0% found this document useful (0 votes)

5 views2,724 pages

OneAPI Math Kernel Library for C 开发人员参考

The document is a Developer Reference for the Intel® oneAPI Math Kernel Library (MKL) for C, detailing its features, routines, and conventions. It covers topics such as performance enhancements, parallelism, BLAS and LAPACK routines, and various numerical methods. The reference serves as a comprehensive guide for developers using Intel MKL in their applications.

Uploaded by

xiejiananhappy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views2,724 pages

OneAPI Math Kernel Library for C 开发人员参考

Uploaded by

xiejiananhappy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2724

Developer Reference for Intel® oneAPI

Math Kernel Library for C

Developer Reference for Intel® oneAPI Math Kernel Library for C

Contents
Chapter 1: Developer Reference for Intel® oneAPI Math Kernel
Library - C
Getting Help and Support ......................................................................... 17
What's New ............................................................................................ 18
Notational Conventions ............................................................................ 18
Overview................................................................................................ 19
Performance Enhancements.............................................................. 24
Parallelism ..................................................................................... 24
C Datatypes Specific to Intel MKL ...................................................... 25
OpenMP* Offload..................................................................................... 26
OpenMP* Offload for Intel® oneAPI Math Kernel Library ........................ 26
BLAS and Sparse BLAS Routines................................................................ 33
BLAS Routines ................................................................................ 33
Naming Conventions for BLAS Routines...................................... 33
C Interface Conventions for BLAS Routines ................................. 35
Matrix Storage Schemes for BLAS Routines ................................ 36
BLAS Level 1 Routines and Functions ......................................... 36
BLAS Level 2 Routines ............................................................. 51
BLAS Level 3 Routines ............................................................. 95
Sparse BLAS Level 1 Routines ......................................................... 121
Vector Arguments ................................................................. 122
Naming Conventions for Sparse BLAS Routines ......................... 122
Routines and Data Types........................................................ 122
BLAS Level 1 Routines That Can Work With Sparse Vectors......... 123
cblas_?axpyi ........................................................................ 123
cblas_?doti .......................................................................... 124
cblas_?dotci ......................................................................... 125
cblas_?dotui......................................................................... 125
cblas_?gthr .......................................................................... 126
cblas_?gthrz......................................................................... 127
cblas_?roti ........................................................................... 128
cblas_?sctr........................................................................... 128
Sparse BLAS Level 2 and Level 3 Routines ........................................ 129
Naming Conventions in Sparse BLAS Level 2 and Level 3............ 130
Sparse Matrix Storage Formats for Sparse BLAS Routines........... 130
Routines and Supported Operations......................................... 131
Interface Consideration.......................................................... 132
Sparse BLAS Level 2 and Level 3 Routines................................ 137
Sparse QR Routines....................................................................... 243
mkl_sparse_set_qr_hint ........................................................ 243
mkl_sparse_?_qr .................................................................. 244
mkl_sparse_qr_reorder.......................................................... 246
mkl_sparse_?_qr_factorize..................................................... 247
mkl_sparse_?_qr_solve ......................................................... 248
mkl_sparse_?_qr_qmult ........................................................ 250
mkl_sparse_?_qr_rsolve ........................................................ 252
Compact BLAS and LAPACK Functions .............................................. 253
mkl_?gemm_compact............................................................ 257

2
Contents

mkl_?trsm_compact .............................................................. 260

mkl_?potrf_compact.............................................................. 262
mkl_?getrfnp_compact .......................................................... 263
mkl_?geqrf_compact ............................................................. 264
mkl_?getrinp_compact .......................................................... 266
Numerical Limitations for Compact BLAS and Compact LAPACK
Routines .......................................................................... 267
mkl_?get_size_compact ......................................................... 268
mkl_get_format_compact ...................................................... 268
mkl_?gepack_compact .......................................................... 269
mkl_?geunpack_compact ....................................................... 271
Inspector-executor Sparse BLAS Routines......................................... 272
Naming Conventions in Inspector-Executor Sparse BLAS Routines 272
Sparse Matrix Storage Formats for Inspector-executor Sparse
BLAS Routines .................................................................. 274
Supported Inspector-executor Sparse BLAS Operations .............. 274
Two-stage Algorithm in Inspector-Executor Sparse BLAS Routines 275
Matrix Manipulation Routines .................................................. 276
Inspector-Executor Sparse BLAS Analysis Routines .................... 296
Inspector-Executor Sparse BLAS Execution Routines .................. 313
BLAS-like Extensions ..................................................................... 357
cblas_?axpy_batch................................................................ 359
cblas_?axpy_batch_strided .................................................... 360
cblas_?axpby ....................................................................... 361
cblas_?copy_batch ................................................................ 362
cblas_?copy_batch_strided..................................................... 364
cblas_?gemmt ...................................................................... 365
cblas_?gemm3m................................................................... 368
cblas_?gemm_batch.............................................................. 371
cblas_?gemm_batch_strided .................................................. 374
cblas_?gemm3m_batch_strided .............................................. 377
cblas_?gemm3m_batch ......................................................... 381
cblas_?trsm_batch ................................................................ 384
cblas_?trsm_batch_strided..................................................... 387
mkl_?imatcopy ..................................................................... 389
mkl_?imatcopy_batch............................................................ 391
mkl_?imatcopy_batch_strided ................................................ 393
mkl_?omatadd_batch_strided ................................................. 394
mkl_?omatcopy .................................................................... 397
mkl_?omatcopy_batch........................................................... 398
mkl_?omatcopy_batch_strided ............................................... 400
mkl_?omatcopy2 .................................................................. 402
mkl_?omatadd ..................................................................... 404
cblas_?gemm_pack_get_size, cblas_gemm_*_pack_get_size ..... 406
cblas_?gemm_pack ............................................................... 408
cblas_gemm_*_pack ............................................................. 411
cblas_?gemm_compute ......................................................... 416
cblas_gemm_*_compute ....................................................... 419
cblas_gemm_bf16bf16f32_compute ........................................ 425
cblas_gemm_bf16bf16f32 ...................................................... 429
cblas_gemm_f16f16f32_compute............................................ 431
cblas_gemm_f16f16f32 ......................................................... 435
cblas_?gemm_free ................................................................ 438
cblas_gemm_* ..................................................................... 439
cblas_?gemv_batch_strided ................................................... 443

3
Developer Reference for Intel® oneAPI Math Kernel Library for C

cblas_?gemv_batch............................................................... 444
cblas_?dgmm_batch_strided .................................................. 446
cblas_?dgmm_batch.............................................................. 448
mkl_jit_create_?gemm .......................................................... 450
mkl_jit_get_?gemm_ptr ........................................................ 452
mkl_jit_destroy .................................................................... 455
LAPACK Routines ................................................................................... 456
C Interface Conventions for LAPACK Routines.................................... 456
Matrix Layout for LAPACK Routines .................................................. 458
Matrix Storage Schemes for LAPACK Routines ................................... 460
Mathematical Notation for LAPACK Routines...................................... 467
Error Analysis ............................................................................... 468
LAPACK Linear Equation Routines .................................................... 469
LAPACK Linear Equation Computational Routines....................... 469
LAPACK Linear Equation Driver Routines .................................. 679
LAPACK Least Squares and Eigenvalue Problem Routines .................... 783
LAPACK Least Squares and Eigenvalue Problem Computational
Routines .......................................................................... 784
LAPACK Least Squares and Eigenvalue Problem Driver Routines .1002
LAPACK Auxiliary Routines.............................................................1177
?lacgv ................................................................................1177
?lacrm................................................................................1178
?syconv ..............................................................................1179
?syr ...................................................................................1180
i?max1 ...............................................................................1182
?sum1 ................................................................................1182
?gelq2 ................................................................................1183
?geqr2 ...............................................................................1184
?geqrt2 ..............................................................................1186
?geqrt3 ..............................................................................1188
?getf2 ................................................................................1190
?lacn2 ................................................................................1191
?lacpy ................................................................................1193
?lakf2.................................................................................1194
?lange ................................................................................1195
?lansy ................................................................................1196
?lanhe ................................................................................1197
?lantr .................................................................................1198
LAPACKE_set_nancheck ........................................................1200
LAPACKE_get_nancheck........................................................1200
?lapmr ...............................................................................1200
?lapmt................................................................................1202
?lapy2 ................................................................................1203
?lapy3 ................................................................................1203
?laran.................................................................................1204
?larfb .................................................................................1204
?larfg .................................................................................1207
?larft ..................................................................................1208
?larfx .................................................................................1211
?large ................................................................................1212
?larnd ................................................................................1213
?larnv.................................................................................1214
?laror .................................................................................1215
?larot .................................................................................1217
?lartgp ...............................................................................1220

4
Contents

?lartgs................................................................................1221
?lascl .................................................................................1222
?lasd0 ................................................................................1223
?lasd1 ................................................................................1224
?lasd2 ................................................................................1227
?lasd3 ................................................................................1229
?lasd4 ................................................................................1231
?lasd5 ................................................................................1232
?lasd6 ................................................................................1233
?lasd7 ................................................................................1236
?lasd8 ................................................................................1239
?lasd9 ................................................................................1241
?lasda ................................................................................1242
?lasdq ................................................................................1245
?lasdt .................................................................................1247
?laset .................................................................................1247
?lasrt .................................................................................1249
?laswp................................................................................1250
?latm1................................................................................1251
?latm2................................................................................1253
?latm3................................................................................1255
?latm5................................................................................1259
?latm6................................................................................1262
?latme................................................................................1264
?latmr ................................................................................1268
?lauum ...............................................................................1274
?syswapr ............................................................................1275
?heswapr ............................................................................1276
?sfrk ..................................................................................1278
?hfrk ..................................................................................1279
?tfsm .................................................................................1281
?tfttp .................................................................................1283
?tfttr ..................................................................................1284
?tpqrt2 ...............................................................................1286
?tprfb .................................................................................1288
?tpttf .................................................................................1291
?tpttr .................................................................................1292
?trttf ..................................................................................1294
?trttp .................................................................................1295
?lacp2 ................................................................................1296
?larcm................................................................................1297
mkl_?tppack .......................................................................1298
mkl_?tpunpack ....................................................................1300
LAPACK Utility Functions and Routines ............................................1302
ilaver .................................................................................1303
ilaenv .................................................................................1303
?lamch ...............................................................................1306
LAPACK Test Functions and Routines ...............................................1307
?lagge ................................................................................1307
?laghe ................................................................................1308
?lagsy ................................................................................1309
?latms ................................................................................1310
Additional LAPACK Routines (Included for Compatibility with Netlib
LAPACK) .................................................................................1314
ScaLAPACK Routines .............................................................................1318

5
Developer Reference for Intel® oneAPI Math Kernel Library for C

Overview of ScaLAPACK Routines ...................................................1318

ScaLAPACK Array Descriptors.........................................................1319
Naming Conventions for ScaLAPACK Routines ..................................1321
ScaLAPACK Computational Routines................................................1322
Systems of Linear Equations: ScaLAPACK Computational Routines1322
Matrix Factorization: ScaLAPACK Computational Routines ..........1323
Solving Systems of Linear Equations: ScaLAPACK Computational
Routines .........................................................................1337
Estimating the Condition Number: ScaLAPACK Computational
Routines .........................................................................1353
Refining the Solution and Estimating Its Error: ScaLAPACK
Computational Routines ....................................................1361
Matrix Inversion: ScaLAPACK Computational Routines...............1371
Matrix Equilibration: ScaLAPACK Computational Routines ..........1375
Orthogonal Factorizations: ScaLAPACK Computational Routines ..1379
Symmetric Eigenvalue Problems: ScaLAPACK Computational
Routines .........................................................................1448
Nonsymmetric Eigenvalue Problems: ScaLAPACK Computational
Routines .........................................................................1481
Singular Value Decomposition: ScaLAPACK Driver Routines........1495
Generalized Symmetric-Definite Eigenvalue Problems:
ScaLAPACK Computational Routines ...................................1507
ScaLAPACK Driver Routines ...........................................................1511
p?geevx .............................................................................1511
p?gesv ...............................................................................1515
p?gesvx..............................................................................1516
p?gbsv ...............................................................................1521
p?dbsv ...............................................................................1524
p?dtsv ................................................................................1526
p?posv ...............................................................................1528
p?posvx..............................................................................1530
p?pbsv ...............................................................................1535
p?ptsv ................................................................................1537
p?gels ................................................................................1539
p?syev ...............................................................................1542
p?syevd..............................................................................1545
p?syevr ..............................................................................1547
p?syevx ..............................................................................1551
p?heev ...............................................................................1557
p?heevd .............................................................................1560
p?heevr ..............................................................................1562
p?heevx .............................................................................1567
p?gesvd..............................................................................1574
p?sygvx..............................................................................1578
p?hegvx .............................................................................1585
ScaLAPACK Auxiliary Routines........................................................1592
p?lacgv...............................................................................1597
p?max1 ..............................................................................1598
pilaver................................................................................1599
pmpcol ...............................................................................1600
pmpim2..............................................................................1601
?combamax1.......................................................................1602
p?sum1 ..............................................................................1602
p?dbtrsv .............................................................................1603
p?dttrsv..............................................................................1606

6
Contents

p?gebal ..............................................................................1608
p?gebd2 .............................................................................1611
p?gehd2 .............................................................................1614
p?gelq2 ..............................................................................1616
p?geql2 ..............................................................................1618
p?geqr2..............................................................................1620
p?gerq2..............................................................................1622
p?getf2...............................................................................1624
p?labrd...............................................................................1626
p?lacon ..............................................................................1629
p?laconsb ...........................................................................1631
p?lacp2 ..............................................................................1632
p?lacp3 ..............................................................................1633
p?lacpy...............................................................................1635
p?laevswp...........................................................................1636
p?lahrd...............................................................................1638
p?laiect ..............................................................................1640
p?lamve .............................................................................1641
p?lange ..............................................................................1642
p?lanhs ..............................................................................1644
p?lansy, p?lanhe ..................................................................1646
p?lantr ...............................................................................1648
p?lapiv ...............................................................................1649
p?lapv2 ..............................................................................1652
p?laqge ..............................................................................1654
p?laqr0...............................................................................1655
p?laqr1...............................................................................1658
p?laqr2...............................................................................1661
p?laqr3...............................................................................1663
p?laqr5...............................................................................1666
p?laqsy...............................................................................1668
p?lared1d ...........................................................................1670
p?lared2d ...........................................................................1671
p?larf .................................................................................1672
p?larfb ...............................................................................1675
p?larfc................................................................................1678
p?larfg ...............................................................................1680
p?larft ................................................................................1682
p?larz.................................................................................1684
p?larzb ...............................................................................1687
p?larzc ...............................................................................1691
p?larzt................................................................................1693
p?lascl................................................................................1696
p?lase2 ..............................................................................1698
p?laset ...............................................................................1699
p?lasmsub ..........................................................................1701
p?lasrt................................................................................1702
p?lassq...............................................................................1704
p?laswp ..............................................................................1705
p?latra ...............................................................................1707
p?latrd ...............................................................................1708
p?latrs................................................................................1711
p?latrz................................................................................1713
p?lauu2 ..............................................................................1715
p?lauum .............................................................................1717

7
Developer Reference for Intel® oneAPI Math Kernel Library for C

p?lawil................................................................................1718
p?org2l/p?ung2l...................................................................1719
p?org2r/p?ung2r..................................................................1721
p?orgl2/p?ungl2...................................................................1723
p?orgr2/p?ungr2..................................................................1725
p?orm2l/p?unm2l.................................................................1727
p?orm2r/p?unm2r................................................................1730
p?orml2/p?unml2.................................................................1734
p?ormr2/p?unmr2................................................................1737
p?pbtrsv .............................................................................1740
p?pttrsv..............................................................................1744
p?potf2...............................................................................1746
p?rot ..................................................................................1748
p?rscl .................................................................................1750
p?sygs2/p?hegs2 .................................................................1751
p?sytd2/p?hetd2..................................................................1753
p?trord ...............................................................................1756
p?trsen...............................................................................1760
p?trti2 ................................................................................1764
?lahqr2...............................................................................1765
?lamsh ...............................................................................1767
?lapst .................................................................................1768
?laqr6 ................................................................................1769
?lar1va ...............................................................................1772
?laref .................................................................................1773
?larrb2 ...............................................................................1776
?larrd2 ...............................................................................1778
?larre2 ...............................................................................1781
?larre2a..............................................................................1784
?larrf2 ................................................................................1788
?larrv2 ...............................................................................1789
?lasorte ..............................................................................1794
?lasrt2................................................................................1795
?stegr2...............................................................................1796
?stegr2a .............................................................................1799
?stegr2b .............................................................................1802
?stein2 ...............................................................................1805
?dbtf2 ................................................................................1807
?dbtrf .................................................................................1808
?dttrf .................................................................................1809
?dttrsv ...............................................................................1810
?pttrsv ...............................................................................1812
?steqr2...............................................................................1813
?trmvt ................................................................................1815
pilaenv ...............................................................................1817
pilaenvx .............................................................................1818
pjlaenv ...............................................................................1820
Additional ScaLAPACK Routines..............................................1821
ScaLAPACK Utility Functions and Routines .......................................1823
p?labad ..............................................................................1824
p?lachkieee .........................................................................1825
p?lamch .............................................................................1825
p?lasnbt .............................................................................1826
descinit ..............................................................................1827
numroc ..............................................................................1828

8
Contents

ScaLAPACK Redistribution/Copy Routines ........................................1829

p?gemr2d ...........................................................................1829
p?trmr2d ............................................................................1831
Sparse Solver Routines .........................................................................1833
oneMKL PARDISO - Parallel Direct Sparse Solver Interface .................1833
pardiso ...............................................................................1840
pardisoinit ..........................................................................1847
pardiso_64..........................................................................1848
mkl_pardiso_pivot ...............................................................1849
pardiso_getdiag ...................................................................1850
pardiso_export ....................................................................1851
pardiso_handle_store ...........................................................1853
pardiso_handle_restore ........................................................1854
pardiso_handle_delete..........................................................1854
pardiso_handle_store_64 ......................................................1855
pardiso_handle_restore_64 ...................................................1856
pardiso_handle_delete_64 ....................................................1857
oneMKL PARDISO Parameters in Tabular Form .........................1857
pardiso iparm Parameter.......................................................1862
PARDISO_DATA_TYPE...........................................................1876
Parallel Direct Sparse Solver for Clusters Interface............................1876
cluster_sparse_solver ...........................................................1878
cluster_sparse_solver_64......................................................1883
cluster_sparse_solver_get_csr_size ........................................1884
cluster_sparse_solver_set_csr_ptrs ........................................1885
cluster_sparse_solver_set_ptr ...............................................1887
cluster_sparse_solver_export ................................................1889
cluster_sparse_solver iparm Parameter...................................1891
Direct Sparse Solver (DSS) Interface Routines .................................1899
DSS Interface Description .....................................................1901
DSS Implementation Details..................................................1902
DSS Routines ......................................................................1903
Iterative Sparse Solvers based on Reverse Communication Interface
(RCI ISS) ................................................................................1914
CG Interface Description .......................................................1915
FGMRES Interface Description ...............................................1920
RCI ISS Routines .................................................................1926
RCI ISS Implementation Details.............................................1939
Preconditioners based on Incomplete LU Factorization Technique ........1940
ILU0 and ILUT Preconditioners Interface Description .................1940
dcsrilu0 ..............................................................................1941
dcsrilut ...............................................................................1944
Sparse Matrix Checker Routines .....................................................1947
sparse_matrix_checker.........................................................1947
sparse_matrix_checker_init...................................................1949
Extended Eigensolver Routines ...............................................................1950
The FEAST Algorithm ....................................................................1950
Extended Eigensolver Functionality .................................................1952
Parallelism in Extended Eigensolver Routines ...........................1953
Achieving Performance With Extended Eigensolver Routines.......1953
Extended Eigensolver Interfaces for Eigenvalues within Interval .........1954
Extended Eigensolver Naming Conventions..............................1954
feastinit ..............................................................................1955
Extended Eigensolver Input Parameters ..................................1955
Extended Eigensolver Output Details ......................................1957

9
Developer Reference for Intel® oneAPI Math Kernel Library for C

Extended Eigensolver RCI Routines ........................................1958

Extended Eigensolver Predefined Interfaces.............................1963
Extended Eigensolver Interfaces for Extremal Eigenvalues/Singular
Values ....................................................................................1976
Extended Eigensolver Interfaces to find largest/smallest
eigenvalues.....................................................................1976
Extended Eigensolver Interfaces to find largest/smallest singular
values ............................................................................1981
mkl_sparse_ee_init ..............................................................1983
Extended Eigensolver Input Parameters for Extremal Eigenvalue
Problem..........................................................................1983
Vector Mathematical Functions ...............................................................1985
VM Data Types, Accuracy Modes, and Performance Tips.....................1986
VM Naming Conventions ...............................................................1986
VM Function Interfaces .........................................................1987
Vector Indexing Methods ...............................................................1989
VM Error Diagnostics ....................................................................1990
VM Mathematical Functions ...........................................................1991
Special Value Notations.........................................................1993
Arithmetic Functions.............................................................1994
Power and Root Functions .....................................................2012
Exponential and Logarithmic Functions ...................................2033
Trigonometric Functions ........................................................2050
Hyperbolic Functions ............................................................2083
Special Functions .................................................................2097
Rounding Functions ..............................................................2126
VM Pack/Unpack Functions ............................................................2139
v?Pack ...............................................................................2139
v?Unpack............................................................................2140
VM Service Functions....................................................................2142
vmlSetMode ........................................................................2142
vmlGetMode........................................................................2144
MKLFreeTls .........................................................................2144
vmlSetErrStatus ..................................................................2145
vmlGetErrStatus ..................................................................2146
vmlClearErrStatus ................................................................2146
vmlSetErrorCallBack.............................................................2147
vmlGetErrorCallBack ............................................................2149
vmlClearErrorCallBack ..........................................................2149
Miscellaneous VM Functions ...........................................................2149
v?CopySign .........................................................................2149
v?NextAfter.........................................................................2151
v?Fdim ...............................................................................2152
v?Fmax ..............................................................................2154
v?Fmin ...............................................................................2155
v?MaxMag...........................................................................2157
v?MinMag ...........................................................................2158
Statistical Functions..............................................................................2160
Random Number Generators..........................................................2160
Random Number Generators Conventions ...............................2161
Basic Generators..................................................................2166
Error Reporting....................................................................2169
VS RNG Usage ModelIntel® oneMKL RNG Usage Model...............2171
Service Routines ..................................................................2172
Distribution Generators.........................................................2194

10
Contents

Advanced Service Routines....................................................2239

Convolution and Correlation...........................................................2244
Convolution and Correlation Naming Conventions .....................2245
Convolution and Correlation Data Types ..................................2246
Convolution and Correlation Parameters..................................2246
Convolution and Correlation Task Status and Error Reporting .....2248
Convolution and Correlation Task Constructors .........................2249
Convolution and Correlation Task Editors.................................2256
Task Execution Routines........................................................2261
Convolution and Correlation Task Destructors ..........................2268
Convolution and Correlation Task Copiers ................................2269
Convolution and Correlation Usage Examples...........................2270
Convolution and Correlation Mathematical Notation and
Definitions ......................................................................2274
Convolution and Correlation Data Allocation.............................2275
Summary Statistics ......................................................................2277
Summary Statistics Naming Conventions ................................2278
Summary Statistics Data Types..............................................2278
Summary Statistics Parameters .............................................2279
Summary Statistics Task Status and Error Reporting .................2279
Summary Statistics Task Constructors ....................................2283
Summary Statistics Task Editors ............................................2285
Summary Statistics Task Computation Routines .......................2312
Summary Statistics Task Destructor .......................................2317
Summary Statistics Usage Examples ......................................2317
Summary Statistics Mathematical Notation and Definitions ........2319
Fourier Transform Functions ...................................................................2323
FFT Functions ..............................................................................2324
FFT Interface.......................................................................2325
Computing an FFT................................................................2325
Configuration Settings ..........................................................2326
FFT Descriptor Manipulation Functions ....................................2341
FFT Descriptor Configuration Functions ...................................2345
FFT Computation Functions ...................................................2347
Status Checking Functions ....................................................2354
Cluster FFT Functions ...................................................................2356
Computing Cluster FFT .........................................................2357
Distributing Data Among Processes ........................................2358
Cluster FFT Interface ............................................................2359
Cluster FFT Descriptor Manipulation Functions..........................2360
Cluster FFT Computation Functions.........................................2362
Cluster FFT Descriptor Configuration Functions.........................2365
Error Codes.........................................................................2369
PBLAS Routines....................................................................................2369
PBLAS Routines Overview..............................................................2370
PBLAS Routine Naming Conventions ...............................................2371
PBLAS Level 1 Routines.................................................................2372
p?amax ..............................................................................2373
p?asum ..............................................................................2374
p?axpy ...............................................................................2375
p?copy ...............................................................................2376
p?dot .................................................................................2377
p?dotc ................................................................................2379
p?dotu................................................................................2380
p?nrm2 ..............................................................................2381

11
Developer Reference for Intel® oneAPI Math Kernel Library for C

p?scal ................................................................................2382
p?swap...............................................................................2383
PBLAS Level 2 Routines.................................................................2384
p?gemv ..............................................................................2385
p?agemv ............................................................................2387
p?ger .................................................................................2390
p?gerc................................................................................2391
p?geru ...............................................................................2393
p?hemv ..............................................................................2395
p?ahemv ............................................................................2396
p?her .................................................................................2398
p?her2 ...............................................................................2400
p?symv ..............................................................................2402
p?asymv.............................................................................2404
p?syr .................................................................................2405
p?syr2................................................................................2407
p?trmv ...............................................................................2409
p?atrmv..............................................................................2411
p?trsv ................................................................................2413
PBLAS Level 3 Routines.................................................................2415
p?geadd .............................................................................2416
p?tradd ..............................................................................2417
p?gemm .............................................................................2419
p?hemm .............................................................................2421
p?herk................................................................................2423
p?her2k ..............................................................................2425
p?symm .............................................................................2427
p?syrk ................................................................................2429
p?syr2k ..............................................................................2431
p?tran ................................................................................2434
p?tranu ..............................................................................2435
p?tranc...............................................................................2436
p?trmm ..............................................................................2437
p?trsm ...............................................................................2440
Partial Differential Equations Support ......................................................2442
Trigonometric Transform Routines...................................................2442
Trigonometric Transforms Implemented ..................................2443
Sequence of Invoking TT Routines ..........................................2444
Trigonometric Transform Interface Description .........................2445
TT Routines.........................................................................2446
Common Parameters of the Trigonometric Transforms ...............2453
Trigonometric Transform Implementation Details ......................2456
Fast Poisson Solver Routines .........................................................2457
Poisson Solver Implementation ..............................................2457
Sequence of Invoking Poisson Solver Routines .........................2463
Fast Poisson Solver Interface Description ................................2465
Routines for the Cartesian Solver ...........................................2466
Routines for the Spherical Solver ...........................................2475
Common Parameters for the Poisson Solver .............................2482
Poisson Solver Implementation Details....................................2491
Nonlinear Optimization Problem Solvers ..................................................2492
Nonlinear Solver Organization and Implementation ...........................2492
Nonlinear Solver Routine Naming Conventions .................................2494
Nonlinear Least Squares Problem without Constraints .......................2494
?trnlsp_init .........................................................................2495

12
Contents

?trnlsp_check ......................................................................2497
?trnlsp_solve.......................................................................2498
?trnlsp_get .........................................................................2500
?trnlsp_delete .....................................................................2501
Nonlinear Least Squares Problem with Linear (Bound) Constraints ......2502
?trnlspbc_init ......................................................................2502
?trnlspbc_check...................................................................2504
?trnlspbc_solve....................................................................2506
?trnlspbc_get ......................................................................2507
?trnlspbc_delete ..................................................................2509
Jacobian Matrix Calculation Routines...............................................2509
?jacobi_init .........................................................................2510
?jacobi_solve ......................................................................2511
?jacobi_delete .....................................................................2512
?jacobi ...............................................................................2512
?jacobix..............................................................................2513
Support Functions ................................................................................2515
Version Information......................................................................2518
mkl_get_version ..................................................................2518
mkl_get_version_string ........................................................2519
Threading Control ........................................................................2520
mkl_set_num_threads ..........................................................2521
mkl_domain_set_num_threads ..............................................2522
mkl_set_num_threads_local ..................................................2523
mkl_set_dynamic.................................................................2525
mkl_get_max_threads ..........................................................2526
mkl_domain_get_max_threads ..............................................2526
mkl_get_dynamic ................................................................2527
mkl_set_num_stripes ...........................................................2528
mkl_get_num_stripes ...........................................................2529
Error Handling .............................................................................2530
Error Handling for Linear Algebra Routines ..............................2530
Handling Fatal Errors ............................................................2533
Character Equality Testing .............................................................2534
lsame.................................................................................2534
lsamen ...............................................................................2534
Timing ........................................................................................2535
second/dsecnd ....................................................................2535
mkl_get_cpu_clocks .............................................................2536
mkl_get_cpu_frequency........................................................2536
mkl_get_max_cpu_frequency ................................................2537
mkl_get_clocks_frequency ....................................................2537
Memory Management ...................................................................2538
mkl_free_buffers .................................................................2538
mkl_thread_free_buffers.......................................................2539
mkl_disable_fast_mm ..........................................................2539
mkl_mem_stat ....................................................................2540
mkl_peak_mem_usage .........................................................2541
mkl_malloc .........................................................................2542
mkl_calloc ..........................................................................2542
mkl_realloc .........................................................................2543
mkl_free.............................................................................2544
mkl_set_memory_limit .........................................................2544
Usage Examples for the Memory Functions ..............................2545
Single Dynamic Library Control ......................................................2546

13
Developer Reference for Intel® oneAPI Math Kernel Library for C

mkl_set_interface_layer........................................................2546
mkl_set_threading_layer ......................................................2547
mkl_set_xerbla....................................................................2548
mkl_set_progress ................................................................2549
mkl_set_pardiso_pivot..........................................................2550
Conditional Numerical Reproducibility Control...................................2550
mkl_cbwr_set......................................................................2551
mkl_cbwr_get .....................................................................2552
mkl_cbwr_get_auto_branch ..................................................2553
Named Constants for CNR Control ..........................................2554
Reproducibility Conditions .....................................................2555
Usage Examples for CNR Support Functions.............................2556
Miscellaneous ..............................................................................2557
mkl_progress ......................................................................2557
mkl_enable_instructions .......................................................2558
mkl_set_env_mode ..............................................................2561
mkl_verbose .......................................................................2561
mkl_verbose_output_file.......................................................2562
mkl_set_mpi .......................................................................2563
mkl_finalize ........................................................................2564
BLACS Routines ...................................................................................2565
Matrix Shapes..............................................................................2566
Repeatability and Coherence..........................................................2567
BLACS Combine Operations ...........................................................2570
?gamx2d ............................................................................2571
?gamn2d ............................................................................2572
?gsum2d ............................................................................2574
BLACS Point To Point Communication ..............................................2575
?gesd2d .............................................................................2577
?trsd2d...............................................................................2578
?gerv2d ..............................................................................2578
?trrv2d ...............................................................................2579
BLACS Broadcast Routines.............................................................2579
?gebs2d .............................................................................2581
?trbs2d...............................................................................2581
?gebr2d..............................................................................2582
?trbr2d ...............................................................................2583
BLACS Support Routines ...............................................................2584
Initialization Routines ...........................................................2584
Destruction Routines ............................................................2590
Informational Routines .........................................................2592
Miscellaneous Routines .........................................................2594
BLACS Routines Usage Examples....................................................2595
Data Fitting Functions ...........................................................................2595
Data Fitting Function Naming Conventions .......................................2595
Data Fitting Function Data Types ....................................................2596
Mathematical Conventions for Data Fitting Functions.........................2596
Data Fitting Usage Model...............................................................2599
Data Fitting Usage Examples .........................................................2599
Data Fitting Function Task Status and Error Reporting .......................2605
Data Fitting Task Creation and Initialization Routines ........................2607
df?NewTask1D ......................................................................2607
Task Configuration Routines...........................................................2609
df?EditPPSpline1D ..............................................................2610

14
Contents

df?EditPtr .........................................................................2617
dfiEditVal .........................................................................2618
df?EditIdxPtr.....................................................................2620
df?QueryPtr ........................................................................2622
dfiQueryVal ........................................................................2622
df?QueryIdxPtr ...................................................................2623
Data Fitting Computational Routines ...............................................2624
df?Construct1D ...................................................................2625
df?Interpolate1D/df?InterpolateEx1D ..................................2626
df?Integrate1D/df?IntegrateEx1D ........................................2634
df?SearchCells1D/df?SearchCellsEx1D ..................................2638
df?InterpCallBack ..............................................................2640
df?IntegrCallBack ..............................................................2641
df?SearchCellsCallBack ......................................................2643
Data Fitting Task Destructors .........................................................2644
dfDeleteTask ......................................................................2644
Appendix A: Linear Solvers Basics ..........................................................2645
Sparse Linear Systems..................................................................2645
Matrix Fundamentals ............................................................2646
Direct Method......................................................................2647
Sparse Matrix Storage Formats ......................................................2653
DSS Symmetric Matrix Storage ..............................................2654
DSS Nonsymmetric Matrix Storage .........................................2655
DSS Structurally Symmetric Matrix Storage .............................2655
DSS Distributed Symmetric Matrix Storage..............................2656
Sparse BLAS CSR Matrix Storage Format.................................2657
Sparse BLAS CSC Matrix Storage Format.................................2659
Sparse BLAS Coordinate Matrix Storage Format .......................2660
Sparse BLAS Diagonal Matrix Storage Format ..........................2661
Sparse BLAS Skyline Matrix Storage Format ............................2662
Sparse BLAS BSR Matrix Storage Format.................................2663
Appendix B: Routine and Function Arguments ..........................................2665
Vector Arguments in BLAS .............................................................2665
Vector Arguments in Vector Math ...................................................2666
Matrix Arguments.........................................................................2667
Appendix C: FFTW Interface to Intel® oneAPI Math Kernel Library (oneMKL) .2672
Notational Conventions .................................................................2672
FFTW2 Interface to Intel® oneAPI Math Kernel Library (oneMKL) .........2672
Wrappers Reference .............................................................2673
Limitations of the FFTW2 Interface to Intel® oneAPI Math Kernel
Library (oneMKL) .............................................................2675
Installing FFTW2 Interface Wrappers ......................................2676
MPI FFTW2 Wrappers ...........................................................2676
FFTW3 Interface to Intel® oneAPI Math Kernel Library (oneMKL) .........2679
Using FFTW3 Wrappers .........................................................2679
Building Your Own Wrapper Library.........................................2680
Building an Application With FFTW3 Interface Wrappers ............2681
Running FFTW3 Interface Wrapper Examples ...........................2681
MPI FFTW3 Wrappers ...........................................................2682
Appendix D: Code Examples ..................................................................2683
BLAS Code Examples ....................................................................2683
Fourier Transform Functions Code Examples ....................................2689
FFT Code Examples ..............................................................2689
Examples for Cluster FFT Functions ........................................2695

15
Developer Reference for Intel® oneAPI Math Kernel Library for C

Auxiliary Data Transformations ..............................................2697

Appendix F: oneMKL Functionality...........................................................2698
BLAS Functionality ....................................................................2698
Transposition Functionality .......................................................2698
LAPACK Functionality ................................................................2699
DFT Functionality ......................................................................2700
Sparse BLAS Functionality.........................................................2701
Sparse Solvers Functionality .....................................................2706
Random Number Generators Functionality ................................2706
Vector Math Functionality .........................................................2707
Data Fitting Functionality ..........................................................2708
Summary Statistics Functionality ..............................................2708
Bibliography ........................................................................................2710
Glossary..............................................................................................2715
Notices and Disclaimers.........................................................................2720

16
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

Developer Reference for Intel®

oneAPI Math Kernel Library - C 1
For detailed information on setting up and using Intel® oneAPI Math Kernel Library (oneMKL), refer to the
Developer Guide for Linux and the Developer Guide for Windows.
For more documentation on this and other products, visit the oneAPI Documentation Library.
Intel® Math Kernel Library is now Intel® oneAPI Math Kernel Library (oneMKL).
Documentation for versions of Intel® Math Kernel Library older than 2023.0 is available for download only.
See Downloadable Documentation.
This publication describes the C interface.

Basic Linear Algebra The BLAS routines provide vector, matrix-vector, and matrix-matrix operations.
Subprograms (BLAS)
Sparse BLAS The Sparse BLAS routines provide basic operations on sparse vectors and
matrices.

Sparse QR The Sparse QR Routines provide a multifrontal sparse QR factorization method

for solving a sparse system of linear equations.

LAPACK The LAPACK routines solve systems of linear equations, least square problems,
eigenvalue and singular value problems, and Sylvester's equations.

Statistical Functions The Statistical Functions provides a set of routines implementing commonly used
pseudorandom random number generators (RNG) with continuous distribution.

Direct and Iterative Among several options for solving sparse linear systems of equations, oneMKL
Sparse Solvers offers a direct sparse solver based on PARDISO*, which is referred to here as
Intel MKL PARDISO.

Vector Mathematics The Vector Mathematics (VM) functions compute core mathematical functions on
Functions vector arguments.

Vector Statistics Functions The Vector Statistics (VS) functions generate vectors of pseudorandom numbers
with different types of statistical distributions and perform convolution and
correlation computations.

Fourier Transform The Fourier Transform Functions offer several options for computing Fast Fourier
Functions Transforms (FFTs).

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Getting Help and Support

Intel provides a support web site that contains a rich repository of self help information, including getting
started tips, known product issues, product errata, license information, user forums, and more. Visit the
Intel® oneAPI Math Kernel Library (oneMKL) support website at http://www.intel.com/software/products/
support/.

17
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

What's New
This Developer Reference documents Intel® oneAPI Math Kernel Library (oneMKL) release for the C interface.
Intel® Math Kernel Library is now Intel® oneAPI Math Kernel Library (oneMKL). Documentation for older
versions of Intel® Math Kernel Library is available for download only. For a list of available documentation
downloads by product version, see these pages:
• Download Documentation for Intel® Parallel Studio XE
• Download Documentation for Intel® System Studio
The manual has been updated to reflect enhancements to the product, besides improvements and error
corrections.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Notational Conventions
This manual uses the following terms to refer to operating systems:

Windows* OS This term refers to information that is valid on all supported Windows* operating
systems.

Linux* OS This term refers to information that is valid on all supported Linux* operating
systems.

macOS* This term refers to information that is valid on Intel®-based systems running the
macOS* operating system.

This manual uses the following notational conventions:

• Routine name shorthand (for example, ?ungqr instead of cungqr/zungqr).
• Font conventions used for distinction between the text and the code.

Routine Name Shorthand

For shorthand, names that contain a question mark "?" represent groups of routines with similar
functionality. Each group typically consists of routines used with four basic data types: single-precision real,
double-precision real, single-precision complex, and double-precision complex. The question mark is used to
indicate any or all possible varieties of a function; for example:

?swap Refers to all four data types of the vector-vector ?swap routine:
sswap, dswap, cswap, and zswap.

Font Conventions
The following font conventions are used:

lowercase courier Code examples:

a[k+i][j] = matrix[i][j];

data types; for example, const float*

18
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lowercase courier mixed with Function names; for example, vmlSetMode
UpperCase courier

lowercase courier italic Variables in arguments and parameters description. For example, incx.

* Used as a multiplication symbol in code examples and equations and

where required by the programming language syntax.

Overview
Intel® oneAPI Math Kernel Library (oneMKL) is optimized for performance on Intel processors. oneMKL also
runs on non-Intel x86-compatible processors.

NOTE
oneMKL provides limited input validation to minimize the performance overheads. It is your
responsibility when using oneMKL to ensure that input data has the required format and does not
contain invalid characters. These can cause unexpected behavior of the library. Examples of the inputs
that may result in unexpected behavior:
• Not-a-number (NaN) and other special floating point values
• Large inputs may lead to accumulator overflow
As the oneMKL API accepts raw pointers, it is your application's responsibility to validate the buffer
sizes before passing them to the library. The library requires subroutine and function parameters to be
valid before being passed. While some oneMKL routines do limited checking of parameter errors, your
application should check for NULL pointers, for example.

The Intel® oneAPI Math Kernel Library includes Fortran routines and functions optimized for Intel® processor-
based computers running operating systems that support multiprocessing. In addition to the Fortran
interface, Intel® oneAPI Math Kernel Library (oneMKL) includes a C-language interface for the Discrete
Fourier transform functions, as well as for the Vector Mathematics, Vector Statistics, and many other
functions. For hardware and software requirements to use Intel® oneAPI Math Kernel Library (oneMKL),
seeIntel® oneAPI Math Kernel Library (oneMKL) Release Notes.

NOTE
Function calls at runtime for Intel® oneAPI Math Kernel Library (oneMKL) libraries on the Microsoft
Windows* operating system can utilize the functionLoadLibrary() and related loading functions in
static, dynamic, and single-dynamic library linking models. These functions attempt to access the
loader lock which when used within or at the same time as another DllMainfunction call, can lead to a
deadlock. If possible, avoid making your calls to Intel® oneAPI Math Kernel Library (oneMKL) in
aDllMain function or at the same time as other calls to DllMain even on separate threads. Refer to
the Microsoft documentation about DllMain and Dynamic-Link Library Best Practices for more details.

BLAS Routines
The BLAS routines and functions are divided into the following groups according to the operations they
perform:
• BLAS Level 1 Routines perform operations of both addition and reduction on vectors of data. Typical
operations include scaling and dot products.
• BLAS Level 2 Routines perform matrix-vector operations, such as matrix-vector multiplication, rank-1 and
rank-2 matrix updates, and solution of triangular systems.
• BLAS Level 3 Routines perform matrix-matrix operations, such as matrix-matrix multiplication, rank-k
update, and solution of triangular systems.

19
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Starting from release 8.0, Intel® oneAPI Math Kernel Library (oneMKL) also supports the Fortran 95 interface
to the BLAS routines.
Starting from release 10.1, a number of BLAS-like Extensions are added to enable the user to perform
certain data manipulation, including matrix in-place and out-of-place transposition operations combined with
simple matrix arithmetic operations.

Sparse BLAS Routines

The Sparse BLAS Level 1 Routines and Functions and Sparse BLAS Level 2 and Level 3 Routinesroutines and
functions operate on sparse vectors and matrices. These routines perform vector operations similar to the
BLAS Level 1, 2, and 3 routines. The Sparse BLAS routines take advantage of vector and matrix sparsity:
they allow you to store only non-zero elements of vectors and matrices. Intel® oneAPI Math Kernel Library
(oneMKL) also supports Fortran 95 interface to Sparse BLAS routines.

Sparse QR
Sparse QRin Intel® oneAPI Math Kernel Library (oneMKL) is a set of routines used to solve sparse matrices
with real coefficients and general structure. All Sparse QR routines can be divided into three steps:
reordering, factorization, and solving. Currently, only CSR format is supported for the input matrix, and
Sparse QR operates on the matrix handle used in all SpBLAS IE routines. (For details on how to create a
matrix handle, refer tomkl-sparse-create-csr.)

LAPACK Routines
The Intel® oneAPI Math Kernel Library fully supports the LAPACK 3.7 set of computational, driver, auxiliary
and utility routines.
The original versions of LAPACK from which that part of Intel® oneAPI Math Kernel Library (oneMKL) was
derived can be obtained fromhttp://www.netlib.org/lapack/index.html. The authors of LAPACK are E.
Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S.
Hammarling, A. McKenney, and D. Sorensen.
The LAPACK routines can be divided into the following groups according to the operations they perform:
• Routines for solving systems of linear equations, factoring and inverting matrices, and estimating
condition numbers (see LAPACK Routines: Linear Equations).
• Routines for solving least squares problems, eigenvalue and singular value problems, and Sylvester's
equations (see LAPACK Routines: Least Squares and Eigenvalue Problems).
Starting from release 8.0, Intel® oneAPI Math Kernel Library (oneMKL) also supports the Fortran 95 interface
to LAPACK computational and driver routines. This interface provides an opportunity for simplified calls of
LAPACK routines with fewer required arguments.

Sparse Solver Routines

Direct sparse solver routines in Intel® oneAPI Math Kernel Library (oneMKL) (seeSparse Solver Routines )
solve symmetric and symmetrically-structured sparse matrices with real or complex coefficients. For
symmetric matrices, these Intel® oneAPI Math Kernel Library (oneMKL) subroutines can solve both positive-
definite and indefinite systems. Intel® oneAPI Math Kernel Library (oneMKL) includes a solver based on the
PARDISO* sparse solver, referred to as Intel® oneAPI Math Kernel Library (oneMKL) PARDISO, as well as an
alternative set of user callable direct sparse solver routines.
If you use the Intel® oneAPI Math Kernel Library (oneMKL) PARDISO sparse solver, please cite:
O.Schenk and K.Gartner. Solving unsymmetric sparse systems of linear equations with PARDISO. J. of Future
Generation Computer Systems, 20(3):475-487, 2004.
Intel® oneAPI Math Kernel Library (oneMKL) provides also an iterative sparse solver (seeSparse Solver
Routines) that uses Sparse BLAS level 2 and 3 routines and works with different sparse data formats.

20
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Extended Eigensolver Routines
TheExtended Eigensolver RCI Routines is a set of high-performance numerical routines for solving standard
(Ax = λx) and generalized (Ax = λBx) eigenvalue problems, where A and B are symmetric or Hermitian. It
yields all the eigenvalues and eigenvectors within a given search interval. It is based on the Feast algorithm,
an innovative fast and stable numerical algorithm presented in [Polizzi09], which deviates fundamentally
from the traditional Krylov subspace iteration based techniques (Arnoldi and Lanczos algorithms [Bai00]) or
other Davidson-Jacobi techniques [Sleijpen96]. The Feast algorithm is inspired by the density-matrix
representation and contour integration technique in quantum mechanics.
It is free from orthogonalization procedures. Its main computational tasks consist of solving very few inner
independent linear systems with multiple right-hand sides and one reduced eigenvalue problem orders of
magnitude smaller than the original one. The Feast algorithm combines simplicity and efficiency and offers
many important capabilities for achieving high performance, robustness, accuracy, and scalability on parallel
architectures. This algorithm is expected to significantly augment numerical performance in large-scale
modern applications.
Some of the characteristics of the Feast algorithm [Polizzi09] are:
• Converges quickly in 2-3 iterations with very high accuracy
• Naturally captures all eigenvalue multiplicities
• No explicit orthogonalization procedure
• Can reuse the basis of pre-computed subspace as suitable initial guess for performing outer-refinement
iterations
This capability can also be used for solving a series of eigenvalue problems that are close one another.
• The number of internal iterations is independent of the size of the system and the number of eigenpairs in
the search interval
• The inner linear systems can be solved either iteratively (even with modest relative residual error) or
directly

VM Functions
The Vector Mathematics functions (see Vector Mathematical Functions) include a set of highly optimized
implementations of certain computationally expensive core mathematical functions (power, trigonometric,
exponential, hyperbolic, etc.) that operate on vectors of real and complex numbers.
Application programs that might significantly improve performance with VM include nonlinear programming
software, integrals computation, and many others. VM provides interfaces both for Fortran and C languages.

Statistical Functions
Vector Statistics (VS) contains three sets of functions (see Statistical Functions) providing:
• Pseudorandom, quasi-random, and non-deterministic random number generator subroutines
implementing basic continuous and discrete distributions. To provide best performance, the VS
subroutines use calls to highly optimized Basic Random Number Generators (BRNGs) and a set of vector
mathematical functions.
• A wide variety of convolution and correlation operations.
• Initial statistical analysis of raw single and double precision multi-dimensional datasets.

Fourier Transform Functions

The Intel® oneAPI Math Kernel Library (oneMKL) multidimensional Fast Fourier Transform (FFT) functions with
mixed radix support (see Fourier Transform Functions) provide uniformity of discrete Fourier transform
computation and combine functionality with ease of use. Both Fortran and C interface specifications are
given. There is also a cluster version of FFT functions, which runs on distributed-memory architectures and is
provided only for Intel® 64 architectures.
The FFT functions provide fast computation via the FFT algorithms for arbitrary lengths. See the Intel®
oneAPI Math Kernel Library (oneMKL) Developer Guide for the specific radices supported.

21
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Partial Differential Equations Support

Intel® oneAPI Math Kernel Library (oneMKL) provides tools for solving Partial Differential Equations (PDE)
(seePartial Differential Equations Support). These tools are Trigonometric Transform interface routines and
Poisson Solver.
The Trigonometric Transform routines may be helpful to users who implement their own solvers similar to the
Intel® oneAPI Math Kernel Library (oneMKL) Poisson Solver. The users can improve performance of their
solvers by using fast sine, cosine, and staggered cosine transforms implemented in the Trigonometric
Transform interface.
The Poisson Solver is designed for fast solving of simple Helmholtz, Poisson, and Laplace problems. The
Trigonometric Transform interface, which underlies the solver, is based on the Intel® oneAPI Math Kernel
Library (oneMKL) FFT interface (refer toFourier Transform Functions), optimized for Intel® processors.

Support Functions
The Intel® oneAPI Math Kernel Library (oneMKL) support functions (seeSupport Functions) are used to
support the operation of the Intel® oneAPI Math Kernel Library (oneMKL) software and provide basic
information on the library and library operation, such as the current library version, timing, setting and
measuring of CPU frequency, error handling, and memory allocation.
Starting from release 10.0, the Intel® oneAPI Math Kernel Library (oneMKL) support functions provide
additional threading control.
Starting from release 10.1, Intel® oneAPI Math Kernel Library (oneMKL) selectively supports aProgress
Routine feature to track progress of a lengthy computation and/or interrupt the computation using a callback
function mechanism. The user application can define a function called mkl_progressthat is regularly called
from the Intel® oneAPI Math Kernel Library (oneMKL) routine supporting the progress routine feature.
SeeProgress Routine in Support Functions for reference. Refer to a specific LAPACK or DSS/PARDISO function
description to see whether the function supports this feature or not.

oneMKL Initialization on CPU

When a user first invokes any oneMKL functions, there is an initialization cost to keep in mind. Here are some
details about running oneMKL C/Fortran functions:
When we run an application with oneMKL C/Fortran functions on CPU, we spend time on some service
routines. Here's what is happening inside the library when we call oneMKL C/Fortran functions:
• The first step is setting xerbla. It's a oneMKL routine that acts as an error handler for BLAS, LAPACK, VS,
and VM domains if an input parameter has an invalid value. See xerbla for more information.
• The next step is to check which oneMKL verbose mode was chosen. oneMKL verbose mode is needed to
profile oneMKL usage in the application. You can read more about oneMKL Verbose mode in the
documentation here:

Linux
Using oneMKL Verbose Mode

Windows
Using oneMKL Verbose Mode

The oneMKL Verbose feature is enabled only for certain domains such as BLAS (and BLAS-like
extensions), LAPACK, selected functionality in ScaLAPACK and FFT, and (in the DPC++ API only) RNG.

22
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• The next item in the list is the oneMKL dispatcher. oneMKL dispatcher checks the hardware used for
running the application and the available instruction set. Based on the results from dispatcher, different
function implementations (optimized for different hardware and instruction-sets) will be called. More
details can be found in the oneMKL documentation here:

Linux
Instruction Set–Specific Dispatching

Windows
Instruction Set–Specific Dispatching

• During the function run (or even before), you may need to allocate the memory. oneMKL has a memory
manager that provides a list of support functions, the ability to redefine memory functions, and internal
fast memory allocations with memory reuse. See the following for more information:
Memory Management
Redefining Memory Functions (Linux)
Redefining Memory Functions (Windows)

• If you're in the threading mode, oneMKL will also call its own threading manager where it will check for
different environment variables and set the number of threads. You can read more about this in oneMKL
documentation here:

Linux
Improving Performance with Threading

Windows
Improving Performance with Threading

As an example, BLAS dgemm was run on the 4th Gen Intel® Xeon® Scalable Processors system. Sizes of
matrices A and B were 10000x10000. Running the dgemm function in sequential mode took 32.5 seconds
(32500 milliseconds), from which:
• Setting oneMKL xerbla took 0.001 millisecond.
• Setting/checking oneMKL verbose mode took 0.009 milliseconds.
• Checking for MKL_CBWR settings and detecting CPU using MKL dispatcher took 0.004 milliseconds.
• Additional internal memory allocations in dgemm took 0.009 milliseconds followed by 0.002 milliseconds of
deallocation.
As you can see in the example, before the dgemm function runs there are several mkl_malloc calls to
allocate memory for the A, B, and C matrices. Overall memory allocation took around 0.084 milliseconds.
After the dgemm function completes, there are several mkl_free calls to free the A, B, and C matrix memory.
This took around 5.159 milliseconds.
If you run dgemm with intel omp threading, you'll spend 24 milliseconds in the oneMKL threading manager.
If you run dgemm with tbb threading, you'll spend around 5 milliseconds in oneMKL threading manager.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.

23
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Product and Performance Information

Notice revision #20201201

Performance Enhancements
The Intel® oneAPI Math Kernel Library has been optimized by exploiting both processor and system features
and capabilities. Special care has been given to those routines that most profit from cache-management
techniques. These especially include matrix-matrix operation routines such asdgemm().
In addition, code optimization techniques have been applied to minimize dependencies of scheduling integer
and floating-point units on the results within the processor.
The major optimization techniques used throughout the library include:
• Loop unrolling to minimize loop management costs
• Blocking of data to improve data reuse opportunities
• Copying to reduce chances of data eviction from cache
• Data prefetching to help hide memory latency
• Multiple simultaneous operations (for example, dot products in dgemm) to eliminate stalls due to
arithmetic unit pipelines
• Use of hardware features such as the SIMD arithmetic units, where appropriate
These are techniques from which the arithmetic code benefits the most.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Parallelism
Intel® oneAPI Math Kernel Library (oneMKL) offers performance gains through parallelism provided by the
symmetric multiprocessing performance (SMP) feature. You can obtain improvements from SMP in the
following ways:
• One way is based on user-managed threads in the program and further distribution of the operations over
the threads based on data decomposition, domain decomposition, control decomposition, or some other
parallelizing technique. Each thread can use any of the Intel® oneAPI Math Kernel Library (oneMKL)
functions (except for the deprecated?lacon LAPACK routine) because the library has been designed to be
thread-safe.
• Another method is to use the FFT and BLAS level 3 routines. They have been parallelized and require no
alterations of your application to gain the performance enhancements of multiprocessing. Performance
using multiple processors on the level 3 BLAS shows excellent scaling. Since the threads are called and
managed within the library, the application does not need to be recompiled thread-safe.
• Yet another method is to use tuned LAPACK routines. Currently these include the single- and double
precision flavors of routines for QR factorization of general matrices, triangular factorization of general
and symmetric positive-definite matrices, solving systems of equations with such matrices, as well as
solving symmetric eigenvalue problems.
For instructions on setting the number of available processors for the BLAS level 3 and LAPACK routines, see
Intel® oneAPI Math Kernel Library (oneMKL) Developer Guide.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.

24
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Product and Performance Information

Notice revision #20201201

C Datatypes Specific to Intel MKL

The mkl_types.hfile defines datatypes specific to Intel® oneAPI Math Kernel Library (oneMKL).
C/C++ Type Fortran Type LP32 LP64 Equivalent ILP64 Equivalent
Equivalent (Size in Bytes) (Size in Bytes)
(Size in
Bytes)

MKL_INT INTEGER C/C++: C/C++: int C/C++: long long

int (or define MKL_ILP64
(MKL integer) (default Fortran: INTEGER*4
macros
INTEGER) Fortran:
(4 bytes)
INTEGER*4 Fortran: INTEGER*8
(4 bytes) (8 bytes)

MKL_UINT N/A C/C++: C/C++: unsigned C/C++: unsigned

unsigned int long long
(MKL unsigned
int
integer) (4 bytes) (8 bytes)
(4 bytes)

MKL_LONG N/A C/C++: C/C++: long C/C++: long

long
(MKL long integer) (Windows: 4 bytes) (8 bytes)
(4 bytes)
(Linux, Mac: 8
bytes)

MKL_Complex8 COMPLEX*8 (8 bytes) (8 bytes) (8 bytes)

(Like C99 complex
float)

MKL_Complex16 COMPLEX*16 (16 bytes) (16 bytes) (16 bytes)

(Like C99 complex
double)

You can redefine datatypes specific to Intel® oneAPI Math Kernel Library (oneMKL). One reason to do this is if
you have your own types which are binary-compatible with Intel® oneAPI Math Kernel Library (oneMKL)
datatypes, with the same representation or memory layout. To redefine a datatype, use one of these
methods:
• Insert the #define statement redefining the datatype before the mkl.h header file #include statement.
For example,
#define MKL_INT size_t
#include "mkl.h"
• Use the compiler -D option to redefine the datatype. For example,

...-DMKL_INT=size_t...

NOTE
As the user, if you redefine Intel® oneAPI Math Kernel Library (oneMKL) datatypes you are responsible
for making sure that your definition is compatible with that of Intel® oneAPI Math Kernel Library
(oneMKL). If not, it might cause unpredictable results or crash the application.

25
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

OpenMP* Offload
This section describes how to perform OpenMP offload computations using Intel® oneAPI Math Kernel Library.

OpenMP* Offload for Intel® oneAPI Math Kernel Library

You can use Intel® oneAPI Math Kernel Library (oneMKL) and OpenMP* offload to run standard oneMKL
computations on Intel GPUs. You can find the list of oneMKL features that support OpenMP offload in the
mkl_omp_offload.h header file, which includes:
• All Level 1, 2, and 3 BLAS functions through the CBLAS and BLAS interfaces, supporting both synchronous
and asynchronous execution
• BLAS-like extensions: cblas_?axpby, cblas_?axpy_batch{_strided},
cblas_?copy_batch{_strided}, cblas_?gemv_batch{_strided}, cblas_?dgmm_batch{_strided},
cblas_hgemm, cblas_gemm_bf16bf16f32, cblas_gemm_s8u8s32, cblas_?gemm_batch{_strided},
cblas_?trsm_batch{_strided}, cblas_?trmm_oop, cblas_?trsm_oop, and cblas_?gemmt
functionality through the CBLAS and BLAS interfaces as well as mkl_?omatcopy_batch_strided,
mkl_?imatcopy_batch_strided, and mkl_?omatadd_batch_strided, supporting both synchronous
and asynchronous execution
• LAPACK, including LAPACK-like extensions
• All computations on the Intel GPU (supports both synchronous and asynchronous execution):
• ?getrf_batch
• ?getrf_batch_strided
• ?getrfnp_batch
• ?getrfnp_batch_strided
• ?getri
• ?getri_oop_batch
• ?getri_oop_batch_strided
• ?getrs
• ?getrs_batch_strided
• ?getrsnp_batch_strided
• ?gels_batch_strided
• ?potrf
• ?potri
• ?potrs
• ?trtri
• ?trtrs
• Hybrid; some computations on the Intel GPU (supports synchronous execution):
• ?gels
• ?geqrf
• ?getrf (all computations on the CPU for n <= 256)
• mkl_?getrfnp (all computations on the CPU for n <= 512)
• ?ormqr, ?unmqr
• dsyevd, zheevd
• dsyevx, zheevx
• dsygvd, zhegvd
• dsygvx, zhegvx
• Interface support only; all computations on the CPU (supports synchronous execution):
• ?gebrd
• ?gesvd
• ?gesvda_batch_strided
• ?orgqr, ?ungqr
• ?steqr

26
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• ?syev, ?heev
• ssyevd, cheevd
• ssyevx, cheevx
• ssygvd, chegvd
• ssygvx, chegvx
• ?sytrd, ?hetrd
• Vector Statistics
• Random number generators

NOTE
All distributions are supported. See https://www.intel.com/content/www/us/en/docs/onemkl/
developer-reference-c/2025-0/distribution-generators.html

Basic random number generators:

• VSL_BRNG_MCG31
• VSL_BRNG_MCG59
• VSL_BRNG_PHILOX4X32X10
• VSL_BRNG_MRG32K3A
• VSL_BRNG_MT19937
• VSL_BRNG_MT2203
• VSL_BRNG_SOBOL

Important Check the oneMKL DPC++ developer reference for the BRNG data type used in the
distributions in case the offload device doesn't have sycl::aspect::fp64 support.

• Summary statistics
Supports the vsl?SSCompute routine for the following estimates:
• VSL_SS_MEAN
• VSL_SS_SUM
• VSL_SS_2R_MOM
• VSL_SS_2R_SUM
• VSL_SS_3R_MOM
• VSL_SS_3R_SUM
• VSL_SS_4R_MOM
• VSL_SS_4R_SUM
• VSL_SS_2C_MOM
• VSL_SS_2C_SUM
• VSL_SS_3C_MOM
• VSL_SS_3C_SUM
• VSL_SS_4C_MOM
• VSL_SS_4C_SUM
• VSL_SS_KURTOSIS
• VSL_SS_SKEWNESS
• VSL_SS_MIN
• VSL_SS_MAX
• VSL_SS_VARIATION
Supported methods:
• VSL_SS_METHOD_FAST
• VSL_SS_METHOD_FAST_USER_MEAN
• FFTs through both DFTI and FFTW3 interfaces in one, two, and three dimensions.
• For COMPLEX_STORAGE, only the DFTI_COMPLEX_COMPLEX format is currently supported on CPU and
GPU devices.
• Both synchronous and asynchronous computations are supported.

27
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

• For R2C/C2R transforms on the GPU, only

DFTI_CONJUGATE_EVEN_STORAGE=DFTI_COMPLEX_COMPLEX is supported (implying
DFTI_PACKED_FORMAT=DFTI_CCE_FORMAT).
• NOTEINCONSISTENT_CONFIGURATION errors at compute time indicate an invalid descriptor or invalid
data pointer. Double check your data mapping if you encounter such errors.

• Arbitrary strides and batch distances are not supported for multi-dimensional R2C transforms offloaded
to the GPU. Considering the last dimension of the data, every element must be separated from its two
nearest peers (along another dimension and/or in another batch) by a constant distance. For example,
to compute a batched, two-dimensional R2C FFT of size [N2, N1] with input strides [0, S2, 1]
(row-major layout with unit elementary stride and no offset), INPUT_DISTANCE must be equal to
N2*S2 so that every element is separated from its nearest last-dimension counterpart(s) by a distance
S2 (in this example), even across batches.
• Due to the variadic implementation of DftiComputeForward and DftiComputeBackward, out-of-place
compute calls using the DFTI API with the OpenMP 5.1 dispatch construct differ from common dispatch
construct usage by requiring a "need_device_ptr" clause. The oneMKL examples provided on
installation demonstrate this usage.
• Transforms on GPU devices may overwrite FFT-irrelevant, padding entries in the output data.
• Sparse BLAS
• mkl_sparse_{s, d}_create_csr
• mkl_sparse_{s, d}_export_csr
• mkl_sparse_destroy
• mkl_sparse_order
• Currently supports only CSR matrix format.
• mkl_sparse_set_mv_hint
• Currently supports only SPARSE_OPERATION_NON_TRANSPOSE with CSR matrix format for general
MV (SPARSE_MATRIX_TYPE_GENERAL) and triangular MV (SPARSE_MATRIX_TYPE_TRIANGULAR with
fill modes SPARSE_FILL_MODE_LOWER/SPARSE_FILL_MODE_UPPER).
• mkl_sparse_set_sv_hint
• mkl_sparse_set_sm_hint
• Currently supports only CSR matrix format and SPARSE_MATRIX_TYPE_TRIANGULAR type.
• mkl_sparse_optimize
• Supports optimization for mkl_sparse_{s, d}_mv functionality based on supported hints added
through mkl_sparse_set_mv_hint offload.
• Supports optimization for mkl_sparse_{s, d}_trsv functionality based on supported hints added
through mkl_sparse_set_sv_hint offload.
• Supports optimization for mkl_sparse_{s, d}_trsm functionality based on supported hints added
through mkl_sparse_set_sm_hint offload.
• Both synchronous and asynchronous executions are supported.

NOTE Note that although you can run the mkl_sparse_optimize offload function asynchronously,
you are responsible for the data dependency between the optimization routine and the execution
routines.

• mkl_sparse_{s, d}_mv:
• Currently supports only SPARSE_OPERATION_NON_TRANSPOSE with the following combinations of
matrix types:
• SPARSE_MATRIX_TYPE_GENERAL
• SPARSE_MATRIX_TYPE_TRIANGULAR with fill modes SPARSE_FILL_MODE_LOWER/
SPARSE_FILL_MODE_UPPER and diagonal types SPARSE_DIAG_UNIT/SPARSE_DIAG_NON_UNIT
• SPARSE_MATRIX_TYPE_SYMMETRIC fill modes SPARSE_FILL_MODE_LOWER/
SPARSE_FILL_MODE_UPPER and diagonal type SPARSE_DIAG_NON_UNIT (currently,
SPARSE_DIAG_UNIT is not supported)

28
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• Both synchronous and asynchronous computations are supported.
• mkl_sparse_{s, d}_mm:
• Currently supported only with SPARSE_MATRIX_TYPE_GENERAL and
SPARSE_OPERATION_NON_TRANSPOSE.
• Both SPARSE_LAYOUT_ROW_MAJOR and SPARSE_LAYOUT_COLUMN_MAJOR are supported.
• Both synchronous and asynchronous computations are supported.
• mkl_sparse_{s, d}_trsv
• Currently supports only CSR matrix format with SPARSE_MATRIX_TYPE_TRIANGULAR and
SPARSE_OPERATION_NON_TRANSPOSE.
• Both synchronous and asynchronous computations are supported
• mkl_sparse_{s, d}_trsm
• Currently supports only CSR matrix format with SPARSE_MATRIX_TYPE_TRIANGULAR and
SPARSE_OPERATION_NON_TRANSPOSE.
• Both SPARSE_LAYOUT_ROW_MAJOR and SPARSE_LAYOUT_COLUMN_MAJOR are supported.
• Both synchronous and asynchronous computations are supported.
• mkl_sparse_sp2m
• Currently supported only with SPARSE_MATRIX_TYPE_GENERAL.
• Both synchronous and asynchronous computations are supported with Level Zero backend, and
currently only synchronous computations are supported with OpenCL backend.
• Note that you can run the mkl_sparse_sp2m offload function asynchronously, but you are
responsible for the data dependency between the first stage and the second stage of
mkl_sparse_sp2m.
• mkl_sparse_sp2m internally creates arrays for the sparse C matrix output. As they may be
expected to be used subsequently on both host and device, they are created internally using USM
shared memory. The arrays are managed by the library and will be cleaned up when the
corresponding C matrix handle is destroyed; however, direct access to the arrays is provided by the
mkl_sparse_{s,d}_export_csr() OpenMP offload function. Users are recommended to make a
copy to their own arrays if they want to have such data beyond the scope of the C matrix handle.
The choice of USM shared memory for C arrays is made for functional support of the OpenMP
Offload paradigm and has a performance impact over choosing USM device memory, which would
be more performant but not functional in all subsequent use cases.
• The created C matrix in the provided handle is not guaranteed to be sorted, so the
mkl_sparse_order() OpenMP offload API is provided for user convenience if that property is
needed.
• The input matrix handle A is not required to be sorted on input, but the input matrix handle B is
required to be sorted on input.
• In Sparse BLAS, the usage model consists of the creation stage, the inspection stage, the execution
stage, and the destruction stage. For Sparse BLAS with C OpenMP Offload, all stages can be
asynchronously executed, provided any data dependencies are already respected.
The OpenMP offload feature from Intel® oneAPI Math Kernel Library (oneMKL) enables you to run oneMKL
computations on Intel GPUs through the standard oneMKL APIs within an omp dispatch section. For
example, the standard CBLAS API for single precision real data type matrix multiply is:
void cblas_sgemm(const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE TransA,
const CBLAS_TRANSPOSE TransB, const MKL_INT M, const MKL_INT N,
const MKL_INT K, const float alpha, const float *A, const MKL_INT lda,
const float *B, const MKL_INT ldb, const float beta, float *C,
const MKL_INT ldc);
If the oneMKL function (for example, cblas_sgemm) is called outside of an omp dispatch section, or if
offload is disabled, then the CPU implementation is dispatched. If the same function is called within an omp
dispatch section and offload is possible then the GPU implementation is dispatched. By default the
execution of the oneMKL function within a dispatch construct is synchronous. OpenMP offload computations
may be done asynchronously by adding the nowait clause to the dispatch construct. This ensures that the
host thread encountering the task region generated by this construct will not be blocked by the oneMKL call.
Rather, the host thread is returned to the caller for further use. To finish the asynchronous (nowait)

29
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

computations and ensure memory and execution model consistency (for example, that the results of a
computation will be ready in memory to map), the last such nowait computation is followed by the stand-
alone construct #pragma omp taskwait.
From the OpenMP Application Programming Interface version 5.0 specification: "The taskwait region binds to
the current task region [i.e., in this case, the last nowait computation]. The current task region is suspended
at an implicit task scheduling point associated with the construct. The current task region remains suspended
until all child tasks that it generated before the taskwait region complete execution [currently, depend clause
is not supported]."

Example
Examples for using the OpenMP offload for oneMKL are located in the Intel® oneAPI Math Kernel Library
(oneMKL) installation directory, under:

examples/c_offload
The following code snippet shows how to use OpenMP offload for single-call oneMKL features such as most
dense linear algebra functionality.

#include <omp.h>
#include "mkl.h"
#include "mkl_omp_offload.h" // MKL header file for OpenMP offload
int dnum = 0;
int main() {
float *a, *b, *c, alpha = 1.0, beta = 1.0;
MKL_INT m = 150, n = 200, k = 128, lda = m, ldb = k, ldc = m;
MKL_INT sizea = lda * k, sizeb = ldb * n, sizec = ldc * n;
// allocate matrices and check pointers
a = (float *)mkl_malloc(sizea * sizeof(float), 64);
...
// initialize matrices
#pragma omp target map(c[0:sizec])
{
for (i = 0; i < sizec; i++) {
c[i] = 42;
}
...
}
// run gemm on host, use standard MKL interface
cblas_sgemm(CblasColMajor, CblasNoTrans, CblasNoTrans, m, n, k, alpha, a,
lda, b, ldb, beta, c, ldc);
// map the a, b, and c matrices on the device memory
#pragma omp target data map(to:a[0:sizea],b[0:sizeb]) map(tofrom:c[0:sizec])
device(dnum)
{
// run gemm on gpu, use standard MKL interface within a dispatch construct
// if offload is not possible, default to cpu
#pragma omp dispatch device(dnum)
cblas_sgemm(
CblasColMajor, CblasNoTrans, CblasNoTrans, m, n, k,
alpha, a, lda, b, ldb, beta, c, ldc
);
}
// Free matrices
mkl_free(a);
…
}

30
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Some of the oneMKL functionality requires to call a set of functions to perform the corresponding
computation. This is the case, for example, for the Discrete Fourier Transform which for a typical computation
involves calling the functions.
DFTI_EXTERN MKL_LONG DftiCreateDescriptor(DFTI_DESCRIPTOR_HANDLE*,
enum DFTI_CONFIG_VALUE,
enum DFTI_CONFIG_VALUE,
MKL_LONG, ...);
DFTI_EXTERN MKL_LONG DftiCommitDescriptor(DFTI_DESCRIPTOR_HANDLE);
DFTI_EXTERN MKL_LONG DftiComputeForward(DFTI_DESCRIPTOR_HANDLE, void*, ...);
DFTI_EXTERN MKL_LONG DftiComputeBackward(DFTI_DESCRIPTOR_HANDLE, void*, ...);
DFTI_EXTERN MKL_LONG DftiFreeDescriptor(DFTI_DESCRIPTOR_HANDLE*);

In that case, only a subset of the calls must be wrapped in an omp dispatch construct as shown in the
following code snippet for DFTI.
#include <omp.h>
#include "mkl.h"
#include "mkl_omp_offload.h"
int main(void)
{
const int devNum = 0;
const MKL_LONG N = 64; // Size of 1D transform
MKL_LONG status = 0;
MKL_LONG statusGPU = 0;
DFTI_DESCRIPTOR_HANDLE descHandle = NULL;
DFTI_DESCRIPTOR_HANDLE descHandleGPU = NULL;
MKL_Complex8 *x = NULL;
MKL_Complex8 *xGPU = NULL;
printf("Create DFTI descriptor\n");
status = DftiCreateDescriptor(&descHandle, DFTI_SINGLE, DFTI_COMPLEX, 1, N);
printf("Create GPU DFTI descriptor\n");
statusGPU = DftiCreateDescriptor(&descHandleGPU, DFTI_SINGLE, DFTI_COMPLEX,
1, N);
printf("Commit DFTI descriptor\n");
status = DftiCommitDescriptor(descHandle);
printf("Commit GPU DFTI descriptor\n");
#pragma omp dispatch device(devNum)
statusGPU = DftiCommitDescriptor(descHandleGPU);
printf("Allocate memory for input array\n");
x = (MKL_Complex8 *)mkl_malloc(N*sizeof(MKL_Complex8), 64);
printf("Allocate memory for GPU input array\n");
xGPU = (MKL_Complex8 *)mkl_malloc(N*sizeof(MKL_Complex8), 64);
printf("Initialize input for forward FFT\n");
// init x and xGPU ...
printf("Compute forward FFT in-place\n");
status = DftiComputeForward(descHandle, x);
printf("Compute GPU forward FFT in-place\n");
#pragma omp target data map(tofrom:xGPU[0:N]) device(devNum)
{
#pragma omp dispatch device(devNum)
statusGPU = DftiComputeForward(descHandleGPU, xGPU);
}
// use results now in x and xGPU ...
cleanup:
DftiFreeDescriptor(&descHandle);
DftiFreeDescriptor(&descHandleGPU);

31
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

mkl_free(x);
mkl_free(xGPU);
}
For asynchronous execution of multi-call oneMKL computation, the nowait clause needs to be used only on
the call to the function performing the actual computation (for example,
DftiCompute{Forward,Backward}). For instance, the following snippet shows how the DFTI example above
could be changed to have two, back-to-back, asynchronous (nowait) computations dispatched, with a
taskwait at the end of the second to ensure the completion of both computations before their results are
accessed:
printf("Compute Intel GPU forward FFT 1 in-place\n");
#pragma omp target data map(tofrom:x1GPU[0:N1], x2GPU[0:N2]) device(devNum)
{
#pragma omp dispatch device(devNum) nowait
status1GPU = DftiComputeForward(descHandle1GPU, x1GPU);
printf("Compute Intel GPU forward FFT 2 in-place\n");
#pragma omp dispatch device(devNum) nowait
status2GPU = DftiComputeForward(descHandle2GPU, x2GPU);
#pragma omp taskwait
}
if (status1GPU != DFTI_NO_ERROR) goto failed;
if (status2GPU != DFTI_NO_ERROR) goto failed;
For sparse BLAS computations, the workflow ‘create a CSR matrix handle’ → ‘compute’ → ‘destroy the CSR
matrix handle’ must be done so that the offloaded data arrays are alive through the full workflow. For
instance, if you are using a target data map, then the workflow must be contained in a single target data
region. On the other hand, if the arrays were allocated directly using omp_target_alloc() or the Intel
Extensions omp_target_alloc_host/omp_target_alloc_device/omp_target_alloc_shared, then the
workflow must be contained at least in a subset of the scope where those arrays are usable; that is, before
the corresponding calls to omp_target_free. The following snippet shows how the Sparse BLAS OpenMP
Offload example for mkl_sparse_s_mv() could be run using a target data map region, where N is the
number of rows, M is the number of columns, and NNZ is the number of non-zero entries of the sparse
matrix csrA_gpu, x is the input vector, and the output is stored in the z array:
#pragma omp target data map(to:ia[0:N+1],ja[0:NNZ],a[0:NNZ],x[0:M]) map(tofrom:z[0:N])
device(devNum)
{
#pragma omp dispatch device(devNum)
status_gpu1 = mkl_sparse_s_create_csr(&csrA_gpu, SPARSE_INDEX_BASE_ZERO, N, M, ia, ia +
1, ja, a);
#pragma omp dispatch device(devNum)
status_gpu2 = mkl_sparse_s_mv(SPARSE_OPERATION_NON_TRANSPOSE, alpha, csrA_gpu, descrA,
x, beta, z);
#pragma omp dispatch device(devNum)
status_gpu3 = mkl_sparse_destroy(csrA_gpu);
}
For asynchronous execution of multi-call oneMKL Sparse BLAS computation, the nowait clause can be added
to the call of the function performing the actual computation (for example, calls to the
mkl_sparse_{s,d}_mv() function).
As an example, the following snippet shows how the Sparse BLAS example above could be changed to have
two asynchronous (nowait) computations using the same matrix handle, csrA_gpu, but unrelated vector
data so there is no read/write dependency between them. Add a taskwait at the end of the second
execution to ensure the completion of both computations before the mkl_sparse_destroy() function is
called:
#pragma omp target data map(to:ia[0:N+1],ja[0:NNZ],a[0:NNZ],x[0:M],w[0:M])
map(tofrom:y[0:N],z[0:N]) device(devNum)
{

32
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

#pragma omp dispatch device(devNum)

status_gpu1 = mkl_sparse_s_create_csr(&csrA_gpu, SPARSE_INDEX_BASE_ZERO, N, M, ia, ia +
1, ja, a);

#pragma omp dispatch device(devNum) nowait

status_gpu2 = mkl_sparse_s_mv(SPARSE_OPERATION_NON_TRANSPOSE, alpha, csrA_gpu, descrA,
x, beta, z);

#pragma omp dispatch device(devNum) nowait

status_gpu3 = mkl_sparse_s_mv(SPARSE_OPERATION_NON_TRANSPOSE, alpha, csrA_gpu, descrA,
w, beta, y);
#pragma omp taskwait

#pragma omp dispatch device(devNum)

status_gpu4 = mkl_sparse_destroy(csrA_gpu);
}

BLAS and Sparse BLAS Routines

Intel® oneAPI Math Kernel Library (oneMKL)implements the BLAS and Sparse BLAS routines, and BLAS-like
extensions. The routine descriptions are arranged in several sections:
• BLAS Level 1 Routines (vector-vector operations)
• BLAS Level 2 Routines (matrix-vector operations)
• BLAS Level 3 Routines (matrix-matrix operations)
• Sparse BLAS Level 1 Routines (vector-vector operations).
• Sparse BLAS Level 2 and Level 3 Routines (matrix-vector and matrix-matrix operations)
• BLAS-like Extensions
The question mark in the group name corresponds to different character codes indicating the data type (s, d,
c, and z or their combination); see Routine Naming Conventions.
When BLAS or Sparse BLAS routines encounter an error, they call the error reporting routine xerbla.

BLAS Routines

NOTE Different arrays used as parameters to Intel® MKL BLAS routines must not overlap.

Naming Conventions for BLAS Routines

BLAS routine names have the following structure:
<character> <name> <mod> [_64]
The <character> field indicates the data type:

s real, single precision

c complex, single precision

d real, double precision

z complex, double precision

Some routines and functions can have combined character codes, such as sc or dz.

33
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

For example, the function scasum uses a complex input array and returns a real value.
The <name> field, in BLAS level 1, indicates the operation type. For example, the BLAS level 1
routines ?dot, ?rot, ?swap compute a vector dot product, vector rotation, and vector swap, respectively.
In BLAS level 2 and 3, <name> reflects the matrix argument type:

ge general matrix

gb general band matrix

sy symmetric matrix

sp symmetric matrix (packed storage)

sb symmetric band matrix

he Hermitian matrix

hp Hermitian matrix (packed storage)

hb Hermitian band matrix

tr triangular matrix

tp triangular matrix (packed storage)

tb triangular band matrix.

The <mod> field, if present, provides additional details of the operation. BLAS level 1 names can have the
following characters in the <mod> field:

c conjugated vector

u unconjugated vector

g Givens rotation construction

m modified Givens rotation

mg modified Givens rotation construction

BLAS level 2 names can have the following characters in the <mod> field:

mv matrix-vector product

sv solving a system of linear equations with a single unknown vector

r rank-1 update of a matrix

r2 rank-2 update of a matrix.

BLAS level 3 names can have the following characters in the <mod> field:

mm matrix-matrix product

sm solving a system of linear equations with multiple unknown vectors

rk rank-k update of a matrix

r2k rank-2k update of a matrix.

On 64-bit platforms, routines with the _64 suffix support large data arrays in the LP64 interface library and
enable you to mix integer types in one application. For example, when an application is linked with the LP64
interface library, SGEMM indexes arrays with the 32-bit integer type, while SGEMM_64 indexes arrays with the
64-bit integer type. For more interface library details, see "Using the ILP64 Interface vs. LP64 Interface" in
the developer guide.
The examples below illustrate how to interpret BLAS routine names:

34
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ddot <d> <dot>: real and double precision, vector-vector dot product

cdotc <c> <dot> <c>: complex and single precision, vector-vector dot product,
conjugated

cdotu <c> <dot> <u>: complex and single precision, vector-vector dot product,
unconjugated

scasum <sc> <asum>: real and single-precision output, complex and single-precision
input, sum of magnitudes of vector elements

sgemv <s> <ge> <mv>: real and single precision, general matrix, matrix-vector product

ztrmm <z> <tr> <mm> _64: complex and double precision, triangular matrix, matrix-
matrix product, 64-bit integer type

Sparse BLAS level 1 naming conventions are similar to those of BLAS level 1. For more information, see
Naming Conventions.

C Interface Conventions for BLAS Routines

CBLAS, the C interface to the Basic Linear Algebra Subprograms (BLAS), provides a C language interface to
BLAS routines for Intel® oneAPI Math Kernel Library (oneMKL). While you can call the Fortran implementation
of BLAS, for coding in C the CBLAS interface has some advantages such as allowing you to specify column-
major or row-major ordering with thelayout parameter.
For more information about calling Fortran routines from C in general, and specifically about calling BLAS and
CBLAS routines, see " Mixed-language Programming with the Intel® oneAPI Math Kernel Library" in theIntel®
oneAPI Math Kernel Library Developer Guide.

NOTE
This reference contains syntax in C for both the CBLAS interface and the Fortran BLAS routines.

In CBLAS, the Fortran routine names are prefixed with cblas_ (for example, dasum becomes cblas_dasum).
Names of all CBLAS functions are in lowercase letters. Like BLAS routines, Intel® oneAPI Math Kernel Library
provides CBLAS routines with the _64 suffix (for example, cblas_dasum_64) to support large data arrays in
the LP64 interface library on 64-bit platforms. For more interface library details, see "Using the ILP64
Interface vs. LP64 Interface" in the developer guide.
Complex functions ?dotc and ?dotu become CBLAS subroutines (void functions); they return the complex
result via a void pointer, added as the last parameter. CBLAS names of these functions are suffixed with
_sub. For example, the BLAS function cdotc corresponds to cblas_cdotc_sub.

WARNING
Users of the CBLAS interface should be aware that the CBLAS are just a C interface to the BLAS, which
is based on the FORTRAN standard and subject to the FORTRAN standard restrictions. In particular, the
output parameters should not be referenced through more than one argument.

NOTE
This interface is not implemented in the Sparse BLAS Level 2 and Level 3 routines.

The arguments of CBLAS functions comply with the following rules:

• Input arguments are declared with the const modifier.
• Non-complex scalar input arguments are passed by value.
• Complex scalar input arguments are passed as void pointers.

35
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

• Array arguments are passed by address.

• BLAS character arguments are replaced by the appropriate enumerated type.
• Level 2 and Level 3 routines acquire an additional parameter of type CBLAS_LAYOUT as their first
argument. This parameter specifies whether two-dimensional arrays are row-major (CblasRowMajor) or
column-major (CblasColMajor).

Enumerated Types
The CBLAS interface uses the following enumerated types:

enum CBLAS_LAYOUT {
CblasRowMajor=101, /* row-major arrays */
CblasColMajor=102}; /* column-major arrays */
enum CBLAS_TRANSPOSE {
CblasNoTrans=111, /* trans='N' */
CblasTrans=112, /* trans='T' */
CblasConjTrans=113}; /* trans='C' */
enum CBLAS_UPLO {
CblasUpper=121, /* uplo ='U' */
CblasLower=122}; /* uplo ='L' */
enum CBLAS_DIAG {
CblasNonUnit=131, /* diag ='N' */
CblasUnit=132}; /* diag ='U' */
enum CBLAS_SIDE {
CblasLeft=141, /* side ='L' */
CblasRight=142}; /* side ='R' */

Matrix Storage Schemes for BLAS Routines

Matrix arguments of BLAS and CBLAS routines can use the following storage schemes:
• Full storage: a matrix A is stored in a two-dimensional array a, with the matrix element Aij stored in the
array element a[i + j*lda] for column-major layout and a[j + i*lda] for row-major layout, where
lda is the leading dimension for the array.
• Packed storage scheme allows you to store symmetric, Hermitian, or triangular matrices more compactly.
For column-major layout, the upper or lower triangle of the matrix is packed by columns in a one
dimensional array. For row-major layout, the upper or lower triangle of the matrix is packed by rows in a
one dimensional array.
• Band storage: a band matrix is stored compactly in a two-dimensional array. For column-major layout,
columns of the matrix are stored in the corresponding columns of the array, and diagonals of the matrix
are stored in a specific row of the array. For row-major layout, rows of the matrix are stored in the
corresponding rows of the array, and diagonals of the matrix are stored in a specific column of the array.
For more information on matrix storage schemes, see Matrix Arguments in the Appendix “Routine and
Function Arguments”.

Row-Major and Column-Major Layout

The BLAS routines follow the Fortran convention of storing two-dimensional arrays using column-major
layout. When calling BLAS routines from C, remember that they require arrays to be in column-major format,
not the row-major format that is the convention for C. Unless otherwise specified, the psuedo-code examples
for the BLAS routines illustrate matrices stored using column-major layout.
The CBLAS interface allows you to specify either column-major or row-major layout for BLAS Level 2 and
Level 3 routines, by setting the layout parameter to CblasColMajor or CblasRowMajor.

BLAS Level 1 Routines and Functions

BLAS Level 1 includes routines and functions, which perform vector-vector operations. The following table
lists the BLAS Level 1 routine and function groups and the data types associated with them.

36
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
BLAS Level 1 Routine and Function Groups and Their Data Types
Routine or Data Types Description
Function Group

cblas_?asum s, d, sc, dz Sum of vector magnitudes (functions)

cblas_?axpy s, d, c, z Scalar-vector product (routines)

cblas_?copy s, d, c, z Copy vector (routines)

cblas_?dot s, d Dot product (functions)

cblas_?sdot sd, d Dot product with double precision (functions)

cblas_?dotc c, z Dot product conjugated (functions)

cblas_?dotu c, z Dot product unconjugated (functions)

cblas_?nrm2 s, d, sc, dz Vector 2-norm (Euclidean norm) (functions)

cblas_?rot s, d, c, z, cs, zd Plane rotation of points (routines)

cblas_?rotg s, d, c, z Generate Givens rotation of points (routines)

cblas_?rotm s, d Modified Givens plane rotation of points (routines)

cblas_?rotmg s, d Generate modified Givens plane rotation of points

(routines)

cblas_?scal s, d, c, z, cs, zd Vector-scalar product (routines)

cblas_?swap s, d, c, z Vector-vector swap (routines)

cblas_i?amax s, d, c, z Index of the maximum absolute value element of a vector

(functions)

cblas_i?amin s, d, c, z Index of the minimum absolute value element of a vector

(functions)

cblas_?cabs1 s, d Auxiliary functions, compute the absolute value of a

complex number of single or double precision

cblas_?asum
Computes the sum of magnitudes of the vector
elements.

Syntax
float cblas_sasum (const MKL_INT n, const float *x, const MKL_INT incx);
float cblas_scasum (const MKL_INT n, const void *x, const MKL_INT incx);
double cblas_dasum (const MKL_INT n, const double *x, const MKL_INT incx);
double cblas_dzasum (const MKL_INT n, const void *x, const MKL_INT incx);

Include Files
• mkl.h

Description

37
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The ?asum routine computes the sum of the magnitudes of elements of a real vector, or the sum of
magnitudes of the real and imaginary parts of elements of a complex vector:

res = |Rex1| + |Imx1| + |Rex2| + Imx2|+ ... + |Rexn| + |Imxn|,

n
result = ∑ Re X i + I m Xi
i=1

where x is a vector with n elements.

Input Parameters

n Specifies the number of elements in vector x.

x Array, size at least (1 + (n-1)*abs(incx)).

incx Specifies the increment for indexing vector x.

Output Parameters

res Contains the sum of magnitudes of real and imaginary parts of all elements
of the vector.

Return Values
Contains the sum of magnitudes of real and imaginary parts of all elements of the vector.

cblas_?axpy
Computes a vector-scalar product and adds the result
to a vector.

Syntax
void cblas_saxpy (const MKL_INT n, const float a, const float *x, const MKL_INT incx,
float *y, const MKL_INT incy);
void cblas_daxpy (const MKL_INT n, const double a, const double *x, const MKL_INT incx,
double *y, const MKL_INT incy);
void cblas_caxpy (const MKL_INT n, const void *a, const void *x, const MKL_INT incx,
void *y, const MKL_INT incy);
void cblas_zaxpy (const MKL_INT n, const void *a, const void *x, const MKL_INT incx,
void *y, const MKL_INT incy);

Include Files
• mkl.h

Description

The ?axpy routines perform a vector-vector operation defined as

y := a*x + y
where:
a is a scalar
x and y are vectors each with a number of elements that equals n.

38
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

n Specifies the number of elements in vectors x and y.

a Specifies the scalar a.

x Array, size at least (1 + (n-1)*abs(incx)).

incx Specifies the increment for the elements of x.

y Array, size at least (1 + (n-1)*abs(incy)).

incy Specifies the increment for the elements of y.

Output Parameters

y Contains the updated vector y.

cblas_?copy
Copies a vector to another vector.

Syntax
void cblas_scopy (const MKL_INT n, const float *x, const MKL_INT incx, float *y, const
MKL_INT incy);
void cblas_dcopy (const MKL_INT n, const double *x, const MKL_INT incx, double *y,
const MKL_INT incy);
void cblas_ccopy (const MKL_INT n, const void *x, const MKL_INT incx, void *y, const
MKL_INT incy);
void cblas_zcopy (const MKL_INT n, const void *x, const MKL_INT incx, void *y, const
MKL_INT incy);

Include Files
• mkl.h

Description

The ?copy routines perform a vector-vector operation defined as

y = x,
where x and y are vectors.

Input Parameters

n Specifies the number of elements in vectors x and y.

x Array, size at least (1 + (n-1)*abs(incx)).

incx Specifies the increment for the elements of x.

y Array, size at least (1 + (n-1)*abs(incy)).

incy Specifies the increment for the elements of y.

39
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

y Contains a copy of the vector x if n is positive. Otherwise, parameters are

unaltered.

cblas_?dot
Computes a vector-vector dot product.

Syntax
float cblas_sdot (const MKL_INT n, const float *x, const MKL_INT incx, const float *y,
const MKL_INT incy);
double cblas_ddot (const MKL_INT n, const double *x, const MKL_INT incx, const double
*y, const MKL_INT incy);

Include Files
• mkl.h

Description

The ?dot routines perform a vector-vector reduction operation defined as

where xi and yi are elements of vectors x and y.

Input Parameters

n Specifies the number of elements in vectors x and y.

x Array, size at least (1+(n-1)*abs(incx)).

incx Specifies the increment for the elements of x.

y Array, size at least (1+(n-1)*abs(incy)).

incy Specifies the increment for the elements of y.

Return Values
The result of the dot product of x and y, if n is positive. Otherwise, returns 0.

cblas_?sdot
Computes a vector-vector dot product with double
precision.

Syntax
float cblas_sdsdot (const MKL_INT n, const float sb, const float *sx, const MKL_INT
incx, const float *sy, const MKL_INT incy);

40
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
double cblas_dsdot (const MKL_INT n, const float *sx, const MKL_INT incx, const float
*sy, const MKL_INT incy);

Include Files
• mkl.h

Description

The ?sdot routines compute the inner product of two vectors with double precision. Both routines use double
precision accumulation of the intermediate results, but the sdsdot routine outputs the final result in single
precision, whereas the dsdot routine outputs the double precision result. The function sdsdot also adds
scalar value sb to the inner product.

Input Parameters

n Specifies the number of elements in the input vectors sx and sy.

sb Single precision scalar to be added to inner product (for the function

sdsdot only).

sx, sy Arrays, size at least (1+(n -1)abs(incx)) and (1+(n-1)abs(incy)),

respectively. Contain the input single precision vectors.

incx Specifies the increment for the elements of sx.

incy Specifies the increment for the elements of sy.

Output Parameters

res Contains the result of the dot product of sx and sy (with sb added for
sdsdot), if n is positive. Otherwise, res contains sb for sdsdot and 0 for
dsdot.

Return Values
The result of the dot product of sx and sy (with sb added for sdsdot), if n is positive. Otherwise, returns sb
for sdsdot and 0 for dsdot.

cblas_?dotc
Computes a dot product of a conjugated vector with
another vector.

Syntax
void cblas_cdotc_sub (const MKL_INT n, const void *x, const MKL_INT incx, const void
*y, const MKL_INT incy, void *dotc);
void cblas_zdotc_sub (const MKL_INT n, const void *x, const MKL_INT incx, const void
*y, const MKL_INT incy, void *dotc);

Include Files
• mkl.h

Description

41
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The ?dotc routines perform a vector-vector operation defined as:

where xi and yi are elements of vectors x and y.

Input Parameters

n Specifies the number of elements in vectors x and y.

x Array, size at least (1 + (n -1)*abs(incx)).

incx Specifies the increment for the elements of x.

y Array, size at least (1 + (n -1)*abs(incy)).

incy Specifies the increment for the elements of y.

Output Parameters

dotc Contains the result of the dot product of the conjugated x and unconjugated
y, if n is positive. Otherwise, it contains 0.

cblas_?dotu
Computes a complex vector-vector dot product.

Syntax
void cblas_cdotu_sub (const MKL_INT n, const void *x, const MKL_INT incx, const void
*y, const MKL_INT incy, void *dotu);
void cblas_zdotu_sub (const MKL_INT n, const void *x, const MKL_INT incx, const void
*y, const MKL_INT incy, void *dotu);

Include Files
• mkl.h

Description

The ?dotu routines perform a vector-vector reduction operation defined as

where xi and yi are elements of complex vectors x and y.

NOTE The _sub suffix on cblas_cdotu_sub and cblas_zdotu_sub is to emphasize that these
are subroutines rather than functions (the return value is stored into the dotu pointer).

42
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

n Specifies the number of elements in vectors x and y.

x Array, size at least (1 + (n -1)*abs(incx)).

incx Specifies the increment for the elements of x.

y Array, size at least (1 + (n -1)*abs(incy)).

incy Specifies the increment for the elements of y.

Output Parameters

dotu Contains the result of the dot product of x and y, if n is positive. Otherwise,
it contains 0.

cblas_?nrm2
Computes the Euclidean norm of a vector.

Syntax
float cblas_snrm2 (const MKL_INT n, const float *x, const MKL_INT incx);
double cblas_dnrm2 (const MKL_INT n, const double *x, const MKL_INT incx);
float cblas_scnrm2 (const MKL_INT n, const void *x, const MKL_INT incx);
double cblas_dznrm2 (const MKL_INT n, const void *x, const MKL_INT incx);

Include Files
• mkl.h

Description

The ?nrm2 routines perform a vector reduction operation defined as

res = ||x||,
where:
x is a vector,
res is a value containing the Euclidean norm of the elements of x.

Input Parameters

n Specifies the number of elements in vector x.

x Array, size at least (1 + (n -1)*abs (incx)).

incx Specifies the increment for the elements of x.

Return Values
The Euclidean norm of the vector x.

43
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

cblas_?rot
Performs rotation of points in the plane.

Syntax
void cblas_srot (const MKL_INT n, float *x, const MKL_INT incx, float *y, const MKL_INT incy,
const float c, const float s);
void cblas_drot (const MKL_INT n, double *x, const MKL_INT incx, double *y, const MKL_INT incy,
const double c, const double s);
void cblas_crot (const MKL_INT n, void *x, const MKL_INT incx, void *y, const MKL_INT incy,
const float c, const void* s);
void cblas_zrot (const MKL_INT n, void *x, const MKL_INT incx, void *y, const MKL_INT incy,
const double c, const void* s);
void cblas_csrot (const MKL_INT n, void *x, const MKL_INT incx, void *y, const MKL_INT incy,
const float c, const float s);
void cblas_zdrot (const MKL_INT n, void *x, const MKL_INT incx, void *y, const MKL_INT incy,
const double c, const double s);

Description
Given two complex vectors x and y, each vector element of these vectors is replaced as follows:

xi = c*xi + s*yi
yi = c*yi - s*xi
If s is a complex type, each vector element is replaced as follows:

xi = c*xi + s*yi
yi = c*yi - conj(s)*xi

Input Parameters

n Specifies the number of elements in vectors x and y.

x Array, size at least (1 + (n-1)*abs(incx)).

incx Specifies the increment for the elements of x.

y Array, size at least (1 + (n -1)*abs(incy)).

incy Specifies the increment for the elements of y.

c A scalar.

s A scalar.

Output Parameters

x Each element is replaced by cx + sy.

y Each element is replaced by cy - sx, or by cy-conj(s)x if s is a

complex type.

cblas_?rotg
Computes the parameters for a Givens rotation.

Syntax
void cblas_srotg (float *a, float *b, float *c, float *s);
void cblas_drotg (double *a, double *b, double *c, double *s);

44
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void cblas_crotg (void *a, const void *b, float *c, void *s);
void cblas_zrotg (void *a, const void *b, double *c, void *s);

Include Files
• mkl.h

Description

Given the Cartesian coordinates (a, b) of a point, these routines return the parameters c, s, r, and z
associated with the Givens rotation. The parameters c and s define a unitary matrix such that:

The parameter z is defined such that if |a| > |b|, z is s; otherwise if c is not 0 z is 1/c; otherwise z is 1.

Input Parameters

a Provides the x-coordinate of the point p.

b Provides the y-coordinate of the point p.

Output Parameters

a Contains the parameter r associated with the Givens rotation.

b Contains the parameter z associated with the Givens rotation.

c Contains the parameter c associated with the Givens rotation.

s Contains the parameter s associated with the Givens rotation.

cblas_?rotm
Performs modified Givens rotation of points in the
plane.

Syntax
void cblas_srotm (const MKL_INT n, float *x, const MKL_INT incx, float *y, const
MKL_INT incy, const float *param);
void cblas_drotm (const MKL_INT n, double *x, const MKL_INT incx, double *y, const
MKL_INT incy, const double *param);

45
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description

Given two vectors x and y, each vector element of these vectors is replaced as follows:
xi xi
=H
yi yi

for i=1 to n, where H is a modified Givens transformation matrix whose values are stored in the param[1]
through param[4] array. See discussion on the param argument.

Input Parameters

n Specifies the number of elements in vectors x and y.

x Array, size at least (1 + (n -1)*abs(incx)).

incx Specifies the increment for the elements of x.

y Array, size at least (1 + (n -1)*abs(incy)).

incy Specifies the increment for the elements of y.

param Array, size 5.

The elements of the param array are:
param[0] contains a switch, flag. param[1-4] contain h11, h21, h12, and
h22, respectively, the components of the array H.
Depending on the values of flag, the components of H are set as follows:
h11 h12
flag = -1.0: H =
h21 h22

1.0 h12
flag = 0.0: H =
h21 1.0

h11 1.0
flag = 1.0: H =
−1.0 h22

1.0 0.0
flag = -2.0: H =
0.0 1.0
In the last three cases, the matrix entries of 1.0, -1.0, and 0.0 are assumed
based on the value of flag and are not required to be set in the param
vector.

Output Parameters

x Each element x[i] is replaced by h11x[i] + h12y[i].

y Each element y[i] is replaced by h21x[i] + h22y[i].

46
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
cblas_?rotmg
Computes the parameters for a modified Givens
rotation.

Syntax
void cblas_srotmg (float *d1, float *d2, float *x1, const float y1, float *param);
void cblas_drotmg (double *d1, double *d2, double *x1, const double y1, double *param);

Include Files
• mkl.h

Description

Given Cartesian coordinates (x1, y1) of an input vector, these routines compute the components of a
modified Givens transformation matrix H that zeros the y-component of the resulting vector:

x1 x1 d1
=H
0 y1 d2

Input Parameters

d1 Provides the scaling factor for the x-coordinate of the input vector.

d2 Provides the scaling factor for the y-coordinate of the input vector.

x1 Provides the x-coordinate of the input vector.

y1 Provides the y-coordinate of the input vector.

Output Parameters

d1 Provides the first diagonal element of the updated matrix.

d2 Provides the second diagonal element of the updated matrix.

x1 Provides the x-coordinate of the rotated vector before scaling.

param Array, size 5.

The elements of the param array are:
param[0] contains a switch, flag. the other array elements param[1-4]
contain the components of the array H: h11, h21, h12, and h22, respectively.
Depending on the values of flag, the components of H are set as follows:
h11 h12
flag = -1.0: H =
h21 h22

1.0 h12
flag = 0.0: H =
h21 1.0

h11 1.0
flag = 1.0: H =
−1.0 h22

1.0 0.0
flag = -2.0: H =
0.0 1.0

47
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

In the last three cases, the matrix entries of 1.0, -1.0, and 0.0 are assumed
based on the value of flag and are not required to be set in the param
vector.

cblas_?scal
Computes the product of a vector by a scalar.

Syntax
void cblas_sscal (const MKL_INT n, const float a, float *x, const MKL_INT incx);
void cblas_dscal (const MKL_INT n, const double a, double *x, const MKL_INT incx);
void cblas_cscal (const MKL_INT n, const void *a, void *x, const MKL_INT incx);
void cblas_zscal (const MKL_INT n, const void *a, void *x, const MKL_INT incx);
void cblas_csscal (const MKL_INT n, const float a, void *x, const MKL_INT incx);
void cblas_zdscal (const MKL_INT n, const double a, void *x, const MKL_INT incx);

Include Files
• mkl.h

Description

The ?scal routines perform a vector operation defined as

x = a*x
where:
a is a scalar, x is an n-element vector.

Input Parameters

n Specifies the number of elements in vector x.

a Specifies the scalar a.

x Array, size at least (1 + (n -1)*abs(incx)).

incx Specifies the increment for the elements of x.

Output Parameters

x Updated vector x.

cblas_?swap
Swaps a vector with another vector.

Syntax
void cblas_sswap (const MKL_INT n, float *x, const MKL_INT incx, float *y, const
MKL_INT incy);
void cblas_dswap (const MKL_INT n, double *x, const MKL_INT incx, double *y, const
MKL_INT incy);

48
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void cblas_cswap (const MKL_INT n, void *x, const MKL_INT incx, void *y, const MKL_INT
incy);
void cblas_zswap (const MKL_INT n, void *x, const MKL_INT incx, void *y, const MKL_INT
incy);

Include Files
• mkl.h

Description

Given two vectors x and y, the ?swap routines return vectors y and x swapped, each replacing the other.

Input Parameters

n Specifies the number of elements in vectors x and y.

x Array, size at least (1 + (n-1)*abs(incx)).

incx Specifies the increment for the elements of x.

y Array, size at least (1 + (n-1)*abs(incy)).

incy Specifies the increment for the elements of y.

Output Parameters

x Contains the resultant vector x, that is, the input vector y.

y Contains the resultant vector y, that is, the input vector x.

cblas_i?amax
Finds the index of the element with maximum
absolute value.

Syntax
CBLAS_INDEX cblas_isamax (const MKL_INT n, const float *x, const MKL_INT incx);
CBLAS_INDEX cblas_idamax (const MKL_INT n, const double *x, const MKL_INT incx);
CBLAS_INDEX cblas_icamax (const MKL_INT n, const void *x, const MKL_INT incx);
CBLAS_INDEX cblas_izamax (const MKL_INT n, const void *x, const MKL_INT incx);

Include Files
• mkl.h

Description

Given a vector x, the i?amax functions return the position of the vector element x[i] that has the largest
absolute value for real flavors, or the largest sum |Re(x[i])|+|Im(x[i])| for complex flavors.

If either n or incx are not positive, the routine returns 0.

If more than one vector element is found with the same largest absolute value, the index of the first one
encountered is returned.

49
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If the vector contains NaN values, then the routine returns the index of the first NaN.

Input Parameters

n Specifies the number of elements in vector x.

x Array, size at least (1+(n-1)*abs(incx)).

incx Specifies the increment for the elements of x.

Return Values
Returns the position of vector element that has the largest absolute value such that x[index-1] has the
largest absolute value. The index returned is zero-based.

cblas_i?amin
Finds the index of the element with the smallest
absolute value.

Syntax
CBLAS_INDEX cblas_isamin (const MKL_INT n, const float *x, const MKL_INT incx);
CBLAS_INDEX cblas_idamin (const MKL_INT n, const double *x, const MKL_INT incx);
CBLAS_INDEX cblas_icamin (const MKL_INT n, const void *x, const MKL_INT incx);
CBLAS_INDEX cblas_izamin (const MKL_INT n, const void *x, const MKL_INT incx);

Include Files
• mkl.h

Description
Given a vector x, the i?amin functions return the position of the vector element x[i] that has the smallest
absolute value for real flavors, or the smallest sum |Re(x[i])|+|Im(x[i])| for complex flavors.

If either n or incx are not positive, the routine returns 0.

If more than one vector element is found with the same smallest absolute value, the index of the first one
encountered is returned.
If the vector contains NaN values, then the routine returns the index of the first NaN.

Input Parameters

n On entry, n specifies the number of elements in vector x.

x Array, size at least (1+(n-1)*abs(incx)).

incx Specifies the increment for the elements of x.

Return Values
Indicates the position of vector element with the smallest absolute value such that x[index-1] has the
smallest absolute value. The index returned is zero-based.

cblas_?cabs1
Computes absolute value of complex number.

50
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
float cblas_scabs1 (const void *z);
double cblas_dcabs1 (const void *z);

Include Files
• mkl.h

Description

The ?cabs1 is an auxiliary routine for a few BLAS Level 1 routines. This routine performs an operation
defined as

res=|Re(z)|+|Im(z)|,
where z is a scalar, and res is a value containing the absolute value of a complex number z.

Input Parameters

z Scalar.

Return Values
The absolute value of a complex number z.

BLAS Level 2 Routines

This section describes BLAS Level 2 routines, which perform matrix-vector operations. The following table
lists the BLAS Level 2 routine groups and the data types associated with them.
BLAS Level 2 Routine Groups and Their Data Types
Routine Groups Data Types Description

cblas_?gbmv s, d, c, z Matrix-vector product using a general band matrix

cblas?_gemv s, d, c, z Matrix-vector product using a general matrix

cblas_?ger s, d Rank-1 update of a general matrix

cblas_?gerc c, z Rank-1 update of a conjugated general matrix

cblas_?geru c, z Rank-1 update of a general matrix, unconjugated

cblas_?hbmv c, z Matrix-vector product using a Hermitian band matrix

cblas_?hemv c, z Matrix-vector product using a Hermitian matrix

cblas_?her c, z Rank-1 update of a Hermitian matrix

cblas_?her2 c, z Rank-2 update of a Hermitian matrix

cblas_?hpmv c, z Matrix-vector product using a Hermitian packed matrix

cblas_?hpr c, z Rank-1 update of a Hermitian packed matrix

cblas_?hpr2 c, z Rank-2 update of a Hermitian packed matrix

cblas_?sbmv s, d Matrix-vector product using symmetric band matrix

51
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Routine Groups Data Types Description

cblas_?spmv s, d Matrix-vector product using a symmetric packed matrix

cblas_?spr s, d Rank-1 update of a symmetric packed matrix

cblas_?spr2 s, d Rank-2 update of a symmetric packed matrix

cblas_?symv s, d Matrix-vector product using a symmetric matrix

cblas_?syr s, d Rank-1 update of a symmetric matrix

cblas_?syr2 s, d Rank-2 update of a symmetric matrix

cblas_?tbmv s, d, c, z Matrix-vector product using a triangular band matrix

cblas_?tbsv s, d, c, z Solution of a linear system of equations with a triangular

band matrix

cblas_?tpmv s, d, c, z Matrix-vector product using a triangular packed matrix

cblas_?tpsv s, d, c, z Solution of a linear system of equations with a triangular

packed matrix

cblas_?trmv s, d, c, z Matrix-vector product using a triangular matrix

cblas_?trsv s, d, c, z Solution of a linear system of equations with a triangular

matrix

cblas_?gbmv
Computes a matrix-vector product with a general
band matrix.

Syntax
void cblas_sgbmv (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE trans, const MKL_INT
m, const MKL_INT n, const MKL_INT kl, const MKL_INT ku, const float alpha, const float
*a, const MKL_INT lda, const float *x, const MKL_INT incx, const float beta, float *y,
const MKL_INT incy);
void cblas_dgbmv (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE trans, const MKL_INT
m, const MKL_INT n, const MKL_INT kl, const MKL_INT ku, const double alpha, const
double *a, const MKL_INT lda, const double *x, const MKL_INT incx, const double beta,
double *y, const MKL_INT incy);
void cblas_cgbmv (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE trans, const MKL_INT
m, const MKL_INT n, const MKL_INT kl, const MKL_INT ku, const void *alpha, const void
*a, const MKL_INT lda, const void *x, const MKL_INT incx, const void *beta, void *y,
const MKL_INT incy);
void cblas_zgbmv (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE trans, const MKL_INT
m, const MKL_INT n, const MKL_INT kl, const MKL_INT ku, const void *alpha, const void
*a, const MKL_INT lda, const void *x, const MKL_INT incx, const void *beta, void *y,
const MKL_INT incy);

Include Files
• mkl.h

Description

52
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The ?gbmv routines perform a matrix-vector operation defined as

y := alpha*A*x + beta*y,
or
y := alpha*A'*x + beta*y,
or
y := alpha *conjg(A')*x + beta*y,
where:
alpha and beta are scalars,
x and y are vectors,
A is an m-by-n band matrix, with kl sub-diagonals and ku super-diagonals.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

trans Specifies the operation:

If trans=CblasNoTrans, then y := alpha*A*x + beta*y

If trans=CblasTrans, then y := alphaA'x + beta*y

If trans=CblasConjTrans, then y := alpha conjg(A')x + beta*y

m Specifies the number of rows of the matrix A.

The value of m must be at least zero.

n Specifies the number of columns of the matrix A.

The value of n must be at least zero.

kl Specifies the number of sub-diagonals of the matrix A.

The value of kl must satisfy 0≤kl.

ku Specifies the number of super-diagonals of the matrix A.

The value of ku must satisfy 0≤ku.

alpha Specifies the scalar alpha.

a Array, size lda*n.

Layout = CblasColMajor: Before entry, the leading (kl + ku + 1) by n

part of the array a must contain the matrix of coefficients. This matrix must
be supplied column-by-column, with the leading diagonal of the matrix in
row (ku) of the array, the first super-diagonal starting at position 1 in row
(ku - 1), the first sub-diagonal starting at position 0 in row (ku + 1),
and so on. Elements in the array a that do not correspond to elements in
the band matrix (such as the top left ku by ku triangle) are not referenced.

53
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The following program segment transfers a band matrix from conventional

full matrix storage (matrix, with leading dimension ldm) to band storage (a,
with leading dimension lda):

for (j = 0; j < n; j++) {

k = ku - j;
for (i = max(0, j-ku); i < min(m, j+kl+1); i++) {
a[(k+i) + j*lda] = matrix[i + j*ldm];
}
}
Layout = CblasRowMajor: Before entry, the leading (kl + ku + 1) by m
part of the array a must contain the matrix of coefficients. This matrix must
be supplied row-by-row, with the leading diagonal of the matrix in column
(kl) of the array, the first super-diagonal starting at position 0 in column
(kl + 1), the first sub-diagonal starting at position 1 in row (kl - 1),
and so on. Elements in the array a that do not correspond to elements in
the band matrix (such as the top left kl by kl triangle) are not referenced.
The following program segment transfers a band matrix from row-major full
matrix storage (matrix, with leading dimension ldm) to band storage (a,
with leading dimension lda):

for (i = 0; i < m; i++) {

k = kl - i;
for (j = max(0, i-kl); j < min(n, i+ku+1); j++) {
a[(k+j) + i*lda] = matrix[j + i*ldm];
}
}

lda Specifies the leading dimension of a as declared in the calling

(sub)program. The value of lda must be at least (kl + ku + 1).

x Array, size at least (1 + (n - 1)*abs(incx)) when

trans=CblasNoTrans, and at least (1 + (m - 1)*abs(incx))
otherwise. Before entry, the array x must contain the vector x.

incx Specifies the increment for the elements of x. incx must not be zero.

beta Specifies the scalar beta. When beta is equal to zero, then y need not be
set on input.

y Array, size at least (1 +(m - 1)*abs(incy)) when

trans=CblasNoTrans and at least (1 +(n - 1)*abs(incy)) otherwise.
Before entry, the incremented array y must contain the vector y.

incy Specifies the increment for the elements of y.

The value of incy must not be zero.

Output Parameters

y Buffer holding the updated vector y.

cblas_?gemv
Computes a matrix-vector product using a general
matrix.

54
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void cblas_sgemv (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE trans, const MKL_INT
m, const MKL_INT n, const float alpha, const float *a, const MKL_INT lda, const float
*x, const MKL_INT incx, const float beta, float *y, const MKL_INT incy);
void cblas_dgemv (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE trans, const MKL_INT
m, const MKL_INT n, const double alpha, const double *a, const MKL_INT lda, const
double *x, const MKL_INT incx, const double beta, double *y, const MKL_INT incy);
void cblas_cgemv (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE trans, const MKL_INT
m, const MKL_INT n, const void *alpha, const void *a, const MKL_INT lda, const void *x,
const MKL_INT incx, const void *beta, void *y, const MKL_INT incy);
void cblas_zgemv (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE trans, const MKL_INT
m, const MKL_INT n, const void *alpha, const void *a, const MKL_INT lda, const void *x,
const MKL_INT incx, const void *beta, void *y, const MKL_INT incy);

Include Files
• mkl.h

Description

The ?gemv routines perform a matrix-vector operation defined as:

y := alpha*A*x + beta*y,
or
y := alpha*A'*x + beta*y,
or
y := alpha*conjg(A')*x + beta*y,
where:
alpha and beta are scalars,
x and y are vectors,
A is an m-by-n matrix.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

trans Specifies the operation:

if trans=CblasNoTrans, then y := alpha*A*x + beta*y;

if trans=CblasTrans, then y := alphaA'x + beta*y;

if trans=CblasConjTrans, then y := alpha conjg(A')x + beta*y.

m Specifies the number of rows of the matrix A. The value of m must be at

least zero.

n Specifies the number of columns of the matrix A. The value of n must be at

least zero.

55
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

alpha Specifies the scalar alpha.

a Array, size lda*k.

For Layout = CblasColMajor, k is n. Before entry, the leading m-by-n part

of the array a must contain the matrix A.

For Layout = CblasRowMajor, k is m. Before entry, the leading n-by-m part

of the array a must contain the matrix A.

lda Specifies the leading dimension of a as declared in the calling

(sub)program.
For Layout = CblasColMajor, the value of lda must be at least max(1,
m).
For Layout = CblasRowMajor, the value of lda must be at least max(1,
n).

x Array, size at least (1+(n-1)*abs(incx)) when trans=CblasNoTrans

and at least (1+(m - 1)*abs(incx)) otherwise. Before entry, the
incremented array x must contain the vector x.

incx Specifies the increment for the elements of x.

The value of incx must not be zero.

beta Specifies the scalar beta. When beta is set to zero, then y need not be set
on input.

y Array, size at least (1 +(m - 1)*abs(incy)) when

trans=CblasNoTrans and at least (1 +(n - 1)*abs(incy)) otherwise.
Before entry with non-zero beta, the incremented array y must contain the
vector y.

incy Specifies the increment for the elements of y.

The value of incy must not be zero.

Output Parameters

y Updated vector y.

cblas_?ger
Performs a rank-1 update of a general matrix.

Syntax
void cblas_sger (const CBLAS_LAYOUT Layout, const MKL_INT m, const MKL_INT n, const
float alpha, const float *x, const MKL_INT incx, const float *y, const MKL_INT incy,
float *a, const MKL_INT lda);
void cblas_dger (const CBLAS_LAYOUT Layout, const MKL_INT m, const MKL_INT n, const
double alpha, const double *x, const MKL_INT incx, const double *y, const MKL_INT incy,
double *a, const MKL_INT lda);

Include Files
• mkl.h

56
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description

The ?ger routines perform a matrix-vector operation defined as

A := alpha*x*y'+ A,
where:
alpha is a scalar,
x is an m-element vector,
y is an n-element vector,
A is an m-by-n general matrix.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

m Specifies the number of rows of the matrix A.

The value of m must be at least zero.

n Specifies the number of columns of the matrix A.

The value of n must be at least zero.

alpha Specifies the scalar alpha.

x Array, size at least (1 + (m - 1)*abs(incx)). Before entry, the

incremented array x must contain the m-element vector x.

incx Specifies the increment for the elements of x.

The value of incx must not be zero.

y Array, size at least (1 + (n - 1)*abs(incy)). Before entry, the

incremented array y must contain the n-element vector y.

incy Specifies the increment for the elements of y.

The value of incy must not be zero.

a Array, size lda*k.

For Layout = CblasColMajor, k is n. Before entry, the leading m-by-n part

of the array a must contain the matrix A.

For Layout = CblasRowMajor, k is m. Before entry, the leading n-by-m part

of the array a must contain the matrix A.

lda Specifies the leading dimension of a as declared in the calling

(sub)program.
For Layout = CblasColMajor, the value of lda must be at least max(1,
m).
For Layout = CblasRowMajor, the value of lda must be at least max(1,
n).

57
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

a Overwritten by the updated matrix.

cblas_?gerc
Performs a rank-1 update (conjugated) of a general
matrix.

Syntax
void cblas_cgerc (const CBLAS_LAYOUT Layout, const MKL_INT m, const MKL_INT n, const
void *alpha, const void *x, const MKL_INT incx, const void *y, const MKL_INT incy, void
*a, const MKL_INT lda);
void cblas_zgerc (const CBLAS_LAYOUT Layout, const MKL_INT m, const MKL_INT n, const
void *alpha, const void *x, const MKL_INT incx, const void *y, const MKL_INT incy, void
*a, const MKL_INT lda);

Include Files
• mkl.h

Description

The ?gerc routines perform a matrix-vector operation defined as

A := alpha*x*conjg(y') + A,
where:
alpha is a scalar,
x is an m-element vector,
y is an n-element vector,
A is an m-by-n matrix.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

m Specifies the number of rows of the matrix A.

The value of m must be at least zero.

n Specifies the number of columns of the matrix A.

The value of n must be at least zero.

alpha Specifies the scalar alpha.

x Array, size at least (1 + (m - 1)*abs(incx)). Before entry, the

incremented array x must contain the m-element vector x.

incx Specifies the increment for the elements of x.

The value of incx must not be zero.

58
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
y Array, size at least (1 + (n - 1)*abs(incy)). Before entry, the
incremented array y must contain the n-element vector y.

incy Specifies the increment for the elements of y.

The value of incy must not be zero.

a Array, size lda*k.

For Layout = CblasColMajor, k is n. Before entry, the leading m-by-n part

of the array a must contain the matrix A.

For Layout = CblasRowMajor, k is m. Before entry, the leading n-by-m part

of the array a must contain the matrix A.

lda Specifies the leading dimension of a as declared in the calling

(sub)program.
For Layout = CblasColMajor, the value of lda must be at least max(1,
m).
For Layout = CblasRowMajor, the value of lda must be at least max(1,
n).

Output Parameters

a Overwritten by the updated matrix.

cblas_?geru
Performs a rank-1 update (unconjugated) of a general
matrix.

Syntax
void cblas_cgeru (const CBLAS_LAYOUT Layout, const MKL_INT m, const MKL_INT n, const
void *alpha, const void *x, const MKL_INT incx, const void *y, const MKL_INT incy, void
*a, const MKL_INT lda);
void cblas_zgeru (const CBLAS_LAYOUT Layout, const MKL_INT m, const MKL_INT n, const
void *alpha, const void *x, const MKL_INT incx, const void *y, const MKL_INT incy, void
*a, const MKL_INT lda);

Include Files
• mkl.h

Description

The ?geru routines perform a matrix-vector operation defined as

A := alpha*x*y ' + A,
where:
alpha is a scalar,
x is an m-element vector,
y is an n-element vector,
A is an m-by-n matrix.

59
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

m Specifies the number of rows of the matrix A.

The value of m must be at least zero.

n Specifies the number of columns of the matrix A.

The value of n must be at least zero.

alpha Specifies the scalar alpha.

x Array, size at least (1 + (m - 1)*abs(incx)). Before entry, the

incremented array x must contain the m-element vector x.

incx Specifies the increment for the elements of x.

The value of incx must not be zero.

y Array, size at least (1 + (n - 1)*abs(incy)). Before entry, the

incremented array y must contain the n-element vector y.

incy Specifies the increment for the elements of y.

The value of incy must not be zero.

a Array, size lda*k.

For Layout = CblasColMajor, k is n. Before entry, the leading m-by-n part

of the array a must contain the matrix A.

For Layout = CblasRowMajor, k is m. Before entry, the leading n-by-m part

of the array a must contain the matrix A.

lda Specifies the leading dimension of a as declared in the calling

(sub)program.
For Layout = CblasColMajor, the value of lda must be at least max(1,
m).
For Layout = CblasRowMajor, the value of lda must be at least max(1,
n).

Output Parameters

a Overwritten by the updated matrix.

cblas_?hbmv
Computes a matrix-vector product using a Hermitian
band matrix.

Syntax
void cblas_chbmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const MKL_INT k, const void *alpha, const void *a, const MKL_INT lda, const void *x,
const MKL_INT incx, const void *beta, void *y, const MKL_INT incy);

60
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void cblas_zhbmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const MKL_INT k, const void *alpha, const void *a, const MKL_INT lda, const void *x,
const MKL_INT incx, const void *beta, void *y, const MKL_INT incy);

Include Files
• mkl.h

Description

The ?hbmv routines perform a matrix-vector operation defined as y := alphaAx + beta*y,

where:
alpha and beta are scalars,
x and y are n-element vectors,
A is an n-by-n Hermitian band matrix, with k super-diagonals.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

uplo Specifies whether the upper or lower triangular part of the Hermitian band
matrix A is used:
If uplo = CblasUpper, then the upper triangular part of the matrix A is
used.
If uplo = CblasLower, then the low triangular part of the matrix A is used.

n Specifies the order of the matrix A. The value of n must be at least zero.

k For uplo = CblasUpper: Specifies the number of super-diagonals of the

matrix A.
For uplo = CblasLower: Specifies the number of sub-diagonals of the
matrix A.
The value of k must satisfy 0≤k.

alpha Specifies the scalar alpha.

a Array, size lda*n.

Layout = CblasColMajor:
Before entry with uplo = CblasUpper, the leading (k + 1) by n part of
the array a must contain the upper triangular band part of the Hermitian
matrix. The matrix must be supplied column-by-column, with the leading
diagonal of the matrix in row k of the array, the first super-diagonal starting
at position 1 in row (k - 1), and so on. The top left k by k triangle of the
array a is not referenced.

61
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The following program segment transfers the upper triangular part of a

Hermitian band matrix from conventional full matrix storage (matrix, with
leading dimension ldm) to band storage (a, with leading dimension lda):

for (j = 0; j < n; j++) {

m = k - j;
for (i = max( 0, j - k); i <= j; i++) {
a[(m+i) + j*lda] = matrix[i + j*ldm];
}
}
Before entry with uplo = CblasLower, the leading (k + 1) by n part of
the array a must contain the lower triangular band part of the Hermitian
matrix, supplied column-by-column, with the leading diagonal of the matrix
in row 0 of the array, the first sub-diagonal starting at position 0 in row 1,
and so on. The bottom right k by k triangle of the array a is not referenced.
The following program segment transfers the lower triangular part of a
Hermitian band matrix from conventional full matrix storage (matrix, with
leading dimension ldm) to band storage (a, with leading dimension lda):

for (j = 0; j < n; j++) {

m = -j;
for (i = j; i < min(n, j + k + 1); i++) {
a[(m+i) + j*lda] = matrix[i + j*ldm];
}
}
Layout = CblasRowMajor:
Before entry with uplo = CblasUpper, the leading (k + 1)-by-n part of
array a must contain the upper triangular band part of the Hermitian
matrix. The matrix must be supplied row-by-row, with the leading diagonal
of the matrix in column 0 of the array, the first super-diagonal starting at
position 0 in column 1, and so on. The bottom right k-by-k triangle of array
a is not referenced.
The following program segment transfers the upper triangular part of a
Hermitian band matrix from row-major full matrix storage (matrix with
leading dimension ldm) to row-major band storage (a, with leading
dimension lda):

for (i = 0; i < n; i++) {

62
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The following program segment transfers the lower triangular part of a
Hermitian row-major band matrix from row-major full matrix storage
(matrix, with leading dimension ldm) to row-major band storage (a, with
leading dimension lda):

for (i = 0; i < n; i++) {

m = k - i;
for (j = max(0, i-k); j <= i; j++) {
a[(m+j) + i*lda] = matrix[j + i*ldm];
}
}
The imaginary parts of the diagonal elements need not be set and are
assumed to be zero.

lda Specifies the leading dimension of a as declared in the calling

(sub)program. The value of lda must be at least (k + 1).

x Array, size at least (1 + (n - 1)*abs(incx)). Before entry, the

incremented array x must contain the vector x.

incx Specifies the increment for the elements of x.

The value of incx must not be zero.

beta Specifies the scalar beta.

y Array, size at least (1 + (n - 1)*abs(incy)). Before entry, the

incremented array y must contain the vector y.

incy Specifies the increment for the elements of y.

The value of incy must not be zero.

Output Parameters

y Overwritten by the updated vector y.

cblas_?hemv
Computes a matrix-vector product using a Hermitian
matrix.

Syntax
void cblas_chemv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const void *alpha, const void *a, const MKL_INT lda, const void *x, const MKL_INT incx,
const void *beta, void *y, const MKL_INT incy);
void cblas_zhemv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const void *alpha, const void *a, const MKL_INT lda, const void *x, const MKL_INT incx,
const void *beta, void *y, const MKL_INT incy);

Include Files
• mkl.h

Description

63
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The ?hemv routines perform a matrix-vector operation defined as

y := alpha*A*x + beta*y,
where:
alpha and beta are scalars,
x and y are n-element vectors,
A is an n-by-n Hermitian matrix.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

uplo Specifies whether the upper or lower triangular part of the array a is used.

If uplo = CblasUpper, then the upper triangular of the array a is used.

If uplo = CblasLower, then the low triangular of the array a is used.

n Specifies the order of the matrix A. The value of n must be at least zero.

alpha Specifies the scalar alpha.

a Array, size lda*n.

Before entry with uplo = CblasUpper, the leading n-by-n upper triangular
part of the array a must contain the upper triangular part of the Hermitian
matrix and the strictly lower triangular part of a is not referenced. Before
entry with uplo = CblasLower, the leading n-by-n lower triangular part of
the array a must contain the lower triangular part of the Hermitian matrix
and the strictly upper triangular part of a is not referenced.
The imaginary parts of the diagonal elements need not be set and are
assumed to be zero.

lda Specifies the leading dimension of a as declared in the calling

(sub)program. The value of lda must be at least max(1, n).

x Array, size at least (1 + (n - 1)*abs(incx)). Before entry, the

incremented array x must contain the n-element vector x.

incx Specifies the increment for the elements of x.

The value of incx must not be zero.

beta Specifies the scalar beta. When beta is supplied as zero then y need not be
set on input.

y Array, size at least (1 + (n - 1)*abs(incy)). Before entry, the

incremented array y must contain the n-element vector y.

incy Specifies the increment for the elements of y.

The value of incy must not be zero.

Output Parameters

y Overwritten by the updated vector y.

64
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
cblas_?her
Performs a rank-1 update of a Hermitian matrix.

Syntax
void cblas_cher (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const float alpha, const void *x, const MKL_INT incx, void *a, const MKL_INT lda);
void cblas_zher (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const double alpha, const void *x, const MKL_INT incx, void *a, const MKL_INT lda);

Include Files
• mkl.h

Description

The ?her routines perform a matrix-vector operation defined as

A := alpha*x*conjg(x') + A,
where:
alpha is a real scalar,
x is an n-element vector,
A is an n-by-n Hermitian matrix.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

uplo Specifies whether the upper or lower triangular part of the array a is used.

If uplo = CblasUpper, then the upper triangular of the array a is used.

If uplo = CblasLower, then the low triangular of the array a is used.

n Specifies the order of the matrix A. The value of n must be at least zero.

alpha Specifies the scalar alpha.

x Array, dimension at least (1 + (n - 1)*abs(incx)). Before entry, the

incremented array x must contain the n-element vector x.

incx Specifies the increment for the elements of x.

The value of incx must not be zero.

a Array, size lda*n.

Before entry with uplo = CblasUpper, the leading n-by-n upper triangular
part of the array a must contain the upper triangular part of the Hermitian
matrix and the strictly lower triangular part of a is not referenced.
Before entry with uplo = CblasLower, the leading n-by-n lower triangular
part of the array a must contain the lower triangular part of the Hermitian
matrix and the strictly upper triangular part of a is not referenced.
The imaginary parts of the diagonal elements need not be set and are
assumed to be zero.

65
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lda Specifies the leading dimension of a as declared in the calling

(sub)program. The value of lda must be at least max(1, n).

Output Parameters

a With uplo = CblasUpper, the upper triangular part of the array a is

overwritten by the upper triangular part of the updated matrix.
With uplo = CblasLower, the lower triangular part of the array a is
overwritten by the lower triangular part of the updated matrix.
If alpha is zero, matrix A is unchanged; otherwise, the imaginary parts of
the diagonal elements are set to zero.

cblas_?her2
Performs a rank-2 update of a Hermitian matrix.

Syntax
void cblas_cher2 (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const void *alpha, const void *x, const MKL_INT incx, const void *y, const MKL_INT
incy, void *a, const MKL_INT lda);
void cblas_zher2 (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const void *alpha, const void *x, const MKL_INT incx, const void *y, const MKL_INT
incy, void *a, const MKL_INT lda);

Include Files
• mkl.h

Description

The ?her2 routines perform a matrix-vector operation defined as

A := alpha xconjg(y') + conjg(alpha)y conjg(x') + A,

where:
alpha is scalar,
x and y are n-element vectors,
A is an n-by-n Hermitian matrix.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

uplo Specifies whether the upper or lower triangular part of the array a is used.

If uplo = CblasUpper, then the upper triangular of the array a is used.

If uplo = CblasLower, then the low triangular of the array a is used.

n Specifies the order of the matrix A. The value of n must be at least zero.

alpha Specifies the scalar alpha.

66
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
x Array, size at least (1 + (n - 1)*abs(incx)). Before entry, the
incremented array x must contain the n-element vector x.

incx Specifies the increment for the elements of x.

The value of incx must not be zero.

y Array, size at least (1 + (n - 1)*abs(incy)). Before entry, the

incremented array y must contain the n-element vector y.

incy Specifies the increment for the elements of y.

The value of incy must not be zero.

a Array, size lda*n.

Before entry with uplo = CblasUpper, the leading n-by-n upper triangular
part of the array a must contain the upper triangular part of the Hermitian
matrix and the strictly lower triangular part of a is not referenced.
Before entry with uplo = CblasLower, the leading n-by-n lower triangular
part of the array a must contain the lower triangular part of the Hermitian
matrix and the strictly upper triangular part of a is not referenced.
The imaginary parts of the diagonal elements need not be set and are
assumed to be zero.

lda Specifies the leading dimension of a as declared in the calling

(sub)program. The value of lda must be at least max(1, n).

Output Parameters

a With uplo = CblasUpper, the upper triangular part of the array a is

cblas_?hpmv
Computes a matrix-vector product using a Hermitian
packed matrix.

Syntax
void cblas_chpmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const void *alpha, const void *ap, const void *x, const MKL_INT incx, const void *beta,
void *y, const MKL_INT incy);
void cblas_zhpmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const void *alpha, const void *ap, const void *x, const MKL_INT incx, const void *beta,
void *y, const MKL_INT incy);

Include Files
• mkl.h

Description

67
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The ?hpmv routines perform a matrix-vector operation defined as

y := alpha*A*x + beta*y,
where:
alpha and beta are scalars,
x and y are n-element vectors,
A is an n-by-n Hermitian matrix, supplied in packed form.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

uplo Specifies whether the upper or lower triangular part of the matrix A is
supplied in the packed array ap.

If uplo = CblasUpper, then the upper triangular part of the matrix A is

supplied in the packed array ap .

If uplo = CblasLower, then the low triangular part of the matrix A is

supplied in the packed array ap .

n Specifies the order of the matrix A. The value of n must be at least zero.

alpha Specifies the scalar alpha.

ap Array, size at least ((n*(n + 1))/2).

For Layout = CblasColMajor:

Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular part of the Hermitian matrix packed sequentially, column-by-
column, so that ap[0] contains A1, 1, ap[1] and ap[2] contain A1, 2 and
A2, 2 respectively, and so on. Before entry with uplo = CblasLower, the
array ap must contain the lower triangular part of the Hermitian matrix
packed sequentially, column-by-column, so that ap[0] contains A1, 1,
ap[1] and ap[2] contain A2, 1 and A3, 1 respectively, and so on.
For Layout = CblasRowMajor:

Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular part of the Hermitian matrix packed sequentially, row-by-row,
ap[0] contains A1, 1, ap[1] and ap[2] contain A1, 2 and A1, 3 respectively,
and so on. Before entry with uplo = CblasLower, the array ap must
contain the lower triangular part of the Hermitian matrix packed
sequentially, row-by-row, so that ap[0] contains A1, 1, ap[1] and ap[2]
contain A2, 1 and A2, 2 respectively, and so on.
The imaginary parts of the diagonal elements need not be set and are
assumed to be zero.

x Array, size at least (1 +(n - 1)*abs(incx)). Before entry, the

incremented array x must contain the n-element vector x.

incx Specifies the increment for the elements of x.

The value of incx must not be zero.

beta Specifies the scalar beta.

68
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
When beta is equal to zero then y need not be set on input.

y Array, size at least (1 + (n - 1)*abs(incy)). Before entry, the

incremented array y must contain the n-element vector y.

incy Specifies the increment for the elements of y.

The value of incy must not be zero.

Output Parameters

y Overwritten by the updated vector y.

cblas_?hpr
Performs a rank-1 update of a Hermitian packed
matrix.

Syntax
void cblas_chpr (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const float alpha, const void *x, const MKL_INT incx, void *ap);
void cblas_zhpr (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const double alpha, const void *x, const MKL_INT incx, void *ap);

Include Files
• mkl.h

Description

The ?hpr routines perform a matrix-vector operation defined as

A := alpha*x*conjg(x') + A,
where:
alpha is a real scalar,
x is an n-element vector,
A is an n-by-n Hermitian matrix, supplied in packed form.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

uplo Specifies whether the upper or lower triangular part of the matrix A is
supplied in the packed array ap.

If uplo = CblasUpper, the upper triangular part of the matrix A is supplied

in the packed array ap .

If uplo = CblasLower, the low triangular part of the matrix A is supplied in

the packed array ap .

n Specifies the order of the matrix A. The value of n must be at least zero.

alpha Specifies the scalar alpha.

69
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

x Array, size at least (1 + (n - 1)*abs(incx)). Before entry, the

incremented array x must contain the n-element vector x.

incx Specifies the increment for the elements of x. incx must not be zero.

ap Array, size at least ((n*(n + 1))/2).

For Layout = CblasColMajor:

Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular part of the Hermitian matrix packed sequentially, column-by-
column, so that ap[0] contains A1, 1, ap[1] and ap[2] contain A1, 2 and
A2, 2 respectively, and so on.
Before entry with uplo = CblasLower, the array ap must contain the lower
triangular part of the Hermitian matrix packed sequentially, column-by-
column, so that ap[0] contains A1, 1, ap[1] and ap[2] contain A2, 1 and
A3, 1 respectively, and so on.
For Layout = CblasRowMajor:

Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular part of the Hermitian matrix packed sequentially, row-by-row,
ap[0] contains A1, 1, ap[1] and ap[2] contain A1, 2 and A1, 3 respectively,
and so on.
Before entry with uplo = CblasLower, the array ap must contain the lower
triangular part of the Hermitian matrix packed sequentially, row-by-row, so
that ap[0] contains A1, 1, ap[1] and ap[2] contain A2, 1 and A2, 2
respectively, and so on.
The imaginary parts of the diagonal elements need not be set and are
assumed to be zero.

Output Parameters

ap With uplo = CblasUpper, overwritten by the upper triangular part of the

updated matrix.
With uplo = CblasLower, overwritten by the lower triangular part of the
updated matrix.
If alpha is zero, matrix A is unchanged; otherwise, the imaginary parts of
the diagonal elements are set to zero.

cblas_?hpr2
Performs a rank-2 update of a Hermitian packed
matrix.

Syntax
void cblas_chpr2 (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const void *alpha, const void *x, const MKL_INT incx, const void *y, const MKL_INT
incy, void *ap);
void cblas_zhpr2 (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const void *alpha, const void *x, const MKL_INT incx, const void *y, const MKL_INT
incy, void *ap);

70
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h

Description

The ?hpr2 routines perform a matrix-vector operation defined as

A := alpha*x*conjg(y') + conjg(alpha)*y*conjg(x') + A,
where:
alpha is a scalar,
x and y are n-element vectors,
A is an n-by-n Hermitian matrix, supplied in packed form.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

uplo Specifies whether the upper or lower triangular part of the matrix A is
supplied in the packed array ap.

If uplo = CblasUpper, then the upper triangular part of the matrix A is

supplied in the packed array ap .

If uplo = CblasLower, then the low triangular part of the matrix A is

supplied in the packed array ap .

n Specifies the order of the matrix A. The value of n must be at least zero.

alpha Specifies the scalar alpha.

x Array, dimension at least (1 +(n - 1)*abs(incx)). Before entry, the

incremented array x must contain the n-element vector x.

incx Specifies the increment for the elements of x.

The value of incx must not be zero.

y Array, size at least (1 +(n - 1)*abs(incy)). Before entry, the

incremented array y must contain the n-element vector y.

incy Specifies the increment for the elements of y.

The value of incy must not be zero.

ap Array, size at least ((n*(n + 1))/2).

For Layout = CblasColMajor:

Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular part of the Hermitian matrix packed sequentially, column-by-
column, so that ap[0] contains A1, 1, ap[1] and ap[2] contain A1, 2 and
A2, 2 respectively, and so on.
Before entry with uplo = CblasLower, the array ap must contain the lower
triangular part of the Hermitian matrix packed sequentially, column-by-
column, so that ap[0] contains A1, 1, ap[1] and ap[2] contain A2, 1 and
A3, 1 respectively, and so on.

71
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

For Layout = CblasRowMajor:

Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular part of the Hermitian matrix packed sequentially, row-by-row,
ap[0] contains A1, 1, ap[1] and ap[2] contain A1, 2 and A1, 3 respectively,
and so on.
Before entry with uplo = CblasLower, the array ap must contain the lower
triangular part of the Hermitian matrix packed sequentially, row-by-row, so
that ap[0] contains A1, 1, ap[1] and ap[2] contain A2, 1 and A2, 2
respectively, and so on.
The imaginary parts of the diagonal elements need not be set and are
assumed to be zero.

Output Parameters

ap With uplo = CblasUpper, overwritten by the upper triangular part of the

cblas_?sbmv
Computes a matrix-vector product with a symmetric
band matrix.

Syntax
void cblas_ssbmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const MKL_INT k, const float alpha, const float *a, const MKL_INT lda, const float *x,
const MKL_INT incx, const float beta, float *y, const MKL_INT incy);
void cblas_dsbmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const MKL_INT k, const double alpha, const double *a, const MKL_INT lda, const double
*x, const MKL_INT incx, const double beta, double *y, const MKL_INT incy);

Include Files
• mkl.h

Description

The ?sbmv routines perform a matrix-vector operation defined as

y := alpha*A*x + beta*y,
where:
alpha and beta are scalars,
x and y are n-element vectors,
A is an n-by-n symmetric band matrix, with k super-diagonals.

72
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

uplo Specifies whether the upper or lower triangular part of the band matrix A is
used:
if uplo = CblasUpper - upper triangular part;

if uplo = CblasLower - low triangular part.

n Specifies the order of the matrix A. The value of n must be at least zero.

k Specifies the number of super-diagonals of the matrix A.

The value of k must satisfy 0≤k.

alpha Specifies the scalar alpha.

a Array, size lda*n. Before entry with uplo = CblasUpper, the leading (k +
1) by n part of the array a must contain the upper triangular band part of
the symmetric matrix, supplied column-by-column, with the leading
diagonal of the matrix in row k of the array, the first super-diagonal starting
at position 1 in row (k - 1), and so on. The top left k by k triangle of the
array a is not referenced.
The following program segment transfers the upper triangular part of a
symmetric band matrix from conventional full matrix storage (matrix, with
leading dimension ldm) to band storage (a, with leading dimension lda):

for (j = 0; j < n; j++) {

m = k - j;
for (i = max( 0, j - k); i <= j; i++) {
a[(m+i) + j*lda] = matrix[i + j*ldm];
}
}
Before entry with uplo = CblasLower, the leading (k + 1) by n part of
the array a must contain the lower triangular band part of the symmetric
matrix, supplied column-by-column, with the leading diagonal of the matrix
in row 0 of the array, the first sub-diagonal starting at position 0 in row 1,
and so on. The bottom right k by k triangle of the array a is not referenced.
The following program segment transfers the lower triangular part of a
symmetric band matrix from conventional full matrix storage (matrix, with
leading dimension ldm) to band storage (a, with leading dimension lda):

for (j = 0; j < n; j++) {

73
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

of the matrix in column 0 of the array, the first super-diagonal starting at

position 0 in column 1, and so on. The bottom right k-by-k triangle of array
a is not referenced.
The following program segment transfers the upper triangular part of a
symmetric band matrix from row-major full matrix storage (matrix with
leading dimension ldm) to row-major band storage (a, with leading
dimension lda):

for (i = 0; i < n; i++) {

m = -i;
for (j = i; j < MIN(n, i+k+1); j++) {
a[(m+j) + i*lda] = matrix[j + i*ldm];
}
}
Before entry with uplo = CblasLower, the leading (k + 1)-by-n part of
array a must contain the lower triangular band part of the symmetric
matrix, supplied row-by-row, with the leading diagonal of the matrix in
column k of the array, the first sub-diagonal starting at position 1 in column
k-1, and so on. The top left k-by-k triangle of array a is not referenced.
The following program segment transfers the lower triangular part of a
symmetric row-major band matrix from row-major full matrix storage
(matrix, with leading dimension ldm) to row-major band storage (a, with
leading dimension lda):

for (i = 0; i < n; i++) {

m = k - i;
for (j = max(0, i-k); j <= i; j++) {
a[(m+j) + i*lda] = matrix[j + i*ldm];
}
}

lda Specifies the leading dimension of a as declared in the calling

(sub)program. The value of lda must be at least (k + 1).

x Array, size at least (1 + (n - 1)*abs(incx)). Before entry, the

incremented array x must contain the vector x.

incx Specifies the increment for the elements of x.

The value of incx must not be zero.

beta Specifies the scalar beta.

y Array, size at least (1 + (n - 1)*abs(incy)). Before entry, the

incremented array y must contain the vector y.

incy Specifies the increment for the elements of y.

The value of incy must not be zero.

Output Parameters

y Overwritten by the updated vector y.

74
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
cblas_?spmv
Computes a matrix-vector product with a symmetric
packed matrix.

Syntax
void cblas_sspmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const float alpha, const float *ap, const float *x, const MKL_INT incx, const float
beta, float *y, const MKL_INT incy);
void cblas_dspmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const double alpha, const double *ap, const double *x, const MKL_INT incx, const double
beta, double *y, const MKL_INT incy);

Include Files
• mkl.h

Description

The ?spmv routines perform a matrix-vector operation defined as

y := alpha*A*x + beta*y,
where:
alpha and beta are scalars,
x and y are n-element vectors,
A is an n-by-n symmetric matrix, supplied in packed form.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

uplo Specifies whether the upper or lower triangular part of the matrix A is
supplied in the packed array ap.

If uplo = CblasUpper, then the upper triangular part of the matrix A is

supplied in the packed array ap .

If uplo = CblasLower, then the low triangular part of the matrix A is

supplied in the packed array ap .

n Specifies the order of the matrix A. The value of n must be at least zero.

alpha Specifies the scalar alpha.

ap Array, size at least ((n*(n + 1))/2).

For Layout = CblasColMajor:

Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular part of the symmetric matrix packed sequentially, column-by-
column, so that ap[0] contains A1, 1, ap[1] and ap[2] contain A1, 2 and
A2, 2 respectively, and so on. Before entry with uplo = CblasLower, the
array ap must contain the lower triangular part of the symmetric matrix
packed sequentially, column-by-column, so that ap[0] contains A1, 1,
ap[1] and ap[2] contain A2, 1 and A3, 1 respectively, and so on.

75
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

For Layout = CblasRowMajor:

Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular part of the symmetric matrix packed sequentially, row-by-row,
ap[0] contains A1, 1, ap[1] and ap[2] contain A1, 2 and A1, 3 respectively,
and so on. Before entry with uplo = CblasLower, the array ap must
contain the lower triangular part of the symmetric matrix packed
sequentially, row-by-row, so that ap[0] contains A1, 1, ap[1] and ap[2]
contain A2, 1 and A2, 2 respectively, and so on.

x Array, size at least (1 + (n - 1)*abs(incx)). Before entry, the

incremented array x must contain the n-element vector x.

incx Specifies the increment for the elements of x.

The value of incx must not be zero.

beta Specifies the scalar beta.

When beta is supplied as zero, then y need not be set on input.

y Array, size at least (1 + (n - 1)*abs(incy)). Before entry, the

incremented array y must contain the n-element vector y.

incy Specifies the increment for the elements of y.

The value of incy must not be zero.

Output Parameters

y Overwritten by the updated vector y.

cblas_?spr
Performs a rank-1 update of a symmetric packed
matrix.

Syntax
void cblas_sspr (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const float alpha, const float *x, const MKL_INT incx, float *ap);
void cblas_dspr (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const double alpha, const double *x, const MKL_INT incx, double *ap);

Include Files
• mkl.h

Description

The ?spr routines perform a matrix-vector operation defined as

a:= alpha*x*x'+ A,
where:
alpha is a real scalar,
x is an n-element vector,
A is an n-by-n symmetric matrix, supplied in packed form.

76
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

uplo Specifies whether the upper or lower triangular part of the matrix A is
supplied in the packed array ap.

If uplo = CblasUpper, then the upper triangular part of the matrix A is

supplied in the packed array ap .

If uplo = CblasLower, then the low triangular part of the matrix A is

supplied in the packed array ap .

n Specifies the order of the matrix A. The value of n must be at least zero.

alpha Specifies the scalar alpha.

x Array, size at least (1 + (n - 1)*abs(incx)). Before entry, the

incremented array x must contain the n-element vector x.

incx Specifies the increment for the elements of x.

The value of incx must not be zero.

ap For Layout = CblasColMajor:

Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular part of the symmetric matrix packed sequentially, column-by-
column, so that ap[0] contains A1, 1, ap[1] and ap[2] contain A1, 2 and
A2, 2 respectively, and so on.
Before entry with uplo = CblasLower, the array ap must contain the lower
triangular part of the symmetric matrix packed sequentially, column-by-
column, so that ap[0] contains A1, 1, ap[1] and ap[2] contain A2, 1 and
A3, 1 respectively, and so on.
For Layout = CblasRowMajor:

Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular part of the symmetric matrix packed sequentially, row-by-row,
ap[0] contains A1, 1, ap[1] and ap[2] contain A1, 2 and A1, 3 respectively,
and so on.
Before entry with uplo = CblasLower, the array ap must contain the lower
triangular part of the symmetric matrix packed sequentially, row-by-row, so
that ap[0] contains A1, 1, ap[1] and ap[2] contain A2, 1 and A2, 2
respectively, and so on.

Output Parameters

ap With uplo = CblasUpper, overwritten by the upper triangular part of the

updated matrix.
With uplo = CblasLower, overwritten by the lower triangular part of the
updated matrix.

77
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

cblas_?spr2
Computes a rank-2 update of a symmetric packed
matrix.

Syntax
void cblas_sspr2 (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const float alpha, const float *x, const MKL_INT incx, const float *y, const MKL_INT
incy, float *ap);
void cblas_dspr2 (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const double alpha, const double *x, const MKL_INT incx, const double *y, const MKL_INT
incy, double *ap);

Include Files
• mkl.h

Description

The ?spr2 routines perform a matrix-vector operation defined as

A:= alphaxy'+ alphayx' + A,

where:
alpha is a scalar,
x and y are n-element vectors,
A is an n-by-n symmetric matrix, supplied in packed form.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

uplo Specifies whether the upper or lower triangular part of the matrix A is
supplied in the packed array ap.

If uplo = CblasUpper, then the upper triangular part of the matrix A is

supplied in the packed array ap .

If uplo = CblasLower, then the low triangular part of the matrix A is

supplied in the packed array ap .

n Specifies the order of the matrix A. The value of n must be at least zero.

alpha Specifies the scalar alpha.

x Array, size at least (1 + (n - 1)*abs(incx)). Before entry, the

incremented array x must contain the n-element vector x.

incx Specifies the increment for the elements of x.

The value of incx must not be zero.

y Array, size at least (1 + (n - 1)*abs(incy)). Before entry, the

incremented array y must contain the n-element vector y.

incy Specifies the increment for the elements of y. The value of incy must not be
zero.

78
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ap For Layout = CblasColMajor:

Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular part of the symmetric matrix packed sequentially, column-by-
column, so that ap[0] contains A1, 1, ap[1] and ap[2] contain A1, 2 and
A2, 2 respectively, and so on.
Before entry with uplo = CblasLower, the array ap must contain the lower
triangular part of the symmetric matrix packed sequentially, column-by-
column, so that ap[0] contains A1, 1, ap[1] and ap[2] contain A2, 1 and
A3, 1 respectively, and so on.
For Layout = CblasRowMajor:

Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular part of the symmetric matrix packed sequentially, row-by-row,
ap[0] contains A1, 1, ap[1] and ap[2] contain A1, 2 and A1, 3 respectively,
and so on.
Before entry with uplo = CblasLower, the array ap must contain the lower
triangular part of the symmetric matrix packed sequentially, row-by-row, so
that ap[0] contains A1, 1, ap[1] and ap[2] contain A2, 1 and A2, 2
respectively, and so on.

Output Parameters

ap With uplo = CblasUpper, overwritten by the upper triangular part of the

updated matrix.
With uplo = CblasLower, overwritten by the lower triangular part of the
updated matrix.

cblas_?symv
Computes a matrix-vector product for a symmetric
matrix.

Syntax
void cblas_ssymv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const float alpha, const float *a, const MKL_INT lda, const float *x, const MKL_INT
incx, const float beta, float *y, const MKL_INT incy);
void cblas_dsymv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const double alpha, const double *a, const MKL_INT lda, const double *x, const MKL_INT
incx, const double beta, double *y, const MKL_INT incy);

Include Files
• mkl.h

Description

The ?symv routines perform a matrix-vector operation defined as

y := alpha*A*x + beta*y,
where:
alpha and beta are scalars,

79
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

x and y are n-element vectors,

A is an n-by-n symmetric matrix.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

uplo Specifies whether the upper or lower triangular part of the array a is used.

If uplo = CblasUpper, then the upper triangular part of the array a is

used.
If uplo = CblasLower, then the low triangular part of the array a is used.

n Specifies the order of the matrix A. The value of n must be at least zero.

alpha Specifies the scalar alpha.

a Array, size lda*n.

Before entry with uplo = CblasUpper, the leading n-by-n upper triangular
part of the array a must contain the upper triangular part of the symmetric
matrix A and the strictly lower triangular part of a is not referenced. Before
entry with uplo = CblasLower, the leading n-by-n lower triangular part of
the array a must contain the lower triangular part of the symmetric matrix
A and the strictly upper triangular part of a is not referenced.

lda Specifies the leading dimension of a as declared in the calling

(sub)program. The value of lda must be at least max(1, n).

x Array, size at least (1 + (n - 1)*abs(incx)). Before entry, the

incremented array x must contain the n-element vector x.

incx Specifies the increment for the elements of x.

The value of incx must not be zero.

beta Specifies the scalar beta.

When beta is supplied as zero, then y need not be set on input.

y Array, size at least (1 + (n - 1)*abs(incy)). Before entry, the

incremented array y must contain the n-element vector y.

incy Specifies the increment for the elements of y.

The value of incy must not be zero.

Output Parameters

y Overwritten by the updated vector y.

cblas_?syr
Performs a rank-1 update of a symmetric matrix.

Syntax
void cblas_ssyr (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const float alpha, const float *x, const MKL_INT incx, float *a, const MKL_INT lda);

80
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void cblas_dsyr (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const double alpha, const double *x, const MKL_INT incx, double *a, const MKL_INT lda);

Include Files
• mkl.h

Description

The ?syr routines perform a matrix-vector operation defined as

A := alpha*x*x' + A ,
where:
alpha is a real scalar,
x is an n-element vector,
A is an n-by-n symmetric matrix.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

uplo Specifies whether the upper or lower triangular part of the array a is used.

If uplo = CblasUpper, then the upper triangular part of the array a is

used.
If uplo = CblasLower, then the low triangular part of the array a is used.

n Specifies the order of the matrix A. The value of n must be at least zero.

alpha Specifies the scalar alpha.

x Array, size at least (1 + (n-1)*abs(incx)). Before entry, the

incremented array x must contain the n-element vector x.

incx Specifies the increment for the elements of x.

The value of incx must not be zero.

a Array, size lda*n.

Before entry with uplo = CblasLower, the leading n-by-n lower triangular
part of the array a must contain the lower triangular part of the symmetric
matrix A and the strictly upper triangular part of a is not referenced.

lda Specifies the leading dimension of a as declared in the calling

(sub)program. The value of lda must be at least max(1, n).

Output Parameters

a With uplo = CblasUpper, the upper triangular part of the array a is

overwritten by the upper triangular part of the updated matrix.

81
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

With uplo = CblasLower, the lower triangular part of the array a is

overwritten by the lower triangular part of the updated matrix.

cblas_?syr2
Performs a rank-2 update of a symmetric matrix.

Syntax
void cblas_ssyr2 (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const float alpha, const float *x, const MKL_INT incx, const float *y, const MKL_INT
incy, float *a, const MKL_INT lda);
void cblas_dsyr2 (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const double alpha, const double *x, const MKL_INT incx, const double *y, const MKL_INT
incy, double *a, const MKL_INT lda);

Include Files
• mkl.h

Description

The ?syr2 routines perform a matrix-vector operation defined as

A := alpha*x*y'+ alpha*y*x' + A,
where:
alpha is scalar,
x and y are n-element vectors,
A is an n-by-n symmetric matrix.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

uplo Specifies whether the upper or lower triangular part of the array a is used.

If uplo = CblasUpper, then the upper triangular part of the array a is

used.
If uplo = CblasLower, then the low triangular part of the array a is used.

n Specifies the order of the matrix A. The value of n must be at least zero.

alpha Specifies the scalar alpha.

x Array, size at least (1 + (n - 1)*abs(incx)). Before entry, the

incremented array x must contain the n-element vector x.

incx Specifies the increment for the elements of x.

The value of incx must not be zero.

y Array, size at least (1 + (n - 1)*abs(incy)). Before entry, the

incremented array y must contain the n-element vector y.

82
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
incy Specifies the increment for the elements of y. The value of incy must not be
zero.

a Array, size lda*n.

Before entry with uplo = CblasUpper, the leading n-by-n upper triangular
part of the array a must contain the upper triangular part of the symmetric
matrix and the strictly lower triangular part of a is not referenced.
Before entry with uplo = CblasLower, the leading n-by-n lower triangular
part of the array a must contain the lower triangular part of the symmetric
matrix and the strictly upper triangular part of a is not referenced.

lda Specifies the leading dimension of a as declared in the calling

(sub)program. The value of lda must be at least max(1, n).

Output Parameters

a With uplo = CblasUpper, the upper triangular part of the array a is

overwritten by the upper triangular part of the updated matrix.
With uplo = CblasLower, the lower triangular part of the array a is
overwritten by the lower triangular part of the updated matrix.

cblas_?tbmv
Computes a matrix-vector product using a triangular
band matrix.

Syntax
void cblas_stbmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const MKL_INT k, const
float *a, const MKL_INT lda, float *x, const MKL_INT incx);
void cblas_dtbmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const MKL_INT k, const
double *a, const MKL_INT lda, double *x, const MKL_INT incx);
void cblas_ctbmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const MKL_INT k, const
void *a, const MKL_INT lda, void *x, const MKL_INT incx);
void cblas_ztbmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const MKL_INT k, const
void *a, const MKL_INT lda, void *x, const MKL_INT incx);

Include Files
• mkl.h

Description

The ?tbmv routines perform one of the matrix-vector operations defined as

x := Ax, or x := A'x, or x := conjg(A')*x,

where:
x is an n-element vector,

83
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

A is an n-by-n unit, or non-unit, upper or lower triangular band matrix, with (k +1) diagonals.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

uplo Specifies whether the matrix A is an upper or lower triangular matrix:

uplo = CblasUpper
if uplo = CblasLower, then the matrix is low triangular.

trans Specifies the operation:

if trans=CblasNoTrans, then x := A*x;

if trans=CblasTrans, then x := A'*x;

if trans=CblasConjTrans, then x := conjg(A')*x.

diag Specifies whether the matrix A is unit triangular:

if diag = CblasUnit then the matrix is unit triangular;

if diag = CblasNonUnit , then the matrix is not unit triangular.

n Specifies the order of the matrix A. The value of n must be at least zero.

k On entry with uplo = CblasUpper specifies the number of super-diagonals

of the matrix A. On entry with uplo = CblasLower, k specifies the number
of sub-diagonals of the matrix a.
The value of k must satisfy 0≤k.

a Array, size lda*n.

Layout = CblasColMajor:
Before entry with uplo = CblasUpper, the leading (k + 1) by n part of
the array a must contain the upper triangular band part of the matrix of
coefficients, supplied column-by-column, with the leading diagonal of the
matrix in row k of the array, the first super-diagonal starting at position 1 in
row (k - 1), and so on. The top left k by k triangle of the array a is not
referenced. The following program segment transfers an upper triangular
band matrix from conventional full matrix storage (matrix, with leading
dimension ldm) to band storage (a, with leading dimension lda):

for (j = 0; j < n; j++) {

84
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
referenced. The following program segment transfers a lower triangular
band matrix from conventional full matrix storage (matrix, with leading
dimension ldm) to band storage (a, with leading dimension lda):

for (j = 0; j < n; j++) {

m = -j;
for (i = j; i < min(n, j + k + 1); i++) {
a[(m+i) + j*lda] = matrix[i + j*ldm];
}
}
Note that when diag = CblasUnit , the elements of the array a
corresponding to the diagonal elements of the matrix are not referenced,
but are assumed to be unity.
Layout = CblasRowMajor:
Before entry with uplo = CblasUpper, the leading (k + 1)-by-n part of
array a must contain the upper triangular band part of the matrix of
coefficients. The matrix must be supplied row-by-row, with the leading
diagonal of the matrix in column 0 of the array, the first super-diagonal
starting at position 0 in column 1, and so on. The bottom right k-by-k
triangle of array a is not referenced.

The following program segment transfers the upper triangular part of a

Hermitian band matrix from row-major full matrix storage (matrix with
leading dimension ldm) to row-major band storage (a, with leading
dimension lda):

for (i = 0; i < n; i++) {

m = -i;
for (j = i; j < MIN(n, i+k+1); j++) {
a[(m+j) + i*lda] = matrix[j + i*ldm];
}
}
Before entry with uplo = CblasLower, the leading (k + 1)-by-n part of
array a must contain the lower triangular band part of the matrix of
coefficients, supplied row-by-row, with the leading diagonal of the matrix in
column k of the array, the first sub-diagonal starting at position 1 in column
k-1, and so on. The top left k-by-k triangle of array a is not referenced.
The following program segment transfers the lower triangular part of a
Hermitian row-major band matrix from row-major full matrix storage
(matrix, with leading dimension ldm) to row-major band storage (a, with
leading dimension lda):

for (i = 0; i < n; i++) {

m = k - i;
for (j = max(0, i-k); j <= i; j++) {
a[(m+j) + i*lda] = matrix[j + i*ldm];
}
}

lda Specifies the leading dimension of a as declared in the calling

(sub)program. The value of lda must be at least (k + 1).

x Array, size at least (1 + (n - 1)*abs(incx)). Before entry, the

incremented array x must contain the n-element vector x.

85
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

incx Specifies the increment for the elements of x.

The value of incx must not be zero.

Output Parameters

x Overwritten with the transformed vector x.

cblas_?tbsv
Solves a system of linear equations whose coefficients
are in a triangular band matrix.

Syntax
void cblas_stbsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const MKL_INT k, const
float *a, const MKL_INT lda, float *x, const MKL_INT incx);
void cblas_dtbsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const MKL_INT k, const
double *a, const MKL_INT lda, double *x, const MKL_INT incx);
void cblas_ctbsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const MKL_INT k, const
void *a, const MKL_INT lda, void *x, const MKL_INT incx);
void cblas_ztbsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const MKL_INT k, const
void *a, const MKL_INT lda, void *x, const MKL_INT incx);

Include Files
• mkl.h

Description

The ?tbsv routines solve one of the following systems of equations:

Ax = b, or A'x = b, or conjg(A')*x = b,

where:
b and x are n-element vectors,
A is an n-by-n unit, or non-unit, upper or lower triangular band matrix, with (k + 1) diagonals.

The routine does not test for singularity or near-singularity.

Such tests must be performed before calling this routine.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

uplo Specifies whether the matrix A is an upper or lower triangular matrix:

if uplo = CblasUpper the matrix is upper triangular;

if uplo = CblasLower, the matrix is low triangular.

86
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
trans Specifies the system of equations:
if trans=CblasNoTrans, then A*x = b;

if trans=CblasTrans, then A'*x = b;

if trans=CblasConjTrans, then conjg(A')*x = b.

diag Specifies whether the matrix A is unit triangular:

if diag = CblasUnit then the matrix is unit triangular;

if diag = CblasNonUnit, then the matrix is not unit triangular.

n Specifies the order of the matrix A. The value of n must be at least zero.

k On entry with uplo = CblasUpper, k specifies the number of super-

diagonals of the matrix A. On entry with uplo = CblasLower, k specifies
the number of sub-diagonals of the matrix A.
The value of k must satisfy 0≤k.

a Array, size lda*n.

Layout = CblasColMajor:
Before entry with uplo = CblasUpper, the leading (k + 1) by n part of
the array a must contain the upper triangular band part of the matrix of
coefficients, supplied column-by-column, with the leading diagonal of the
matrix in row k of the array, the first super-diagonal starting at position 1 in
row (k - 1), and so on. The top left k by k triangle of the array a is not
referenced.
The following program segment transfers an upper triangular band matrix
from conventional full matrix storage (matrix, with leading dimension ldm)
to band storage (a, with leading dimension lda):

for (j = 0; j < n; j++) {

m = k - j;
for (i = max( 0, j - k); i <= j; i++) {
a[(m+i) + j*lda] = matrix[i + j*ldm];
}
}
Before entry with uplo = CblasLower, the leading (k + 1) by n part of
the array a must contain the lower triangular band part of the matrix of
coefficients, supplied column-by-column, with the leading diagonal of the
matrix in row 0 of the array, the first sub-diagonal starting at position 0 in
row 1, and so on. The bottom right k by k triangle of the array a is not
referenced.
The following program segment transfers a lower triangular band matrix
from conventional full matrix storage (matrix, with leading dimension ldm)
to band storage (a, with leading dimension lda):

for (j = 0; j < n; j++) {

m = -j;
for (i = j; i < min(n, j + k + 1); i++) {
a[(m+i) + j*lda] = matrix[i + j*ldm];
}
}

87
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

When diag = CblasUnit, the elements of the array a corresponding to the

diagonal elements of the matrix are not referenced, but are assumed to be
unity.
Layout = CblasRowMajor:
Before entry with uplo = CblasUpper, the leading (k + 1)-by-n part of
array a must contain the upper triangular band part of the matrix of
coefficients. The matrix must be supplied row-by-row, with the leading
diagonal of the matrix in column 0 of the array, the first super-diagonal
starting at position 0 in column 1, and so on. The bottom right k-by-k
triangle of array a is not referenced.

The following program segment transfers the upper triangular part of a

Hermitian band matrix from row-major full matrix storage (matrix with
leading dimension ldm) to row-major band storage (a, with leading
dimension lda):

for (i = 0; i < n; i++) {

m = -i;
for (j = i; j < MIN(n, i+k+1); j++) {
a[(m+j) + i*lda] = matrix[j + i*ldm];
}
}
Before entry with uplo = CblasLower, the leading (k + 1)-by-n part of
array a must contain the lower triangular band part of the matrix of
coefficients, supplied row-by-row, with the leading diagonal of the matrix in
column k of the array, the first sub-diagonal starting at position 1 in column
k-1, and so on. The top left k-by-k triangle of array a is not referenced.
The following program segment transfers the lower triangular part of a
Hermitian row-major band matrix from row-major full matrix storage
(matrix, with leading dimension ldm) to row-major band storage (a, with
leading dimension lda):

for (i = 0; i < n; i++) {

m = k - i;
for (j = max(0, i-k); j <= i; j++) {
a[(m+j) + i*lda] = matrix[j + i*ldm];
}
}

lda Specifies the leading dimension of a as declared in the calling

(sub)program. The value of lda must be at least (k + 1).

x Array, size at least (1 + (n - 1)*abs(incx)). Before entry, the

incremented array x must contain the n-element right-hand side vector b.

incx Specifies the increment for the elements of x.

The value of incx must not be zero.

Output Parameters

x Overwritten with the solution vector x.

88
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
cblas_?tpmv
Computes a matrix-vector product using a triangular
packed matrix.

Syntax
void cblas_stpmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const float *ap, float
*x, const MKL_INT incx);
void cblas_dtpmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const double *ap, double
*x, const MKL_INT incx);
void cblas_ctpmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const void *ap, void *x,
const MKL_INT incx);
void cblas_ztpmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const void *ap, void *x,
const MKL_INT incx);

Include Files
• mkl.h

Description

The ?tpmv routines perform one of the matrix-vector operations defined as

x := Ax, or x := A'x, or x := conjg(A')*x,

where:
x is an n-element vector,
A is an n-by-n unit, or non-unit, upper or lower triangular matrix, supplied in packed form.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

uplo Specifies whether the matrix A is upper or lower triangular:

uplo = CblasUpper
if uplo = CblasLower, then the matrix is low triangular.

trans Specifies the operation:

if trans=CblasNoTrans, then x := A*x;

if trans=CblasTrans, then x := A'*x;

if trans=CblasConjTrans, then x := conjg(A')*x.

diag Specifies whether the matrix A is unit triangular:

if diag = CblasUnit then the matrix is unit triangular;

if diag = CblasNonUnit, then the matrix is not unit triangular.

89
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

n Specifies the order of the matrix A. The value of n must be at least zero.

ap Array, size at least ((n*(n + 1))/2).

For Layout = CblasColMajor:

Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular matrix packed sequentially, column-by-column, so that
respectively, and so on. Before entry with uplo = CblasLowerap[0]
contains A1, 1, ap[1] and ap[2] contain A1, 2 and A2, 2, the array ap must
contain the lower triangular matrix packed sequentially, column-by-column,
so thatap[0] contains A1, 1, ap[1] and ap[2] contain A2, 1 and A3, 1
respectively, and so on. When diag = CblasUnit, the diagonal elements of
a are not referenced, but are assumed to be unity.
For Layout = CblasRowMajor:

Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular matrix packed sequentially, row-by-row, ap[0] contains A1, 1,
ap[1] and ap[2] contain A1, 2 and A1, 3 respectively, and so on.
Before entry with uplo = CblasLower, the array ap must contain the lower
triangular matrix packed sequentially, row-by-row, so that ap[0] contains
A1, 1, ap[1] and ap[2] contain A2, 1 and A2, 2 respectively, and so on.

x Array, size at least (1 + (n - 1)*abs(incx)). Before entry, the

incremented array x must contain the n-element vector x.

incx Specifies the increment for the elements of x.

The value of incx must not be zero.

Output Parameters

x Overwritten with the transformed vector x.

cblas_?tpsv
Solves a system of linear equations whose coefficients
are in a triangular packed matrix.

Syntax
void cblas_stpsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const float *ap, float
*x, const MKL_INT incx);
void cblas_dtpsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const double *ap, double
*x, const MKL_INT incx);
void cblas_ctpsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const void *ap, void *x,
const MKL_INT incx);
void cblas_ztpsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const void *ap, void *x,
const MKL_INT incx);

90
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h

Description

The ?tpsv routines solve one of the following systems of equations

Ax = b, or A'x = b, or conjg(A')*x = b,

where:
b and x are n-element vectors,
A is an n-by-n unit, or non-unit, upper or lower triangular matrix, supplied in packed form.
This routine does not test for singularity or near-singularity.
Such tests must be performed before calling this routine.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

uplo Specifies whether the matrix A is upper or lower triangular:

uplo = CblasUpper
if uplo = CblasLower, then the matrix is low triangular.

trans Specifies the system of equations:

if trans=CblasNoTrans, then A*x = b;

if trans=CblasTrans, then A'*x = b;

if trans=CblasConjTrans, then conjg(A')*x = b.

diag Specifies whether the matrix A is unit triangular:

if diag = CblasUnit then the matrix is unit triangular;

if diag = CblasNonUnit , then the matrix is not unit triangular.

n Specifies the order of the matrix A. The value of n must be at least zero.

ap Array, size at least ((n*(n + 1))/2).

For Layout = CblasColMajor:

Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular part of the triangular matrix packed sequentially, column-by-
column, so that ap[0] contains A1, 1, ap[1] and ap[2] contain A1, 2 and
A2, 2 respectively, and so on.
Before entry with uplo = CblasLower, the array ap must contain the lower
triangular part of the triangular matrix packed sequentially, column-by-
column, so that ap[0] contains A1, 1, ap[1] and ap[2] contain A2, 1 and
A3, 1 respectively, and so on.
For Layout = CblasRowMajor:

91
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular part of the triangular matrix packed sequentially, row-by-row,
ap[0] contains A1, 1, ap[1] and ap[2] contain A1, 2 and A1, 3 respectively,
and so on. Before entry with uplo = CblasLower, the array ap must
contain the lower triangular part of the triangular matrix packed
sequentially, row-by-row, so that ap[0] contains A1, 1, ap[1] and ap[2]
contain A2, 1 and A2, 2 respectively, and so on.
When diag = CblasUnit, the diagonal elements of a are not referenced,
but are assumed to be unity.

x Array, size at least (1 + (n - 1)*abs(incx)). Before entry, the

incremented array x must contain the n-element right-hand side vector b.

incx Specifies the increment for the elements of x.

The value of incx must not be zero.

Output Parameters

x Overwritten with the solution vector x.

cblas_?trmv
Computes a matrix-vector product using a triangular
matrix.

Syntax
void cblas_strmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const float *a, const
MKL_INT lda, float *x, const MKL_INT incx);
void cblas_dtrmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const double *a, const
MKL_INT lda, double *x, const MKL_INT incx);
void cblas_ctrmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const void *a, const
MKL_INT lda, void *x, const MKL_INT incx);
void cblas_ztrmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const void *a, const
MKL_INT lda, void *x, const MKL_INT incx);

Include Files
• mkl.h

Description

The ?trmv routines perform one of the following matrix-vector operations defined as

x := Ax, or x := A'x, or x := conjg(A')*x,

where:
x is an n-element vector,
A is an n-by-n unit, or non-unit, upper or lower triangular matrix.

92
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

uplo Specifies whether the matrix A is upper or lower triangular:

uplo = CblasUpper
if uplo = CblasLower, then the matrix is low triangular.

trans Specifies the operation:

if trans=CblasNoTrans, then x := A*x;

if trans=CblasTrans, then x := A'*x;

if trans=CblasConjTrans, then x := conjg(A')*x.

diag Specifies whether the matrix A is unit triangular:

if diag = CblasUnit then the matrix is unit triangular;

if diag = CblasNonUnit , then the matrix is not unit triangular.

n Specifies the order of the matrix A. The value of n must be at least zero.

a Array, size lda*n. Before entry with uplo = CblasUpper, the leading n-by-
n upper triangular part of the array a must contain the upper triangular
matrix and the strictly lower triangular part of a is not referenced. Before
entry with uplo = CblasLower, the leading n-by-n lower triangular part of
the array a must contain the lower triangular matrix and the strictly upper
triangular part of a is not referenced.
When diag = CblasUnit, the diagonal elements of a are not referenced
either, but are assumed to be unity.

lda Specifies the leading dimension of a as declared in the calling

(sub)program. The value of lda must be at least max(1, n).

x Array, size at least (1 + (n - 1)*abs(incx)). Before entry, the

incremented array x must contain the n-element vector x.

incx Specifies the increment for the elements of x.

The value of incx must not be zero.

Output Parameters

x Overwritten with the transformed vector x.

cblas_?trsv
Solves a system of linear equations whose coefficients
are in a triangular matrix.

Syntax
void cblas_strsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const float *a, const
MKL_INT lda, float *x, const MKL_INT incx);

93
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

void cblas_dtrsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const

CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const double *a, const
MKL_INT lda, double *x, const MKL_INT incx);
void cblas_ctrsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const void *a, const
MKL_INT lda, void *x, const MKL_INT incx);
void cblas_ztrsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const void *a, const
MKL_INT lda, void *x, const MKL_INT incx);

Include Files
• mkl.h

Description

The ?trsv routines solve one of the systems of equations:

Ax = b, or A'x = b, or conjg(A')*x = b,

where:
b and x are n-element vectors,
A is an n-by-n unit, or non-unit, upper or lower triangular matrix.
The routine does not test for singularity or near-singularity.
Such tests must be performed before calling this routine.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

uplo Specifies whether the matrix A is upper or lower triangular:

uplo = CblasUpper
if uplo = CblasLower, then the matrix is low triangular.

trans Specifies the systems of equations:

if trans=CblasNoTrans, then A*x = b;

if trans=CblasTrans, then A'*x = b;

if trans=CblasConjTrans, then oconjg(A')*x = b.

diag Specifies whether the matrix A is unit triangular:

if diag = CblasUnit then the matrix is unit triangular;

if diag = CblasNonUnit, then the matrix is not unit triangular.

n Specifies the order of the matrix A. The value of n must be at least zero.

94
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
entry with uplo = CblasLower, the leading n-by-n lower triangular part of
the array a must contain the lower triangular matrix and the strictly upper
triangular part of a is not referenced.
When diag = CblasUnit, the diagonal elements of a are not referenced
either, but are assumed to be unity.

lda Specifies the leading dimension of a as declared in the calling

(sub)program. The value of lda must be at least max(1, n).

x Array, size at least (1 + (n - 1)*abs(incx)). Before entry, the

incremented array x must contain the n-element right-hand side vector b.

incx Specifies the increment for the elements of x.

The value of incx must not be zero.

Output Parameters

x Overwritten with the solution vector x.

BLAS Level 3 Routines

BLAS Level 3 routines perform matrix-matrix operations. The following table lists the BLAS Level 3 routine
groups and the data types associated with them.
BLAS Level 3 Routine Groups and Their Data Types
Routine Group Data Types Description

cblas_?gemm s, d, c, z Computes a matrix-matrix product with general matrices.

cblas_?hemm c, z Computes a matrix-matrix product where one input matrix

is Hermitian.

cblas_?herk c, z Performs a Hermitian rank-k update.

cblas_?her2k c, z Performs a Hermitian rank-2k update.

cblas_?symm s, d, c, z Computes a matrix-matrix product where one input matrix

is symmetric.

cblas_?syrk s, d, c, z Performs a symmetric rank-k update.

cblas_?syr2k s, d, c, z Performs a symmetric rank-2k update.

cblas_?trmm s, d, c, z Computes a matrix-matrix product where one input matrix

is triangular.

cblas_?trsm s, d, c, z Solves a triangular matrix equation.

Symmetric Multiprocessing Version of Intel® MKL

Many applications spend considerable time executing BLAS routines. This time can be scaled by the number
of processors available on the system through using the symmetric multiprocessing (SMP) feature built into
the Intel® oneMKL. The performance enhancements based on the parallel use of the processors are available
without any programming effort on your part.
To enhance performance, the library uses the following methods:
• The BLAS functions are blocked where possible to restructure the code in a way that increases the
localization of data reference, enhances cache memory use, and reduces the dependency on the memory
bus.

95
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

• The code is distributed across the processors to maximize parallelism.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

cblas_?gemm
Computes a matrix-matrix product with general
matrices.

Syntax
void cblas_hgemm (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, const
CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const
MKL_F16 alpha, const MKL_F16 *a, const MKL_INT lda, const MKL_F16 *b, const MKL_INT
ldb, const MKL_F16 beta, MKL_F16 *c, const MKL_INT ldc);
void cblas_sgemm (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, const
CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const float
alpha, const float *a, const MKL_INT lda, const float *b, const MKL_INT ldb, const
float beta, float *c, const MKL_INT ldc);
void cblas_dgemm (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, const
CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const double
alpha, const double *a, const MKL_INT lda, const double *b, const MKL_INT ldb, const
double beta, double *c, const MKL_INT ldc);
void cblas_cgemm (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, const
CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const void
*alpha, const void *a, const MKL_INT lda, const void *b, const MKL_INT ldb, const void
*beta, void *c, const MKL_INT ldc);
void cblas_zgemm (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, const
CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const void
*alpha, const void *a, const MKL_INT lda, const void *b, const MKL_INT ldb, const void
*beta, void *c, const MKL_INT ldc);

Include Files
• mkl.h

Description

The ?gemm routines compute a scalar-matrix-matrix product and add the result to a scalar-matrix product,
with general matrices. The operation is defined as

C := alpha*op(A)*op(B) + beta*C
where:
op(X) is one of op(X) = X, or op(X) = XT, or op(X) = XH,
alpha and beta are scalars,
A, B and C are matrices:
op(A) is an m-by-k matrix,

96
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
op(B) is a k-by-n matrix,
C is an m-by-n matrix.
See also:

• ?gemm3m, BLAS-like extension routines, that use matrix multiplication for similar matrix-matrix operations

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

transa Specifies the form of op(A) used in the matrix multiplication:

• if transa=CblasNoTrans, then op(A) = A;

• if transa=CblasTrans, then op(A) = AT;
• if transa=CblasConjTrans, then op(A) = AH.

transb Specifies the form of op(B) used in the matrix multiplication:

• if transb=CblasNoTrans, then op(B) = B;

• if transb=CblasTrans, then op(B) = BT;
• if transb=CblasConjTrans, then op(B) = BH.

m Specifies the number of rows of the matrix op(A) and of the matrix C. The
value of m must be at least zero.

n Specifies the number of columns of the matrix op(B) and the number of
columns of the matrix C. The value of n must be at least zero.

k Specifies the number of columns of the matrix op(A) and the number of
rows of the matrix op(B). The value of k must be at least zero.

alpha Specifies the scalar alpha.

a
transa=CblasNoTrans transa=CblasTrans or
transa=CblasConjTrans

Layout = Array, size ldak. Array, size ldam.

CblasColMajor
Before entry, the leading Before entry, the leading k-
m-by-k part of the array a by-m part of the array a
must contain the matrix must contain the matrix A.
A.

Layout = Array, size lda* m. Array, size lda*k.

CblasRowMajor
Before entry, the leading Before entry, the leading m-
k-by-m part of the array a by-k part of the array a
must contain the matrix must contain the matrix A.
A.

lda Specifies the leading dimension of a as declared in the calling

(sub)program.

transa=CblasNoTrans transa=CblasTrans or
transa=CblasConjTrans

97
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Layout = lda must be at least lda must be at least

CblasColMajor max(1, m). max(1, k)

Layout = lda must be at least lda must be at least

CblasRowMajor max(1, k) max(1, m).

b
transb=CblasNoTrans transb=CblasTrans or
transb=CblasConjTrans

Layout = Array, size ldb by n. Array, size ldb by k. Before

CblasColMajor Before entry, the leading entry the leading n-by-k
k-by-n part of the array b part of the array b must
must contain the matrix contain the matrix B.
B.

Layout = Array, size ldb by k. Array, size ldb by n. Before

CblasRowMajor Before entry the leading entry, the leading k-by-n
n-by-k part of the array b part of the array b must
must contain the matrix contain the matrix B.
B.

ldb Specifies the leading dimension of b as declared in the calling

(sub)program.
When transb=CblasNoTrans , then ldb must be at least max(1, k),
otherwise ldb must be at least max(1, n).

transb=CblasNoTrans transb=CblasTrans or
transb=CblasConjTrans

Layout = ldb must be at least ldb must be at least

CblasColMajor max(1, k). max(1, n).

Layout = ldb must be at least ldb must be at least

CblasRowMajor max(1, n). max(1, k).

beta Specifies the scalar beta. When beta is equal to zero, then c need not be
set on input.

c
Layout = Array, size ldc by n. Before entry, the leading m-
CblasColMajor by-n part of the array c must contain the matrix C,
except when beta is equal to zero, in which case c
need not be set on entry.

Layout = Array, size ldc by m. Before entry, the leading n-

CblasRowMajor by-m part of the array c must contain the matrix C,
except when beta is equal to zero, in which case c
need not be set on entry.

98
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ldc Specifies the leading dimension of c as declared in the calling
(sub)program.

Layout = CblasColMajor ldc must be at least max(1, m).

Layout = CblasRowMajor ldc must be at least max(1, n).

Output Parameters

c Overwritten by the m-by-n matrix (alphaop(A)op(B) + beta*C).

Example
For examples of routine usage, see these code examples in the Intel® oneAPI Math Kernel Library (oneMKL)
installation directory:

• cblas_hgemm: examples\cblas\source\cblas_hgemmx.c
• cblas_sgemm: examples\cblas\source\cblas_sgemmx.c
• cblas_dgemm: examples\cblas\source\cblas_dgemmx.c
• cblas_cgemm: examples\cblas\source\cblas_cgemmx.c
• cblas_zgemm: examples\cblas\source\cblas_zgemmx.c

cblas_?hemm
Computes a matrix-matrix product where one input
matrix is Hermitian.

Syntax
void cblas_chemm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const MKL_INT m, const MKL_INT n, const void *alpha, const void *a, const MKL_INT
lda, const void *b, const MKL_INT ldb, const void *beta, void *c, const MKL_INT ldc);
void cblas_zhemm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const MKL_INT m, const MKL_INT n, const void *alpha, const void *a, const MKL_INT
lda, const void *b, const MKL_INT ldb, const void *beta, void *c, const MKL_INT ldc);

Include Files
• mkl.h

Description

The ?hemm routines compute a scalar-matrix-matrix product using a Hermitian matrix A and a general matrix
B and add the result to a scalar-matrix product using a general matrix C. The operation is defined as

C := alpha*A*B + beta*C
or

C := alpha*B*A + beta*C
where:
alpha and beta are scalars,
A is a Hermitian matrix,
B and C are m-by-n matrices.

99
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

side Specifies whether the Hermitian matrix A appears on the left or right in the
operation as follows:
if side = CblasLeft, then C := alpha*A*B + beta*C;

if side = CblasRight, then C := alphaBA + beta*C.

uplo Specifies whether the upper or lower triangular part of the Hermitian matrix
A is used:
If uplo = CblasUpper, then the upper triangular part of the Hermitian
matrix A is used.
If uplo = CblasLower, then the low triangular part of the Hermitian matrix
A is used.

m Specifies the number of rows of the matrix C.

The value of m must be at least zero.

n Specifies the number of columns of the matrix C.

The value of n must be at least zero.

alpha Specifies the scalar alpha.

a Array, size lda* ka, where ka is m when side = CblasLeft and is n

otherwise. Before entry with side = CblasLeft, the m-by-m part of the
array a must contain the Hermitian matrix, such that when uplo =
CblasUpper, the leading m-by-m upper triangular part of the array a must
contain the upper triangular part of the Hermitian matrix and the strictly
lower triangular part of a is not referenced, and when uplo = CblasLower,
the leading m-by-m lower triangular part of the array a must contain the
lower triangular part of the Hermitian matrix, and the strictly upper
triangular part of a is not referenced.
Before entry with side = CblasRight, the n-by-n part of the array a must
contain the Hermitian matrix, such that when uplo = CblasUpper, the
leading n-by-n upper triangular part of the array a must contain the upper
triangular part of the Hermitian matrix and the strictly lower triangular part
of a is not referenced, and when uplo = CblasLower, the leading n-by-n
lower triangular part of the array a must contain the lower triangular part of
the Hermitian matrix, and the strictly upper triangular part of a is not
referenced. The imaginary parts of the diagonal elements need not be set,
they are assumed to be zero.

lda Specifies the leading dimension of a as declared in the calling (sub)

program. When side = CblasLeft then lda must be at least max(1, m),
otherwise lda must be at least max(1,n).

b For Layout = CblasColMajor: array, size ldb*n. The leading m-by-n part
of the array b must contain the matrix B.

For Layout = CblasRowMajor: array, size ldb*m. The leading n-by-m part
of the array b must contain the matrix B

100
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ldb Specifies the leading dimension of b as declared in the calling
(sub)program.When Layout = CblasColMajor, ldb must be at least
max(1, m); otherwise, ldb must be at least max(1, n) .

beta Specifies the scalar beta.

When beta is supplied as zero, then c need not be set on input.

c For Layout = CblasColMajor: array, size ldc*n. Before entry, the leading
m-by-n part of the array c must contain the matrix C, except when beta is
zero, in which case c need not be set on entry.
For Layout = CblasRowMajor: array, size ldc*m. Before entry, the leading
n-by-m part of the array c must contain the matrix C, except when beta is
zero, in which case c need not be set on entry.

ldc Specifies the leading dimension of c as declared in the calling

(sub)program. When Layout = CblasColMajor, ldc must be at least
max(1, m); otherwise, ldc must be at least max(1, n) .

Output Parameters

c Overwritten by the m-by-n updated matrix.

cblas_?herk
Performs a Hermitian rank-k update.

Syntax
void cblas_cherk (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const float alpha, const void
*a, const MKL_INT lda, const float beta, void *c, const MKL_INT ldc);
void cblas_zherk (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const double alpha, const void
*a, const MKL_INT lda, const double beta, void *c, const MKL_INT ldc);

Include Files
• mkl.h

Description

The ?herk routines perform a rank-k matrix-matrix operation using a general matrix A and a Hermitian
matrix C. The operation is defined as:

C := alpha*A*AH + beta*C,
or

C := alpha*AH*A + beta*C,
where:
alpha and beta are real scalars,
C is an n-by-n Hermitian matrix,
A is an n-by-k matrix in the first case and a k-by-n matrix in the second case.

101
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

uplo Specifies whether the upper or lower triangular part of the array c is used.

If uplo = CblasUpper, then the upper triangular part of the array c is

used.
If uplo = CblasLower, then the low triangular part of the array c is used.

trans Specifies the operation:

if trans=CblasNoTrans, then C := alpha*A*AH + beta*C;

if trans=CblasConjTrans, then C := alphaAHA + beta*C.

n Specifies the order of the matrix C. The value of n must be at least zero.

k Withtrans=CblasNoTrans, k specifies the number of columns of the

matrix A, and with trans=CblasConjTrans, k specifies the number of
rows of the matrix A.
The value of k must be at least zero.

alpha Specifies the scalar alpha.

a
trans=CblasNoTrans trans=CblasConjTrans

Layout = Array, size ldak. Array, size ldan.

CblasColMajor
Before entry, the leading Before entry, the leading k-
n-by-k part of the array by-n part of the array a
a must contain the must contain the matrix A.
matrix A.

Layout = Array, size ldan. Array, size ldak.

CblasRowMajor
Before entry, the leading Before entry, the leading n-
k-by-n part of the array by-k part of the array a
a must contain the must contain the matrix A.
matrix A.

lda
trans=CblasNoTrans trans=CblasConjTrans

Layout = lda must be at least lda must be at least

CblasColMajor max(1, n). max(1, k)

Layout = lda must be at least lda must be at least

CblasRowMajor max(1, k) max(1, n).

beta Specifies the scalar beta.

c Array, size ldc by n.

102
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Before entry with uplo = CblasUpper, the leading n-by-n upper triangular
part of the array c must contain the upper triangular part of the Hermitian
matrix and the strictly lower triangular part of c is not referenced.
Before entry with uplo = CblasLower, the leading n-by-n lower triangular
part of the array c must contain the lower triangular part of the Hermitian
matrix and the strictly upper triangular part of c is not referenced.
The imaginary parts of the diagonal elements need not be set, they are
assumed to be zero.

ldc Specifies the leading dimension of c as declared in the calling

(sub)program. The value of ldc must be at least max(1, n).

Output Parameters

c With uplo = CblasUpper, the upper triangular part of the array c is

overwritten by the upper triangular part of the updated matrix.
With uplo = CblasLower, the lower triangular part of the array c is
overwritten by the lower triangular part of the updated matrix.
The imaginary parts of the diagonal elements are set to zero.

cblas_?her2k
Performs a Hermitian rank-2k update.

Syntax
void cblas_cher2k (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const void *alpha, const void
*a, const MKL_INT lda, const void *b, const MKL_INT ldb, const float beta, void *c,
const MKL_INT ldc);
void cblas_zher2k (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const void *alpha, const void
*a, const MKL_INT lda, const void *b, const MKL_INT ldb, const double beta, void *c,
const MKL_INT ldc);

Include Files
• mkl.h

Description

The ?her2k routines perform a rank-2k matrix-matrix operation using general matrices A and B and a
Hermitian matrix C. The operation is defined as

C := alphaABH + conjg(alpha)BAH + betaC

C := alphaAHB + conjg(alpha)BHA + beta*C

where:
alpha is a scalar and beta is a real scalar.
C is an n-by-n Hermitian matrix.
A and B are n-by-k matrices in the first case and k-by-n matrices in the second case.

103
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

uplo Specifies whether the upper or lower triangular part of the array c is used.

If uplo = CblasUpper, then the upper triangular of the array c is used.

If uplo = CblasLower, then the low triangular of the array c is used.

trans Specifies the operation:

iftrans=CblasNoTrans, then C:=alpha*A*BH + alpha*B*AH + beta*C;

if trans=CblasConjTrans, then C:=alphaAHB + alphaBHA +

beta*C.

n Specifies the order of the matrix C. The value of n must be at least zero.

k With trans=CblasNoTrans specifies the number of columns of the matrix

A, and with trans=CblasConjTrans, k specifies the number of rows of the
matrix A.
The value of k must be at least equal to zero.

alpha Specifies the scalar alpha.

a
trans=CblasNoTrans trans=CblasConjTrans

Layout = Array, size ldak. Array, size ldan.

CblasColMajor
Before entry, the leading Before entry, the leading k-
n-by-k part of the array by-n part of the array a
a must contain the must contain the matrix A.
matrix A.

Layout = Array, size ldan. Array, size ldak.

CblasRowMajor
Before entry, the leading Before entry, the leading n-
k-by-n part of the array by-k part of the array a
a must contain the must contain the matrix A.
matrix A.

lda Specifies the leading dimension of a as declared in the calling

(sub)program.

trans=CblasNoTrans trans=CblasConjTrans

Layout = lda must be at least lda must be at least

CblasColMajor max(1, n). max(1, k)

Layout = lda must be at least lda must be at least

CblasRowMajor max(1, k) max(1, n).

beta Specifies the scalar beta.

104
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
b
trans=CblasNoTrans trans=CblasConjTrans

Layout = Array, size ldbk. Array, size ldbn.

CblasColMajor
Before entry, the leading Before entry, the leading k-
n-by-k part of the array by-n part of the array b
b must contain the must contain the matrix B.
matrix B.

Layout = Array, size ldan. Array, size ldak.

CblasRowMajor
Before entry, the leading Before entry, the leading n-
k-by-n part of the array by-k part of the array b
b must contain the must contain the matrix B.
matrix B.

ldb Specifies the leading dimension of a as declared in the calling

(sub)program.

trans=CblasNoTrans trans=CblasConjTrans

Layout = ldb must be at least ldb must be at least

CblasColMajor max(1, n). max(1, k)

Layout = ldb must be at least ldb must be at least

CblasRowMajor max(1, k) max(1, n).

c Array, size ldc by n.

Before entry withuplo = CblasUpper, the leading n-by-n upper triangular
part of the array c must contain the upper triangular part of the Hermitian
matrix and the strictly lower triangular part of c is not referenced.
Before entry with uplo = CblasLower, the leading n-by-n lower triangular
part of the array c must contain the lower triangular part of the Hermitian
matrix and the strictly upper triangular part of c is not referenced.
The imaginary parts of the diagonal elements need not be set, they are
assumed to be zero.

ldc Specifies the leading dimension of c as declared in the calling

(sub)program. The value of ldc must be at least max(1, n).

Output Parameters

c With uplo = CblasUpper, the upper triangular part of the array c is

105
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

cblas_?symm
Computes a matrix-matrix product where one input
matrix is symmetric.

Syntax
void cblas_ssymm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const MKL_INT m, const MKL_INT n, const float alpha, const float *a, const
MKL_INT lda, const float *b, const MKL_INT ldb, const float beta, float *c, const
MKL_INT ldc);
void cblas_dsymm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const MKL_INT m, const MKL_INT n, const double alpha, const double *a, const
MKL_INT lda, const double *b, const MKL_INT ldb, const double beta, double *c, const
MKL_INT ldc);
void cblas_csymm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const MKL_INT m, const MKL_INT n, const void *alpha, const void *a, const MKL_INT
lda, const void *b, const MKL_INT ldb, const void *beta, void *c, const MKL_INT ldc);
void cblas_zsymm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const MKL_INT m, const MKL_INT n, const void *alpha, const void *a, const MKL_INT
lda, const void *b, const MKL_INT ldb, const void *beta, void *c, const MKL_INT ldc);

Include Files
• mkl.h

Description

The ?symm routines compute a scalar-matrix-matrix product with one symmetric matrix and add the result to
a scalar-matrix product . The operation is defined as

C := alpha*A*B + beta*C,
or

C := alpha*B*A + beta*C,
where:
alpha and beta are scalars,
A is a symmetric matrix,
B and C are m-by-n matrices.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

side Specifies whether the symmetric matrix A appears on the left or right in the
operation:
if side = CblasLeft, then C := alpha*A*B + beta*C;

if side = CblasRight, then C := alphaBA + beta*C.

uplo Specifies whether the upper or lower triangular part of the symmetric
matrix A is used:

106
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
if uplo = CblasUpper, then the upper triangular part is used;

if uplo = CblasLower, then the lower triangular part is used.

m Specifies the number of rows of the matrix C.

The value of m must be at least zero.

n Specifies the number of columns of the matrix C.

The value of n must be at least zero.

alpha Specifies the scalar alpha.

a Array, size lda* ka , where ka is m when side = CblasLeft and is n

otherwise.
Before entry with side = CblasLeft, the m-by-m part of the array a must
contain the symmetric matrix, such that when uplo = CblasUpper, the
leading m-by-m upper triangular part of the array a must contain the upper
triangular part of the symmetric matrix and the strictly lower triangular part
of a is not referenced, and when uplo = CblasLeft, the leading m-by-m
lower triangular part of the array a must contain the lower triangular part of
the symmetric matrix and the strictly upper triangular part of a is not
referenced.
Before entry with side = CblasRight, the n-by-n part of the array a must
contain the symmetric matrix, such that when uplo = CblasUppere array a
must contain the upper triangular part of the symmetric matrix and the
strictly lower triangular part of a is not referenced, and when uplo =
CblasLeft, the leading n-by-n lower triangular part of the array a must
contain the lower triangular part of the symmetric matrix and the strictly
upper triangular part of a is not referenced.

lda Specifies the leading dimension of a as declared in the calling

(sub)program. When side = CblasLeft then lda must be at least max(1,
m), otherwise lda must be at least max(1, n).

b For Layout = CblasColMajor: array, size ldb*n. The leading m-by-n part
of the array b must contain the matrix B.
For Layout = CblasRowMajor: array, size ldb*m. The leading n-by-m part
of the array b must contain the matrix B

ldb Specifies the leading dimension of b as declared in the calling

(sub)program. When Layout = CblasColMajor, ldb must be at least
max(1, m); otherwise, ldb must be at least max(1, n).

beta Specifies the scalar beta.

When beta is set to zero, then c need not be set on input.

107
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ldc Specifies the leading dimension of c as declared in the calling

(sub)program. When Layout = CblasColMajor, ldc must be at least
max(1, m); otherwise, ldc must be at least max(1, n).

Output Parameters

c Overwritten by the m-by-n updated matrix.

cblas_?syrk
Performs a symmetric rank-k update.

Syntax
void cblas_ssyrk (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const float alpha, const float
*a, const MKL_INT lda, const float beta, float *c, const MKL_INT ldc);
void cblas_dsyrk (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const double alpha, const
double *a, const MKL_INT lda, const double beta, double *c, const MKL_INT ldc);
void cblas_csyrk (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const void *alpha, const void
*a, const MKL_INT lda, const void *beta, void *c, const MKL_INT ldc);
void cblas_zsyrk (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const void *alpha, const void
*a, const MKL_INT lda, const void *beta, void *c, const MKL_INT ldc);

Include Files
• mkl.h

Description

The ?syrk routines perform a rank-k matrix-matrix operation for a symmetric matrix C using a general
matrix A . The operation is defined as:

C := alpha*A*A' + beta*C,
or

C := alpha*A'*A + beta*C,
where:
alpha and beta are scalars,
C is an n-by-n symmetric matrix,
A is an n-by-k matrix in the first case and a k-by-n matrix in the second case.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

uplo Specifies whether the upper or lower triangular part of the array c is used.

108
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If uplo = CblasUpper, then the upper triangular part of the array c is
used.
If uplo = CblasLower, then the low triangular part of the array c is used.

trans Specifies the operation:

if trans=CblasNoTrans, then C := alpha*A*A' + beta*C;

if trans=CblasTrans, then C := alphaA'A + beta*C;

if trans=CblasConjTrans, then C := alphaA'A + beta*C.

n Specifies the order of the matrix C. The value of n must be at least zero.

k On entry with trans=CblasNoTrans, k specifies the number of columns of

the matrix a, and on entry with trans=CblasTrans or
trans=CblasConjTrans , k specifies the number of rows of the matrix a.
The value of k must be at least zero.

alpha Specifies the scalar alpha.

a Array, size lda* ka, where ka is k when trans=CblasNoTrans, and is n

otherwise. Before entry with trans=CblasNoTrans, the leading n-by-k
part of the array a must contain the matrix A, otherwise the leading k-by-n
part of the array a must contain the matrix A.

trans=CblasNoTrans trans=CblasConjTrans

Layout = Array, size ldak. Array, size ldan.

CblasColMajor
Before entry, the leading Before entry, the leading k-
n-by-k part of the array by-n part of the array a
a must contain the must contain the matrix A.
matrix A.

Layout = Array, size ldan. Array, size ldak.

CblasRowMajor
Before entry, the leading Before entry, the leading n-
k-by-n part of the array by-k part of the array a
a must contain the must contain the matrix A.
matrix A.

lda
trans=CblasNoTrans trans=CblasConjTrans

Layout = lda must be at least lda must be at least

CblasColMajor max(1, n). max(1, k)

Layout = lda must be at least lda must be at least

CblasRowMajor max(1, k) max(1, n).

beta Specifies the scalar beta.

109
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Before entry with uplo = CblasLower, the leading n-by-n lower triangular
part of the array c must contain the lower triangular part of the symmetric
matrix and the strictly upper triangular part of c is not referenced.

ldc Specifies the leading dimension of c as declared in the calling

(sub)program. The value of ldc must be at least max(1, n).

Output Parameters

c With uplo = CblasUpper, the upper triangular part of the array c is

overwritten by the upper triangular part of the updated matrix.
With uplo = CblasLower, the lower triangular part of the array c is
overwritten by the lower triangular part of the updated matrix.

cblas_?syr2k
Performs a symmetric rank-2k update.

Syntax
void cblas_ssyr2k (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const float alpha, const float
*a, const MKL_INT lda, const float *b, const MKL_INT ldb, const float beta, float *c,
const MKL_INT ldc);
void cblas_dsyr2k (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const double alpha, const
double *a, const MKL_INT lda, const double *b, const MKL_INT ldb, const double beta,
double *c, const MKL_INT ldc);
void cblas_csyr2k (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const void *alpha, const void
*a, const MKL_INT lda, const void *b, const MKL_INT ldb, const void *beta, void *c,
const MKL_INT ldc);
void cblas_zsyr2k (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const void *alpha, const void
*a, const MKL_INT lda, const void *b, const MKL_INT ldb, const void *beta, void *c,
const MKL_INT ldc);

Include Files
• mkl.h

Description

The ?syr2k routines perform a rank-2k matrix-matrix operation for a symmetric matrix C using general
matrices A and BThe operation is defined as:

C := alphaAB' + alphaBA' + beta*C,

C := alphaA'B + alphaB'A + beta*C,

where:
alpha and beta are scalars,

110
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
C is an n-by-n symmetric matrix,
A and B are n-by-k matrices in the first case, and k-by-n matrices in the second case.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

uplo Specifies whether the upper or lower triangular part of the array c is used.

If uplo = CblasUpper, then the upper triangular part of the array c is

used.
If uplo = CblasLower, then the low triangular part of the array c is used.

trans Specifies the operation:

if trans=CblasNoTrans, then C := alpha*A*B'+alpha*B*A'+beta*C;

if trans=CblasTrans, then C := alphaA'B +alphaB'A +beta*C;

if trans=CblasConjTrans, then C := alphaA'B +alphaB'A

+beta*C.

n Specifies the order of the matrix C.The value of n must be at least zero.

k On entry with trans=CblasNoTrans, k specifies the number of columns of

the matrices A and B, and on entry with trans=CblasTrans or
trans=CblasConjTrans, k specifies the number of rows of the matrices A
and B. The value of k must be at least zero.

alpha Specifies the scalar alpha.

a
trans=CblasNoTrans trans=CblasConjTrans

Layout = Array, size ldak. Array, size ldan.

CblasColMajor
Before entry, the leading Before entry, the leading k-
n-by-k part of the array by-n part of the array a
a must contain the must contain the matrix A.
matrix A.

Layout = Array, size ldan. Array, size ldak.

CblasRowMajor
Before entry, the leading Before entry, the leading n-
k-by-n part of the array by-k part of the array a
a must contain the must contain the matrix A.
matrix A.

lda Specifies the leading dimension of a as declared in the calling

(sub)program.

trans=CblasNoTrans trans=CblasConjTrans

Layout = lda must be at least lda must be at least

CblasColMajor max(1, n). max(1, k)

111
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Layout = lda must be at least lda must be at least

CblasRowMajor max(1, k) max(1, n).

b
trans=CblasNoTrans trans=CblasConjTrans

Layout = Array, size ldbk. Array, size ldbn.

CblasColMajor
Before entry, the leading Before entry, the leading k-
n-by-k part of the array by-n part of the array b
b must contain the must contain the matrix B.
matrix B.

Layout = Array, size ldan. Array, size ldak.

CblasRowMajor
Before entry, the leading Before entry, the leading n-
k-by-n part of the array by-k part of the array b
b must contain the must contain the matrix B.
matrix B.

ldb Specifies the leading dimension of a as declared in the calling

(sub)program.

trans=CblasNoTrans trans=CblasConjTrans

Layout = ldb must be at least ldb must be at least

CblasColMajor max(1, n). max(1, k)

Layout = ldb must be at least ldb must be at least

CblasRowMajor max(1, k) max(1, n).

beta Specifies the scalar beta.

c Array, size ldc* n. Before entry with uplo = CblasUpper, the leading n-
by-n upper triangular part of the array c must contain the upper triangular
part of the symmetric matrix and the strictly lower triangular part of c is not
referenced.
Before entry with uplo = CblasLower, the leading n-by-n lower triangular
part of the array c must contain the lower triangular part of the symmetric
matrix and the strictly upper triangular part of c is not referenced.

ldc Specifies the leading dimension of c as declared in the calling

(sub)program. The value of ldc must be at least max(1, n).

Output Parameters

c With uplo = CblasUpper, the upper triangular part of the array c is

overwritten by the upper triangular part of the updated matrix.
With uplo = CblasLower, the lower triangular part of the array c is
overwritten by the lower triangular part of the updated matrix.

112
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
cblas_?trmm
Computes a matrix-matrix product where one input
matrix is triangular.

Syntax
void cblas_strmm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m, const
MKL_INT n, const float alpha, const float *a, const MKL_INT lda, float *b, const
MKL_INT ldb);
void cblas_dtrmm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m, const
MKL_INT n, const double alpha, const double *a, const MKL_INT lda, double *b, const
MKL_INT ldb);
void cblas_ctrmm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m, const
MKL_INT n, const void *alpha, const void *a, const MKL_INT lda, void *b, const MKL_INT
ldb);
void cblas_ztrmm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m, const
MKL_INT n, const void *alpha, const void *a, const MKL_INT lda, void *b, const MKL_INT
ldb);

Include Files
• mkl.h

Description

The ?trmm routines compute a scalar-matrix-matrix product with one triangular matrix . The operation is
defined as

B := alpha*op(A)*B
or

B := alpha*B*op(A)
where:
alpha is a scalar,
B is an m-by-n matrix,
A is a unit, or non-unit, upper or lower triangular matrix
op(A) is one of op(A) = A, or op(A) = A', or op(A) = conjg(A').

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

side Specifies whether op(A) appears on the left or right of B in the operation:

if side = CblasLeft, then B := alphaop(A)B;

if side = CblasRight, then B := alphaBop(A).

113
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

uplo Specifies whether the matrix A is upper or lower triangular.

uplo = CblasUpper
if uplo = CblasLower, then the matrix is low triangular.

transa Specifies the form of op(A) used in the matrix multiplication:

if transa=CblasNoTrans, then op(A) = A;

if transa=CblasTrans, then op(A) = A';

if transa=CblasConjTrans, then op(A) = conjg(A').

diag Specifies whether the matrix A is unit triangular:

if diag = CblasUnit then the matrix is unit triangular;

if diag = CblasNonUnit , then the matrix is not unit triangular.

m Specifies the number of rows of B. The value of m must be at least zero.

n Specifies the number of columns of B. The value of n must be at least zero.

alpha Specifies the scalar alpha.

When alpha is zero, then a is not referenced and b need not be set before
entry.

a Array, size lda by k, where k is m when side = CblasLeft and is n when

side = CblasRight. Before entry with uplo = CblasUpper, the leading k
by k upper triangular part of the array a must contain the upper triangular
matrix and the strictly lower triangular part of a is not referenced.
Before entry with uplo = CblasLower, the leading k by k lower triangular
part of the array a must contain the lower triangular matrix and the strictly
upper triangular part of a is not referenced.
When diag = CblasUnit, the diagonal elements of a are not referenced
either, but are assumed to be unity.

lda Specifies the leading dimension of a as declared in the calling

(sub)program. Whenside = CblasLeft, then lda must be at least max(1,
m), when side = CblasRight, then lda must be at least max(1, n).

b For Layout = CblasColMajor: array, size ldb*n. Before entry, the leading
m-by-n part of the array b must contain the matrix B.
For Layout = CblasRowMajor: array, size ldb*m. Before entry, the leading
n-by-m part of the array b must contain the matrix B.

ldb Specifies the leading dimension of b as declared in the calling

(sub)program. When Layout = CblasColMajor, ldb must be at least
max(1, m); otherwise, ldb must be at least max(1, n).

Output Parameters

b Overwritten by the transformed matrix.

cblas_?trsm
Solves a triangular matrix equation.

114
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void cblas_strsm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m, const
MKL_INT n, const float alpha, const float *a, const MKL_INT lda, float *b, const
MKL_INT ldb);
void cblas_dtrsm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m, const
MKL_INT n, const double alpha, const double *a, const MKL_INT lda, double *b, const
MKL_INT ldb);
void cblas_ctrsm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m, const
MKL_INT n, const void *alpha, const void *a, const MKL_INT lda, void *b, const MKL_INT
ldb);
void cblas_ztrsm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m, const
MKL_INT n, const void *alpha, const void *a, const MKL_INT lda, void *b, const MKL_INT
ldb);

Include Files
• mkl.h

Description

The ?trsm routines solve one of the following matrix equations:

op(A)*X = alpha*B,
or

X*op(A) = alpha*B,
where:
alpha is a scalar,
X and B are m-by-n matrices,
A is a unit, or non-unit, upper or lower triangular matrix, and
op(A) is one of op(A) = A, or op(A) = A', or op(A) = conjg(A').
The matrix B is overwritten by the solution matrix X.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

side Specifies whether op(A) appears on the left or right of X in the equation:

if side = CblasLeft, then op(A)X = alphaB;

if side = CblasRight, then Xop(A) = alphaB.

uplo Specifies whether the matrix A is upper or lower triangular.

uplo = CblasUpper

115
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

if uplo = CblasLower, then the matrix is low triangular.

transa Specifies the form of op(A) used in the matrix multiplication:

if transa=CblasNoTrans, then op(A) = A;

if transa=CblasTrans;

if transa=CblasConjTrans, then op(A) = conjg(A').

diag Specifies whether the matrix A is unit triangular:

if diag = CblasUnit then the matrix is unit triangular;

if diag = CblasNonUnit , then the matrix is not unit triangular.

m Specifies the number of rows of B. The value of m must be at least zero.

n Specifies the number of columns of B. The value of n must be at least zero.

alpha Specifies the scalar alpha.

When alpha is zero, then a is not referenced and b need not be set before
entry.

a Array, size lda* k , where k is m when side = CblasLeft and is n when

side = CblasRight. Before entry with uplo = CblasUpper, the leading k
by k upper triangular part of the array a must contain the upper triangular
matrix and the strictly lower triangular part of a is not referenced.
Before entry with uplo = CblasLower lower triangular part of the array a
must contain the lower triangular matrix and the strictly upper triangular
part of a is not referenced.
When diag = CblasUnit, the diagonal elements of a are not referenced
either, but are assumed to be unity.

lda Specifies the leading dimension of a as declared in the calling

(sub)program. When side = CblasLeft, then lda must be at least max(1,
m), when side = CblasRight, then lda must be at least max(1, n).

ldb Specifies the leading dimension of b as declared in the calling

(sub)program. When Layout = CblasColMajor, ldb must be at least
max(1, m); otherwise, ldb must be at least max(1, n).

Output Parameters

b Overwritten by the solution matrix X.

cblas_?trmm_oop
Computes a matrix-matrix product where one input
matrix is triangular and the other matrix is general,
putting output into a different matrix.

116
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void cblas_strmm_oop (
const CBLAS_LAYOUT layout, const CBLAS_SIDE side, const CBLAS_UPLO uplo,
const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m,
const MKL_INT n, const float alpha, const float *a, const MKL_INT lda,
const float *b, const MKL_INT ldb, const float beta, float *c,
const MKL_INT ldc);

void cblas_dtrmm_oop (
const CBLAS_LAYOUT layout, const CBLAS_SIDE side, const CBLAS_UPLO uplo,
const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m,
const MKL_INT n, const double alpha, const double *a, const MKL_INT lda,
const double *b, const MKL_INT ldb, const double beta, double *c,
const MKL_INT ldc);

void cblas_ctrmm_oop (
const CBLAS_LAYOUT layout, const CBLAS_SIDE side,
const CBLAS_UPLO uplo, const CBLAS_TRANSPOSE transa,
const CBLAS_DIAG diag, const MKL_INT m, const MKL_INT n,
const void* alpha, const void *a, const MKL_INT lda, const void *b,
const MKL_INT ldb, const void* beta, void *c, const MKL_INT ldc);

void cblas_ztrmm_oop (
const CBLAS_LAYOUT layout, const CBLAS_SIDE side,
const CBLAS_UPLO uplo, const CBLAS_TRANSPOSE transa,
const CBLAS_DIAG diag, const MKL_INT m, const MKL_INT n,
const void* alpha, const void *a, const MKL_INT lda, const void *b,
const MKL_INT ldb, const void* beta, void *c, const MKL_INT ldc);

Include Files
mkl.h

Description
The cblas_?trmm_oop routines compute a scalar-matrix-matrix product where one of the matrices in the
multiplication is triangular, and then add the result to a scalar-matrix product. The operation is defined as

C := alpha*op(A)*B + beta*C
or

C := alpha*B*op(A) + beta*C
where:

• alpha and beta are scalars

• A is a unit, or non-unit, upper or lower triangular matrix
• B is an m-by-n matrix
• C is an m-by-n matrix
• op(A) is one of op(A) = A, op(A) = A', or op(A) = conjg(A')

Input Parameters

layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

side Specifies whether op(A) appears on the left or right of B in the operation.

If side = CblasLeft, then C := alphaop(A)B + beta*C.

117
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If side = CblasRight, then C := alphaBop(A) + beta*C.

uplo Specifies whether the matrix A is upper or lower triangular.

If uplo = CblasUpper, then the matrix is upper triangular.

If uplo = CblasLower, then the matrix is lower triangular.

transa Specifies the form of op(A) used in the matrix multiplication.

If transa=CblasNoTrans, then op(A) = A.

If transa=CblasTrans, then op(A) = A'.

If transa=CblasConjTrans, then op(A) = conjg(A').

diag Specifies whether the matrix A is unit triangular.

If diag = CblasUnit, then the matrix is unit triangular.

If diag = CblasNonUnit, then the matrix is not unit triangular.

m Specifies the number of rows of matrix B. The value of m must be at least

zero.

n Specifies the number of columnss of matrix B. The value of n must be at

least zero.

alpha Specifies the scalar alpha.

a Array of size lda*k, where k is m when side = CblasLeft and k is n

when side = CblasRight.

Before entry with uplo = CblasUpper, the leading k by k upper triangular

part of the array a must contain the upper triangular matrix and the strictly
lower triangular part of a is not referenced.

Before entry with uplo = CblasLower the lower triangular part of the
array a must contain the lower triangular matrix and the strictly upper
triangular part of a is not referenced.

When diag = CblasUnit, the diagonal elements of a are not referenced

either, but are assumed to be unity.

lda Specifies the leading dimension of a. When side = CblasLeft, then lda
must be at least max(1, m). When side = CblasRight, then lda must
be at least max(1, n).

b For layout = CblasColMajor, array of size ldb*n. Before entry, the

leading m-by-n part of the array b must contain the matrix B.

For layout = CblasRowMajor, array of size ldb*m. Before entry, the

leading n-by-m part of the array b must contain the matrix B.

ldb Specifies the leading dimension of b. When layout = CblasColMajor,

ldb must be at least max(1, m); otherwise, ldb must be at least max(1,
n).

beta Specifies the scalar beta.

c For layout = CblasColMajor, array of size ldc*n. Before entry, the

leading m-by-n part of the array c must contain the matrix C.

118
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
For layout = CblasRowMajor, array of size ldc*m. Before entry, the
leading n-by-m part of the array c must contain the matrix C.

ldc Specifies the leading dimension of c. When layout = CblasColMajor,

ldc must be at least max(1, m); otherwise, ldc must be at least max(1,
n).

Output Parameters

c Output matrix overwritten by the operation

cblas_?trsm_oop
Solves a triangular matrix equation and adds the
result to another scaled matrix.

Syntax

void cblas_strsm_oop (
const CBLAS_LAYOUT layout, const CBLAS_SIDE side, const CBLAS_UPLO uplo,
const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m,
const MKL_INT n, const float alpha, const float *a, const MKL_INT lda,
const float *b, const MKL_INT ldb, const float beta, float *c,
const MKL_INT ldc);

void cblas_dtrsm_oop (
const CBLAS_LAYOUT layout, const CBLAS_SIDE side, const CBLAS_UPLO uplo,
const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m,
const MKL_INT n, const double alpha, const double *a, const MKL_INT lda,
const double *b, const MKL_INT ldb, const double beta, double *c,
const MKL_INT ldc);

void cblas_ctrsm_oop (
const CBLAS_LAYOUT layout, const CBLAS_SIDE side,
const CBLAS_UPLO uplo, const CBLAS_TRANSPOSE transa,
const CBLAS_DIAG diag, const MKL_INT m, const MKL_INT n,
const void* alpha, const void *a, const MKL_INT lda, const void *b,
const MKL_INT ldb, const void* beta, void *c, const MKL_INT ldc);

void cblas_ztrsm_oop (
const CBLAS_LAYOUT layout, const CBLAS_SIDE side,
const CBLAS_UPLO uplo, const CBLAS_TRANSPOSE transa,
const CBLAS_DIAG diag, const MKL_INT m, const MKL_INT n,
const void* alpha, const void *a, const MKL_INT lda, const void *b,
const MKL_INT ldb, const void* beta, void *c, const MKL_INT ldc);

Include Files
mkl.h

Description
The cblas_?trsm_oop routines perform a triangular matrix solve followed by a scaled matrix addition.

For a left-side solve, the routine solves

op(A)*X=alpha*B

119
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

for x and then computes

C := X + beta*C
For a right-side solve, the routine solves

X*op(A)=alpha*B
followed by the same-scaled addition

C := X + beta*C
where:

• alpha and beta are scalars.

• A is a unit or non-unit upper or lower triangular matrix.
• B is an m-by-n matrix.
• C is an m-by-n matrix.
• op(A) is one of op(A) = A, op(A) = A', or op(A) = conjg(A').

Input Parameters

layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

side Specifies whether op(A) appears on the left or right of B in the triangular
solve.
If side = CblasLeft, then we solve op(A)*X = alpha*B before
performing C := X + beta*C.

If side = CblasRight, then we solve Xop(A) = alphaB before

performing C := X + beta*C.

uplo Specifies whether the matrix A is upper or lower triangular.

If uplo = CblasUpper, then the matrix is upper triangular.

If uplo = CblasLower, then the matrix is lower triangular.

transa Specifies the form of op(A) used in the triangular solve.

If transa=CblasNoTrans, then op(A) = A.

If transa=CblasTrans, then op(A) = A'.

If transa=CblasConjTrans, then op(A) = conjg(A').

diag Specifies whether the matrix A is unit triangular.

If diag = CblasUnit then the matrix is unit triangular.

If diag = CblasNonUnit, then the matrix is not unit triangular.

m Specifies the number of rows of matrix B. The value of m must be at least

zero.

n Specifies the number of columns of matrix B. The value of n must be at

least zero.

alpha Specifies the scalar alpha.

a Array of size lda*k, where k is m when side = CblasLeft and k is n

when side = CblasRight.

120
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Before entry with uplo = CblasUpper, the leading k by k upper triangular
part of the array a must contain the upper triangular matrix and the strictly
lower triangular part of a is not referenced.

Before entry with uplo = CblasLower the lower triangular part of the
array a must contain the lower triangular matrix and the strictly upper
triangular part of a is not referenced.

When diag = CblasUnit, the diagonal elements of a are not referenced

either, but are assumed to be unity.

lda Specifies the leading dimension of a. When side = CblasLeft, then lda
must be at least max(1, m). When side = CblasRight, then lda must
be at least max(1, n).

b For layout = CblasColMajor, array of size ldb*n. Before entry, the leading m-
by-n part of the array b must contain the matrix B.
For layout = CblasRowMajor, array of size ldb*m. Before entry, the leading
n-by-m part of the array b must contain the matrix B.

ldb Specifies the leading dimension of b. When layout = CblasColMajor,

ldb must be at least max(1, m); otherwise, ldb must be at least max(1,
n).

beta Specifies the scalar beta.

c For layout = CblasColMajor, array of size ldc*n. Before entry, the

leading m-by-n part of the array c must contain the matrix C.

For layout = CblasRowMajor, array of size ldc*m. Before entry, the

leading n-by-m part of the array c must contain the matrix C.

ldc Specifies the leading dimension of c. When layout = CblasColMajor,

ldc must be at least max(1, m); otherwise, ldc must be at least max(1,
n).

Output Parameters

c Output matrix overwritten by the operation.

Sparse BLAS Level 1 Routines

This section describes Sparse BLAS Level 1, an extension of BLAS Level 1 included in the Intel® oneAPI Math
Kernel Library beginning with the Intel® oneAPI Math Kernel Library (oneMKL) release 2.1. Sparse BLAS Level
1 is a group of routines and functions that perform a number of common vector operations on sparse vectors
stored in compressed form.
Sparse vectors are those in which the majority of elements are zeros. Sparse BLAS routines and functions
are specially implemented to take advantage of vector sparsity. This allows you to achieve large savings in
computer time and memory. If nz is the number of non-zero vector elements, the computer time taken by
Sparse BLAS operations will be O(nz).

121
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Vector Arguments
Compressed sparse vectors. Let a be a vector stored in an array, and assume that the only non-zero
elements of a are the following:
a[k1], a[k2], a[k3] . . . a[knz],
where nz is the total number of non-zero elements in a.
In Sparse BLAS, this vector can be represented in compressed form by two arrays, x (values) and indx
(indices). Each array has nz elements:
x[0]=a[k1], x[1]=a[k2], . . . x[nz-1]= a[knz],
indx[0]=k1, indx[1]=k2, . . . indx[nz-1]= knz.
Thus, a sparse vector is fully determined by the triple (nz, x, indx). If you pass a negative or zero value of nz
to Sparse BLAS, the subroutines do not modify any arrays or variables.
Full-storage vectors. Sparse BLAS routines can also use a vector argument fully stored in a single array (a
full-storage vector). If y is a full-storage vector, its elements must be stored contiguously: the first element
in y[0], the second in y[1], and so on. This corresponds to an increment incy = 1 in BLAS Level 1. No
increment value for full-storage vectors is passed as an argument to Sparse BLAS routines or functions.

Naming Conventions for Sparse BLAS Routines

Similar to BLAS, the names of Sparse BLAS subprograms have prefixes that determine the data type
involved: s and d for single- and double-precision real; c and z for single- and double-precision complex
respectively.
If a Sparse BLAS routine is an extension of a "dense" one, the subprogram name is formed by appending the
suffix i (standing for indexed) to the name of the corresponding "dense" subprogram. For example, the
Sparse BLAS routine saxpyi corresponds to the BLAS routine saxpy, and the Sparse BLAS function cdotci
corresponds to the BLAS function cdotc.

Routines and Data Types

Routines and data types supported in the Intel® oneAPI Math Kernel Library (oneMKL) implementation of
Sparse BLAS are listed inTable “Sparse BLAS Routines and Their Data Types”.
Sparse BLAS Routines and Their Data Types
Routine/ Data Types Description
Function

cblas_?axpyi s, d, c, z Scalar-vector product plus vector (routines)

cblas_?doti s, d Dot product (functions)

cblas_?dotci c, z Complex dot product conjugated (functions)

cblas_?dotui c, z Complex dot product unconjugated (functions)

cblas_?gthr s, d, c, z Gathering a full-storage sparse vector into compressed

form nz, x, indx (routines)

cblas_?gthrz s, d, c, z Gathering a full-storage sparse vector into compressed

form and assigning zeros to gathered elements in the full-
storage vector (routines)

cblas_?roti s, d Givens rotation (routines)

cblas_?sctr s, d, c, z Scattering a vector from compressed form to full-storage

form (routines)

122
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

BLAS Level 1 Routines That Can Work With Sparse Vectors

The following BLAS Level 1 routines will give correct results when you pass to them a compressed-form array
x(with the increment incx=1):

cblas_?asum sum of absolute values of vector elements

cblas_?copy copying a vector

cblas_?nrm2 Euclidean norm of a vector

cblas_?scal scaling a vector

cblas_i?amax index of the element with the largest absolute value for real flavors, or the
largest sum |Re(x[i])|+|Im(x[i])| for complex flavors.

cblas_i?amin index of the element with the smallest absolute value for real flavors, or the
smallest sum |Re(x[i])|+|Im(x[i])| for complex flavors.

The result i returned by i?amax and i?amin should be interpreted as index in the compressed-form array, so
that the largest (smallest) value is x[i-1]; the corresponding index in full-storage array is indx[i-1].
You can also call cblas_?rotg to compute the parameters of Givens rotation and then pass these
parameters to the Sparse BLAS routines cblas_?roti.

cblas_?axpyi
Adds a scalar multiple of compressed sparse vector to
a full-storage vector.

Syntax
void cblas_saxpyi (const MKL_INT nz, const float a, const float *x, const MKL_INT
*indx, float *y);
void cblas_daxpyi (const MKL_INT nz, const double a, const double *x, const MKL_INT
*indx, double *y);
void cblas_caxpyi (const MKL_INT nz, const void *a, const void *x, const MKL_INT *indx,
void *y);
void cblas_zaxpyi (const MKL_INT nz, const void *a, const void *x, const MKL_INT *indx,
void *y);

Include Files
• mkl.h

Description

The ?axpyi routines perform a vector-vector operation defined as

y := a*x + y
where:
a is a scalar,
x is a sparse vector stored in compressed form,
y is a vector in full storage form.
The ?axpyi routines reference or modify only the elements of y whose indices are listed in the array indx.

123
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The values in indx must be distinct.

Input Parameters

nz The number of elements in x and indx.

a Specifies the scalar a.

x Array, size at least nz.

indx Specifies the indices for the elements of x.

Array, size at least nz.

y Array, size at least max(indx[i]).

Output Parameters

y Contains the updated vector y.

cblas_?doti
Computes the dot product of a compressed sparse real
vector by a full-storage real vector.

Syntax
float cblas_sdoti (const MKL_INT nz, const float *x, const MKL_INT *indx, const float
*y);
double cblas_ddoti (const MKL_INT nz, const double *x, const MKL_INT *indx, const
double *y);

Include Files
• mkl.h

Description

The ?doti routines return the dot product of x and y defined as

res = x[0]y[indx[0]] + x[1]y[indx[1]] +...+ x[nz-1]*y[indx[nz-1]]

where the triple (nz, x, indx) defines a sparse real vector stored in compressed form, and y is a real vector in
full storage form. The functions reference only the elements of y whose indices are listed in the array indx.
The values in indx must be distinct.

Input Parameters

nz The number of elements in x and indx .

x Array, size at least nz.

indx Specifies the indices for the elements of x.

Array, size at least nz.

y Array, size at least max(indx[i]).

124
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

res Contains the dot product of x and y, if nz is positive. Otherwise, res

contains 0.

cblas_?dotci
Computes the conjugated dot product of a
compressed sparse complex vector with a full-storage
complex vector.

Syntax
void cblas_cdotci_sub (const MKL_INT nz, const void *x, const MKL_INT *indx, const void
*y, void *dotui);
void cblas_zdotci_sub (const MKL_INT nz, const void *x, const MKL_INT *indx, const void
*y, void *dotui);

Include Files
• mkl.h

Description

The ?dotci routines return the dot product of x and y defined as

conjg(x[0])y[indx[0]] + ... + conjg(x[nz-1])y[indx[nz-1]]

where the triple (nz, x, indx) defines a sparse complex vector stored in compressed form, and y is a real
vector in full storage form. The functions reference only the elements of y whose indices are listed in the
array indx. The values in indx must be distinct.

Input Parameters

nz The number of elements in x and indx .

x Array, size at least nz.

indx Specifies the indices for the elements of x.

Array, size at least nz.

y Array, size at least max(indx[i]).

Output Parameters

dotui Contains the conjugated dot product of x and y, if nz is positive. Otherwise,

it contains 0.

cblas_?dotui
Computes the dot product of a compressed sparse
complex vector by a full-storage complex vector.

Syntax
void cblas_cdotui_sub (const MKL_INT nz, const void *x, const MKL_INT *indx, const void
*y, void *dotui);

125
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

void cblas_zdotui_sub (const MKL_INT nz, const void *x, const MKL_INT *indx, const void
*y, void *dotui);

Include Files
• mkl.h

Description

The ?dotui routines return the dot product of x and y defined as

res = x[0]y[indx[0]] + x[1]y(indx[1]) +...+ x[nz - 1]*y[indx[nz - 1]]

Input Parameters

nz The number of elements in x and indx.

x Array, size at least nz.

indx Specifies the indices for the elements of x.

Array, size at least nz.

y Array, size at least max(indx[i]).

Output Parameters

dotui Contains the dot product of x and y, if nz is positive. Otherwise, res

contains 0.

cblas_?gthr
Gathers a full-storage sparse vector's elements into
compressed form.

Syntax
void cblas_sgthr (const MKL_INT nz, const float *y, float *x, const MKL_INT *indx);
void cblas_dgthr (const MKL_INT nz, const double *y, double *x, const MKL_INT *indx);
void cblas_cgthr (const MKL_INT nz, const void *y, void *x, const MKL_INT *indx);
void cblas_zgthr (const MKL_INT nz, const void *y, void *x, const MKL_INT *indx);

Include Files
• mkl.h

Description

The ?gthr routines gather the specified elements of a full-storage sparse vector y into compressed form(nz,
x, indx). The routines reference only the elements of y whose indices are listed in the array indx:
x[i] = y]indx[i]], for i=0,1,... ,nz-1.

126
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

nz The number of elements of y to be gathered.

indx Specifies indices of elements to be gathered.

Array, size at least nz.

y Array, size at least max(indx[i]).

Output Parameters

x Array, size at least nz.

Contains the vector converted to the compressed form.

cblas_?gthrz
Gathers a sparse vector's elements into compressed
form, replacing them by zeros.

Syntax
void cblas_sgthrz (const MKL_INT nz, float *y, float *x, const MKL_INT *indx);
void cblas_dgthrz (const MKL_INT nz, double *y, double *x, const MKL_INT *indx);
void cblas_cgthrz (const MKL_INT nz, void *y, void *x, const MKL_INT *indx);
void cblas_zgthrz (const MKL_INT nz, void *y, void *x, const MKL_INT *indx);

Include Files
• mkl.h

Description

The ?gthrz routines gather the elements with indices specified by the array indx from a full-storage vector y
into compressed form (nz, x, indx) and overwrite the gathered elements of y by zeros. Other elements of y
are not referenced or modified (see also ?gthr).

Input Parameters

nz The number of elements of y to be gathered.

indx Specifies indices of elements to be gathered.

Array, size at least nz.

y Array, size at least max(indx[i]).

Output Parameters

x Array, size at least nz.

Contains the vector converted to the compressed form.

y The updated vector y.

127
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

cblas_?roti
Applies Givens rotation to sparse vectors one of which
is in compressed form.

Syntax
void cblas_sroti (const MKL_INT nz, float *x, const MKL_INT *indx, float *y, const
float c, const float s);
void cblas_droti (const MKL_INT nz, double *x, const MKL_INT *indx, double *y, const
double c, const double s);

Include Files
• mkl.h

Description

The ?roti routines apply the Givens rotation to elements of two real vectors, x (in compressed form nz, x,
indx) and y (in full storage form):

x[i] = cx[i] + sy[indx[i]]

y[indx[i]] = c*y[indx[i]]- s*x[i]
The routines reference only the elements of y whose indices are listed in the array indx. The values in indx
must be distinct.

Input Parameters

nz The number of elements in x and indx.

x Array, size at least nz.

indx Specifies the indices for the elements of x.

Array, size at least nz.

y Array, size at least max(indx[i]).

c A scalar.

s A scalar.

Output Parameters

x and y The updated arrays.

cblas_?sctr
Converts compressed sparse vectors into full storage
form.

Syntax
void cblas_ssctr (const MKL_INT nz, const float *x, const MKL_INT *indx, float *y);
void cblas_dsctr (const MKL_INT nz, const double *x, const MKL_INT *indx, double *y);
void cblas_csctr (const MKL_INT nz, const void *x, const MKL_INT *indx, void *y);

128
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void cblas_zsctr (const MKL_INT nz, const void *x, const MKL_INT *indx, void *y);

Include Files
• mkl.h

Description

The ?sctr routines scatter the elements of the compressed sparse vector (nz, x, indx) to a full-storage
vector y. The routines modify only the elements of y whose indices are listed in the array indx:
y[indx[i]] = x[i], for i=0,1,... ,nz-1.

Input Parameters

nz The number of elements of x to be scattered.

indx Specifies indices of elements to be scattered.

Array, size at least nz.

x Array, size at least nz.

Contains the vector to be converted to full-storage form.

Output Parameters

y Array, size at least max(indx[i]).

Contains the vector y with updated elements.

Sparse BLAS Level 2 and Level 3 Routines

NOTE The Intel® oneAPI Math Kernel Library (oneMKL) Sparse BLAS Level 2 and Level 3 routines are
deprecated. Use the corresponding routine from the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface as indicated in the description for each routine.

This section describes Sparse BLAS Level 2 and Level 3 routines included in the Intel® oneAPI Math Kernel
Library (oneMKL) . Sparse BLAS Level 2 is a group of routines and functions that perform operations between
a sparse matrix and dense vectors. Sparse BLAS Level 3 is a group of routines and functions that perform
operations between a sparse matrix and dense matrices.
The terms and concepts required to understand the use of the Intel® oneAPI Math Kernel Library (oneMKL)
Sparse BLAS Level 2 and Level 3 routines are discussed in theLinear Solvers Basics appendix.
The Sparse BLAS routines can be useful to implement iterative methods for solving large sparse systems of
equations or eigenvalue problems. For example, these routines can be considered as building blocks for
Iterative Sparse Solvers based on Reverse Communication Interface (RCI ISS).
Intel® oneAPI Math Kernel Library (oneMKL) provides Sparse BLAS Level 2 and Level 3 routines with typical
(or conventional) interface similar to the interface used in the NIST* Sparse BLAS library [Rem05].
Some software packages and libraries (the PARDISO* Solverused in Intel® oneAPI Math Kernel Library
(oneMKL),Sparskit 2 [Saad94], the Compaq* Extended Math Library (CXML)[CXML01]) use different (early)
variation of the compressed sparse row (CSR) format and support only Level 2 operations with simplified
interfaces. Intel® oneAPI Math Kernel Library (oneMKL) provides an additional set of Sparse BLAS Level 2
routines with similar simplified interfaces. Each of these routines operates only on a matrix of the fixed type.
The routines described in this section support both one-based indexing and zero-based indexing of the input
data (see details in the section One-based and Zero-based Indexing).

129
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Naming Conventions in Sparse BLAS Level 2 and Level 3

Each Sparse BLAS Level 2 and Level 3 routine has a six- or eight-character base name preceded by the prefix
mkl_ or mkl_cspblas_ .
The routines with typical (conventional) interface have six-character base names in accordance with the
template:
mkl_<character > <data> <operation>( )
The routines with simplified interfaces have eight-character base names in accordance with the templates:
mkl_<character > <data> <mtype> <operation>( )
for routines with one-based indexing; and
mkl_cspblas_<character> <data><mtype><operation>( )
for routines with zero-based indexing.
The <character> field indicates the data type:

s real, single precision

c complex, single precision

d real, double precision

z complex, double precision

The <data> field indicates the sparse matrix storage format (see section Sparse Matrix Storage Formats):

coo coordinate format

csr compressed sparse row format and its variations

csc compressed sparse column format and its variations

dia diagonal format

sky skyline storage format

bsr block sparse row format and its variations

The <operation> field indicates the type of operation:

mv matrix-vector product (Level 2)

mm matrix-matrix product (Level 3)

sv solving a single triangular system (Level 2)

sm solving triangular systems with multiple right-hand sides (Level 3)

The field <mtype> indicates the matrix type:

ge sparse representation of a general matrix

sy sparse representation of the upper or lower triangle of a symmetric matrix

tr sparse representation of a triangular matrix

Sparse Matrix Storage Formats for Sparse BLAS Routines

The current version of Intel® oneAPI Math Kernel Library (oneMKL) Sparse BLAS Level 2 and Level 3 routines
support the following point entry [Duff86] storage formats for sparse matrices:

130
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• compressed sparse row format (CSR) and its variations;
• compressed sparse column format (CSC);
• coordinate format;
• diagonal format;
• skyline storage format;
and one block entry storage format:
• block sparse row format (BSR) and its variations.
For more information see "Sparse Matrix Storage Formats" in the Appendix"Linear Solvers Basics".
Intel® oneAPI Math Kernel Library (oneMKL) provides auxiliary routines -matrix converters - that convert
sparse matrix from one storage format to another.

Routines and Supported Operations

This section describes operations supported by the Intel® oneAPI Math Kernel Library (oneMKL) Sparse BLAS
Level 2 and Level 3 routines. The following notations are used here:
A is a sparse matrix;
B and C are dense matrices;
D is a diagonal scaling matrix;
x and y are dense vectors;
alpha and beta are scalars;
op(A) is one of the possible operations:
op(A) = A;
op(A) = AT - transpose of A;
op(A) = AH - conjugated transpose of A.
inv(op(A)) denotes the inverse of op(A).
The Intel® oneAPI Math Kernel Library (oneMKL) Sparse BLAS Level 2 and Level 3 routines support the
following operations:
• computing the vector product between a sparse matrix and a dense vector:

y := alpha*op(A)*x + beta*y
• solving a single triangular system:

y := alpha*inv(op(A))*x
• computing a product between sparse matrix and dense matrix:

C := alpha*op(A)*B + beta*C
• solving a sparse triangular system with multiple right-hand sides:

C := alpha*inv(op(A))*B
Intel® oneAPI Math Kernel Library (oneMKL) provides an additional set of the Sparse BLAS Level 2 routines
withsimplified interfaces. Each of these routines operates on a matrix of the fixed type. The following
operations are supported:
• computing the vector product between a sparse matrix and a dense vector (for general and symmetric
matrices):

y := op(A)*x
• solving a single triangular system (for triangular matrices):

y := inv(op(A))*x
Matrix type is indicated by the field <mtype> in the routine name (see section Naming Conventions in Sparse
BLAS Level 2 and Level 3).

131
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

NOTE
The routines with simplified interfaces support only four sparse matrix storage formats, specifically:
CSR format in the 3-array variation accepted in the direct sparse solvers and in the CXML;
diagonal format accepted in the CXML;
coordinate format;
BSR format in the 3-array variation.

Note that routines with both typical (conventional) and simplified interfaces use the same computational
kernels that work with certain internal data structures.
The Intel® oneAPI Math Kernel Library (oneMKL) Sparse BLAS Level 2 and Level 3 routines do not support in-
place operations.
Complete list of all routines is given in the “Sparse BLAS Level 2 and Level 3 Routines”.

Interface Consideration

One-Based and Zero-Based Indexing

The Intel® oneAPI Math Kernel Library (oneMKL) Sparse BLAS Level 2 and Level 3 routines support one-based
and zero-based indexing of data arrays.
Routines with typical interfaces support zero-based indexing for the following sparse data storage formats:
CSR, CSC, BSR, and COO. Routines with simplified interfaces support zero based indexing for the following
sparse data storage formats: CSR, BSR, and COO. See the complete list of Sparse BLAS Level 2 and Level 3
Routines.
The one-based indexing uses the convention of starting array indices at 1. The zero-based indexing uses the
convention of starting array indices at 0. For example, indices of the 5-element array x can be presented in
case of one-based indexing as follows:
Element index: 1 2 3 4 5

Element value: 1.0 5.0 7.0 8.0 9.0

and in case of zero-based indexing as follows:

Element index: 0 1 2 3 4

Element value: 1.0 5.0 7.0 8.0 9.0

The detailed descriptions of the one-based and zero-based variants of the sparse data storage formats are
given in the "Sparse Matrix Storage Formats" in the Appendix "Linear Solvers Basics".
Most parameters of the routines are identical for both one-based and zero-based indexing, but some of them
have certain differences. The following table lists all these differences.

Parameter One-based Indexing Zero-based Indexing

val Array containing non-zero elements of the Array containing non-zero elements of
matrix A, its length is . pntre[m] - the matrix A, its length is . pntre[m-1]
pntrb[1] - pntrb[0]
pntrb Array of length m. This array contains row Array of length m. This array contains row
indices, such that pntrb[i] - indices, such that pntrb[i] - pntrb[0]
pntrb[1]+1 is the first index of row i in is the first index of row i in the arrays
the arrays val and indx val and indx.

132
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Parameter One-based Indexing Zero-based Indexing

pntre Array of length m. This array contains row Array of length m. This array contains row
indices, such that pntre[I] - pntrb[1] indices, such that pntre[i] -
is the last index of row i in the arrays pntrb[0]-1 is the last index of row i in
val and indx. the arrays val and indx.
ia Array of length m + 1, containing indices Array of length m+1, containing indices of
of elements in the array a, such that elements in the array a, such that ia[i]
ia[i] is the index in the array a of the is the index in the array a of the first
first non-zero element from the row i. non-zero element from the row i. The
The value of the last element ia[m + 1] value of the last element ia[m] is equal
is equal to the number of non-zeros plus to the number of non-zeros.
one.
ldb Specifies the leading dimension of b as Specifies the second dimension of b as
declared in the calling (sub)program. declared in the calling (sub)program.
ldc Specifies the leading dimension of c as Specifies the second dimension of c as
declared in the calling (sub)program. declared in the calling (sub)program.

Differences Between Intel MKL and NIST* Interfaces

The Intel® oneAPI Math Kernel Library (oneMKL) Sparse BLAS Level 3 routines have the following
conventional interfaces:
mkl_xyyymm(transa, m, n, k, alpha, matdescra, arg(A), b, ldb, beta, c, ldc), for matrix-
matrix product;
mkl_xyyysm(transa, m, n, alpha, matdescra, arg(A), b, ldb, c, ldc), for triangular solvers
with multiple right-hand sides.
Here x denotes data type, and yyy - sparse matrix data structure (storage format).

The analogous NIST* Sparse BLAS (NSB) library routines have the following interfaces:
xyyymm(transa, m, n, k, alpha, descra, arg(A), b, ldb, beta, c, ldc, work, lwork), for
matrix-matrix product;
xyyysm(transa, m, n, unitd, dv, alpha, descra, arg(A), b, ldb, beta, c, ldc, work,
lwork), for triangular solvers with multiple right-hand sides.
Some similar arguments are used in both libraries. The argument transa indicates what operation is
performed and is slightly different in the NSB library (see Table "Parameter transa"). The arguments m and k
are the number of rows and column in the matrix A, respectively, n is the number of columns in the matrix C.
The arguments alpha and beta are scalar alpha and beta respectively (betais not used in the Intel® oneAPI
Math Kernel Library (oneMKL) triangular solvers.) The argumentsb and c are rectangular arrays with the
leading dimension ldb and ldc, respectively. arg(A) denotes the list of arguments that describe the sparse
representation of A.
Parameter transa

MKL interface NSB interface Operation

data type char * INTEGER

value N or n 0 op(A) = A

T or t 1 op(A) = AT

133
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

MKL interface NSB interface Operation

C or c 2 op(A) = AT or op(A) =
AH

Parameter matdescra
The parameter matdescra describes the relevant characteristic of the matrix A. This manual describes
matdescraas an array of six elements in line with the NIST* implementation. However, only the first four
elements of the array are used in the current versions of the Intel® oneAPI Math Kernel Library (oneMKL)
Sparse BLAS routines. Elementsmatdescra[4] and matdescra[5] are reserved for future use. Note that
whether matdescrais described in your application as an array of length 6 or 4 is of no importance because
the array is declared as a pointer in the Intel® oneAPI Math Kernel Library (oneMKL) routines. To learn more
about declaration of thematdescraarray, see the Sparse BLAS examples located in the Intel® oneAPI Math
Kernel Library (oneMKL) installation directory:examples/spblasc/ for C. The table below lists elements of
the parameter matdescra, their Fortran values, and their meanings. The parameter matdescra corresponds
to the argument descra from NSB library.

Possible Values of the Parameter matdescra [descra - 1]

MKL interface NSB Matrix characteristics

interface

one-based zero-based
indexing indexing

data type char * char * int *

1st element matdescra[1] matdescra[0] descra[0] matrix structure

value G G 0 general

S S 1 symmetric (A = AT)

H H 2 Hermitian (A = (AH))

T T 3 triangular

A A 4 skew(anti)-symmetric (A = -AT)

D D 5 diagonal

2nd element matdescra[2] matdescra[1] descra[1] upper/lower triangular indicator

value L L 1 lower

U U 2 upper

3rd element matdescra[3] matdescra[2] descra[2] main diagonal type

value N N 0 non-unit

U U 1 unit
type of indexing
4th element matdescra[4] matdescra[3] descra[3]
one-based indexing
value F 1
zero-based indexing
C 0

134
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
In some cases possible element values of the parameter matdescra depend on the values of other elements.
The Table "Possible Combinations of Element Values of the Parameter matdescra" lists all possible
combinations of element values for both multiplication routines and triangular solvers.

Possible Combinations of Element Values of the Parameter matdescra

Routines matdescra[0] matdescra[1] matdescra[2] matdescra[3]

Multiplication G ignored ignored F (default) or C
Routines
S or H L (default) N (default) F (default) or C
S or H L (default) U F (default) or C
S or H U N (default) F (default) or C
S or H U U F (default) or C
A L (default) ignored F (default) or C
A U ignored F (default) or C
Multiplication T L U F (default) or C
Routines and
Triangular Solvers
T L N F (default) or C
T U U F (default) or C
T U N F (default) or C
D ignored N (default) F (default) or C
D ignored U F (default) or C

For a matrix in the skyline format with the main diagonal declared to be a unit, diagonal elements must be
stored in the sparse representation even if they are zero. In all other formats, diagonal elements can be
stored (if needed) in the sparse representation if they are not zero.

Operations with Partial Matrices

One of the distinctive feature of the Intel® oneAPI Math Kernel Library (oneMKL) Sparse BLAS routines is a
possibility to perform operations only on partial matrices composed of certain parts (triangles and the main
diagonal) of the input sparse matrix. It can be done by setting properly first three elements of the
parametermatdescra.

An arbitrary sparse matrix A can be decomposed as

A = L + D + U
where L is the strict lower triangle of A, U is the strict upper triangle of A, D is the main diagonal.
Table "Output Matrices for Multiplication Routines" shows correspondence between the output matrices and
values of the parameter matdescra for the sparse matrix A for multiplication routines.

Output Matrices for Multiplication Routines

matdescra[0] matdescra[1] matdescra[2] Output Matrix

G ignored ignored alphaop(A)x + beta*y

alpha*op(A)*B + beta*C

S or H L N alpha*op(L+D+L')*x + beta*y
alpha*op(L+D+L')*B + beta*C

S or H L U alpha*op(L+I+L')*x + beta*y

alpha*op(L+I+L')*B + beta*C

135
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

matdescra[0] matdescra[1] matdescra[2] Output Matrix

S or H U N alpha*op(U'+D+U)*x + beta*y
alpha*op(U'+D+U)*B + beta*C

S or H U U alpha*op(U'+I+U)*x + beta*y
alpha*op(U'+I+U)*B + beta*C

T L U alpha*op(L+I)*x + beta*y
alpha*op(L+I)*B + beta*C

T L N alpha*op(L+D)*x + beta*y
alpha*op(L+D)*B + beta*C

T U U alpha*op(U+I)*x + beta*y
alpha*op(U+I)*B + beta*C

T U N alpha*op(U+D)*x + beta*y
alpha*op(U+D)*B + beta*C

A L ignored alphaop(L-L')x + beta*y

alpha*op(L-L')*B + beta*C

A U ignored alphaop(U-U')x + beta*y

alpha*op(U-U')*B + beta*C

D ignored N alphaDx + beta*y

alpha*D*B + beta*C

D ignored U alphax + betay

alpha*B + beta*C

Table "Output Matrices for Triangular Solvers" shows correspondence between the output matrices and values
of the parameter matdescra for the sparse matrix A for triangular solvers.
Output Matrices for Triangular Solvers
matdescra[0] matdescra[1] matdescra[2] Output Matrix

T L N alpha*inv(op(L))*x
alpha*inv(op(L))*B

T L U alpha*inv(op(L))*x
alpha*inv(op(L))*B

T U N alpha*inv(op(U))*x
alpha*inv(op(U))*B

T U U alpha*inv(op(U))*x
alpha*inv(op(U))*B

D ignored N alpha*inv(D)*x
alpha*inv(D)*B

D ignored U alpha*x
alpha*B

136
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

Sparse BLAS Level 2 and Level 3 Routines.

Table “Sparse BLAS Level 2 and Level 3 Routines” lists the sparse BLAS Level 2 and Level 3 routines
described in more detail later in this section.

Sparse BLAS Level 2 and Level 3 Routines

Routine/Function Description

Simplified interface, one-based indexing

mkl_?csrgemv Computes matrix - vector product of a sparse general matrix

in the CSR format (3-array variation)

mkl_?bsrgemv Computes matrix - vector product of a sparse general matrix

in the BSR format (3-array variation).

mkl_?coogemv Computes matrix - vector product of a sparse general matrix

in the coordinate format.

mkl_?diagemv Computes matrix - vector product of a sparse general matrix

in the diagonal format.

mkl_?csrsymv Computes matrix - vector product of a sparse symmetrical

matrix in the CSR format (3-array variation)

mkl_?bsrsymv Computes matrix - vector product of a sparse symmetrical

matrix in the BSR format (3-array variation).

mkl_?coosymv Computes matrix - vector product of a sparse symmetrical

matrix in the coordinate format.

mkl_?diasymv Computes matrix - vector product of a sparse symmetrical

matrix in the diagonal format.

mkl_?csrtrsv Triangular solvers with simplified interface for a sparse matrix

in the CSR format (3-array variation).

mkl_?bsrtrsv Triangular solver with simplified interface for a sparse matrix

in the BSR format (3-array variation).

mkl_?cootrsv Triangular solvers with simplified interface for a sparse matrix

in the coordinate format.

mkl_?diatrsv Triangular solvers with simplified interface for a sparse matrix

in the diagonal format.

Simplified interface, zero-based indexing

mkl_cspblas_?csrgemv Computes matrix - vector product of a sparse general matrix

in the CSR format (3-array variation) with zero-based
indexing.

mkl_cspblas_?bsrgemv Computes matrix - vector product of a sparse general matrix

in the BSR format (3-array variation)with zero-based indexing.

137
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Routine/Function Description

mkl_cspblas_?coogemv Computes matrix - vector product of a sparse general matrix

in the coordinate format with zero-based indexing.

mkl_cspblas_?csrsymv Computes matrix - vector product of a sparse symmetrical

matrix in the CSR format (3-array variation) with zero-based
indexing

mkl_cspblas_?bsrsymv Computes matrix - vector product of a sparse symmetrical

matrix in the BSR format (3-array variation) with zero-based
indexing.

mkl_cspblas_?coosymv Computes matrix - vector product of a sparse symmetrical

matrix in the coordinate format with zero-based indexing.

mkl_cspblas_?csrtrsv Triangular solvers with simplified interface for a sparse matrix

in the CSR format (3-array variation) with zero-based
indexing.

mkl_cspblas_?bsrtrsv Triangular solver with simplified interface for a sparse matrix

in the BSR format (3-array variation) with zero-based
indexing.

mkl_cspblas_?cootrsv Triangular solver with simplified interface for a sparse matrix

in the coordinate format with zero-based indexing.

Typical (conventional) interface, one-based and zero-based indexing

mkl_?csrmv Computes matrix - vector product of a sparse matrix in the

CSR format.

mkl_?bsrmv Computes matrix - vector product of a sparse matrix in the

BSR format.

mkl_?cscmv Computes matrix - vector product for a sparse matrix in the

CSC format.

mkl_?coomv Computes matrix - vector product for a sparse matrix in the

coordinate format.

mkl_?csrsv Solves a system of linear equations for a sparse matrix in the

CSR format.

mkl_?bsrsv Solves a system of linear equations for a sparse matrix in the

BSR format.

mkl_?cscsv Solves a system of linear equations for a sparse matrix in the

CSC format.

mkl_?coosv Solves a system of linear equations for a sparse matrix in the

coordinate format.

mkl_?csrmm Computes matrix - matrix product of a sparse matrix in the

CSR format

mkl_?bsrmm Computes matrix - matrix product of a sparse matrix in the

BSR format.

mkl_?cscmm Computes matrix - matrix product of a sparse matrix in the

CSC format

138
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Routine/Function Description

mkl_?coomm Computes matrix - matrix product of a sparse matrix in the

coordinate format.

mkl_?csrsm Solves a system of linear matrix equations for a sparse matrix

in the CSR format.

mkl_?bsrsm Solves a system of linear matrix equations for a sparse matrix

in the BSR format.

mkl_?cscsm Solves a system of linear matrix equations for a sparse matrix

in the CSC format.

mkl_?coosm Solves a system of linear matrix equations for a sparse matrix

in the coordinate format.

Typical (conventional) interface, one-based indexing

mkl_?diamv Computes matrix - vector product of a sparse matrix in the

diagonal format.

mkl_?skymv Computes matrix - vector product for a sparse matrix in the

skyline storage format.

mkl_?diasv Solves a system of linear equations for a sparse matrix in the

diagonal format.

mkl_?skysv Solves a system of linear equations for a sparse matrix in the

skyline format.

mkl_?diamm Computes matrix - matrix product of a sparse matrix in the

diagonal format.

mkl_?skymm Computes matrix - matrix product of a sparse matrix in the

skyline storage format.

mkl_?diasm Solves a system of linear matrix equations for a sparse matrix

in the diagonal format.

mkl_?skysm Solves a system of linear matrix equations for a sparse matrix

in the skyline storage format.
Auxiliary routines

Matrix converters

mkl_?dnscsr Converts a sparse matrix in uncompressed representation to

CSR format (3-array variation) and vice versa.

mkl_?csrcoo Converts a sparse matrix in CSR format (3-array variation) to

coordinate format and vice versa.

mkl_?csrbsr Converts a sparse matrix in CSR format to BSR format (3-

array variations) and vice versa.

mkl_?csrcsc Converts a sparse matrix in CSR format to CSC format and

vice versa (3-array variations).

mkl_?csrdia Converts a sparse matrix in CSR format (3-array variation) to

diagonal format and vice versa.

mkl_?csrsky Converts a sparse matrix in CSR format (3-array variation) to

sky line format and vice versa.

139
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Routine/Function Description

Operations on sparse matrices

mkl_?csradd Computes the sum of two sparse matrices stored in the CSR
format (3-array variation) with one-based indexing.

mkl_?csrmultcsr Computes the product of two sparse matrices stored in the

CSR format (3-array variation) with one-based indexing.

mkl_?csrmultd Computes product of two sparse matrices stored in the CSR

format (3-array variation) with one-based indexing. The result
is stored in the dense matrix.

mkl_?csrgemv
Computes matrix - vector product of a sparse general
matrix stored in the CSR format (3-array variation)
with one-based indexing (deprecated).

Syntax
void mkl_scsrgemv (const char *transa , const MKL_INT *m , const float *a , const
MKL_INT *ia , const MKL_INT *ja , const float *x , float *y );
void mkl_dcsrgemv (const char *transa , const MKL_INT *m , const double *a , const
MKL_INT *ia , const MKL_INT *ja , const double *x , double *y );
void mkl_ccsrgemv (const char *transa , const MKL_INT *m , const MKL_Complex8 *a ,
const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_zcsrgemv (const char *transa , const MKL_INT *m , const MKL_Complex16 *a ,
const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex16 *x , MKL_Complex16 *y );

Include Files
• mkl.h

Description

This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?csrgemv routine performs a matrix-vector operation defined as

y := A*x
or

y := AT*x,
where:
x and y are vectors,
A is an m-by-m sparse square matrix in the CSR format (3-array variation), AT is the transpose of A.

NOTE
This routine supports only one-based indexing of the input arrays.

140
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

transa Specifies the operation.

If transa = 'N' or 'n', then as y := A*x

If transa = 'T' or 't' or 'C' or 'c', then y := AT*x,

m Number of rows of the matrix A.

a Array containing non-zero elements of the matrix A. Its length is equal to

the number of non-zero elements in the matrix A. Refer to values array
description in Sparse Matrix Storage Formats for more details.

ia Array of length m + 1, containing indices of elements in the array a, such

that ia[i] - ia[0] is the index in the array a of the first non-zero
element from the row i. The value of the last element ia[m] - ia[0] is
equal to the number of non-zeros. Refer to rowIndex array description in
Sparse Matrix Storage Formats for more details.

ja Array containing the column indices plus one for each non-zero element of
the matrix A.
Its length is equal to the length of the array a. Refer to columns array
description in Sparse Matrix Storage Formats for more details.

x Array, size is m.

On entry, the array x must contain the vector x.

Output Parameters

y Array, size at least m.

On exit, the array y must contain the vector y.

mkl_?bsrgemv
Computes matrix - vector product of a sparse general
matrix stored in the BSR format (3-array variation)
with one-based indexing (deprecated).

Syntax
void mkl_sbsrgemv (const char *transa , const MKL_INT *m , const MKL_INT *lb , const
float *a , const MKL_INT *ia , const MKL_INT *ja , const float *x , float *y );
void mkl_dbsrgemv (const char *transa , const MKL_INT *m , const MKL_INT *lb , const
double *a , const MKL_INT *ia , const MKL_INT *ja , const double *x , double *y );
void mkl_cbsrgemv (const char *transa , const MKL_INT *m , const MKL_INT *lb , const
MKL_Complex8 *a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex8 *x ,
MKL_Complex8 *y );
void mkl_zbsrgemv (const char *transa , const MKL_INT *m , const MKL_INT *lb , const
MKL_Complex16 *a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex16 *x ,
MKL_Complex16 *y );

Include Files
• mkl.h

141
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Description

This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?bsrgemv routine performs a matrix-vector operation defined as

y := A*x
or

y := AT*x,
where:
x and y are vectors,
A is an m-by-m block sparse square matrix in the BSR format (3-array variation), AT is the transpose of A.

NOTE
This routine supports only one-based indexing of the input arrays.

Input Parameters

transa Specifies the operation.

If transa = 'N' or 'n', then the matrix-vector product is computed as
y := A*x
If transa = 'T' or 't' or 'C' or 'c', then the matrix-vector product is
computed as y := AT*x,

m Number of block rows of the matrix A.

lb Size of the block in the matrix A.

a Array containing elements of non-zero blocks of the matrix A. Its length is

equal to the number of non-zero blocks in the matrix A multiplied by lb*lb.
Refer to values array description in BSR Format for more details.

ia Array of length (m + 1), containing indices of block in the array a, such

that ia[i] - ia[0] is the index in the array a of the first non-zero
element from the row i. The value of the last element ia[m] - ia[0] is
equal to the number of non-zero blocks. Refer to rowIndex array
description in BSR Format for more details.

ja Array containing the column indices plus one for each non-zero block in the
matrix A.
Its length is equal to the number of non-zero blocks of the matrix A. Refer
to columns array description in BSR Format for more details.

x Array, size (m*lb).

On entry, the array x must contain the vector x.

142
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

y Array, size at least (m*lb).

On exit, the array y must contain the vector y.

mkl_?coogemv
Computes matrix-vector product of a sparse general
matrix stored in the coordinate format with one-based
indexing (deprecated).

Syntax
void mkl_scoogemv (const char *transa , const MKL_INT *m , const float *val , const
MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const float *x , float
*y );
void mkl_dcoogemv (const char *transa , const MKL_INT *m , const double *val , const
MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const double *x , double
*y );
void mkl_ccoogemv (const char *transa , const MKL_INT *m , const MKL_Complex8 *val ,
const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const MKL_Complex8
*x , MKL_Complex8 *y );
void mkl_zcoogemv (const char *transa , const MKL_INT *m , const MKL_Complex16 *val ,
const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const
MKL_Complex16 *x , MKL_Complex16 *y );

Include Files
• mkl.h

Description

This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?coogemv routine performs a matrix-vector operation defined as

y := A*x
or

y := AT*x,
where:
x and y are vectors,
A is an m-by-m sparse square matrix in the coordinate format, AT is the transpose of A.

NOTE
This routine supports only one-based indexing of the input arrays.

Input Parameters

143
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

transa Specifies the operation.

If transa = 'N' or 'n', then the matrix-vector product is computed as
y := A*x
If transa = 'T' or 't' or 'C' or 'c', then the matrix-vector product is
computed as y := AT*x,

m Number of rows of the matrix A.

val Array of length nnz, contains non-zero elements of the matrix A in the
arbitrary order.
Refer to values array description in Coordinate Format for more details.

rowind Array of length nnz, contains the row indices plus one for each non-zero
element of the matrix A.
Refer to rows array description in Coordinate Format for more details.

colind Array of length nnz, contains the column indices plus one for each non-zero
element of the matrix A. Refer to columns array description in Coordinate
Format for more details.

nnz Specifies the number of non-zero element of the matrix A.

Refer to nnz description in Coordinate Format for more details.

x Array, size is m.

One entry, the array x must contain the vector x.

Output Parameters

y Array, size at least m.

On exit, the array y must contain the vector y.

mkl_?diagemv
Computes matrix - vector product of a sparse general
matrix stored in the diagonal format with one-based
indexing (deprecated).

Syntax
void mkl_sdiagemv (const char *transa , const MKL_INT *m , const float *val , const
MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const float *x , float
*y );
void mkl_ddiagemv (const char *transa , const MKL_INT *m , const double *val , const
MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const double *x , double
*y );
void mkl_cdiagemv (const char *transa , const MKL_INT *m , const MKL_Complex8 *val ,
const MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const MKL_Complex8
*x , MKL_Complex8 *y );
void mkl_zdiagemv (const char *transa , const MKL_INT *m , const MKL_Complex16 *val ,
const MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const MKL_Complex16
*x , MKL_Complex16 *y );

144
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h

Description

This routine is deprecated, but no replacement is available yet in the Inspector-Executor Sparse BLAS API
interfaces. You can continue using this routine until a replacement is provided and this can be fully removed.
The mkl_?diagemv routine performs a matrix-vector operation defined as

y := A*x
or

y := AT*x,
where:
x and y are vectors,
A is an m-by-m sparse square matrix in the diagonal storage format, AT is the transpose of A.

NOTE
This routine supports only one-based indexing of the input arrays.

Input Parameters

transa Specifies the operation.

If transa = 'N' or 'n', then y := A*x

If transa = 'T' or 't' or 'C' or 'c', then y := AT*x,

m Number of rows of the matrix A.

val Two-dimensional array of size lval*ndiag, contains non-zero diagonals of

the matrix A. Refer to values array description in Diagonal Storage Scheme
for more details.

lval Leading dimension of vallval≥m. Refer to lval description in Diagonal

Storage Scheme for more details.

idiag Array of length ndiag, contains the distances between main diagonal and
each non-zero diagonals in the matrix A.
Refer to distance array description in Diagonal Storage Scheme for more
details.

ndiag Specifies the number of non-zero diagonals of the matrix A.

x Array, size is m.

On entry, the array x must contain the vector x.

Output Parameters

y Array, size at least m.

145
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

On exit, the array y must contain the vector y.

mkl_?csrsymv
Computes matrix - vector product of a sparse
symmetrical matrix stored in the CSR format (3-array
variation) with one-based indexing (deprecated).

Syntax
void mkl_scsrsymv (const char *uplo , const MKL_INT *m , const float *a , const MKL_INT
*ia , const MKL_INT *ja , const float *x , float *y );
void mkl_dcsrsymv (const char *uplo , const MKL_INT *m , const double *a , const
MKL_INT *ia , const MKL_INT *ja , const double *x , double *y );
void mkl_ccsrsymv (const char *uplo , const MKL_INT *m , const MKL_Complex8 *a , const
MKL_INT *ia , const MKL_INT *ja , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_zcsrsymv (const char *uplo , const MKL_INT *m , const MKL_Complex16 *a , const
MKL_INT *ia , const MKL_INT *ja , const MKL_Complex16 *x , MKL_Complex16 *y );

Include Files
• mkl.h

Description

This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?csrsymv routine performs a matrix-vector operation defined as

y := A*x
where:
x and y are vectors,
A is an upper or lower triangle of the symmetrical sparse matrix in the CSR format (3-array variation).

NOTE
This routine supports only one-based indexing of the input arrays.

Input Parameters

uplo Specifies whether the upper or low triangle of the matrix A is used.
If uplo = 'U' or 'u', then the upper triangle of the matrix A is used.

If uplo = 'L' or 'l', then the low triangle of the matrix A is used.

m Number of rows of the matrix A.

a Array containing non-zero elements of the matrix A. Its length is equal to

the number of non-zero elements in the matrix A. Refer to values array
description in Sparse Matrix Storage Formats for more details.

146
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ia Array of length m + 1, containing indices of elements in the array a, such
that ia[i] - ia[0] is the index in the array a of the first non-zero
element from the row i. The value of the last element ia[m] - ia[0] is
equal to the number of non-zeros. Refer to rowIndex array description in
Sparse Matrix Storage Formats for more details.

x Array, size is m.

On entry, the array x must contain the vector x.

Output Parameters

y Array, size at least m.

On exit, the array y must contain the vector y.

mkl_?bsrsymv
Computes matrix-vector product of a sparse
symmetrical matrix stored in the BSR format (3-array
variation) with one-based indexing (deprecated).

Syntax
void mkl_sbsrsymv (const char *uplo , const MKL_INT *m , const MKL_INT *lb , const
float *a , const MKL_INT *ia , const MKL_INT *ja , const float *x , float *y );
void mkl_dbsrsymv (const char *uplo , const MKL_INT *m , const MKL_INT *lb , const
double *a , const MKL_INT *ia , const MKL_INT *ja , const double *x , double *y );
void mkl_cbsrsymv (const char *uplo , const MKL_INT *m , const MKL_INT *lb , const
MKL_Complex8 *a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex8 *x ,
MKL_Complex8 *y );
void mkl_zbsrsymv (const char *uplo , const MKL_INT *m , const MKL_INT *lb , const
MKL_Complex16 *a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex16 *x ,
MKL_Complex16 *y );

Include Files
• mkl.h

Description

This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?bsrsymv routine performs a matrix-vector operation defined as

y := A*x
where:
x and y are vectors,

147
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

A is an upper or lower triangle of the symmetrical sparse matrix in the BSR format (3-array variation).

NOTE
This routine supports only one-based indexing of the input arrays.

Input Parameters

uplo Specifies whether the upper or low triangle of the matrix A is considered.
If uplo = 'U' or 'u', then the upper triangle of the matrix A is used.

If uplo = 'L' or 'l', then the low triangle of the matrix A is used.

m Number of block rows of the matrix A.

lb Size of the block in the matrix A.

a Array containing elements of non-zero blocks of the matrix A. Its length is

equal to the number of non-zero blocks in the matrix A multiplied by lb*lb.
Refer to values array description in BSR Format for more details.

ia Array of length (m + 1), containing indices of block in the array a, such

x Array, size (m*lb).

On entry, the array x must contain the vector x.

Output Parameters

y Array, size at least (m*lb).

On exit, the array y must contain the vector y.

mkl_?coosymv
Computes matrix - vector product of a sparse
symmetrical matrix stored in the coordinate format
with one-based indexing (deprecated).

Syntax
void mkl_scoosymv (const char *uplo , const MKL_INT *m , const float *val , const
MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const float *x , float
*y );

148
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void mkl_dcoosymv (const char *uplo , const MKL_INT *m , const double *val , const
MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const double *x , double
*y );
void mkl_ccoosymv (const char *uplo , const MKL_INT *m , const MKL_Complex8 *val ,
const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const MKL_Complex8
*x , MKL_Complex8 *y );
void mkl_zcoosymv (const char *uplo , const MKL_INT *m , const MKL_Complex16 *val ,
const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const
MKL_Complex16 *x , MKL_Complex16 *y );

Include Files
• mkl.h

Description

This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?coosymv routine performs a matrix-vector operation defined as

y := A*x
where:
x and y are vectors,
A is an upper or lower triangle of the symmetrical sparse matrix in the coordinate format.

NOTE
This routine supports only one-based indexing of the input arrays.

Input Parameters

uplo Specifies whether the upper or low triangle of the matrix A is used.
If uplo = 'U' or 'u', then the upper triangle of the matrix A is used.

If uplo = 'L' or 'l', then the low triangle of the matrix A is used.

m Number of rows of the matrix A.

val Array of length nnz, contains non-zero elements of the matrix A in the
arbitrary order.
Refer to values array description in Coordinate Format for more details.

rowind Array of length nnz, contains the row indices plus one for each non-zero
element of the matrix A.
Refer to rows array description in Coordinate Format for more details.

colind Array of length nnz, contains the column indices plus one for each non-zero
element of the matrix A. Refer to columns array description in Coordinate
Format for more details.

nnz Specifies the number of non-zero element of the matrix A.

149
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Refer to nnz description in Coordinate Format for more details.

x Array, size is m.

On entry, the array x must contain the vector x.

Output Parameters

y Array, size at least m.

On exit, the array y must contain the vector y.

mkl_?diasymv
Computes matrix - vector product of a sparse
symmetrical matrix stored in the diagonal format with
one-based indexing (deprecated).

Syntax
void mkl_sdiasymv (const char *uplo , const MKL_INT *m , const float *val , const
MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const float *x , float
*y );
void mkl_ddiasymv (const char *uplo , const MKL_INT *m , const double *val , const
MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const double *x , double
*y );
void mkl_cdiasymv (const char *uplo , const MKL_INT *m , const MKL_Complex8 *val ,
const MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const MKL_Complex8
*x , MKL_Complex8 *y );
void mkl_zdiasymv (const char *uplo , const MKL_INT *m , const MKL_Complex16 *val ,
const MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const MKL_Complex16
*x , MKL_Complex16 *y );

Include Files
• mkl.h

Description

This routine is deprecated, but no replacement is available yet in the Inspector-Executor Sparse BLAS API
interfaces. You can continue using this routine until a replacement is provided and this can be fully removed.
The mkl_?diasymv routine performs a matrix-vector operation defined as

y := A*x
where:
x and y are vectors,
A is an upper or lower triangle of the symmetrical sparse matrix.

NOTE
This routine supports only one-based indexing of the input arrays.

150
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

uplo Specifies whether the upper or low triangle of the matrix A is used.
If uplo = 'U' or 'u', then the upper triangle of the matrix A is used.

If uplo = 'L' or 'l', then the low triangle of the matrix A is used.

m Number of rows of the matrix A.

val Two-dimensional array of size lval by ndiag, contains non-zero diagonals

of the matrix A. Refer to values array description in Diagonal Storage
Scheme for more details.

lval Leading dimension of val, lval≥m. Refer to lval description in Diagonal

Storage Scheme for more details.

idiag Array of length ndiag, contains the distances between main diagonal and
each non-zero diagonals in the matrix A.
Refer to distance array description in Diagonal Storage Scheme for more
details.

ndiag Specifies the number of non-zero diagonals of the matrix A.

x Array, size is m.

On entry, the array x must contain the vector x.

Output Parameters

y Array, size at least m.

On exit, the array y must contain the vector y.

mkl_?csrtrsv
Triangular solvers with simplified interface for a sparse
matrix in the CSR format (3-array variation) with one-
based indexing (deprecated).

Syntax
void mkl_scsrtrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const float *a , const MKL_INT *ia , const MKL_INT *ja , const float *x ,
float *y );
void mkl_dcsrtrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const double *a , const MKL_INT *ia , const MKL_INT *ja , const double
*x , double *y );
void mkl_ccsrtrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const MKL_Complex8 *a , const MKL_INT *ia , const MKL_INT *ja , const
MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_zcsrtrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const MKL_Complex16 *a , const MKL_INT *ia , const MKL_INT *ja , const
MKL_Complex16 *x , MKL_Complex16 *y );

151
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description

This routine is deprecated. Use mkl_sparse_?_trsvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?csrtrsv routine solves a system of linear equations with matrix-vector operations for a sparse
matrix stored in the CSR format (3 array variation):

A*y = x
or

AT*y = x,
where:
x and y are vectors,
A is a sparse upper or lower triangular matrix with unit or non-unit main diagonal, AT is the transpose of A.

NOTE
This routine supports only one-based indexing of the input arrays.

Input Parameters

uplo Specifies whether the upper or low triangle of the matrix A is used.
If uplo = 'U' or 'u', then the upper triangle of the matrix A is used.

If uplo = 'L' or 'l', then the low triangle of the matrix A is used.

transa Specifies the system of linear equations.

If transa = 'N' or 'n', then A*y = x

If transa = 'T' or 't' or 'C' or 'c', then AT*y = x,

diag Specifies whether A is unit triangular.

If diag = 'U' or 'u', then A is a unit triangular.

If diag = 'N' or 'n', then A is not unit triangular.

m Number of rows of the matrix A.

a Array containing non-zero elements of the matrix A. Its length is equal to

the number of non-zero elements in the matrix A. Refer to values array
description in Sparse Matrix Storage Formats for more details.

152
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

NOTE
The non-zero elements of the given row of the matrix must be
stored in the same order as they appear in the row (from left to
right).
No diagonal element can be omitted from a sparse storage if the solver
is called with the non-unit indicator.

ia Array of length m + 1, containing indices of elements in the array a, such

NOTE
Column indices must be sorted in increasing order for each row.

x Array, size is m.

On entry, the array x must contain the vector x.

Output Parameters

y Array, size at least m.

Contains the vector y.

mkl_?bsrtrsv
Triangular solver with simplified interface for a sparse
matrix stored in the BSR format (3-array variation)
with one-based indexing (deprecated).

Syntax
void mkl_sbsrtrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const MKL_INT *lb , const float *a , const MKL_INT *ia , const MKL_INT
*ja , const float *x , float *y );
void mkl_dbsrtrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const MKL_INT *lb , const double *a , const MKL_INT *ia , const MKL_INT
*ja , const double *x , double *y );
void mkl_cbsrtrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const MKL_INT *lb , const MKL_Complex8 *a , const MKL_INT *ia , const
MKL_INT *ja , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_zbsrtrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const MKL_INT *lb , const MKL_Complex16 *a , const MKL_INT *ia , const
MKL_INT *ja , const MKL_Complex16 *x , MKL_Complex16 *y );

153
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description

This routine is deprecated. Use mkl_sparse_?_trsvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?bsrtrsv routine solves a system of linear equations with matrix-vector operations for a sparse
matrix stored in the BSR format (3-array variation) :

y := A*x
or

y := AT*x,
where:
x and y are vectors,
A is a sparse upper or lower triangular matrix with unit or non-unit main diagonal, AT is the transpose of A.

NOTE
This routine supports only one-based indexing of the input arrays.

Input Parameters

uplo Specifies the upper or low triangle of the matrix A is used.

If uplo = 'U' or 'u', then the upper triangle of the matrix A is used.

If uplo = 'L' or 'l', then the low triangle of the matrix A is used.

transa Specifies the operation.

If transa = 'N' or 'n', then the matrix-vector product is computed as
y := A*x
If transa = 'T' or 't' or 'C' or 'c', then the matrix-vector product is
computed as y := AT*x.

diag Specifies whether A is a unit triangular matrix.

If diag = 'U' or 'u', then A is a unit triangular.

If diag = 'N' or 'n', then A is not a unit triangular.

m Number of block rows of the matrix A.

lb Size of the block in the matrix A.

a Array containing elements of non-zero blocks of the matrix A. Its length is

equal to the number of non-zero blocks in the matrix A multiplied by lb*lb.
Refer to values array description in BSR Format for more details.

154
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

ia Array of length (m + 1), containing indices of block in the array a, such

that ia[I] - ia[0] is the index in the array a of the first non-zero
element from the row I. The value of the last element ia[m] - ia[0] is
equal to the number of non-zero blocks. Refer to rowIndex array
description in BSR Format for more details.

x Array, size (m*lb).

On entry, the array x must contain the vector x.

Output Parameters

y Array, size at least (m*lb).

On exit, the array y must contain the vector y.

mkl_?cootrsv
Triangular solvers with simplified interface for a sparse
matrix in the coordinate format with one-based
indexing (deprecated).

Syntax
void mkl_scootrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const float *val , const MKL_INT *rowind , const MKL_INT *colind , const
MKL_INT *nnz , const float *x , float *y );
void mkl_dcootrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const double *val , const MKL_INT *rowind , const MKL_INT *colind , const
MKL_INT *nnz , const double *x , double *y );
void mkl_ccootrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const MKL_Complex8 *val , const MKL_INT *rowind , const MKL_INT *colind ,
const MKL_INT *nnz , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_zcootrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const MKL_Complex16 *val , const MKL_INT *rowind , const MKL_INT *colind ,
const MKL_INT *nnz , const MKL_Complex16 *x , MKL_Complex16 *y );

Include Files
• mkl.h

155
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Description

This routine is deprecated. Use mkl_sparse_?_trsvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?cootrsv routine solves a system of linear equations with matrix-vector operations for a sparse
matrix stored in the coordinate format:

A*y = x
or

AT*y = x,
where:
x and y are vectors,
A is a sparse upper or lower triangular matrix with unit or non-unit main diagonal, AT is the transpose of A.

NOTE
This routine supports only one-based indexing of the input arrays.

Input Parameters

uplo Specifies whether the upper or low triangle of the matrix A is considered.
If uplo = 'U' or 'u', then the upper triangle of the matrix A is used.

If uplo = 'L' or 'l', then the low triangle of the matrix A is used.

transa Specifies the system of linear equations.

If transa = 'N' or 'n', then A*y = x

If transa = 'T' or 't' or 'C' or 'c', then AT*y = x,

diag Specifies whether A is unit triangular.

If diag = 'U' or 'u', then A is unit triangular.

If diag = 'N' or 'n', then A is not unit triangular.

m Number of rows of the matrix A.

val Array of length nnz, contains non-zero elements of the matrix A in the
arbitrary order.
Refer to values array description in Coordinate Format for more details.

rowind Array of length nnz, contains the row indices plus one for each non-zero
element of the matrix A.
Refer to rows array description in Coordinate Format for more details.

colind Array of length nnz, contains the column indices plus one for each non-zero
element of the matrix A. Refer to columns array description in Coordinate
Format for more details.

nnz Specifies the number of non-zero element of the matrix A.

156
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Refer to nnz description in Coordinate Format for more details.

x Array, size is m.

On entry, the array x must contain the vector x.

Output Parameters

y Array, size at least m.

Contains the vector y.

mkl_?diatrsv
Triangular solvers with simplified interface for a sparse
matrix in the diagonal format with one-based indexing
(deprecated).

Syntax
void mkl_sdiatrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const float *val , const MKL_INT *lval , const MKL_INT *idiag , const
MKL_INT *ndiag , const float *x , float *y );
void mkl_ddiatrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const double *val , const MKL_INT *lval , const MKL_INT *idiag , const
MKL_INT *ndiag , const double *x , double *y );
void mkl_cdiatrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const MKL_Complex8 *val , const MKL_INT *lval , const MKL_INT *idiag ,
const MKL_INT *ndiag , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_zdiatrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const MKL_Complex16 *val , const MKL_INT *lval , const MKL_INT *idiag ,
const MKL_INT *ndiag , const MKL_Complex16 *x , MKL_Complex16 *y );

Include Files
• mkl.h

Description

This routine is deprecated, but no replacement is available yet in the Inspector-Executor Sparse BLAS API
interfaces. You can continue using this routine until a replacement is provided and this can be fully removed.
The mkl_?diatrsv routine solves a system of linear equations with matrix-vector operations for a sparse
matrix stored in the diagonal format:

A*y = x
or

AT*y = x,
where:
x and y are vectors,
A is a sparse upper or lower triangular matrix with unit or non-unit main diagonal, AT is the transpose of A.

157
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

NOTE
This routine supports only one-based indexing of the input arrays.

Input Parameters

uplo Specifies whether the upper or low triangle of the matrix A is used.
If uplo = 'U' or 'u', then the upper triangle of the matrix A is used.

If uplo = 'L' or 'l', then the low triangle of the matrix A is used.

transa Specifies the system of linear equations.

If transa = 'N' or 'n', then A*y = x

If transa = 'T' or 't' or 'C' or 'c', then AT*y = x,

diag Specifies whether A is unit triangular.

If diag = 'U' or 'u', then A is unit triangular.

If diag = 'N' or 'n', then A is not unit triangular.

m Number of rows of the matrix A.

val Two-dimensional array of size lval by ndiag, contains non-zero diagonals

of the matrix A. Refer to values array description in Diagonal Storage
Scheme for more details.

lval Leading dimension of val, lval≥m. Refer to lval description in Diagonal

Storage Scheme for more details.

idiag Array of length ndiag, contains the distances between main diagonal and
each non-zero diagonals in the matrix A.

NOTE
All elements of this array must be sorted in increasing order.

Refer to distance array description in Diagonal Storage Scheme for more

details.

ndiag Specifies the number of non-zero diagonals of the matrix A.

x Array, size is m.

On entry, the array x must contain the vector x.

Output Parameters

y Array, size at least m.

Contains the vector y.

mkl_cspblas_?csrgemv
Computes matrix - vector product of a sparse general
matrix stored in the CSR format (3-array variation)
with zero-based indexing (deprecated).

158
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void mkl_cspblas_scsrgemv (const char *transa , const MKL_INT *m , const float *a ,
const MKL_INT *ia , const MKL_INT *ja , const float *x , float *y );
void mkl_cspblas_dcsrgemv (const char *transa , const MKL_INT *m , const double *a ,
const MKL_INT *ia , const MKL_INT *ja , const double *x , double *y );
void mkl_cspblas_ccsrgemv (const char *transa , const MKL_INT *m , const MKL_Complex8
*a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_cspblas_zcsrgemv (const char *transa , const MKL_INT *m , const MKL_Complex16
*a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex16 *x , MKL_Complex16
*y );

Include Files
• mkl.h

Description

This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_cspblas_?csrgemv routine performs a matrix-vector operation defined as

y := A*x
or

y := AT*x,
where:
x and y are vectors,
A is an m-by-m sparse square matrix in the CSR format (3-array variation) with zero-based indexing, AT is
the transpose of A.

NOTE
This routine supports only zero-based indexing of the input arrays.

Input Parameters

transa Specifies the operation.

If transa = 'N' or 'n', then the matrix-vector product is computed as
y := A*x
If transa = 'T' or 't' or 'C' or 'c', then the matrix-vector product is
computed as y := AT*x,

m Number of rows of the matrix A.

a Array containing non-zero elements of the matrix A. Its length is equal to

the number of non-zero elements in the matrix A. Refer to values array
description in Sparse Matrix Storage Formats for more details.

159
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ia Array of length m + 1, containing indices of elements in the array a, such

that ia[I] is the index in the array a of the first non-zero element from the
row I. The value of the last element ia[m] is equal to the number of non-
zeros. Refer to rowIndex array description in Sparse Matrix Storage
Formats for more details.

ja Array containing the column indices for each non-zero element of the
matrix A.
Its length is equal to the length of the array a. Refer to columns array
description in Sparse Matrix Storage Formats for more details.

x Array, size is m.

One entry, the array x must contain the vector x.

Output Parameters

y Array, size at least m.

On exit, the array y must contain the vector y.

mkl_cspblas_?bsrgemv
Computes matrix - vector product of a sparse general
matrix stored in the BSR format (3-array variation)
with zero-based indexing (deprecated).

Syntax
void mkl_cspblas_sbsrgemv (const char *transa , const MKL_INT *m , const MKL_INT *lb ,
const float *a , const MKL_INT *ia , const MKL_INT *ja , const float *x , float *y );
void mkl_cspblas_dbsrgemv (const char *transa , const MKL_INT *m , const MKL_INT *lb ,
const double *a , const MKL_INT *ia , const MKL_INT *ja , const double *x , double
*y );
void mkl_cspblas_cbsrgemv (const char *transa , const MKL_INT *m , const MKL_INT *lb ,
const MKL_Complex8 *a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex8 *x ,
MKL_Complex8 *y );
void mkl_cspblas_zbsrgemv (const char *transa , const MKL_INT *m , const MKL_INT *lb ,
const MKL_Complex16 *a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex16
*x , MKL_Complex16 *y );

Include Files
• mkl.h

Description

This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_cspblas_?bsrgemv routine performs a matrix-vector operation defined as

y := A*x
or

y := AT*x,

160
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
where:
x and y are vectors,
A is an m-by-m block sparse square matrix in the BSR format (3-array variation) with zero-based indexing,
AT is the transpose of A.

NOTE
This routine supports only zero-based indexing of the input arrays.

Input Parameters

transa Specifies the operation.

If transa = 'N' or 'n', then the matrix-vector product is computed as
y := A*x
If transa = 'T' or 't' or 'C' or 'c', then the matrix-vector product is
computed as y := AT*x,

m Number of block rows of the matrix A.

lb Size of the block in the matrix A.

a Array containing elements of non-zero blocks of the matrix A. Its length is

equal to the number of non-zero blocks in the matrix A multiplied by lb*lb.
Refer to values array description in BSR Format for more details.

ia Array of length (m + 1), containing indices of block in the array a, such

that ia[i] is the index in the array a of the first non-zero element from the
row i. The value of the last element ia[m] is equal to the number of non-
zero blocks. Refer to rowIndex array description in BSR Format for more
details.

ja Array containing the column indices for each non-zero block in the matrix A.
Its length is equal to the number of non-zero blocks of the matrix A. Refer
to columns array description in BSR Format for more details.

x Array, size (m*lb).

On entry, the array x must contain the vector x.

Output Parameters

y Array, size at least (m*lb).

On exit, the array y must contain the vector y.

mkl_cspblas_?coogemv
Computes matrix - vector product of a sparse general
matrix stored in the coordinate format with zero-
based indexing (deprecated).

161
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
void mkl_cspblas_scoogemv (const char *transa , const MKL_INT *m , const float *val ,
const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const float *x ,
float *y );
void mkl_cspblas_dcoogemv (const char *transa , const MKL_INT *m , const double *val ,
const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const double *x ,
double *y );
void mkl_cspblas_ccoogemv (const char *transa , const MKL_INT *m , const MKL_Complex8
*val , const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const
MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_cspblas_zcoogemv (const char *transa , const MKL_INT *m , const MKL_Complex16
*val , const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const
MKL_Complex16 *x , MKL_Complex16 *y );

Include Files
• mkl.h

Description

This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_cspblas_dcoogemv routine performs a matrix-vector operation defined as

y := A*x
or

y := AT*x,
where:
x and y are vectors,
A is an m-by-m sparse square matrix in the coordinate format with zero-based indexing, AT is the transpose
of A.

NOTE
This routine supports only zero-based indexing of the input arrays.

Input Parameters

transa Specifies the operation.

If transa = 'N' or 'n', then the matrix-vector product is computed as
y := A*x
If transa = 'T' or 't' or 'C' or 'c', then the matrix-vector product is
computed as y := AT*x.

m Number of rows of the matrix A.

val Array of length nnz, contains non-zero elements of the matrix A in the
arbitrary order.

162
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Refer to values array description in Coordinate Format for more details.

rowind Array of length nnz, contains the row indices for each non-zero element of
the matrix A.
Refer to rows array description in Coordinate Format for more details.

colind Array of length nnz, contains the column indices for each non-zero element
of the matrix A. Refer to columns array description in Coordinate Format
for more details.

nnz Specifies the number of non-zero element of the matrix A.

Refer to nnz description in Coordinate Format for more details.

x Array, size is m.

On entry, the array x must contain the vector x.

Output Parameters

y Array, size at least m.

On exit, the array y must contain the vector y.

mkl_cspblas_?csrsymv
Computes matrix-vector product of a sparse
symmetrical matrix stored in the CSR format (3-array
variation) with zero-based indexing (deprecated).

Syntax
void mkl_cspblas_scsrsymv (const char *uplo , const MKL_INT *m , const float *a , const
MKL_INT *ia , const MKL_INT *ja , const float *x , float *y );
void mkl_cspblas_dcsrsymv (const char *uplo , const MKL_INT *m , const double *a ,
const MKL_INT *ia , const MKL_INT *ja , const double *x , double *y );
void mkl_cspblas_ccsrsymv (const char *uplo , const MKL_INT *m , const MKL_Complex8
*a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_cspblas_zcsrsymv (const char *uplo , const MKL_INT *m , const MKL_Complex16
*a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex16 *x , MKL_Complex16
*y );

Include Files
• mkl.h

Description

This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_cspblas_?csrsymv routine performs a matrix-vector operation defined as

y := A*x
where:
x and y are vectors,

163
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

A is an upper or lower triangle of the symmetrical sparse matrix in the CSR format (3-array variation) with
zero-based indexing.

NOTE
This routine supports only zero-based indexing of the input arrays.

Input Parameters

uplo Specifies whether the upper or low triangle of the matrix A is used.
If uplo = 'U' or 'u', then the upper triangle of the matrix A is used.

If uplo = 'L' or 'l', then the low triangle of the matrix A is used.

m Number of rows of the matrix A.

a Array containing non-zero elements of the matrix A. Its length is equal to

the number of non-zero elements in the matrix A. Refer to values array
description in Sparse Matrix Storage Formats for more details.

ia Array of length m + 1, containing indices of elements in the array a, such

that ia[i] is the index in the array a of the first non-zero element from the
row i. The value of the last element ia[m] is equal to the number of non-
zeros. Refer to rowIndex array description in Sparse Matrix Storage
Formats for more details.

x Array, size is m.

On entry, the array x must contain the vector x.

Output Parameters

y Array, size at least m.

On exit, the array y must contain the vector y.

mkl_cspblas_?bsrsymv
Computes matrix-vector product of a sparse
symmetrical matrix stored in the BSR format (3-arrays
variation) with zero-based indexing (deprecated).

Syntax
void mkl_cspblas_sbsrsymv (const char *uplo , const MKL_INT *m , const MKL_INT *lb ,
const float *a , const MKL_INT *ia , const MKL_INT *ja , const float *x , float *y );
void mkl_cspblas_dbsrsymv (const char *uplo , const MKL_INT *m , const MKL_INT *lb ,
const double *a , const MKL_INT *ia , const MKL_INT *ja , const double *x , double
*y );

164
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void mkl_cspblas_cbsrsymv (const char *uplo , const MKL_INT *m , const MKL_INT *lb ,
const MKL_Complex8 *a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex8 *x ,
MKL_Complex8 *y );
void mkl_cspblas_zbsrsymv (const char *uplo , const MKL_INT *m , const MKL_INT *lb ,
const MKL_Complex16 *a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex16
*x , MKL_Complex16 *y );

Include Files
• mkl.h

Description

This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_cspblas_?bsrsymv routine performs a matrix-vector operation defined as

y := A*x
where:
x and y are vectors,
A is an upper or lower triangle of the symmetrical sparse matrix in the BSR format (3-array variation) with
zero-based indexing.

NOTE
This routine supports only zero-based indexing of the input arrays.

Input Parameters

uplo Specifies whether the upper or low triangle of the matrix A is used.
If uplo = 'U' or 'u', then the upper triangle of the matrix A is used.

If uplo = 'L' or 'l', then the low triangle of the matrix A is used.

m Number of block rows of the matrix A.

lb Size of the block in the matrix A.

a Array containing elements of non-zero blocks of the matrix A. Its length is

equal to the number of non-zero blocks in the matrix A multiplied by lb*lb.
Refer to values array description in BSR Format for more details.

ia Array of length (m + 1), containing indices of block in the array a, such

165
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

x Array, size (m*lb).

On entry, the array x must contain the vector x.

Output Parameters

y Array, size at least (m*lb).

On exit, the array y must contain the vector y.

mkl_cspblas_?coosymv
Computes matrix - vector product of a sparse
symmetrical matrix stored in the coordinate format
with zero-based indexing (deprecated).

Syntax
void mkl_cspblas_scoosymv (const char *uplo , const MKL_INT *m , const float *val ,
const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const float *x ,
float *y );
void mkl_cspblas_dcoosymv (const char *uplo , const MKL_INT *m , const double *val ,
const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const double *x ,
double *y );
void mkl_cspblas_ccoosymv (const char *uplo , const MKL_INT *m , const MKL_Complex8
*val , const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const
MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_cspblas_zcoosymv (const char *uplo , const MKL_INT *m , const MKL_Complex16
*val , const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const
MKL_Complex16 *x , MKL_Complex16 *y );

Include Files
• mkl.h

Description

This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_cspblas_?coosymv routine performs a matrix-vector operation defined as

y := A*x
where:
x and y are vectors,
A is an upper or lower triangle of the symmetrical sparse matrix in the coordinate format with zero-based
indexing.

NOTE
This routine supports only zero-based indexing of the input arrays.

166
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

uplo Specifies whether the upper or low triangle of the matrix A is used.
If uplo = 'U' or 'u', then the upper triangle of the matrix A is used.

If uplo = 'L' or 'l', then the low triangle of the matrix A is used.

m Number of rows of the matrix A.

val Array of length nnz, contains non-zero elements of the matrix A in the
arbitrary order.
Refer to values array description in Coordinate Format for more details.

rowind Array of length nnz, contains the row indices for each non-zero element of
the matrix A.
Refer to rows array description in Coordinate Format for more details.

colind Array of length nnz, contains the column indices for each non-zero element
of the matrix A. Refer to columns array description in Coordinate Format
for more details.

nnz Specifies the number of non-zero element of the matrix A.

Refer to nnz description in Coordinate Format for more details.

x Array, size is m.

On entry, the array x must contain the vector x.

Output Parameters

y Array, size at least m.

On exit, the array y must contain the vector y.

mkl_cspblas_?csrtrsv
Triangular solvers with simplified interface for a sparse
matrix in the CSR format (3-array variation) with
zero-based indexing (deprecated).

Syntax
void mkl_cspblas_scsrtrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const float *a , const MKL_INT *ia , const MKL_INT *ja , const float
*x , float *y );
void mkl_cspblas_dcsrtrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const double *a , const MKL_INT *ia , const MKL_INT *ja , const
double *x , double *y );
void mkl_cspblas_ccsrtrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const MKL_Complex8 *a , const MKL_INT *ia , const MKL_INT *ja ,
const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_cspblas_zcsrtrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const MKL_Complex16 *a , const MKL_INT *ia , const MKL_INT *ja ,
const MKL_Complex16 *x , MKL_Complex16 *y );

167
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description

This routine is deprecated. Use mkl_sparse_?_trsvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_cspblas_?csrtrsv routine solves a system of linear equations with matrix-vector operations for a
sparse matrix stored in the CSR format (3-array variation) with zero-based indexing:

A*y = x
or

AT*y = x,
where:
x and y are vectors,
A is a sparse upper or lower triangular matrix with unit or non-unit main diagonal, AT is the transpose of A.

NOTE
This routine supports only zero-based indexing of the input arrays.

Input Parameters

uplo Specifies whether the upper or low triangle of the matrix A is used.
If uplo = 'U' or 'u', then the upper triangle of the matrix A is used.

If uplo = 'L' or 'l', then the low triangle of the matrix A is used.

transa Specifies the system of linear equations.

If transa = 'N' or 'n', then A*y = x

If transa = 'T' or 't' or 'C' or 'c', then AT*y = x,

diag Specifies whether matrix A is unit triangular.

If diag = 'U' or 'u', then A is unit triangular.

If diag = 'N' or 'n', then A is not unit triangular.

m Number of rows of the matrix A.

a Array containing non-zero elements of the matrix A. Its length is equal to

the number of non-zero elements in the matrix A. Refer to values array
description in Sparse Matrix Storage Formats for more details.

168
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

ia Array of length m+1, containing indices of elements in the array a, such that
ia[i] is the index in the array a of the first non-zero element from the row
i. The value of the last element ia[m] is equal to the number of non-zeros.
Refer to rowIndex array description in Sparse Matrix Storage Formats for
more details.

NOTE
Column indices must be sorted in increasing order for each row.

x Array, size is m.

On entry, the array x must contain the vector x.

Output Parameters

y Array, size at least m.

Contains the vector y.

mkl_cspblas_?bsrtrsv
Triangular solver with simplified interface for a sparse
matrix stored in the BSR format (3-array variation)
with zero-based indexing (deprecated).

Syntax
void mkl_cspblas_sbsrtrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const MKL_INT *lb , const float *a , const MKL_INT *ia , const
MKL_INT *ja , const float *x , float *y );
void mkl_cspblas_dbsrtrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const MKL_INT *lb , const double *a , const MKL_INT *ia , const
MKL_INT *ja , const double *x , double *y );
void mkl_cspblas_cbsrtrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const MKL_INT *lb , const MKL_Complex8 *a , const MKL_INT *ia ,
const MKL_INT *ja , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_cspblas_zbsrtrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const MKL_INT *lb , const MKL_Complex16 *a , const MKL_INT *ia ,
const MKL_INT *ja , const MKL_Complex16 *x , MKL_Complex16 *y );

169
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description

This routine is deprecated. Use mkl_sparse_?_trsvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_cspblas_?bsrtrsv routine solves a system of linear equations with matrix-vector operations for a
sparse matrix stored in the BSR format (3-array variation) with zero-based indexing:

y := A*x
or

y := AT*x,
where:
x and y are vectors,
A is a sparse upper or lower triangular matrix with unit or non-unit main diagonal, AT is the transpose of A.

NOTE
This routine supports only zero-based indexing of the input arrays.

Input Parameters

uplo Specifies the upper or low triangle of the matrix A is used.

If uplo = 'U' or 'u', then the upper triangle of the matrix A is used.

If uplo = 'L' or 'l', then the low triangle of the matrix A is used.

transa Specifies the operation.

If transa = 'N' or 'n', then the matrix-vector product is computed as
y := A*x
If transa = 'T' or 't' or 'C' or 'c', then the matrix-vector product is
computed as y := AT*x.

diag Specifies whether matrix A is unit triangular or not.

If diag = 'U' or 'u', A is unit triangular.

If diag = 'N' or 'n', A is not unit triangular.

m Number of block rows of the matrix A.

lb Size of the block in the matrix A.

a Array containing elements of non-zero blocks of the matrix A. Its length is

equal to the number of non-zero blocks in the matrix A multiplied by lb*lb.
Refer to values array description in BSR Format for more details.

170
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

ia Array of length (m + 1), containing indices of block in the array a, such

that ia[I] is the index in the array a of the first non-zero element from the
row I. The value of the last element ia[m] is equal to the number of non-
zero blocks. Refer to rowIndex array description in BSR Format for more
details.

x Array, size (m*lb).

On entry, the array x must contain the vector x.

Output Parameters

y Array, size at least (m*lb).

On exit, the array y must contain the vector y.

mkl_cspblas_?cootrsv
Triangular solvers with simplified interface for a sparse
matrix in the coordinate format with zero-based
indexing (deprecated).

Syntax
void mkl_cspblas_scootrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const float *val , const MKL_INT *rowind , const MKL_INT *colind ,
const MKL_INT *nnz , const float *x , float *y );
void mkl_cspblas_dcootrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const double *val , const MKL_INT *rowind , const MKL_INT *colind ,
const MKL_INT *nnz , const double *x , double *y );
void mkl_cspblas_ccootrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const MKL_Complex8 *val , const MKL_INT *rowind , const MKL_INT
*colind , const MKL_INT *nnz , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_cspblas_zcootrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const MKL_Complex16 *val , const MKL_INT *rowind , const MKL_INT
*colind , const MKL_INT *nnz , const MKL_Complex16 *x , MKL_Complex16 *y );

Include Files
• mkl.h

171
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Description

This routine is deprecated. Use mkl_sparse_?_trsvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_cspblas_?cootrsv routine solves a system of linear equations with matrix-vector operations for a
sparse matrix stored in the coordinate format with zero-based indexing:

A*y = x
or

AT*y = x,
where:
x and y are vectors,
A is a sparse upper or lower triangular matrix with unit or non-unit main diagonal, AT is the transpose of A.

NOTE
This routine supports only zero-based indexing of the input arrays.

Input Parameters

uplo Specifies whether the upper or low triangle of the matrix A is considered.
If uplo = 'U' or 'u', then the upper triangle of the matrix A is used.

If uplo = 'L' or 'l', then the low triangle of the matrix A is used.

transa Specifies the system of linear equations.

If transa = 'N' or 'n', then A*y = x

If transa = 'T' or 't' or 'C' or 'c', then AT*y = x,

diag Specifies whether A is unit triangular.

If diag = 'U' or 'u', then A is unit triangular.

If diag = 'N' or 'n', then A is not unit triangular.

m Number of rows of the matrix A.

val Array of length nnz, contains non-zero elements of the matrix A in the
arbitrary order.
Refer to values array description in Coordinate Format for more details.

rowind Array of length nnz, contains the row indices for each non-zero element of
the matrix A.
Refer to rows array description in Coordinate Format for more details.

colind Array of length nnz, contains the column indices for each non-zero element
of the matrix A. Refer to columns array description in Coordinate Format
for more details.

nnz Specifies the number of non-zero element of the matrix A.

172
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Refer to nnz description in Coordinate Format for more details.

x Array, size is m.

On entry, the array x must contain the vector x.

Output Parameters

y Array, size at least m.

Contains the vector y.

mkl_?csrmv
Computes matrix - vector product of a sparse matrix
stored in the CSR format (deprecated).

Syntax
void mkl_scsrmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const float
*alpha , const char *matdescra , const float *val , const MKL_INT *indx , const MKL_INT
*pntrb , const MKL_INT *pntre , const float *x , const float *beta , float *y );
void mkl_dcsrmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
double *alpha , const char *matdescra , const double *val , const MKL_INT *indx , const
MKL_INT *pntrb , const MKL_INT *pntre , const double *x , const double *beta , double
*y );
void mkl_ccsrmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8 *val , const MKL_INT
*indx , const MKL_INT *pntrb , const MKL_INT *pntre , const MKL_Complex8 *x , const
MKL_Complex8 *beta , MKL_Complex8 *y );
void mkl_zcsrmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16 *val , const MKL_INT
*indx , const MKL_INT *pntrb , const MKL_INT *pntre , const MKL_Complex16 *x , const
MKL_Complex16 *beta , MKL_Complex16 *y );

Include Files
• mkl.h

Description

This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?csrmv routine performs a matrix-vector operation defined as

y := alpha*A*x + beta*y
or

y := alpha*AT*x + beta*y,
where:
alpha and beta are scalars,
x and y are vectors,
A is an m-by-k sparse matrix in the CSR format, AT is the transpose of A.

173
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

NOTE
This routine supports a CSR format both with one-based indexing and zero-based indexing.

Input Parameters

transa Specifies the operation.

If transa = 'N' or 'n', then y := alpha*A*x + beta*y

If transa = 'T' or 't' or 'C' or 'c', then y := alphaATx + beta*y,

m Number of rows of the matrix A.

k Number of columns of the matrix A.

alpha Specifies the scalar alpha.

matdescra Array of six elements, specifies properties of the matrix used for operation.
Only first four array elements are used, their possible values are given in
Table “Possible Values of the Parameter matdescra (descra)”. Possible
combinations of element values of this parameter are given in Table
“Possible Combinations of Element Values of the Parameter matdescra”.

val Array containing non-zero elements of the matrix A.

Its length is pntre[m-1] - pntrb[0].

Refer to values array description in CSR Format for more details.

indx For one-based indexing, array containing the column indices plus one for
each non-zero element of the matrix A. For zero-based indexing, array
containing the column indices for each non-zero element of the matrix A.
Its length is equal to length of the val array.

Refer to columns array description in CSR Format for more details.

pntrb Array of length m.

This array contains row indices, such that pntrb[i] - pntrb[0] is the
first index of row i in the arrays val and indx.

Refer to pointerb array description in CSR Format for more details.

pntre Array of length m.

This array contains row indices, such that pntre[i] - pntrb[0]-1 is the
last index of row i in the arrays val and indx.

Refer to pointerE array description in CSR Format for more details.

x Array, size at least k if transa = 'N' or 'n' and at least m otherwise. On

entry, the array x must contain the vector x.

beta Specifies the scalar beta.

y Array, size at least m if transa = 'N' or 'n' and at least k otherwise. On

entry, the array y must contain the vector y.

174
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

y Overwritten by the updated vector y.

mkl_?bsrmv
Computes matrix - vector product of a sparse matrix
stored in the BSR format (deprecated).

Syntax
void mkl_sbsrmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_INT *lb , const float *alpha , const char *matdescra , const float *val , const
MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const float *x , const
float *beta , float *y );
void mkl_dbsrmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_INT *lb , const double *alpha , const char *matdescra , const double *val , const
MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const double *x , const
double *beta , double *y );
void mkl_cbsrmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_INT *lb , const MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8
*val , const MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const
MKL_Complex8 *x , const MKL_Complex8 *beta , MKL_Complex8 *y );
void mkl_zbsrmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_INT *lb , const MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16
*val , const MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const
MKL_Complex16 *x , const MKL_Complex16 *beta , MKL_Complex16 *y );

Include Files
• mkl.h

Description

This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?bsrmv routine performs a matrix-vector operation defined as

y := alpha*A*x + beta*y
or

y := alpha*AT*x + beta*y,
where:
alpha and beta are scalars,
x and y are vectors,
A is an m-by-k block sparse matrix in the BSR format, AT is the transpose of A.

NOTE
This routine supports a BSR format both with one-based indexing and zero-based indexing.

175
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

transa Specifies the operation.

If transa = 'N' or 'n', then the matrix-vector product is computed as
y := alpha*A*x + beta*y
If transa = 'T' or 't' or 'C' or 'c', then the matrix-vector product is
computed as y := alpha*AT*x + beta*y,

m Number of block rows of the matrix A.

k Number of block columns of the matrix A.

lb Size of the block in the matrix A.

alpha Specifies the scalar alpha.

val Array containing elements of non-zero blocks of the matrix A. Its length is
equal to the number of non-zero blocks in the matrix A multiplied by lb*lb.

Refer to values array description in BSR Format for more details.

indx For one-based indexing, array containing the column indices plus one for
each non-zero block of the matrix A. For zero-based indexing, array
containing the column indices for each non-zero block of the matrix A.
Its length is equal to the number of non-zero blocks in the matrix A.
Refer to columns array description in BSR Format for more details.

pntrb Array of length m.

This array contains row indices, such that pntrb[i] - pntrb[0] is the
first index of block row i in the array indx

Refer to pointerB array description in BSR Format for more details.

pntre Array of length m.

For zero-based indexing this array contains row indices, such that
pntre[i] - pntrb[0] - 1 is the last index of block row i in the array
indx.
Refer to pointerE array description in BSR Format for more details.

x Array, size at least (k*lb) if transa = 'N' or 'n', and at least (m*lb)
otherwise. On entry, the array x must contain the vector x.

beta Specifies the scalar beta.

y Array, size at least (m*lb) if transa = 'N' or 'n', and at least (k*lb)
otherwise. On entry, the array y must contain the vector y.

176
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

y Overwritten by the updated vector y.

mkl_?cscmv
Computes matrix-vector product for a sparse matrix in
the CSC format (deprecated).

Syntax
void mkl_scscmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const float
*alpha , const char *matdescra , const float *val , const MKL_INT *indx , const MKL_INT
*pntrb , const MKL_INT *pntre , const float *x , const float *beta , float *y );
void mkl_dcscmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
double *alpha , const char *matdescra , const double *val , const MKL_INT *indx , const
MKL_INT *pntrb , const MKL_INT *pntre , const double *x , const double *beta , double
*y );
void mkl_ccscmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8 *val , const MKL_INT
*indx , const MKL_INT *pntrb , const MKL_INT *pntre , const MKL_Complex8 *x , const
MKL_Complex8 *beta , MKL_Complex8 *y );
void mkl_zcscmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16 *val , const MKL_INT
*indx , const MKL_INT *pntrb , const MKL_INT *pntre , const MKL_Complex16 *x , const
MKL_Complex16 *beta , MKL_Complex16 *y );

Include Files
• mkl.h

Description

This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?cscmv routine performs a matrix-vector operation defined as

y := alpha*A*x + beta*y
or

y := alpha*AT*x + beta*y,
where:
alpha and beta are scalars,
x and y are vectors,
A is an m-by-k sparse matrix in compressed sparse column (CSC) format, AT is the transpose of A.

NOTE
This routine supports CSC format both with one-based indexing and zero-based indexing.

177
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

transa Specifies the operation.

If transa = 'N' or 'n', then y := alpha*A*x + beta*y

If transa = 'T' or 't' or 'C' or 'c', then y := alphaATx + beta*y,

m Number of rows of the matrix A.

k Number of columns of the matrix A.

alpha Specifies the scalar alpha.

val Array containing non-zero elements of the matrix A.

Its length is pntre[k-1] - pntrb[0].

Refer to values array description in CSC Format for more details.

Refer to rows array description in CSC Format for more details.

pntrb Array of length k.

This array contains column indices, such that pntrb[i] - pntrb[0] + 1

is the first index of column i in the arrays val and indx.

Refer to pointerb array description in CSC Format for more details.

pntre Array of length k.

For one-based indexing this array contains column indices, such that
pntre[i] - pntrb[1] is the last index of column i in the arrays val and
indx.
For zero-based indexing this array contains column indices, such that
pntre[i] - pntrb[1] - 1 is the last index of column i in the arrays val
and indx.

Refer to pointerE array description in CSC Format for more details.

x Array, size at least k if transa = 'N' or 'n' and at least m otherwise. On

entry, the array x must contain the vector x.

beta Specifies the scalar beta.

y Array, size at least m if transa = 'N' or 'n' and at least k otherwise. On

entry, the array y must contain the vector y.

178
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

y Overwritten by the updated vector y.

mkl_?coomv
Computes matrix - vector product for a sparse matrix
in the coordinate format (deprecated).

Syntax
void mkl_scoomv (const char *transa , const MKL_INT *m , const MKL_INT *k , const float
*alpha , const char *matdescra , const float *val , const MKL_INT *rowind , const
MKL_INT *colind , const MKL_INT *nnz , const float *x , const float *beta , float *y );
void mkl_dcoomv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
double *alpha , const char *matdescra , const double *val , const MKL_INT *rowind ,
const MKL_INT *colind , const MKL_INT *nnz , const double *x , const double *beta ,
double *y );
void mkl_ccoomv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8 *val , const MKL_INT
*rowind , const MKL_INT *colind , const MKL_INT *nnz , const MKL_Complex8 *x , const
MKL_Complex8 *beta , MKL_Complex8 *y );
void mkl_zcoomv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16 *val , const MKL_INT
*rowind , const MKL_INT *colind , const MKL_INT *nnz , const MKL_Complex16 *x , const
MKL_Complex16 *beta , MKL_Complex16 *y );

Include Files
• mkl.h

Description

This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?coomv routine performs a matrix-vector operation defined as

y := alpha*A*x + beta*y
or

y := alpha*AT*x + beta*y,
where:
alpha and beta are scalars,
x and y are vectors,
A is an m-by-k sparse matrix in compressed coordinate format, AT is the transpose of A.

NOTE
This routine supports a coordinate format both with one-based indexing and zero-based indexing.

179
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

transa Specifies the operation.

If transa = 'N' or 'n', then y := alpha*A*x + beta*y

If transa = 'T' or 't' or 'C' or 'c', then y := alphaATx + beta*y,

m Number of rows of the matrix A.

k Number of columns of the matrix A.

alpha Specifies the scalar alpha.

val Array of length nnz, contains non-zero elements of the matrix A in the
arbitrary order.
Refer to values array description in Coordinate Format for more details.

rowind Array of length nnz.

For one-based indexing, contains the row indices plus one for each non-zero
element of the matrix A.
For zero-based indexing, contains the row indices for each non-zero
element of the matrix A.
Refer to rows array description in Coordinate Format for more details.

colind Array of length nnz.

For one-based indexing, contains the column indices plus one for each non-
zero element of the matrix A.
For zero-based indexing, contains the column indices for each non-zero
element of the matrix A.
Refer to columns array description in Coordinate Format for more details.

nnz Specifies the number of non-zero element of the matrix A.

Refer to nnz description in Coordinate Format for more details.

x Array, size at least k if transa = 'N' or 'n' and at least m otherwise. On

entry, the array x must contain the vector x.

beta Specifies the scalar beta.

y Array, size at least m if transa = 'N' or 'n' and at least k otherwise. On

entry, the array y must contain the vector y.

Output Parameters

y Overwritten by the updated vector y.

180
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
mkl_?csrsv
Solves a system of linear equations for a sparse
matrix in the CSR format (deprecated).

Syntax
void mkl_scsrsv (const char *transa , const MKL_INT *m , const float *alpha , const
char *matdescra , const float *val , const MKL_INT *indx , const MKL_INT *pntrb , const
MKL_INT *pntre , const float *x , float *y );
void mkl_dcsrsv (const char *transa , const MKL_INT *m , const double *alpha , const
char *matdescra , const double *val , const MKL_INT *indx , const MKL_INT *pntrb ,
const MKL_INT *pntre , const double *x , double *y );
void mkl_ccsrsv (const char *transa , const MKL_INT *m , const MKL_Complex8 *alpha ,
const char *matdescra , const MKL_Complex8 *val , const MKL_INT *indx , const MKL_INT
*pntrb , const MKL_INT *pntre , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_zcsrsv (const char *transa , const MKL_INT *m , const MKL_Complex16 *alpha ,
const char *matdescra , const MKL_Complex16 *val , const MKL_INT *indx , const MKL_INT
*pntrb , const MKL_INT *pntre , const MKL_Complex16 *x , MKL_Complex16 *y );

Include Files
• mkl.h

Description

This routine is deprecated. Use mkl_sparse_?_trsvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?csrsv routine solves a system of linear equations with matrix-vector operations for a sparse
matrix in the CSR format:

y := alpha*inv(A)*x
or

y := alpha*inv(AT)*x,
where:
alpha is scalar, x and y are vectors, A is a sparse upper or lower triangular matrix with unit or non-unit main
diagonal, AT is the transpose of A.

NOTE
This routine supports a CSR format both with one-based indexing and zero-based indexing.

Input Parameters

transa Specifies the system of linear equations.

If transa = 'N' or 'n', then y := alpha*inv(A)*x

If transa = 'T' or 't' or 'C' or 'c', then y := alphainv(AT)x,

m Number of columns of the matrix A.

alpha Specifies the scalar alpha.

181
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

val Array containing non-zero elements of the matrix A.

Its length is pntre[m - 1] - pntrb[0].

Refer to values array description in CSR Format for more details.

Refer to columns array description in CSR Format for more details.

NOTE
Column indices must be sorted in increasing order for each row.

pntrb Array of length m.

This array contains row indices, such that pntrb[i] - pntrb[0] is the
first index of row i in the arrays val and indx.

Refer to pointerb array description in CSR Format for more details.

pntre Array of length m.

This array contains row indices, such that pntre[i] - pntrb[0] - 1 is

the last index of row i in the arrays val and indx.

Refer to pointerE array description in CSR Format for more details.

x Array, size at least m.

On entry, the array x must contain the vector x. The elements are accessed
with unit increment.

y Array, size at least m.

On entry, the array y must contain the vector y. The elements are accessed
with unit increment.

Output Parameters

y Contains solution vector x.

182
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
mkl_?bsrsv
Solves a system of linear equations for a sparse
matrix in the BSR format (deprecated).

Syntax
void mkl_sbsrsv (const char *transa , const MKL_INT *m , const MKL_INT *lb , const
float *alpha , const char *matdescra , const float *val , const MKL_INT *indx , const
MKL_INT *pntrb , const MKL_INT *pntre , const float *x , float *y );
void mkl_dbsrsv (const char *transa , const MKL_INT *m , const MKL_INT *lb , const
double *alpha , const char *matdescra , const double *val , const MKL_INT *indx , const
MKL_INT *pntrb , const MKL_INT *pntre , const double *x , double *y );
void mkl_cbsrsv (const char *transa , const MKL_INT *m , const MKL_INT *lb , const
MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8 *val , const MKL_INT
*indx , const MKL_INT *pntrb , const MKL_INT *pntre , const MKL_Complex8 *x ,
MKL_Complex8 *y );
void mkl_zbsrsv (const char *transa , const MKL_INT *m , const MKL_INT *lb , const
MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16 *val , const MKL_INT
*indx , const MKL_INT *pntrb , const MKL_INT *pntre , const MKL_Complex16 *x ,
MKL_Complex16 *y );

Include Files
• mkl.h

Description

This routine is deprecated. Use mkl_sparse_?_trsvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?bsrsv routine solves a system of linear equations with matrix-vector operations for a sparse
matrix in the BSR format:

y := alpha*inv(A)*x
or

y := alpha*inv(AT)* x,
where:
alpha is scalar, x and y are vectors, A is a sparse upper or lower triangular matrix with unit or non-unit main
diagonal, AT is the transpose of A.

NOTE
This routine supports a BSR format both with one-based indexing and zero-based indexing.

Input Parameters

transa Specifies the operation.

If transa = 'N' or 'n', then y := alpha*inv(A)*x

If transa = 'T' or 't' or 'C' or 'c', then y := alphainv(AT) x,

m Number of block columns of the matrix A.

183
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lb Size of the block in the matrix A.

alpha Specifies the scalar alpha.

val Array containing elements of non-zero blocks of the matrix A. Its length is
equal to the number of non-zero blocks in the matrix A multiplied by lb*lb.

Refer to the values array description in BSR Format for more details.

pntrb Array of length m.

This array contains row indices, such that pntrb[i] - pntrb[0] is the
first index of block row i in the array indx

Refer to pointerB array description in BSR Format for more details.

pntre Array of length m.

For one-based indexing this array contains row indices, such that pntre[i]
- pntrb[1] is the last index of block row i in the array indx.
For zero-based indexing this array contains row indices, such that
pntre[i] - pntrb[0] - 1 is the last index of block row i in the array
indx.
Refer to pointerE array description in BSR Format for more details.

x Array, size at least (m*lb).

On entry, the array x must contain the vector x. The elements are accessed
with unit increment.

y Array, size at least (m*lb).

On entry, the array y must contain the vector y. The elements are accessed
with unit increment.

184
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

y Contains solution vector x.

mkl_?cscsv
Solves a system of linear equations for a sparse
matrix in the CSC format (deprecated).

Syntax
void mkl_scscsv (const char *transa , const MKL_INT *m , const float *alpha , const
char *matdescra , const float *val , const MKL_INT *indx , const MKL_INT *pntrb , const
MKL_INT *pntre , const float *x , float *y );
void mkl_dcscsv (const char *transa , const MKL_INT *m , const double *alpha , const
char *matdescra , const double *val , const MKL_INT *indx , const MKL_INT *pntrb ,
const MKL_INT *pntre , const double *x , double *y );
void mkl_ccscsv (const char *transa , const MKL_INT *m , const MKL_Complex8 *alpha ,
const char *matdescra , const MKL_Complex8 *val , const MKL_INT *indx , const MKL_INT
*pntrb , const MKL_INT *pntre , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_zcscsv (const char *transa , const MKL_INT *m , const MKL_Complex16 *alpha ,
const char *matdescra , const MKL_Complex16 *val , const MKL_INT *indx , const MKL_INT
*pntrb , const MKL_INT *pntre , const MKL_Complex16 *x , MKL_Complex16 *y );

Include Files
• mkl.h

Description

This routine is deprecated. Use mkl_sparse_?_trsvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?cscsv routine solves a system of linear equations with matrix-vector operations for a sparse
matrix in the CSC format:

y := alpha*inv(A)*x
or

y := alpha*inv(AT)* x,
where:
alpha is scalar, x and y are vectors, A is a sparse upper or lower triangular matrix with unit or non-unit main
diagonal, AT is the transpose of A.

NOTE
This routine supports a CSC format both with one-based indexing and zero-based indexing.

Input Parameters

transa Specifies the operation.

If transa = 'N' or 'n', then y := alpha*inv(A)*x

185
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If transa= 'T' or 't' or 'C' or 'c', then y := alphainv(AT) x,

m Number of columns of the matrix A.

alpha Specifies the scalar alpha.

val Array containing non-zero elements of the matrix A.

Its length is pntre[m-1] - pntrb[0].

Refer to values array description in CSC Format for more details.

indx For one-based indexing, array containing the row indices plus one for each
non-zero element of the matrix A.
For zero-based indexing, array containing the row indices for each non-zero
element of the matrix A.
Its length is equal to length of the val array.

Refer to columns array description in CSC Format for more details.

NOTE
Row indices must be sorted in increasing order for each column.

pntrb Array of length m.

This array contains column indices, such that pntrb[i] - pntrb[0] is the
first index of column i in the arrays val and indx.

Refer to pointerb array description in CSC Format for more details.

pntre Array of length m.

This array contains column indices, such that pntre[i] - pntrb[0] - 1

is the last index of column i in the arrays val and indx.

Refer to pointerE array description in CSC Format for more details.

x Array, size at least m.

On entry, the array x must contain the vector x. The elements are accessed
with unit increment.

y Array, size at least m.

186
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
On entry, the array y must contain the vector y. The elements are accessed
with unit increment.

Output Parameters

y Contains the solution vector x.

mkl_?coosv
Solves a system of linear equations for a sparse
matrix in the coordinate format (deprecated).

Syntax
void mkl_scoosv (const char *transa , const MKL_INT *m , const float *alpha , const
char *matdescra , const float *val , const MKL_INT *rowind , const MKL_INT *colind ,
const MKL_INT *nnz , const float *x , float *y );
void mkl_dcoosv (const char *transa , const MKL_INT *m , const double *alpha , const
char *matdescra , const double *val , const MKL_INT *rowind , const MKL_INT *colind ,
const MKL_INT *nnz , const double *x , double *y );
void mkl_ccoosv (const char *transa , const MKL_INT *m , const MKL_Complex8 *alpha ,
const char *matdescra , const MKL_Complex8 *val , const MKL_INT *rowind , const MKL_INT
*colind , const MKL_INT *nnz , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_zcoosv (const char *transa , const MKL_INT *m , const MKL_Complex16 *alpha ,
const char *matdescra , const MKL_Complex16 *val , const MKL_INT *rowind , const
MKL_INT *colind , const MKL_INT *nnz , const MKL_Complex16 *x , MKL_Complex16 *y );

Include Files
• mkl.h

Description

This routine is deprecated. Use mkl_sparse_?_trsvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?coosv routine solves a system of linear equations with matrix-vector operations for a sparse
matrix in the coordinate format:

y := alpha*inv(A)*x
or

y := alpha*inv(AT)*x,
where:
alpha is scalar, x and y are vectors, A is a sparse upper or lower triangular matrix with unit or non-unit main
diagonal, AT is the transpose of A.

NOTE
This routine supports a coordinate format both with one-based indexing and zero-based indexing.

Input Parameters

187
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

transa Specifies the system of linear equations.

If transa = 'N' or 'n', then y := alpha*inv(A)*x

If transa = 'T' or 't' or 'C' or 'c', then y := alphainv(AT) x,

m Number of rows of the matrix A.

alpha Specifies the scalar alpha.

val Array of length nnz, contains non-zero elements of the matrix A in the
arbitrary order.
Refer to values array description in Coordinate Format for more details.

rowind Array of length nnz.

colind Array of length nnz.

nnz Specifies the number of non-zero element of the matrix A.

Refer to nnz description in Coordinate Format for more details.

x Array, size at least m.

On entry, the array x must contain the vector x. The elements are accessed
with unit increment.

y Array, size at least m.

On entry, the array y must contain the vector y. The elements are accessed
with unit increment.

Output Parameters

y Contains solution vector x.

mkl_?csrmm
Computes matrix - matrix product of a sparse matrix
stored in the CSR format (deprecated).

188
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void mkl_scsrmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const float *alpha , const char *matdescra , const float *val , const
MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const float *b , const
MKL_INT *ldb , const float *beta , float *c , const MKL_INT *ldc );
void mkl_dcsrmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const double *alpha , const char *matdescra , const double *val , const
MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const double *b , const
MKL_INT *ldb , const double *beta , double *c , const MKL_INT *ldc );
void mkl_ccsrmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8
*val , const MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const
MKL_Complex8 *b , const MKL_INT *ldb , const MKL_Complex8 *beta , MKL_Complex8 *c ,
const MKL_INT *ldc );
void mkl_zcsrmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16
*val , const MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const
MKL_Complex16 *b , const MKL_INT *ldb , const MKL_Complex16 *beta , MKL_Complex16 *c ,
const MKL_INT *ldc );

Include Files
• mkl.h

Description

This routine is deprecated. Use Use mkl_sparse_?_mmfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?csrmm routine performs a matrix-matrix operation defined as

C := alpha*A*B + beta*C
or

C := alpha*AT*B + beta*C
or

C := alpha*AH*B + beta*C,
where:
alpha and beta are scalars,
B and C are dense matrices, A is an m-by-k sparse matrix in compressed sparse row (CSR) format, AT is the
transpose of A, and AH is the conjugate transpose of A.

NOTE
This routine supports a CSR format both with one-based indexing and zero-based indexing.

Input Parameters

transa Specifies the operation.

189
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If transa = 'N' or 'n', then C := alphaAB + beta*C,

If transa = 'T' or 't', then C := alphaATB + beta*C,

If transa = 'C' or 'c', then C := alphaAHB + beta*C.

m Number of rows of the matrix A.

n Number of columns of the matrix C.

k Number of columns of the matrix A.

alpha Specifies the scalar alpha.

matdescra Array of six elements, specifies properties of the matrix used for operation.
Only first four array elements are used, their possible values are given in
Table "Possible Values of the Parameter matdescra (descra)". Possible
combinations of element values of this parameter are given in Table
"Possible Combinations of Element Values of the Parameter matdescra".

val Array containing non-zero elements of the matrix A.

For zero-based indexing its length is pntre[m-1] - pntrb[0].

Refer to values array description in CSR Format for more details.

indx For one-based indexing, array containing the column indices plus one for
each non-zero element of the matrix A.
For zero-based indexing, array containing the column indices for each non-
zero element of the matrix A.
Its length is equal to length of the val array.

Refer to columns array description in CSR Format for more details.

pntrb Array of length m.

This array contains row indices, such that pntrb[I] - pntrb[0] is the
first index of row I in the arrays val and indx.

Refer to pointerb array description in CSR Format for more details.

pntre Array of length m.

This array contains row indices, such that pntre[I] - pntrb[0] - 1 is

the last index of row I in the arrays val and indx.

Refer to pointerE array description in CSR Format for more details.

b Array, size ldb by at least n for non-transposed matrix A and at least m for
transposed for one-based indexing, and (at least k for non-transposed
matrix A and at least m for transposed, ldb) for zero-based indexing.

On entry with transa='N' or 'n', the leading k-by-n part of the array b
must contain the matrix B, otherwise the leading m-by-n part of the array b
must contain the matrix B.

ldb Specifies the leading dimension of b for one-based indexing, and the second
dimension of b for zero-based indexing, as declared in the calling
(sub)program.

beta Specifies the scalar beta.

190
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
c Array, size ldc by n for one-based indexing, and (m, ldc) for zero-based
indexing.
On entry, the leading m-by-n part of the array c must contain the matrix C,
otherwise the leading k-by-n part of the array c must contain the matrix C.

ldc Specifies the leading dimension of c for one-based indexing, and the second
dimension of c for zero-based indexing, as declared in the calling
(sub)program.

Output Parameters

c Overwritten by the matrix (alphaAB + beta* C), (alphaATB +

beta*C), or (alpha*AH*B + beta*C).

mkl_?bsrmm
Computes matrix - matrix product of a sparse matrix
stored in the BSR format (deprecated).

Syntax
void mkl_sbsrmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_INT *lb , const float *alpha , const char *matdescra , const
float *val , const MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const
float *b , const MKL_INT *ldb , const float *beta , float *c , const MKL_INT *ldc );
void mkl_dbsrmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_INT *lb , const double *alpha , const char *matdescra , const
double *val , const MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const
double *b , const MKL_INT *ldb , const double *beta , double *c , const MKL_INT *ldc );
void mkl_cbsrmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_INT *lb , const MKL_Complex8 *alpha , const char *matdescra ,
const MKL_Complex8 *val , const MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT
*pntre , const MKL_Complex8 *b , const MKL_INT *ldb , const MKL_Complex8 *beta ,
MKL_Complex8 *c , const MKL_INT *ldc );
void mkl_zbsrmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_INT *lb , const MKL_Complex16 *alpha , const char *matdescra ,
const MKL_Complex16 *val , const MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT
*pntre , const MKL_Complex16 *b , const MKL_INT *ldb , const MKL_Complex16 *beta ,
MKL_Complex16 *c , const MKL_INT *ldc );

Include Files
• mkl.h

Description

This routine is deprecated. Use Use mkl_sparse_?_mmfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?bsrmm routine performs a matrix-matrix operation defined as

C := alpha*A*B + beta*C

191
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

C := alpha*AT*B + beta*C
or

C := alpha*AH*B + beta*C,
where:
alpha and beta are scalars,
B and C are dense matrices, A is an m-by-k sparse matrix in block sparse row (BSR) format, AT is the
transpose of A, and AH is the conjugate transpose of A.

NOTE
This routine supports a BSR format both with one-based indexing and zero-based indexing.

Input Parameters

transa Specifies the operation.

If transa = 'N' or 'n', then the matrix-matrix product is computed as
C := alpha*A*B + beta*C
If transa = 'T' or 't', then the matrix-vector product is computed as
C := alpha*AT*B + beta*C
If transa = 'C' or 'c', then the matrix-vector product is computed as
C := alpha*AH*B + beta*C,

m Number of block rows of the matrix A.

n Number of columns of the matrix C.

k Number of block columns of the matrix A.

lb Size of the block in the matrix A.

alpha Specifies the scalar alpha.

val Array containing elements of non-zero blocks of the matrix A. Its length is
equal to the number of non-zero blocks in the matrix A multiplied by lb*lb.
Refer to the values array description in BSR Format for more details.

indx For one-based indexing, array containing the column indices plus one for
each non-zero block in the matrix A.
For zero-based indexing, array containing the column indices for each non-
zero block in the matrix A.
Its length is equal to the number of non-zero blocks in the matrix A. Refer
to the columns array description in BSR Format for more details.

pntrb Array of length m.

192
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
This array contains row indices, such that pntrb[I] - pntrb[0] is the
first index of block row I in the array indx.

Refer to pointerB array description in BSR Format for more details.

pntre Array of length m.

This array contains row indices, such that pntre[I] - pntrb[0] - 1 is

the last index of block row I in the array indx.

Refer to pointerE array description in BSR Format for more details.

On entry with transa='N' or 'n', the leading n-by-k block part of the
array b must contain the matrix B, otherwise the leading m-by-n block part
of the array b must contain the matrix B.

ldb Specifies the leading dimension (in blocks) of b as declared in the calling
(sub)program.

beta Specifies the scalar beta.

c Array, size ldc* n for one-based indexing, size k* ldc for zero-based
indexing.
On entry, the leading m-by-n block part of the array c must contain the
matrix C, otherwise the leading n-by-k block part of the array c must
contain the matrix C.

ldc Specifies the leading dimension (in blocks) of c as declared in the calling
(sub)program.

Output Parameters

c Overwritten by the matrix (alphaAB + betaC) or (alphaAT*B +

beta*C) or (alpha*AH*B + beta*C).

mkl_?cscmm
Computes matrix-matrix product of a sparse matrix
stored in the CSC format (deprecated).

Syntax
void mkl_scscmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const float *alpha , const char *matdescra , const float *val , const
MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const float *b , const
MKL_INT *ldb , const float *beta , float *c , const MKL_INT *ldc );
void mkl_dcscmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const double *alpha , const char *matdescra , const double *val , const
MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const double *b , const
MKL_INT *ldb , const double *beta , double *c , const MKL_INT *ldc );

193
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

void mkl_ccscmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8
*val , const MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const
MKL_Complex8 *b , const MKL_INT *ldb , const MKL_Complex8 *beta , MKL_Complex8 *c ,
const MKL_INT *ldc );
void mkl_zcscmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16
*val , const MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const
MKL_Complex16 *b , const MKL_INT *ldb , const MKL_Complex16 *beta , MKL_Complex16 *c ,
const MKL_INT *ldc );

Include Files
• mkl.h

Description

This routine is deprecated. Use Use mkl_sparse_?_mmfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?cscmm routine performs a matrix-matrix operation defined as

C := alpha*A*B + beta*C
or

C := alpha*AT*B + beta*C,
or

C := alpha*AH*B + beta*C,
where:
alpha and beta are scalars,
B and C are dense matrices, A is an m-by-k sparse matrix in compressed sparse column (CSC) format, AT is
the transpose of A, and AH is the conjugate transpose of A.

NOTE
This routine supports CSC format both with one-based indexing and zero-based indexing.

Input Parameters

transa Specifies the operation.

If transa = 'N' or 'n', then C := alpha*A* B + beta*C

If transa = 'T' or 't', then C := alphaATB + beta*C,

If transa ='C' or 'c', then C := alphaAHB + beta*C

m Number of rows of the matrix A.

n Number of columns of the matrix C.

k Number of columns of the matrix A.

alpha Specifies the scalar alpha.

194
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
matdescra Array of six elements, specifies properties of the matrix used for operation.
Only first four array elements are used, their possible values are given in
Table “Possible Values of the Parameter matdescra (descra)”. Possible
combinations of element values of this parameter are given in Table
“Possible Combinations of Element Values of the Parameter matdescra”.

val Array containing non-zero elements of the matrix A.

Its length is pntrb[k-1] - pntrb[0].

Refer to values array description in CSC Format for more details.

indx For one-based indexing, array containing the row indices plus one for each
non-zero element of the matrix A.
For zero-based indexing, array containing the column indices for each non-
zero element of the matrix A.
Its length is equal to length of the val array.

Refer to rows array description in CSC Format for more details.

pntrb Array of length k.

This array contains column indices, such that pntrb[i] - pntrb[0] is the
first index of column i in the arrays val and indx.

Refer to pointerb array description in CSC Format for more details.

pntre Array of length k.

This array contains column indices, such that pntre[i] - pntrb[0] - 1

is the last index of column i in the arrays val and indx.

Refer to pointerE array description in CSC Format for more details.

On entry with transa = 'N' or 'n', the leading k-by-n part of the array b
must contain the matrix B, otherwise the leading m-by-n part of the array b
must contain the matrix B.

ldb Specifies the leading dimension of b for one-based indexing, and the second
dimension of b for zero-based indexing, as declared in the calling
(sub)program.

beta Specifies the scalar beta.

c Array, size ldc by n for one-based indexing, and (m, ldc) for zero-based
indexing.
On entry, the leading m-by-n part of the array c must contain the matrix C,
otherwise the leading k-by-n part of the array c must contain the matrix C.

ldc Specifies the leading dimension of c for one-based indexing, and the second
dimension of c for zero-based indexing, as declared in the calling
(sub)program.

195
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

c Overwritten by the matrix (alphaAB + beta* C) or (alphaATB +

beta*C) or (alpha*AH*B + beta*C).

mkl_?coomm
Computes matrix-matrix product of a sparse matrix
stored in the coordinate format (deprecated).

Syntax
void mkl_scoomm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const float *alpha , const char *matdescra , const float *val , const
MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const float *b , const
MKL_INT *ldb , const float *beta , float *c , const MKL_INT *ldc );
void mkl_dcoomm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const double *alpha , const char *matdescra , const double *val , const
MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const double *b , const
MKL_INT *ldb , const double *beta , double *c , const MKL_INT *ldc );
void mkl_ccoomm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8
*val , const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const
MKL_Complex8 *b , const MKL_INT *ldb , const MKL_Complex8 *beta , MKL_Complex8 *c ,
const MKL_INT *ldc );
void mkl_zcoomm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16
*val , const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const
MKL_Complex16 *b , const MKL_INT *ldb , const MKL_Complex16 *beta , MKL_Complex16 *c ,
const MKL_INT *ldc );

Include Files
• mkl.h

Description

This routine is deprecated. Use Use mkl_sparse_?_mmfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?coomm routine performs a matrix-matrix operation defined as

C := alpha*A*B + beta*C
or

C := alpha*AT*B + beta*C,
or

C := alpha*AH*B + beta*C,
where:
alpha and beta are scalars,
B and C are dense matrices, A is an m-by-k sparse matrix in the coordinate format, AT is the transpose of A,
and AH is the conjugate transpose of A.

196
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
NOTE
This routine supports a coordinate format both with one-based indexing and zero-based indexing.

Input Parameters

transa Specifies the operation.

If transa = 'N' or 'n', then C := alpha*A*B + beta*C

If transa = 'T' or 't', then C := alphaATB + beta*C,

If transa = 'C' or 'c', then C := alphaAHB + beta*C.

m Number of rows of the matrix A.

n Number of columns of the matrix C.

k Number of columns of the matrix A.

alpha Specifies the scalar alpha.

val Array of length nnz, contains non-zero elements of the matrix A in the
arbitrary order.
Refer to values array description in Coordinate Format for more details.

rowind Array of length nnz.

colind Array of length nnz.

nnz Specifies the number of non-zero element of the matrix A.

Refer to nnz description in Coordinate Format for more details.

197
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

On entry with transa = 'N' or 'n', the leading k-by-n part of the array b
must contain the matrix B, otherwise the leading m-by-n part of the array b
must contain the matrix B.

ldb Specifies the leading dimension of b for one-based indexing, and the second
dimension of b for zero-based indexing, as declared in the calling
(sub)program.

beta Specifies the scalar beta.

ldc Specifies the leading dimension of c for one-based indexing, and the second
dimension of c for zero-based indexing, as declared in the calling
(sub)program.

Output Parameters

c Overwritten by the matrix (alphaAB + betaC), (alphaAT*B +

beta*C), or (alpha*AH*B + beta*C).

mkl_?csrsm
Solves a system of linear matrix equations for a
sparse matrix in the CSR format (deprecated).

Syntax
void mkl_scsrsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const float
*alpha , const char *matdescra , const float *val , const MKL_INT *indx , const MKL_INT
*pntrb , const MKL_INT *pntre , const float *b , const MKL_INT *ldb , float *c , const
MKL_INT *ldc );
void mkl_dcsrsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
double *alpha , const char *matdescra , const double *val , const MKL_INT *indx , const
MKL_INT *pntrb , const MKL_INT *pntre , const double *b , const MKL_INT *ldb , double
*c , const MKL_INT *ldc );
void mkl_ccsrsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8 *val , const MKL_INT
*indx , const MKL_INT *pntrb , const MKL_INT *pntre , const MKL_Complex8 *b , const
MKL_INT *ldb , MKL_Complex8 *c , const MKL_INT *ldc );
void mkl_zcsrsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16 *val , const MKL_INT
*indx , const MKL_INT *pntrb , const MKL_INT *pntre , const MKL_Complex16 *b , const
MKL_INT *ldb , MKL_Complex16 *c , const MKL_INT *ldc );

Include Files
• mkl.h

Description

198
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
This routine is deprecated. Use mkl_sparse_?_trsmfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?csrsm routine solves a system of linear equations with matrix-matrix operations for a sparse
matrix in the CSR format:

C := alpha*inv(A)*B
or

C := alpha*inv(AT)*B,
where:
alpha is scalar, B and C are dense matrices, A is a sparse upper or lower triangular matrix with unit or non-
unit main diagonal, AT is the transpose of A.

NOTE
This routine supports a CSR format both with one-based indexing and zero-based indexing.

Input Parameters

transa Specifies the system of linear equations.

If transa = 'N' or 'n', then C := alpha*inv(A)*B

If transa = 'T' or 't' or 'C' or 'c', then C := alphainv(AT)B,

m Number of columns of the matrix A.

n Number of columns of the matrix C.

alpha Specifies the scalar alpha.

val Array containing non-zero elements of the matrix A.

For zero-based indexing its length is pntre[m-1] - pntrb[0].

Refer to values array description in CSR Format for more details.

199
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Its length is equal to length of the val array.

Refer to columns array description in CSR Format for more details.

NOTE
Column indices must be sorted in increasing order for each row.

pntrb Array of length m.

This array contains row indices, such that pntrb[i] - pntrb[0] is the
first index of row i in the arrays val and indx.

Refer to pointerb array description in CSR Format for more details.

pntre Array of length m.

For zero-based indexing this array contains row indices, such that
pntre[i] - pntrb[0] - 1 is the last index of row i in the arrays val and
indx.
Refer to pointerE array description in CSR Format for more details.

b Array, size ldb* n for one-based indexing, and (m, ldb) for zero-based
indexing.
On entry the leading m-by-n part of the array b must contain the matrix B.

ldb Specifies the leading dimension of b for one-based indexing, and the second
dimension of b for zero-based indexing, as declared in the calling
(sub)program.

ldc Specifies the leading dimension of c for one-based indexing, and the second
dimension of c for zero-based indexing, as declared in the calling
(sub)program.

Output Parameters

c Array, size ldc by n for one-based indexing, and (m, ldc) for zero-based
indexing.
The leading m-by-n part of the array c contains the output matrix C.

mkl_?cscsm
Solves a system of linear matrix equations for a
sparse matrix in the CSC format (deprecated).

Syntax
void mkl_scscsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const float
*alpha , const char *matdescra , const float *val , const MKL_INT *indx , const MKL_INT
*pntrb , const MKL_INT *pntre , const float *b , const MKL_INT *ldb , float *c , const
MKL_INT *ldc );
void mkl_dcscsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
double *alpha , const char *matdescra , const double *val , const MKL_INT *indx , const
MKL_INT *pntrb , const MKL_INT *pntre , const double *b , const MKL_INT *ldb , double
*c , const MKL_INT *ldc );

200
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void mkl_ccscsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8 *val , const MKL_INT
*indx , const MKL_INT *pntrb , const MKL_INT *pntre , const MKL_Complex8 *b , const
MKL_INT *ldb , MKL_Complex8 *c , const MKL_INT *ldc );
void mkl_zcscsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16 *val , const MKL_INT
*indx , const MKL_INT *pntrb , const MKL_INT *pntre , const MKL_Complex16 *b , const
MKL_INT *ldb , MKL_Complex16 *c , const MKL_INT *ldc );

Include Files
• mkl.h

Description

This routine is deprecated. Use mkl_sparse_?_trsmfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?cscsm routine solves a system of linear equations with matrix-matrix operations for a sparse
matrix in the CSC format:

C := alpha*inv(A)*B
or

C := alpha*inv(AT)*B,
where:
alpha is scalar, B and C are dense matrices, A is a sparse upper or lower triangular matrix with unit or non-
unit main diagonal, AT is the transpose of A.

NOTE
This routine supports a CSC format both with one-based indexing and zero-based indexing.

Input Parameters

transa Specifies the system of equations.

If transa = 'N' or 'n', then C := alpha*inv(A)*B

If transa = 'T' or 't' or 'C' or 'c', then C := alphainv(AT)B,

m Number of columns of the matrix A.

n Number of columns of the matrix C.

alpha Specifies the scalar alpha.

val Array containing non-zero elements of the matrix A.

For zero-based indexing its length is pntre[m] - pntrb[0].

201
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Refer to values array description in CSC Format for more details.

indx For one-based indexing, array containing the row indices plus one for each
non-zero element of the matrix A. For zero-based indexing, array containing
the row indices for each non-zero element of the matrix A.
Refer to rows array description in CSC Format for more details.

NOTE
Row indices must be sorted in increasing order for each column.

pntrb Array of length m.

This array contains column indices, such that pntrb[I] - pntrb[0] is the
first index of column I in the arrays val and indx.

Refer to pointerb array description in CSC Format for more details.

pntre Array of length m.

This array contains column indices, such that pntre[I] - pntrb[1]-1 is

the last index of column I in the arrays val and indx.

Refer to pointerE array description in CSC Format for more details.

b Array, size ldb by n for one-based indexing, and (m, ldb) for zero-based
indexing.
On entry the leading m-by-n part of the array b must contain the matrix B.

ldb Specifies the leading dimension of b for one-based indexing, and the second
dimension of b for zero-based indexing, as declared in the calling
(sub)program.

ldc Specifies the leading dimension of c for one-based indexing, and the second
dimension of c for zero-based indexing, as declared in the calling
(sub)program.

Output Parameters

c Array, size ldc by n for one-based indexing, and (m, ldc) for zero-based
indexing.
The leading m-by-n part of the array c contains the output matrix C.

202
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
mkl_?coosm
Solves a system of linear matrix equations for a
sparse matrix in the coordinate format (deprecated).

Syntax
void mkl_scoosm (const char *transa , const MKL_INT *m , const MKL_INT *n , const float
*alpha , const char *matdescra , const float *val , const MKL_INT *rowind , const
MKL_INT *colind , const MKL_INT *nnz , const float *b , const MKL_INT *ldb , float *c ,
const MKL_INT *ldc );
void mkl_dcoosm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
double *alpha , const char *matdescra , const double *val , const MKL_INT *rowind ,
const MKL_INT *colind , const MKL_INT *nnz , const double *b , const MKL_INT *ldb ,
double *c , const MKL_INT *ldc );
void mkl_ccoosm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8 *val , const MKL_INT
*rowind , const MKL_INT *colind , const MKL_INT *nnz , const MKL_Complex8 *b , const
MKL_INT *ldb , MKL_Complex8 *c , const MKL_INT *ldc );
void mkl_zcoosm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16 *val , const MKL_INT
*rowind , const MKL_INT *colind , const MKL_INT *nnz , const MKL_Complex16 *b , const
MKL_INT *ldb , MKL_Complex16 *c , const MKL_INT *ldc );

Include Files
• mkl.h

Description

This routine is deprecated. Use mkl_sparse_?_trsmfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?coosm routine solves a system of linear equations with matrix-matrix operations for a sparse
matrix in the coordinate format:

C := alpha*inv(A)*B
or

C := alpha*inv(AT)*B,
where:
alpha is scalar, B and C are dense matrices, A is a sparse upper or lower triangular matrix with unit or non-
unit main diagonal, AT is the transpose of A.

NOTE
This routine supports a coordinate format both with one-based indexing and zero-based indexing.

Input Parameters

transa Specifies the system of linear equations.

If transa = 'N' or 'n', then the matrix-matrix product is computed as
C := alpha*inv(A)*B

203
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If transa = 'T' or 't' or 'C' or 'c', then the matrix-vector product is

computed as C := alpha*inv(AT)*B,

m Number of rows of the matrix A.

n Number of columns of the matrix C.

alpha Specifies the scalar alpha.

val Array of length nnz, contains non-zero elements of the matrix A in the
arbitrary order.
Refer to values array description in Coordinate Format for more details.

rowind Array of length nnz.

colind Array of length nnz.

For one-based indexing, contains the column indices plus one for each non-
zero element of the matrix A
For zero-based indexing, contains the row indices for each non-zero
element of the matrix A
Refer to columns array description in Coordinate Format for more details.

nnz Specifies the number of non-zero element of the matrix A.

Refer to nnz description in Coordinate Format for more details.

b Array, size ldb by n for one-based indexing, and (m, ldb) for zero-based
indexing.
Before entry the leading m-by-n part of the array b must contain the matrix
B.

ldb Specifies the leading dimension of b for one-based indexing, and the second
dimension of b for zero-based indexing, as declared in the calling
(sub)program.

ldc Specifies the leading dimension of c for one-based indexing, and the second
dimension of c for zero-based indexing, as declared in the calling
(sub)program.

Output Parameters

c Array, size ldc by n for one-based indexing, and (m, ldc) for zero-based
indexing.

204
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The leading m-by-n part of the array c contains the output matrix C.

mkl_?bsrsm
Solves a system of linear matrix equations for a
sparse matrix in the BSR format (deprecated).

Syntax
void mkl_sbsrsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *lb , const float *alpha , const char *matdescra , const float *val , const
MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const float *b , const
MKL_INT *ldb , float *c , const MKL_INT *ldc );
void mkl_dbsrsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *lb , const double *alpha , const char *matdescra , const double *val , const
MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const double *b , const
MKL_INT *ldb , double *c , const MKL_INT *ldc );
void mkl_cbsrsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *lb , const MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8
*val , const MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const
MKL_Complex8 *b , const MKL_INT *ldb , MKL_Complex8 *c , const MKL_INT *ldc );
void mkl_zbsrsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *lb , const MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16
*val , const MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const
MKL_Complex16 *b , const MKL_INT *ldb , MKL_Complex16 *c , const MKL_INT *ldc );

Include Files
• mkl.h

Description

This routine is deprecated. Use mkl_sparse_?_trsmfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?bsrsm routine solves a system of linear equations with matrix-matrix operations for a sparse
matrix in the BSR format:

C := alpha*inv(A)*B
or

C := alpha*inv(AT)*B,
where:
alpha is scalar, B and C are dense matrices, A is a sparse upper or lower triangular matrix with unit or non-
unit main diagonal, AT is the transpose of A.

NOTE
This routine supports a BSR format both with one-based indexing and zero-based indexing.

Input Parameters

205
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

transa Specifies the operation.

If transa = 'N' or 'n', then the matrix-matrix product is computed as
C := alpha*inv(A)*B.
If transa = 'T' or 't' or 'C' or 'c', then the matrix-vector product is
computed as C := alpha*inv(AT)*B.

m Number of block columns of the matrix A.

n Number of columns of the matrix C.

lb Size of the block in the matrix A.

alpha Specifies the scalar alpha.

pntrb Array of length m.

This array contains row indices, such that pntrb[i] - pntrb[0] is the
first index of block row i in the array indx.

Refer to pointerB array description in BSR Format for more details.

pntre Array of length m.

This array contains row indices, such that pntre[i] - pntrb[0] - 1 is

the last index of block row i in the arrays val and indx.

Refer to pointerE array description in BSR Format for more details.

b Array, size ldb* n for one-based indexing, size m* ldb for zero-based
indexing.
On entry the leading m-by-n part of the array b must contain the matrix B.

206
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ldb Specifies the leading dimension (in blocks) of b as declared in the calling
(sub)program.

ldc Specifies the leading dimension (in blocks) of c as declared in the calling
(sub)program.

Output Parameters

c Array, size ldc* n for one-based indexing, size m* ldc for zero-based
indexing.
The leading m-by-n part of the array c contains the output matrix C.

mkl_?diamv
Computes matrix - vector product for a sparse matrix
in the diagonal format with one-based indexing
(deprecated).

Syntax
void mkl_sdiamv (const char *transa , const MKL_INT *m , const MKL_INT *k , const float
*alpha , const char *matdescra , const float *val , const MKL_INT *lval , const MKL_INT
*idiag , const MKL_INT *ndiag , const float *x , const float *beta , float *y );
void mkl_ddiamv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
double *alpha , const char *matdescra , const double *val , const MKL_INT *lval , const
MKL_INT *idiag , const MKL_INT *ndiag , const double *x , const double *beta , double
*y );
void mkl_cdiamv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8 *val , const MKL_INT
*lval , const MKL_INT *idiag , const MKL_INT *ndiag , const MKL_Complex8 *x , const
MKL_Complex8 *beta , MKL_Complex8 *y );
void mkl_zdiamv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16 *val , const MKL_INT
*lval , const MKL_INT *idiag , const MKL_INT *ndiag , const MKL_Complex16 *x , const
MKL_Complex16 *beta , MKL_Complex16 *y );

Include Files
• mkl.h

Description

This routine is deprecated, but no replacement is available yet in the Inspector-Executor Sparse BLAS API
interfaces. You can continue using this routine until a replacement is provided and this can be fully removed.
The mkl_?diamv routine performs a matrix-vector operation defined as

y := alpha*A*x + beta*y
or

y := alpha*AT*x + beta*y,
where:
alpha and beta are scalars,

207
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

x and y are vectors,

A is an m-by-k sparse matrix stored in the diagonal format, AT is the transpose of A.

NOTE
This routine supports only one-based indexing of the input arrays.

Input Parameters

transa Specifies the operation.

If transa = 'N' or 'n', then y := alpha*A*x + beta*y,

If transa = 'T' or 't' or 'C' or 'c', then y := alphaATx + beta*y.

m Number of rows of the matrix A.

k Number of columns of the matrix A.

alpha Specifies the scalar alpha.

val Two-dimensional array of size lval by ndiag, contains non-zero diagonals

of the matrix A. Refer to values array description in Diagonal Storage
Scheme for more details.

lval Leading dimension of val, lval≥m. Refer to lval description in Diagonal

Storage Scheme for more details.

idiag Array of length ndiag, contains the distances between main diagonal and
each non-zero diagonals in the matrix A.
Refer to distance array description in Diagonal Storage Scheme for more
details.

ndiag Specifies the number of non-zero diagonals of the matrix A.

x Array, size at least k if transa = 'N' or 'n', and at least m otherwise. On

entry, the array x must contain the vector x.

beta Specifies the scalar beta.

y Array, size at least m if transa = 'N' or 'n', and at least k otherwise. On

entry, the array y must contain the vector y.

Output Parameters

y Overwritten by the updated vector y.

208
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
mkl_?skymv
Computes matrix - vector product for a sparse matrix
in the skyline storage format with one-based indexing
(deprecated).

Syntax
void mkl_sskymv (const char *transa , const MKL_INT *m , const MKL_INT *k , const float
*alpha , const char *matdescra , const float *val , const MKL_INT *pntr , const float
*x , const float *beta , float *y );
void mkl_dskymv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
double *alpha , const char *matdescra , const double *val , const MKL_INT *pntr , const
double *x , const double *beta , double *y );
void mkl_cskymv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8 *val , const MKL_INT
*pntr , const MKL_Complex8 *x , const MKL_Complex8 *beta , MKL_Complex8 *y );
void mkl_zskymv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16 *val , const MKL_INT
*pntr , const MKL_Complex16 *x , const MKL_Complex16 *beta , MKL_Complex16 *y );

Include Files
• mkl.h

Description

This routine is deprecated, but no replacement is available yet in the Inspector-Executor Sparse BLAS API
interfaces. You can continue using this routine until a replacement is provided and this can be fully removed.
The mkl_?skymv routine performs a matrix-vector operation defined as

y := alpha*A*x + beta*y
or

y := alpha*AT*x + beta*y,
where:
alpha and beta are scalars,
x and y are vectors,
A is an m-by-k sparse matrix stored using the skyline storage scheme, AT is the transpose of A.

NOTE
This routine supports only one-based indexing of the input arrays.

Input Parameters

transa Specifies the operation.

If transa = 'N' or 'n', then y := alpha*A*x + beta*y

If transa = 'T' or 't' or 'C' or 'c', then y := alphaATx + beta*y,

m Number of rows of the matrix A.

209
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

k Number of columns of the matrix A.

alpha Specifies the scalar alpha.

NOTE
General matrices (matdescra[0]='G') is not supported.

val Array containing the set of elements of the matrix A in the skyline profile
form.
If matdescrsa[1]= 'L', then val contains elements from the low triangle
of the matrix A.
If matdescrsa[1]= 'U', then val contains elements from the upper
triangle of the matrix A.
Refer to values array description in Skyline Storage Scheme for more
details.

pntr Array of length (m + 1) for lower triangle, and (k + 1) for upper triangle.

It contains the indices specifying in the val the positions of the first
element in each row (column) of the matrix A. Refer to pointers array
description in Skyline Storage Scheme for more details.

x Array, size at least k if transa = 'N' or 'n' and at least m otherwise. On

entry, the array x must contain the vector x.

beta Specifies the scalar beta.

y Array, size at least m if transa = 'N' or 'n' and at least k otherwise. On

entry, the array y must contain the vector y.

Output Parameters

y Overwritten by the updated vector y.

mkl_?diasv
Solves a system of linear equations for a sparse
matrix in the diagonal format with one-based indexing
(deprecated).

Syntax
void mkl_sdiasv (const char *transa , const MKL_INT *m , const float *alpha , const
char *matdescra , const float *val , const MKL_INT *lval , const MKL_INT *idiag , const
MKL_INT *ndiag , const float *x , float *y );
void mkl_ddiasv (const char *transa , const MKL_INT *m , const double *alpha , const
char *matdescra , const double *val , const MKL_INT *lval , const MKL_INT *idiag ,
const MKL_INT *ndiag , const double *x , double *y );

210
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void mkl_cdiasv (const char *transa , const MKL_INT *m , const MKL_Complex8 *alpha ,
const char *matdescra , const MKL_Complex8 *val , const MKL_INT *lval , const MKL_INT
*idiag , const MKL_INT *ndiag , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_zdiasv (const char *transa , const MKL_INT *m , const MKL_Complex16 *alpha ,
const char *matdescra , const MKL_Complex16 *val , const MKL_INT *lval , const MKL_INT
*idiag , const MKL_INT *ndiag , const MKL_Complex16 *x , MKL_Complex16 *y );

Include Files
• mkl.h

Description

This routine is deprecated, but no replacement is available yet in the Inspector-Executor Sparse BLAS API
interfaces. You can continue using this routine until a replacement is provided and this can be fully removed.
The mkl_?diasv routine solves a system of linear equations with matrix-vector operations for a sparse
matrix stored in the diagonal format:

y := alpha*inv(A)*x
or

y := alpha*inv(AT)* x,
where:
alpha is scalar, x and y are vectors, A is a sparse upper or lower triangular matrix with unit or non-unit main
diagonal, AT is the transpose of A.

NOTE
This routine supports only one-based indexing of the input arrays.

Input Parameters

transa Specifies the system of linear equations.

If transa = 'N' or 'n', then y := alpha*inv(A)*x

If transa = 'T' or 't' or 'C' or 'c', then y := alphainv(AT)x,

m Number of rows of the matrix A.

alpha Specifies the scalar alpha.

val Two-dimensional array of size lval by ndiag, contains non-zero diagonals

of the matrix A. Refer to values array description in Diagonal Storage
Scheme for more details.

lval Leading dimension of val, lval≥m. Refer to lval description in Diagonal

Storage Scheme for more details.

211
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

idiag Array of length ndiag, contains the distances between main diagonal and
each non-zero diagonals in the matrix A.

NOTE
All elements of this array must be sorted in increasing order.

Refer to distance array description in Diagonal Storage Scheme for more

details.

ndiag Specifies the number of non-zero diagonals of the matrix A.

x Array, size at least m.

On entry, the array x must contain the vector x. The elements are accessed
with unit increment.

y Array, size at least m.

On entry, the array y must contain the vector y. The elements are accessed
with unit increment.

Output Parameters

y Contains solution vector x.

mkl_?skysv
Solves a system of linear equations for a sparse
matrix in the skyline format with one-based indexing
(deprecated).

Syntax
void mkl_sskysv (const char *transa , const MKL_INT *m , const float *alpha , const
char *matdescra , const float *val , const MKL_INT *pntr , const float *x , float *y );
void mkl_dskysv (const char *transa , const MKL_INT *m , const double *alpha , const
char *matdescra , const double *val , const MKL_INT *pntr , const double *x , double
*y );
void mkl_cskysv (const char *transa , const MKL_INT *m , const MKL_Complex8 *alpha ,
const char *matdescra , const MKL_Complex8 *val , const MKL_INT *pntr , const
MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_zskysv (const char *transa , const MKL_INT *m , const MKL_Complex16 *alpha ,
const char *matdescra , const MKL_Complex16 *val , const MKL_INT *pntr , const
MKL_Complex16 *x , MKL_Complex16 *y );

Include Files
• mkl.h

Description

212
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The mkl_?skysv routine solves a system of linear equations with matrix-vector operations for a sparse
matrix in the skyline storage format:

y := alpha*inv(A)*x
or

y := alpha*inv(AT)*x,
where:
alpha is scalar, x and y are vectors, A is a sparse upper or lower triangular matrix with unit or non-unit main
diagonal, AT is the transpose of A.

NOTE
This routine supports only one-based indexing of the input arrays.

Input Parameters

transa Specifies the system of linear equations.

If transa = 'N' or 'n', then y := alpha*inv(A)*x

If transa = 'T' or 't' or 'C' or 'c', then y := alphainv(AT) x,

m Number of rows of the matrix A.

alpha Specifies the scalar alpha.

NOTE
General matrices (matdescra[0]='G') is not supported.

val Array containing the set of elements of the matrix A in the skyline profile
form.
If matdescra[2]= 'L', then val contains elements from the low triangle
of the matrix A.
If matdescsa[2]= 'U', then val contains elements from the upper
triangle of the matrix A.
Refer to values array description in Skyline Storage Scheme for more
details.

pntr Array of length (m + 1) for lower triangle, and (k + 1) for upper triangle.

It contains the indices specifying in the val the positions of the first
element in each row (column) of the matrix A. Refer to pointers array
description in Skyline Storage Scheme for more details.

x Array, size at least m.

213
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

On entry, the array x must contain the vector x. The elements are accessed
with unit increment.

y Array, size at least m.

On entry, the array y must contain the vector y. The elements are accessed
with unit increment.

Output Parameters

y Contains solution vector x.

mkl_?diamm
Computes matrix-matrix product of a sparse matrix
stored in the diagonal format with one-based indexing
(deprecated).

Syntax
void mkl_sdiamm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const float *alpha , const char *matdescra , const float *val , const
MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const float *b , const
MKL_INT *ldb , const float *beta , float *c , const MKL_INT *ldc );
void mkl_ddiamm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const double *alpha , const char *matdescra , const double *val , const
MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const double *b , const
MKL_INT *ldb , const double *beta , double *c , const MKL_INT *ldc );
void mkl_cdiamm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8
*val , const MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const
MKL_Complex8 *b , const MKL_INT *ldb , const MKL_Complex8 *beta , MKL_Complex8 *c ,
const MKL_INT *ldc );
void mkl_zdiamm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16
*val , const MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const
MKL_Complex16 *b , const MKL_INT *ldb , const MKL_Complex16 *beta , MKL_Complex16 *c ,
const MKL_INT *ldc );

Include Files
• mkl.h

Description

This routine is deprecated, but no replacement is available yet in the Inspector-Executor Sparse BLAS API
interfaces. You can continue using this routine until a replacement is provided and this can be fully removed.
The mkl_?diamm routine performs a matrix-matrix operation defined as

C := alpha*A*B + beta*C
or

C := alpha*AT*B + beta*C,

214
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
or

C := alpha*AH*B + beta*C,
where:
alpha and beta are scalars,
B and C are dense matrices, A is an m-by-k sparse matrix in the diagonal format, AT is the transpose of A,
and AH is the conjugate transpose of A.

NOTE
This routine supports only one-based indexing of the input arrays.

Input Parameters

transa Specifies the operation.

If transa = 'N' or 'n', then C := alpha*A*B + beta*C,

If transa = 'T' or 't', then C := alphaATB + beta*C,

If transa = 'C' or 'c', then C := alphaAHB + beta*C.

m Number of rows of the matrix A.

n Number of columns of the matrix C.

k Number of columns of the matrix A.

alpha Specifies the scalar alpha.

val Two-dimensional array of size lval by ndiag, contains non-zero diagonals

of the matrix A. Refer to values array description in Diagonal Storage
Scheme for more details.

lval Leading dimension of val, lval≥m. Refer to lval description in Diagonal

Storage Scheme for more details.

idiag Array of length ndiag, contains the distances between main diagonal and
each non-zero diagonals in the matrix A.
Refer to distance array description in Diagonal Storage Scheme for more
details.

ndiag Specifies the number of non-zero diagonals of the matrix A.

b Array, size ldb* n.

On entry with transa = 'N' or 'n', the leading k-by-n part of the array b
must contain the matrix B, otherwise the leading m-by-n part of the array b
must contain the matrix B.

215
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ldb Specifies the leading dimension of b as declared in the calling

(sub)program.

beta Specifies the scalar beta.

c Array, size ldc by n.

On entry, the leading m-by-n part of the array c must contain the matrix C,
otherwise the leading k-by-n part of the array c must contain the matrix C.

ldc Specifies the leading dimension of c as declared in the calling

(sub)program.

Output Parameters

c Overwritten by the matrix (alphaAB + betaC), (alphaAT*B +

beta*C), or (alpha*AH*B + beta*C).

mkl_?skymm
Computes matrix-matrix product of a sparse matrix
stored using the skyline storage scheme with one-
based indexing (deprecated).

Syntax
void mkl_sskymm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const float *alpha , const char *matdescra , const float *val , const
MKL_INT *pntr , const float *b , const MKL_INT *ldb , const float *beta , float *c ,
const MKL_INT *ldc );
void mkl_dskymm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const double *alpha , const char *matdescra , const double *val , const
MKL_INT *pntr , const double *b , const MKL_INT *ldb , const double *beta , double *c ,
const MKL_INT *ldc );
void mkl_cskymm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8
*val , const MKL_INT *pntr , const MKL_Complex8 *b , const MKL_INT *ldb , const
MKL_Complex8 *beta , MKL_Complex8 *c , const MKL_INT *ldc );
void mkl_zskymm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16
*val , const MKL_INT *pntr , const MKL_Complex16 *b , const MKL_INT *ldb , const
MKL_Complex16 *beta , MKL_Complex16 *c , const MKL_INT *ldc );

Include Files
• mkl.h

Description

This routine is deprecated, but no replacement is available yet in the Inspector-Executor Sparse BLAS API
interfaces. You can continue using this routine until a replacement is provided and this can be fully removed.
The mkl_?skymm routine performs a matrix-matrix operation defined as

C := alpha*A*B + beta*C

216
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
or

C := alpha*AT*B + beta*C,
or

C := alpha*AH*B + beta*C,
where:
alpha and beta are scalars,
B and C are dense matrices, A is an m-by-k sparse matrix in the skyline storage format, AT is the transpose
of A, and AH is the conjugate transpose of A.

NOTE
This routine supports only one-based indexing of the input arrays.

Input Parameters

transa Specifies the operation.

If transa = 'N' or 'n', then C := alpha*A*B + beta*C,

If transa = 'T' or 't', then C := alphaATB + beta*C,

If transa = 'C' or 'c', then C := alphaAHB + beta*C.

m Number of rows of the matrix A.

n Number of columns of the matrix C.

k Number of columns of the matrix A.

alpha Specifies the scalar alpha.

NOTE
General matrices (matdescra [0]='G') is not supported.

val Array containing the set of elements of the matrix A in the skyline profile
form.
If matdescrsa[2]= 'L', then val contains elements from the low triangle
of the matrix A.
If matdescrsa[2]= 'U', then val contains elements from the upper
triangle of the matrix A.
Refer to values array description in Skyline Storage Scheme for more
details.

pntr Array of length (m + 1) for lower triangle, and (k + 1) for upper triangle.

217
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

It contains the indices specifying the positions of the first element of the
matrix A in each row (for the lower triangle) or column (for upper triangle)
in the val array such that val[pntr[i] - 1] is the first element in row or
column i + 1. Refer to pointers array description in Skyline Storage
Scheme for more details.

b Array, size ldb* n.

On entry with transa = 'N' or 'n', the leading k-by-n part of the array b
must contain the matrix B, otherwise the leading m-by-n part of the array b
must contain the matrix B.

ldb Specifies the leading dimension of b as declared in the calling

(sub)program.

beta Specifies the scalar beta.

c Array, size ldc by n.

On entry, the leading m-by-n part of the array c must contain the matrix C,
otherwise the leading k-by-n part of the array c must contain the matrix C.

ldc Specifies the leading dimension of c as declared in the calling

(sub)program.

Output Parameters

c Overwritten by the matrix (alphaAB + betaC), (alphaAT*B +

beta*C), or (alpha*AH*B + beta*C).

mkl_?diasm
Solves a system of linear matrix equations for a
sparse matrix in the diagonal format with one-based
indexing (deprecated).

Syntax
void mkl_sdiasm (const char *transa , const MKL_INT *m , const MKL_INT *n , const float
*alpha , const char *matdescra , const float *val , const MKL_INT *lval , const MKL_INT
*idiag , const MKL_INT *ndiag , const float *b , const MKL_INT *ldb , float *c , const
MKL_INT *ldc );
void mkl_ddiasm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
double *alpha , const char *matdescra , const double *val , const MKL_INT *lval , const
MKL_INT *idiag , const MKL_INT *ndiag , const double *b , const MKL_INT *ldb , double
*c , const MKL_INT *ldc );
void mkl_cdiasm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8 *val , const MKL_INT
*lval , const MKL_INT *idiag , const MKL_INT *ndiag , const MKL_Complex8 *b , const
MKL_INT *ldb , MKL_Complex8 *c , const MKL_INT *ldc );
void mkl_zdiasm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16 *val , const MKL_INT
*lval , const MKL_INT *idiag , const MKL_INT *ndiag , const MKL_Complex16 *b , const
MKL_INT *ldb , MKL_Complex16 *c , const MKL_INT *ldc );

218
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h

Description

This routine is deprecated, but no replacement is available yet in the Inspector-Executor Sparse BLAS API
interfaces. You can continue using this routine until a replacement is provided and this can be fully removed.
The mkl_?diasm routine solves a system of linear equations with matrix-matrix operations for a sparse
matrix in the diagonal format:

C := alpha*inv(A)*B
or

C := alpha*inv(AT)*B,
where:
alpha is scalar, B and C are dense matrices, A is a sparse upper or lower triangular matrix with unit or non-
unit main diagonal, AT is the transpose of A.

NOTE
This routine supports only one-based indexing of the input arrays.

Input Parameters

transa Specifies the system of linear equations.

If transa = 'N' or 'n', then C := alpha*inv(A)*B,

If transa = 'T' or 't' or 'C' or 'c', then C := alphainv(AT)B.

m Number of rows of the matrix A.

n Number of columns of the matrix C.

alpha Specifies the scalar alpha.

val Two-dimensional array of size lval by ndiag, contains non-zero diagonals

of the matrix A. Refer to values array description in Diagonal Storage
Scheme for more details.

lval Leading dimension of val, lval≥m. Refer to lval description in Diagonal

Storage Scheme for more details.

idiag Array of length ndiag, contains the distances between main diagonal and
each non-zero diagonals in the matrix A.

219
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

NOTE
All elements of this array must be sorted in increasing order.

Refer to distance array description in Diagonal Storage Scheme for more

details.

ndiag Specifies the number of non-zero diagonals of the matrix A.

b Array, size ldb* n.

On entry the leading m-by-n part of the array b must contain the matrix B.

ldb Specifies the leading dimension of b as declared in the calling

(sub)program.

ldc Specifies the leading dimension of c as declared in the calling

(sub)program.

Output Parameters

c Array, size ldc by n.

The leading m-by-n part of the array c contains the matrix C.

mkl_?skysm
Solves a system of linear matrix equations for a
sparse matrix stored using the skyline storage scheme
with one-based indexing (deprecated).

Syntax
void mkl_sskysm (const char *transa , const MKL_INT *m , const MKL_INT *n , const float
*alpha , const char *matdescra , const float *val , const MKL_INT *pntr , const float
*b , const MKL_INT *ldb , float *c , const MKL_INT *ldc );
void mkl_dskysm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
double *alpha , const char *matdescra , const double *val , const MKL_INT *pntr , const
double *b , const MKL_INT *ldb , double *c , const MKL_INT *ldc );
void mkl_cskysm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8 *val , const MKL_INT
*pntr , const MKL_Complex8 *b , const MKL_INT *ldb , MKL_Complex8 *c , const MKL_INT
*ldc );
void mkl_zskysm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16 *val , const MKL_INT
*pntr , const MKL_Complex16 *b , const MKL_INT *ldb , MKL_Complex16 *c , const MKL_INT
*ldc );

Include Files
• mkl.h

Description

220
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
This routine is deprecated, but no replacement is available yet in the Inspector-Executor Sparse BLAS API
interfaces. You can continue using this routine until a replacement is provided and this can be fully removed.
The mkl_?skysm routine solves a system of linear equations with matrix-matrix operations for a sparse
matrix in the skyline storage format:

C := alpha*inv(A)*B
or

C := alpha*inv(AT)*B,
where:
alpha is scalar, B and C are dense matrices, A is a sparse upper or lower triangular matrix with unit or non-
unit main diagonal, AT is the transpose of A.

NOTE
This routine supports only one-based indexing of the input arrays.

Input Parameters

transa Specifies the system of linear equations.

If transa = 'N' or 'n', then C := alpha*inv(A)*B,

If transa = 'T' or 't' or 'C' or 'c', then C := alphainv(AT)B,

m Number of rows of the matrix A.

n Number of columns of the matrix C.

alpha Specifies the scalar alpha.

NOTE
General matrices (matdescra[0]='G') is not supported.

pntr Array of length (m + 1) for lower triangle, and (n + 1) for upper triangle.

221
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

b Array, size ldb* n.

On entry the leading m-by-n part of the array b must contain the matrix B.

ldb Specifies the leading dimension of b as declared in the calling

(sub)program.

ldc Specifies the leading dimension of c as declared in the calling

(sub)program.

Output Parameters

c Array, size ldc by n.

The leading m-by-n part of the array c contains the matrix C.

mkl_?dnscsr
Convert a sparse matrix in uncompressed
representation to the CSR format and vice versa
(deprecated).

Syntax
void mkl_ddnscsr (const MKL_INT *job , const MKL_INT *m , const MKL_INT *n , double
*adns , const MKL_INT *lda , double *acsr , MKL_INT *ja , MKL_INT *ia , MKL_INT *info );
void mkl_sdnscsr (const MKL_INT *job , const MKL_INT *m , const MKL_INT *n , float
*adns , const MKL_INT *lda , float *acsr , MKL_INT *ja , MKL_INT *ia , MKL_INT *info );
void mkl_cdnscsr (const MKL_INT *job , const MKL_INT *m , const MKL_INT *n ,
MKL_Complex8 *adns , const MKL_INT *lda , MKL_Complex8 *acsr , MKL_INT *ja , MKL_INT
*ia , MKL_INT *info );
void mkl_zdnscsr (const MKL_INT *job , const MKL_INT *m , const MKL_INT *n ,
MKL_Complex16 *adns , const MKL_INT *lda , MKL_Complex16 *acsr , MKL_INT *ja , MKL_INT
*ia , MKL_INT *info );

Include Files
• mkl.h

Description

This routine is deprecated, but no replacement is available yet in the Inspector-Executor Sparse BLAS API
interfaces. Either write your own (see the examples/c/sparse_blas/source/sparse_converters.c
example for hints) or continue using this routine until a replacement is provided and this can be fully
removed.
This routine converts a sparse matrix A between formats: stored as a rectangular array (dense
representation) and stored using compressed sparse row (CSR) format (3-array variation).

222
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

job Array, contains the following conversion parameters:

• job[0]: Conversion type.

•If job[0]=0, the rectangular matrix A is converted to the CSR
format;
• if job[0]=1, the rectangular matrix A is restored from the CSR
format.
• job[1]: index base for the rectangular matrix A.
•If job[1]=0, zero-based indexing for the rectangular matrix A is
used;
• if job[1]=1, one-based indexing for the rectangular matrix A is used.
• job[2]: Index base for the matrix in CSR format.
•If job[2]=0, zero-based indexing for the matrix in CSR format is
used;
• if job[2]=1, one-based indexing for the matrix in CSR format is
used.
• job[3]: Portion of matrix.
•If job[3]=0, adns is a lower triangular part of matrix A;
•If job[3]=1, adns is an upper triangular part of matrix A;
•If job[3]=2, adns is a whole matrix A.
• job[4]=nzmax: maximum number of the non-zero elements allowed if
job[0]=0.
• job[5]: job indicator for conversion to CSR format.
• If job[5]=0, only array ia is generated for the output storage.
• If job[5]>0, arrays acsr, ia, ja are generated for the output
storage.

m Number of rows of the matrix A.

n Number of columns of the matrix A.

adns (input/output)
If the conversion type is from uncompressed to CSR, on input adns
contains an uncompressed (dense) representation of matrix A.

lda Specifies the leading dimension of adns as declared in the calling

(sub)program.
For zero-based indexing of A, lda must be at least max(1, n).

For one-based indexing of A, lda must be at least max(1, m).

acsr (input/output)
If conversion type is from CSR to uncompressed, on input acsr contains
the non-zero elements of the matrix A. Its length is equal to the number of
non-zero elements in the matrix A. Refer to values array description in
Sparse Matrix Storage Formats for more details.

223
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ja (input/output). If conversion type is from CSR to uncompressed, on input

for zero-based indexing of A ja contains the column indices plus one for
each non-zero element of the matrix A. For one-based indexing of A ja
contains the column indices for each non-zero element of the matrix A.
Its length is equal to the length of the array acsr. Refer to columns array
description in Sparse Matrix Storage Formats for more details.

ia (input/output). Array of length m + 1.

If conversion type is from CSR to uncompressed, on input for zero-based

indexing of A ia contains indices of elements in the array acsr, such that
ia[i] - 1 is the index in the array acsr of the first non-zero element from
the row i. For one-based indexing of A ia contains indices of elements in
the array acsr, such that ia[i] is the index in the array acsr of the first
non-zero element from the row i.
The value ofia[m] - ia[0] is equal to the number of non-zeros. Refer to
rowIndex array description in Sparse Matrix Storage Formats for more
details.

Output Parameters

adns If conversion type is from CSR to uncompressed, on output adns contains

the uncompressed (dense) representation of matrix A.

acsr, ja, ia If conversion type is from uncompressed to CSR, on output acsr, ja, and
ia contain the compressed sparse row (CSR) format (3-array variation) of
matrix A (see Sparse Matrix Storage Formats for a description of the
storage format).

info Integer info indicator only for restoring the matrix A from the CSR format.
If info=0, the execution is successful.

If info=i, the routine is interrupted processing the i-th row because there
is no space in the arrays acsr and ja according to the value nzmax.

mkl_?csrcoo
Converts a sparse matrix in the CSR format to the
coordinate format and vice versa (deprecated).

Syntax
void mkl_scsrcoo (const MKL_INT *job , const MKL_INT *n , float *acsr , MKL_INT *ja ,
MKL_INT *ia , MKL_INT *nnz , float *acoo , MKL_INT *rowind , MKL_INT *colind , MKL_INT
*info );
void mkl_dcsrcoo (const MKL_INT *job , const MKL_INT *n , double *acsr , MKL_INT *ja ,
MKL_INT *ia , MKL_INT *nnz , double *acoo , MKL_INT *rowind , MKL_INT *colind , MKL_INT
*info );
void mkl_ccsrcoo (const MKL_INT *job , const MKL_INT *n , MKL_Complex8 *acsr , MKL_INT
*ja , MKL_INT *ia , MKL_INT *nnz , MKL_Complex8 *acoo , MKL_INT *rowind , MKL_INT
*colind , MKL_INT *info );
void mkl_zcsrcoo (const MKL_INT *job , const MKL_INT *n , MKL_Complex16 *acsr , MKL_INT
*ja , MKL_INT *ia , MKL_INT *nnz , MKL_Complex16 *acoo , MKL_INT *rowind , MKL_INT
*colind , MKL_INT *info );

224
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h

Description

This routine is deprecated. Use the matrix manipulation routinesfrom the Intel® oneAPI Math Kernel Library
(oneMKL) Inspector-executor Sparse BLAS interface instead.
This routine converts a sparse matrix A stored in the compressed sparse row (CSR) format (3-array
variation) to coordinate format and vice versa.

Input Parameters

job Array, contains the following conversion parameters:

job[0]
If job[0]=0, the matrix in the CSR format is converted to the coordinate
format;
if job[0]=1, the matrix in the coordinate format is converted to the CSR
format.
if job[0]=2, the matrix in the coordinate format is converted to the CSR
format, and the column indices in CSR representation are sorted in the
increasing order within each row.
job[1]
If job[1]=0, zero-based indexing for the matrix in CSR format is used;

if job[1]=1, one-based indexing for the matrix in CSR format is used.

job[2]
If job[2]=0, zero-based indexing for the matrix in coordinate format is
used;
if job[2]=1, one-based indexing for the matrix in coordinate format is
used.
job[4]
job[4]=nzmax - maximum number of the non-zero elements allowed if
job[0]=0.
job[5] - job indicator.
For conversion to the coordinate format:
If job[5]=1, only array rowind is filled in for the output storage.

If job[5]=2, arrays rowind, colind are filled in for the output storage.

If job[5]=3, all arrays rowind, colind, acoo are filled in for the output
storage.
For conversion to the CSR format:
If job[5]=0, all arrays acsr, ja, ia are filled in for the output storage.

If job[5]=1, only array ia is filled in for the output storage.

225
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If job[5]=2, then it is assumed that the routine already has been called
with the job[5]=1, and the user allocated the required space for storing
the output arrays acsr and ja.

n Dimension of the matrix A.

nnz Specifies the number of non-zero elements of the matrix A for job[0]≠0.

Refer to nnz description in Coordinate Format for more details.

acsr (input/output)
Array containing non-zero elements of the matrix A. Its length is equal to
the number of non-zero elements in the matrix A. Refer to values array
description in Sparse Matrix Storage Formats for more details.

ja (input/output). For job[1] = 1 (one-based indexing for the matrix in CSR

format), array containing the column indices plus one for each non-zero
element of the matrix A.
For job[1] = 0 (zero-based indexing for the matrix in CSR format), array
containing the column indices for each non-zero element of the matrix A.
Its length is equal to the length of the array acsr. Refer to columns array
description in Sparse Matrix Storage Formats for more details.

ia (input/output). Array of length n + 1, containing indices of elements in the

array acsr, such that ia[i] - ia[0] is the index in the array acsr of the
first non-zero element from the row i. The value of the last element ia[n]
- ia[0] is equal to the number of non-zeros plus one. Refer to rowIndex
array description in Sparse Matrix Storage Formats for more details.

acoo (input/output)
Array containing non-zero elements of the matrix A. Its length is equal to
the number of non-zero elements in the matrix A. Refer to values array
description in Sparse Matrix Storage Formats for more details.

rowind (input/output). Array of length nnz, contains the row indices for each non-
zero element of the matrix A.
Refer to rows array description in Coordinate Format for more details.

colind (input/output). Array of length nnz, contains the column indices for each
non-zero element of the matrix A. Refer to columns array description in
Coordinate Format for more details.

Output Parameters

nnz Returns the number of converted elements of the matrix A for job[0]=0.

info Integer info indicator only for converting the matrix A from the CSR format.
If info=0, the execution is successful.

If info=1, the routine is interrupted because there is no space in the arrays

acoo, rowind, colind according to the value nzmax.

226
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
mkl_?csrbsr
Converts a square sparse matrix in the CSR format to
the BSR format and vice versa (deprecated).

Syntax
void mkl_scsrbsr (const MKL_INT *job , const MKL_INT *m , const MKL_INT *mblk , const
MKL_INT *ldabsr , float *acsr , MKL_INT *ja , MKL_INT *ia , float *absr , MKL_INT *jab ,
MKL_INT *iab , MKL_INT *info );
void mkl_dcsrbsr (const MKL_INT *job , const MKL_INT *m , const MKL_INT *mblk , const
MKL_INT *ldabsr , double *acsr , MKL_INT *ja , MKL_INT *ia , double *absr , MKL_INT
*jab , MKL_INT *iab , MKL_INT *info );
void mkl_ccsrbsr (const MKL_INT *job , const MKL_INT *m , const MKL_INT *mblk , const
MKL_INT *ldabsr , MKL_Complex8 *acsr , MKL_INT *ja , MKL_INT *ia , MKL_Complex8 *absr ,
MKL_INT *jab , MKL_INT *iab , MKL_INT *info );
void mkl_zcsrbsr (const MKL_INT *job , const MKL_INT *m , const MKL_INT *mblk , const
MKL_INT *ldabsr , MKL_Complex16 *acsr , MKL_INT *ja , MKL_INT *ia , MKL_Complex16
*absr , MKL_INT *jab , MKL_INT *iab , MKL_INT *info );

Include Files
• mkl.h

Description

This routine is deprecated. Use the matrix manipulation routinesfrom the Intel® oneAPI Math Kernel Library
(oneMKL) Inspector-executor Sparse BLAS interface instead.
This routine converts a square sparse matrix A stored in the compressed sparse row (CSR) format (3-array
variation) to the block sparse row (BSR) format and vice versa.

Input Parameters

job Array, contains the following conversion parameters:

job[0]
If job[0]=0, the matrix in the CSR format is converted to the BSR format;

if job[0]=1, the matrix in the BSR format is converted to the CSR format.

job[1]
If job[1]=0, zero-based indexing for the matrix in CSR format is used;

if job[1]=1, one-based indexing for the matrix in CSR format is used.

job[2]
If job[2]=0, zero-based indexing for the matrix in the BSR format is used;

if job[2]=1, one-based indexing for the matrix in the BSR format is used.

job[3] is only used for conversion to CSR format. By default, the converter
saves the blocks without checking whether an element is zero or not. If
job[3]=1, then the converter only saves non-zero elements in blocks.
job[5] - job indicator.
For conversion to the BSR format:

227
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If job[5]=0, only arrays jab, iab are generated for the output storage.

If job[5]>0, all output arrays absr, jab, and iab are filled in for the
output storage.
If job[5]=-1, iab[m] - iab[0] returns the number of non-zero blocks.

For conversion to the CSR format:

If job[5]=0, only arrays ja, ia are generated for the output storage.

m Actual row dimension of the matrix A for convert to the BSR format; block
row dimension of the matrix A for convert to the CSR format.

mblk Size of the block in the matrix A.

ldabsr Leading dimension of the array absr as declared in the calling program.
ldabsr must be greater than or equal to mblk*mblk.

ja (input/output). Array containing the column indices for each non-zero

element of the matrix A.
Its length is equal to the length of the array acsr. Refer to columns array
description in Sparse Matrix Storage Formats for more details.

ia (input/output). Array of length m + 1, containing indices of elements in the

array acsr, such that ia[I]] - iab[0] is the index in the array acsr of
the first non-zero element from the row I. The value of ia[m]] - iab[0]
is equal to the number of non-zeros. Refer to rowIndex array description in
Sparse Matrix Storage Formats for more details.

absr (input/output)
Array containing elements of non-zero blocks of the matrix A. Its length is
equal to the number of non-zero blocks in the matrix A multiplied by
mblk*mblk. Refer to values array description in BSR Format for more
details.

jab (input/output). Array containing the column indices for each non-zero block
of the matrix A.
Its length is equal to the number of non-zero blocks of the matrix A. Refer
to columns array description in BSR Format for more details.

iab (input/output). Array of length (m + 1), containing indices of blocks in the

array absr, such that iab[i] - iab[0] is the index in the array absr of
the first non-zero element from the i-th row . The value of iab[m] is equal
to the number of non-zero blocks. Refer to rowIndex array description in
BSR Format for more details.

Output Parameters

info Integer info indicator only for converting the matrix A from the CSR format.
If info=0, the execution is successful.

228
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info=1, it means that mblk is equal to 0.

If info=2, it means that ldabsr is less than mblk*mblk and there is no

space for all blocks.

mkl_?csrcsc
Converts a square sparse matrix in the CSR format to
the CSC format and vice versa (deprecated).

Syntax
void mkl_dcsrcsc (const MKL_INT *job , const MKL_INT *n , double *acsr , MKL_INT *ja ,
MKL_INT *ia , double *acsc , MKL_INT *ja1 , MKL_INT *ia1 , MKL_INT *info );
void mkl_scsrcsc (const MKL_INT *job , const MKL_INT *n , float *acsr , MKL_INT *ja ,
MKL_INT *ia , float *acsc , MKL_INT *ja1 , MKL_INT *ia1 , MKL_INT *info );
void mkl_ccsrcsc (const MKL_INT *job , const MKL_INT *n , MKL_Complex8 *acsr , MKL_INT
*ja , MKL_INT *ia , MKL_Complex8 *acsc , MKL_INT *ja1 , MKL_INT *ia1 , MKL_INT *info );
void mkl_zcsrcsc (const MKL_INT *job , const MKL_INT *n , MKL_Complex16 *acsr , MKL_INT
*ja , MKL_INT *ia , MKL_Complex16 *acsc , MKL_INT *ja1 , MKL_INT *ia1 , MKL_INT *info );

Include Files
• mkl.h

Description

This routine is deprecated. Use the matrix manipulation routinesfrom the Intel® oneAPI Math Kernel Library
(oneMKL) Inspector-executor Sparse BLAS interface instead.
This routine converts a square sparse matrix A stored in the compressed sparse row (CSR) format (3-array
variation) to the compressed sparse column (CSC) format and vice versa.

Input Parameters

job Array, contains the following conversion parameters:

job[0]
If job[0]=0, the matrix in the CSR format is converted to the CSC format;

if job[0]=1, the matrix in the CSC format is converted to the CSR format.

job[1]
If job[1]=0, zero-based indexing for the matrix in CSR format is used;

if job[1]=1, one-based indexing for the matrix in CSR format is used.

job[2]
If job[2]=0, zero-based indexing for the matrix in the CSC format is used;

if job[2]=1, one-based indexing for the matrix in the CSC format is used.

job[5] - job indicator.

For conversion to the CSC format:
If job[5]=0, only arrays ja1, ia1 are filled in for the output storage.

229
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If job[5]≠0, all output arrays acsc, ja1, and ia1 are filled in for the
output storage.
For conversion to the CSR format:
If job[5]=0, only arrays ja, ia are filled in for the output storage.

If job[5]≠0, all output arrays acsr, ja, and ia are filled in for the output
storage.

m Dimension of the square matrix A.

acsr (input/output)
Array containing non-zero elements of the square matrix A. Its length is
equal to the number of non-zero elements in the matrix A. Refer to values
array description in Sparse Matrix Storage Formats for more details.

ja (input/output). Array containing the column indices for each non-zero

element of the matrix A.
Its length is equal to the length of the array acsr. Refer to columns array
description in Sparse Matrix Storage Formats for more details.

ia (input/output). Array of length m + 1, containing indices of elements in the

array acsr, such that ia[i] - ia[0] is the index in the array acsr of the
first non-zero element from the row i. The value of ia[m] - ia[0] is equal
to the number of non-zeros. Refer to rowIndex array description in Sparse
Matrix Storage Formats for more details.

acsc (input/output)
Array containing non-zero elements of the square matrix A. Its length is
equal to the number of non-zero elements in the matrix A. Refer to values
array description in Sparse Matrix Storage Formats for more details.

ja1 (input/output). Array containing the row indices for each non-zero element
of the matrix A.
Its length is equal to the length of the array acsc. Refer to columns array
description in Sparse Matrix Storage Formats for more details.

ia1 (input/output). Array of length m + 1, containing indices of elements in the

array acsc, such that ia1[i] - ia1[0] is the index in the array acsc of
the first non-zero element from the column i. The value of ia1[m] -
ia1[0] is equal to the number of non-zeros. Refer to rowIndex array
description in Sparse Matrix Storage Formats for more details.

Output Parameters

info This parameter is not used now.

mkl_?csrdia
Converts a sparse matrix in the CSR format to the
diagonal format and vice versa (deprecated).

230
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void mkl_dcsrdia (const MKL_INT *job , const MKL_INT *n , double *acsr , MKL_INT *ja ,
MKL_INT *ia , double *adia , const MKL_INT *ndiag , MKL_INT *distance , MKL_INT
*idiag , double *acsr_rem , MKL_INT *ja_rem , MKL_INT *ia_rem , MKL_INT *info );
void mkl_scsrdia (const MKL_INT *job , const MKL_INT *n , float *acsr , MKL_INT *ja ,
MKL_INT *ia , float *adia , const MKL_INT *ndiag , MKL_INT *distance , MKL_INT *idiag ,
float *acsr_rem , MKL_INT *ja_rem , MKL_INT *ia_rem , MKL_INT *info );
void mkl_ccsrdia (const MKL_INT *job , const MKL_INT *n , MKL_Complex8 *acsr , MKL_INT
*ja , MKL_INT *ia , MKL_Complex8 *adia , const MKL_INT *ndiag , MKL_INT *distance ,
MKL_INT *idiag , MKL_Complex8 *acsr_rem , MKL_INT *ja_rem , MKL_INT *ia_rem , MKL_INT
*info );
void mkl_zcsrdia (const MKL_INT *job , const MKL_INT *n , MKL_Complex16 *acsr , MKL_INT
*ja , MKL_INT *ia , MKL_Complex16 *adia , const MKL_INT *ndiag , MKL_INT *distance ,
MKL_INT *idiag , MKL_Complex16 *acsr_rem , MKL_INT *ja_rem , MKL_INT *ia_rem , MKL_INT
*info );

Include Files
• mkl.h

Description

This routine is deprecated, but no replacement is available yet in the Inspector-Executor Sparse BLAS API
interfaces. Either write your own (see the examples/c/sparse_blas/source/sparse_converters.c
example for hints) or continue using this routine until a replacement is provided and this can be fully
removed.
This routine converts a sparse matrix A stored in the compressed sparse row (CSR) format (3-array
variation) to the diagonal format and vice versa.

Input Parameters

job Array, contains the following conversion parameters:

job[0]
If job[0]=0, the matrix in the CSR format is converted to the diagonal
format;
if job[0]=1, the matrix in the diagonal format is converted to the CSR
format.
job[1]
If job[1]=0, zero-based indexing for the matrix in CSR format is used;

if job[1]=1, one-based indexing for the matrix in CSR format is used.

job[2]
If job[2]=0, zero-based indexing for the matrix in the diagonal format is
used;
if job[2]=1, one-based indexing for the matrix in the diagonal format is
used.
job[5] - job indicator.

231
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

For conversion to the diagonal format:

If job[5]=0, diagonals are not selected internally, and acsr_rem, ja_rem,
ia_rem are not filled in for the output storage.
If job[5]=1, diagonals are not selected internally, and acsr_rem, ja_rem,
ia_rem are filled in for the output storage.
If job[5]=10, diagonals are selected internally, and acsr_rem, ja_rem,
ia_rem are not filled in for the output storage.
If job[5]=11, diagonals are selected internally, and csr_rem, ja_rem,
ia_rem are filled in for the output storage.
For conversion to the CSR format:
If job[5]=0, each entry in the array adia is checked whether it is zero.
Zero entries are not included in the array acsr.

If job[5]≠0, each entry in the array adia is not checked whether it is zero.

m Dimension of the matrix A.

ja (input/output). Array containing the column indices for each non-zero

element of the matrix A.
Its length is equal to the length of the array acsr. Refer to columns array
description in Sparse Matrix Storage Formats for more details.

ia (input/output). Array of length m + 1, containing indices of elements in the

adia (input/output)
Array of size (ndiag*idiag) containing diagonals of the matrix A.

The key point of the storage is that each element in the array adia retains
the row number of the original matrix. To achieve this diagonals in the
lower triangular part of the matrix are padded from the top, and those in
the upper triangular part are padded from the bottom.

ndiag Specifies the leading dimension of the array adia as declared in the calling
(sub)program, must be at least max(1, m).

distance Array of length idiag, containing the distances between the main diagonal
and each non-zero diagonal to be extracted. The distance is positive if the
diagonal is above the main diagonal, and negative if the diagonal is below
the main diagonal. The main diagonal has a distance equal to zero.

idiag Number of diagonals to be extracted. For conversion to diagonal format on

return this parameter may be modified.

232
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
acsr_rem, ja_rem, ia_rem Remainder of the matrix in the CSR format if it is needed for conversion to
the diagonal format.

Output Parameters

info This parameter is not used now.

mkl_?csrsky
Converts a sparse matrix in CSR format to the skyline
format and vice versa (deprecated).

Syntax
void mkl_dcsrsky (const MKL_INT *job , const MKL_INT *m , double *acsr , MKL_INT *ja ,
MKL_INT *ia , double *asky , MKL_INT *pointers , MKL_INT *info );
void mkl_scsrsky (const MKL_INT *job , const MKL_INT *m , float *acsr , MKL_INT *ja ,
MKL_INT *ia , float *asky , MKL_INT *pointers , MKL_INT *info );
void mkl_ccsrsky (const MKL_INT *job , const MKL_INT *m , MKL_Complex8 *acsr , MKL_INT
*ja , MKL_INT *ia , MKL_Complex8 *asky , MKL_INT *pointers , MKL_INT *info );
void mkl_zcsrsky (const MKL_INT *job , const MKL_INT *m , MKL_Complex16 *acsr , MKL_INT
*ja , MKL_INT *ia , MKL_Complex16 *asky , MKL_INT *pointers , MKL_INT *info );

Include Files
• mkl.h

Description

This routine is deprecated, but no replacement is available yet in the Inspector-Executor Sparse BLAS API
interfaces. Either write your own (see the examples/c/sparse_blas/source/sparse_converters.c
example for hints) or continue using this routine until a replacement is provided and this can be fully
removed.
This routine converts a sparse matrix A stored in the compressed sparse row (CSR) format (3-array
variation) to the skyline format and vice versa.

Input Parameters

job Array, contains the following conversion parameters:

job[0]
If job[0]=0, the matrix in the CSR format is converted to the skyline
format;
if job[0]=1, the matrix in the skyline format is converted to the CSR
format.
job[1]
If job[1]=0, zero-based indexing for the matrix in CSR format is used;

if job[1]=1, one-based indexing for the matrix in CSR format is used.

job[2]

233
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If job[2]=0, zero-based indexing for the matrix in the skyline format is

used;
if job[2]=1, one-based indexing for the matrix in the skyline format is
used.
job[3]
For conversion to the skyline format:
If job[3]=0, the upper part of the matrix A in the CSR format is converted.

If job[3]=1, the lower part of the matrix A in the CSR format is converted.

For conversion to the CSR format:

If job[3]=0, the matrix is converted to the upper part of the matrix A in
the CSR format.
If job[3]=1, the matrix is converted to the lower part of the matrix A in
the CSR format.
job[4]
job[4]=nzmax - maximum number of the non-zero elements of the matrix
A if job[0]=0.

job[5] - job indicator.

Only for conversion to the skyline format:
If job[5]=0, only arrays pointers is filled in for the output storage.

If job[5]=1, all output arrays asky and pointers are filled in for the
output storage.

m Dimension of the matrix A.

ja (input/output). Array containing the column indices for each non-zero

element of the matrix A.
Its length is equal to the length of the array acsr. Refer to columns array
description in Sparse Matrix Storage Formats for more details.

ia (input/output). Array of length m + 1, containing indices of elements in the

asky (input/output)
Array, for a lower triangular part of A it contains the set of elements from
each row starting from the first none-zero element to and including the
diagonal element. For an upper triangular matrix it contains the set of
elements from each column of the matrix starting with the first non-zero

234
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
element down to and including the diagonal element. Encountered zero
elements are included in the sets. Refer to values array description in
Skyline Storage Format for more details.

pointers (input/output).
Array with dimension (m+1), where m is number of rows for lower triangle
(columns for upper triangle), pointers[i-1] - pointers[0] gives the
index of element in the array asky that is first non-zero element in row
(column)i . The value of pointers[m] is set to nnz + pointers[0],
where nnz is the number of elements in the array asky. Refer to pointers
array description in Skyline Storage Format for more details

Output Parameters

info Integer info indicator only for converting the matrix A from the CSR format.
If info=0, the execution is successful.

If info=1, the routine is interrupted because there is no space in the array

asky according to the value nzmax.

mkl_?csradd
Computes the sum of two matrices stored in the CSR
format (3-array variation) with one-based indexing
(deprecated).

Syntax
void mkl_dcsradd (const char *trans , const MKL_INT *request , const MKL_INT *sort ,
const MKL_INT *m , const MKL_INT *n , double *a , MKL_INT *ja , MKL_INT *ia , const
double *beta , double *b , MKL_INT *jb , MKL_INT *ib , double *c , MKL_INT *jc , MKL_INT
*ic , const MKL_INT *nzmax , MKL_INT *info );
void mkl_scsradd (const char *trans , const MKL_INT *request , const MKL_INT *sort ,
const MKL_INT *m , const MKL_INT *n , float *a , MKL_INT *ja , MKL_INT *ia , const
float *beta , float *b , MKL_INT *jb , MKL_INT *ib , float *c , MKL_INT *jc , MKL_INT
*ic , const MKL_INT *nzmax , MKL_INT *info );
void mkl_ccsradd (const char *trans , const MKL_INT *request , const MKL_INT *sort ,
const MKL_INT *m , const MKL_INT *n , MKL_Complex8 *a , MKL_INT *ja , MKL_INT *ia ,
const MKL_Complex8 *beta , MKL_Complex8 *b , MKL_INT *jb , MKL_INT *ib , MKL_Complex8
*c , MKL_INT *jc , MKL_INT *ic , const MKL_INT *nzmax , MKL_INT *info );
void mkl_zcsradd (const char *trans , const MKL_INT *request , const MKL_INT *sort ,
const MKL_INT *m , const MKL_INT *n , MKL_Complex16 *a , MKL_INT *ja , MKL_INT *ia ,
const MKL_Complex16 *beta , MKL_Complex16 *b , MKL_INT *jb , MKL_INT *ib ,
MKL_Complex16 *c , MKL_INT *jc , MKL_INT *ic , const MKL_INT *nzmax , MKL_INT *info );

Include Files
• mkl.h

Description

This routine is deprecated. Use mkl_sparse_?_addfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.

235
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The mkl_?csradd routine performs a matrix-matrix operation defined as

C := A+beta*op(B)
where:
A, B, C are the sparse matrices in the CSR format (3-array variation).
op(B) is one of op(B) = B, or op(B) = BT, or op(B) = BH
beta is a scalar.
The routine works correctly if and only if the column indices in sparse matrix representations of matrices A
and B are arranged in the increasing order for each row. If not, use the parameter sort (see below) to
reorder column indices and the corresponding elements of the input matrices.

NOTE
This routine supports only one-based indexing of the input arrays.

Input Parameters

trans Specifies the operation.

If trans = 'N' or 'n', then C := A+beta*B

If trans = 'T' or 't', then C := A+beta*BT

If trans = 'C' or 'c', then C := A+beta*BH.

request If request=0, the routine performs addition. The memory for the output
arrays ic, jc, c must be allocated beforehand.

If request=1, the routine only computes the values of the array ic of

length m + 1. The memory for the ic array must be allocated beforehand.
On exit the value ic[m] - 1 is the actual number of the elements in the
arrays c and jc.

If request=2, after the routine is called previously with the parameter

request=1 and after the output arrays jc and c are allocated in the calling
program with length at least ic[m] - 1, the routine performs addition.

sort Specifies the type of reordering. If this parameter is not set (default), the
routine does not perform reordering.
If sort=1, the routine arranges the column indices ja for each row in the
increasing order and reorders the corresponding values of the matrix A in
the array a.

If sort=2, the routine arranges the column indices jb for each row in the
increasing order and reorders the corresponding values of the matrix B in
the array b.

If sort=3, the routine performs reordering for both input matrices A and B.

m Number of rows of the matrix A.

n Number of columns of the matrix A.

236
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
a Array containing non-zero elements of the matrix A. Its length is equal to
the number of non-zero elements in the matrix A. Refer to values array
description in Sparse Matrix Storage Formats for more details.

ja Array containing the column indices plus one for each non-zero element of
the matrix A. For each row the column indices must be arranged in the
increasing order.
The length of this array is equal to the length of the array a. Refer to
columns array description in Sparse Matrix Storage Formats for more
details.

ia Array of length m + 1, containing indices of elements in the array a, such

that ia[i] - ia[0] is the index in the array a of the first non-zero
element from the row i. The value of the last element ia[m] is equal to the
number of non-zero elements of the matrix A plus one. Refer to rowIndex
array description in Sparse Matrix Storage Formats for more details.

beta Specifies the scalar beta.

b Array containing non-zero elements of the matrix B. Its length is equal to

the number of non-zero elements in the matrix B. Refer to values array
description in Sparse Matrix Storage Formats for more details.

jb Array containing the column indices plus one for each non-zero element of
the matrix B. For each row the column indices must be arranged in the
increasing order.
The length of this array is equal to the length of the array b. Refer to
columns array description in Sparse Matrix Storage Formats for more
details.

ib Array of length m + 1 when trans = 'N' or 'n', or n + 1 otherwise.

This array contains indices of elements in the array b, such that ib[i] -
ib[0] is the index in the array b of the first non-zero element from the row
i. The value of the last element ib[m] or ib[n] is equal to the number of
non-zero elements of the matrix B plus one. Refer to rowIndex array
description in Sparse Matrix Storage Formats for more details.

nzmax The length of the arrays c and jc.

This parameter is used only if request=0. The routine stops calculation if

the number of elements in the result matrix C exceeds the specified value
of nzmax.

Output Parameters

c Array containing non-zero elements of the result matrix C. Its length is

equal to the number of non-zero elements in the matrix C. Refer to values
array description in Sparse Matrix Storage Formats for more details.

jc Array containing the column indices plus one for each non-zero element of
the matrix C.
The length of this array is equal to the length of the array c. Refer to
columns array description in Sparse Matrix Storage Formats for more
details.

237
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ic Array of length m + 1, containing indices of elements in the array c, such

that ic[i] - ic[0] is the index in the array c of the first non-zero
element from the row i. The value of the last element ic[m] is equal to the
number of non-zero elements of the matrix C plus one. Refer to rowIndex
array description in Sparse Matrix Storage Formats for more details.

info If info=0, the execution is successful.

If info=I>0, the routine stops calculation in the I-th row of the matrix C
because number of elements in C exceeds nzmax.

If info=-1, the routine calculates only the size of the arrays c and jc and
returns this value plus 1 as the last element of the array ic.

mkl_?csrmultcsr
Computes product of two sparse matrices stored in
the CSR format (3-array variation) with one-based
indexing (deprecated).

Syntax
void mkl_dcsrmultcsr (const char *trans , const MKL_INT *request , const MKL_INT
*sort , const MKL_INT *m , const MKL_INT *n , const MKL_INT *k , double *a , MKL_INT
*ja , MKL_INT *ia , double *b , MKL_INT *jb , MKL_INT *ib , double *c , MKL_INT *jc ,
MKL_INT *ic , const MKL_INT *nzmax , MKL_INT *info );
void mkl_scsrmultcsr (const char *trans , const MKL_INT *request , const MKL_INT
*sort , const MKL_INT *m , const MKL_INT *n , const MKL_INT *k , float *a , MKL_INT
*ja , MKL_INT *ia , float *b , MKL_INT *jb , MKL_INT *ib , float *c , MKL_INT *jc ,
MKL_INT *ic , const MKL_INT *nzmax , MKL_INT *info );
void mkl_ccsrmultcsr (const char *trans , const MKL_INT *request , const MKL_INT
*sort , const MKL_INT *m , const MKL_INT *n , const MKL_INT *k , MKL_Complex8 *a ,
MKL_INT *ja , MKL_INT *ia , MKL_Complex8 *b , MKL_INT *jb , MKL_INT *ib , MKL_Complex8
*c , MKL_INT *jc , MKL_INT *ic , const MKL_INT *nzmax , MKL_INT *info );
void mkl_zcsrmultcsr (const char *trans , const MKL_INT *request , const MKL_INT
*sort , const MKL_INT *m , const MKL_INT *n , const MKL_INT *k , MKL_Complex16 *a ,
MKL_INT *ja , MKL_INT *ia , MKL_Complex16 *b , MKL_INT *jb , MKL_INT *ib ,
MKL_Complex16 *c , MKL_INT *jc , MKL_INT *ic , const MKL_INT *nzmax , MKL_INT *info );

Include Files
• mkl.h

Description

This routine is deprecated. Use mkl_sparse_spmmfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?csrmultcsr routine performs a matrix-matrix operation defined as

C := op(A)*B
where:
A, B, C are the sparse matrices in the CSR format (3-array variation);
op(A) is one of op(A) = A, or op(A) =AT, or op(A) = AH .

238
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
You can use the parameter sort to perform or not perform reordering of non-zero entries in input and output
sparse matrices. The purpose of reordering is to rearrange non-zero entries in compressed sparse row matrix
so that column indices in compressed sparse representation are sorted in the increasing order for each row.
The following table shows correspondence between the value of the parameter sort and the type of
reordering performed by this routine for each sparse matrix involved:
Value of the parameter Reordering of A (arrays Reordering of B (arrays Reordering of C (arrays
sort a, ja, ia) b, ja, ib) c, jc, ic)
1 yes no yes
2 no yes yes
3 yes yes yes
4 yes no no
5 no yes no
6 yes yes no
7 no no no
arbitrary value not equal to no no yes
1, 2,..., 7

NOTE
This routine supports only one-based indexing of the input arrays.

Input Parameters

trans Specifies the operation.

If trans = 'N' or 'n', then C := A*B

If trans = 'T' or 't' or 'C' or 'c', then C := AT*B.

request If request=0, the routine performs multiplication, the memory for the
output arrays ic, jc, c must be allocated beforehand.

If request=1, the routine computes only values of the array ic of length m

+ 1, the memory for this array must be allocated beforehand. On exit the
value ic[m] - 1 is the actual number of the elements in the arrays c and
jc.
If request=2, the routine has been called previously with the parameter
request=1, the output arrays jc and c are allocated in the calling program
and they are of the length ic[m] - 1 at least.

sort Specifies whether the routine performs reordering of non-zeros entries in

input and/or output sparse matrices (see table above).

m Number of rows of the matrix A.

n Number of columns of the matrix A.

k Number of columns of the matrix B.

a Array containing non-zero elements of the matrix A. Its length is equal to

the number of non-zero elements in the matrix A. Refer to values array
description in Sparse Matrix Storage Formats for more details.

239
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ia Array of length m + 1.

This array contains indices of elements in the array a, such that ia[i] -
ia[0] is the index in the array a of the first non-zero element from the row
i. The value of the last element ia[m] is equal to the number of non-zero
elements of the matrix A plus one. Refer to rowIndex array description in
Sparse Matrix Storage Formats for more details.

b Array containing non-zero elements of the matrix B. Its length is equal to

the number of non-zero elements in the matrix B. Refer to values array
description in Sparse Matrix Storage Formats for more details.

ib Array of length n + 1 when trans = 'N' or 'n', or m + 1 otherwise.

This array contains indices of elements in the array b, such that ib[i] -
ib[0] is the index in the array b of the first non-zero element from the row
i. The value of the last element ib[n] or ib[m] is equal to the number of
non-zero elements of the matrix B plus one. Refer to rowIndex array
description in Sparse Matrix Storage Formats for more details.

nzmax The length of the arrays c and jc.

This parameter is used only if request=0. The routine stops calculation if

the number of elements in the result matrix C exceeds the specified value
of nzmax.

Output Parameters

c Array containing non-zero elements of the result matrix C. Its length is

equal to the number of non-zero elements in the matrix C. Refer to values
array description in Sparse Matrix Storage Formats for more details.

ic Array of length m + 1 when trans = 'N' or 'n', or n + 1 otherwise.

240
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
This array contains indices of elements in the array c, such that ic[i] -
ic[0] is the index in the array c of the first non-zero element from the row
i. The value of the last element ic[m] or ic[n] is equal to the number of
non-zero elements of the matrix C plus one. Refer to rowIndex array
description in Sparse Matrix Storage Formats for more details.

info If info=0, the execution is successful.

If info=I>0, the routine stops calculation in the I-th row of the matrix C
because number of elements in C exceeds nzmax.

If info=-1, the routine calculates only the size of the arrays c and jc and
returns this value plus 1 as the last element of the array ic.

mkl_?csrmultd
Computes product of two sparse matrices stored in
the CSR format (3-array variation) with one-based
indexing. The result is stored in the dense matrix
(deprecated).

Syntax
void mkl_dcsrmultd (const char *trans , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , double *a , MKL_INT *ja , MKL_INT *ia , double *b , MKL_INT *jb , MKL_INT
*ib , double *c , MKL_INT *ldc );
void mkl_scsrmultd (const char *trans , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , float *a , MKL_INT *ja , MKL_INT *ia , float *b , MKL_INT *jb , MKL_INT
*ib , float *c , MKL_INT *ldc );
void mkl_ccsrmultd (const char *trans , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , MKL_Complex8 *a , MKL_INT *ja , MKL_INT *ia , MKL_Complex8 *b , MKL_INT
*jb , MKL_INT *ib , MKL_Complex8 *c , MKL_INT *ldc );
void mkl_zcsrmultd (const char *trans , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , MKL_Complex16 *a , MKL_INT *ja , MKL_INT *ia , MKL_Complex16 *b , MKL_INT
*jb , MKL_INT *ib , MKL_Complex16 *c , MKL_INT *ldc );

Include Files
• mkl.h

Description

This routine is deprecated. Use mkl_sparse_?_spmmdfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?csrmultd routine performs a matrix-matrix operation defined as

C := op(A)*B
where:
A, B are the sparse matrices in the CSR format (3-array variation), C is dense matrix;
op(A) is one of op(A) = A, or op(A) =AT, or op(A) = AH .
The routine works correctly if and only if the column indices in sparse matrix representations of matrices A
and B are arranged in the increasing order for each row.

241
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

NOTE
This routine supports only one-based indexing of the input arrays.

Input Parameters

trans Specifies the operation.

If trans = 'N' or 'n', then C := A*B

If trans = 'T' or 't' or 'C' or 'c', then C := AT*B.

m Number of rows of the matrix A.

n Number of columns of the matrix A.

k Number of columns of the matrix B.

a Array containing non-zero elements of the matrix A. Its length is equal to

the number of non-zero elements in the matrix A. Refer to values array
description in Sparse Matrix Storage Formats for more details.

ia Array of length m + 1 when trans = 'N' or 'n', or n + 1 otherwise.

This array contains indices of elements in the array a, such that ia[i] -
ia[0] is the index in the array a of the first non-zero element from the row
i. The value of the last element ia[m] or ia[n] is equal to the number of
non-zero elements of the matrix A plus one. Refer to rowIndex array
description in Sparse Matrix Storage Formats for more details.

b Array containing non-zero elements of the matrix B. Its length is equal to

the number of non-zero elements in the matrix B. Refer to values array
description in Sparse Matrix Storage Formats for more details.

ib Array of length m + 1.

This array contains indices of elements in the array b, such that ib[i] -
ib[0] is the index in the array b of the first non-zero element from the row
i. The value of the last element ib[m] is equal to the number of non-zero
elements of the matrix B plus one. Refer to rowIndex array description in
Sparse Matrix Storage Formats for more details.

242
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

c Array containing non-zero elements of the result matrix C.

ldc Specifies the leading dimension of the dense matrix C as declared in the
calling (sub)program. Must be at least max(m, 1) when trans = 'N' or
'n', or max(1, n) otherwise.

Sparse QR Routines
Sparse QR routines and their data types

Routine or function group Data types Description

mkl_sparse_set_qr_hint Enables a pivot strategy for an ill-conditioned matrix.
mkl_sparse_?_qr s,d Calculates the solution of a sparse system of linear equations
using QR factorization.
mkl_sparse_qr_reorder Performs reordering and symbolic analysis of the matrix A.
mkl_sparse_?_qr_factorize s,d Performs numerical factorization of the matrix A.
mkl_sparse_?_qr_solve s,d Solves the system A*x = b using QR factorization of the matrix
A.
mkl_sparse_?_qr_qmult s,d Performs x := Q^(-1)*b.
mkl_sparse_?_qr_rsolve s,d Performs x := R^(-1)*b.

NOTE The underdetermined systems of equations are not supported. The number of columns should
be less or equal to the number or rows.

For more information about the workflow of sparse QR functionality, refer to oneMKL Sparse QR solver.
Multifrontal Sparse QR Factorization Method for Solving a Sparse System of Linear Equations.

mkl_sparse_set_qr_hint
Define the pivot strategy for further calls of
mkl_sparse_?_qr.

Syntax
sparse_status_t mkl_sparse_set_qr_hint (sparse_matrix_t A, sparse_qr_hint_t hint);

Include Files
• mkl_sparse_qr.h

Description
You can use this routine to enable a pivot strategy in the case of an ill-conditioned matrix.

Input Parameters

A Handle containing a sparse matrix in an internal data structure.

hint Value specifying whether to use pivoting.

243
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

NOTE The only value currently supported is

SPARSE_QR_WITH_PIVOTS, which enables the use of a pivot
strategy for an ill-conditioned matrix.

Return Values

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INI The routine encountered an empty handle or matrix array.
TIALIZED
SPARSE_STATUS_ALLOC_F Internal memory allocation failed.
AILED
SPARSE_STATUS_INVALID The input parameters contain an invalid value.
_VALUE
SPARSE_STATUS_EXECUTI Execution failed.
ON_FAILED
SPARSE_STATUS_INTERNA An error in algorithm implementation occurred.
L_ERROR
SPARSE_STATUS_NOT_SUP The requested operation is not supported.
PORTED

mkl_sparse_?_qr
Computes the QR decomposition for the matrix of a
sparse linear system and calculates the solution.

Syntax
sparse_status_t mkl_sparse_d_qr ( sparse_operation_t operation, sparse_matrix_t A,
struct matrix_descr descr, sparse_layout_t layout, MKL_INT columns, double *x, MKL_INT
ldx, const double *b, MKL_INT ldb );
sparse_status_t mkl_sparse_s_qr ( sparse_operation_t operation, sparse_matrix_t A,
struct matrix_descr descr, sparse_layout_t layout, MKL_INT columns, float *x, MKL_INT
ldx, const float *b, MKL_INT ldb );

Include Files
• mkl_sparse_qr.h

Description
The mkl_sparse_?_qr routine computes the QR decomposition for the matrix of a sparse linear system A*x
= b, so that A = Q*R where Q is the orthogonal matrix and R is upper triangular, and calculates the solution.

NOTE
Currently, mkl_sparse_?_qr supports only square and overdetermined systems. For
underdetermined systems you can manually transpose the system matrix and use QR
decomposition for AT to get the minimum-norm solution for the original underdetermined
system.

244
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
NOTE Currently, mkl_sparse_?_qr supports only CSR format for the input matrix, non-
transpose operation, and single right-hand side.

Input Parameters

operation Specifies the operation to perform.

NOTE Currently, the only suppored value is

SPARSE_OPERATION_NON_TRANSPOSE (non-transpose case; that is, A*x
= b is solved).

A Handle containing a sparse matrix in an internal data structure.

descr Structure specifying sparse matrix properties. Only the parameters listed here
are currently supported.

type Specifies the type of sparse matrix.

NOTE Currently, the only supported value is

SPARSE_MATRIX_TYPE_GENERAL (the matrix is processed as-is).

layout Describes the storage scheme for the dense matrix:

SPARSE_LAYOUT_COLUMN_MAJOR Storage of elements uses column-major

layout.
SPARSE_LAYOUT_ROW_MAJOR Storage of elements uses row-major layout.

x Array with a size of at least rows*cols:

layout = layout =
SPARSE_LAYOUT_COLUMN_MAJOR SPARSE_LAYOUT_ROW_MAJOR
rows ldx Number of columns in A
(number of
rows in x)
cols columns ldx
(number of
columns in
x)

columns Number of columns in matrix b.

ldx Specifies the leading dimension of matrix x.

b Array with a size of at least rows*cols:

layout = layout =
SPARSE_LAYOUT_COLUMN_MAJOR SPARSE_LAYOUT_ROW_MAJOR
rows ldb Number of columns in A
(number of
rows in b)

245
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

cols columns ldb

(number of
columns in
b)

ldb Specifies the leading dimension of matrix b.

Output Parameters

x Overwritten by the updated matrix y.

Return Values

SPARSE_STATUS_SUCCESS The operation was successful.

mkl_sparse_qr_reorder
Reordering step of SPARSE QR solver.

Syntax
sparse_status_t mkl_sparse_qr_reorder (sparse_matrix_t A, struct matrix_descr descr);

Include Files
• mkl_sparse_qr.h

Description
The mkl_sparse_qr_reorder routine performs ordering and symbolic analysis of matrix A.

NOTE Currently, mkl_sparse_qr_reorder supports only general structure and CSR format
for the input matrix.

Input Parameters

A Handle containing a sparse matrix in an internal data structure.

246
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
descr Structure specifying sparse matrix properties. Only the parameters listed here
are currently supported.

Return Values

SPARSE_STATUS_SUCCESS The operation was successful.

mkl_sparse_?_qr_factorize
Factorization step of the SPARSE QR solver.

Syntax
sparse_status_t mkl_sparse_d_qr_factorize (sparse_matrix_t A, double *alt_values);
sparse_status_t mkl_sparse_s_qr_factorize (sparse_matrix_t A, float *alt_values);

Include Files
• mkl_sparse_qr.h

Description
The mkl_sparse_?_qr_factorize routine performs numerical factorization of matrix A. Prior to calling this
routine, the mkl_sparse_?_qr_reorder routine must be called for the matrix handle A. For more
information about the workflow of sparse QR functionality, refer to oneMKL Sparse QR solver. Multifrontal
Sparse QR Factorization Method for Solving a Sparse System of Linear Equations.

NOTE Currently, mkl_sparse_?_qr_factorize supports only CSR format for the input matrix.

Input Parameters

A Handle containing a sparse matrix in an internal data structure.

alt_values Array with alternative values. Must be the size of the non-zeroes in the initial
input matrix. When passed to the routine, these values will be used during the
factorization step instead of the values stored in handle A.

247
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Return Values

SPARSE_STATUS_SUCCESS The operation was successful.

mkl_sparse_?_qr_solve
Solving step of the SPARSE QR solver.

Syntax
sparse_status_t mkl_sparse_d_qr_solve ( sparse_operation_t operation, sparse_matrix_t
A, double *alt_values, sparse_layout_t layout, MKL_INT columns, double *x, MKL_INT ldx,
const double *b, MKL_INT ldb );
sparse_status_t mkl_sparse_s_qr_solve ( sparse_operation_t operation, sparse_matrix_t
A, float *alt_values, sparse_layout_t layout, MKL_INT columns, float *x, MKL_INT ldx,
const float *b, MKL_INT ldb );

Include Files
• mkl_sparse_qr.h

Description
The mkl_sparse_?_qr_solve routine computes the solution of sparse systems of linear equations A*x =
b. Prior to calling this routine, the mkl_sparse_?_qr_factorize routine must be called for the matrix
handle A. For more information about the workflow of sparse QR functionality, refer to oneMKL Sparse QR
solver. Multifrontal Sparse QR Factorization Method for Solving a Sparse System of Linear Equations.

NOTE
Currently, mkl_sparse_?_qr_solve supports only CSR format for the input matrix, non-
transpose operation, and single right-hand side.
Alternative values are not supported and must be set to NULL.

Input Parameters

operation Specifies the operation to perform.

248
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

NOTE Currently, the only supported value is

SPARSE_OPERATION_NON_TRANSPOSE (non-transpose case; that is, A*x
= b is solved).

A Handle containing a sparse matrix in an internal data structure.

alt_values Reserved for future use.

layout Describes the storage scheme for the dense matrix:

SPARSE_LAYOUT_COLUMN_MAJOR Storage of elements uses column-major

layout.
SPARSE_LAYOUT_ROW_MAJOR Storage of elements uses row-major layout.

x Array with a size of at least rows*cols:

layout = layout =
SPARSE_LAYOUT_COLUMN_MAJOR SPARSE_LAYOUT_ROW_MAJOR
rows ldx Number of columns in A
(number of
rows in x)
cols columns ldx
(number of
columns in
x)

columns Number of columns in matrix b.

ldx Specifies the leading dimension of matrix x.

b Array with a size of at least rows*cols:

layout = layout =
SPARSE_LAYOUT_COLUMN_MAJOR SPARSE_LAYOUT_ROW_MAJOR
rows ldb Number of columns in A
(number of
rows in b)
cols columns ldb
(number of
columns in
b)

ldb Specifies the leading dimension of matrix b.

Output Parameters

x Contains the solution of system A*x = b.

Return Values

SPARSE_STATUS_SUCCESS The operation was successful.

249
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

SPARSE_STATUS_NOT_INI The routine encountered an empty handle or matrix array.

TIALIZED
SPARSE_STATUS_ALLOC_F Internal memory allocation failed.
AILED
SPARSE_STATUS_INVALID The input parameters contain an invalid value.
_VALUE
SPARSE_STATUS_EXECUTI Execution failed.
ON_FAILED
SPARSE_STATUS_INTERNA An error in algorithm implementation occurred.
L_ERROR
SPARSE_STATUS_NOT_SUP The requested operation is not supported.
PORTED

mkl_sparse_?_qr_qmult
First stage of the solving step of the SPARSE QR
solver.

Syntax
sparse_status_t mkl_sparse_d_qr_qmult ( sparse_operation_t operation, sparse_matrix_t
A, sparse_layout_t layout, MKL_INT columns, double *x, MKL_INT ldx, const double *b,
MKL_INT ldb );
sparse_status_t mkl_sparse_s_qr_qmult ( sparse_operation_t operation, sparse_matrix_t
A, sparse_layout_t layout, MKL_INT columns, float *x, MKL_INT ldx, const float *b,
MKL_INT ldb );

Include Files
• mkl_sparse_qr.h

Description
The mkl_sparse_?_qr_qmult routine computes multiplication of inversed matrix Q and right-hand side
matrix b. This routine can be used to perform the solving step in two separate calls as an alternative to a
single call of mkl_sparse_?_qr_solve.

NOTE Currently, mkl_sparse_?_qr_qmult supports only CSR format for the input matrix,
non-transpose operation, and single right-hand side.

Input Parameters

operation Specifies the operation to perform.

NOTE Currently, the only supported value is

SPARSE_OPERATION_NON_TRANSPOSE (non-transpose case; that is, A*x
= b is solved).

A Handle containing a sparse matrix in an internal data structure.

250
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
layout Describes the storage scheme for the dense matrix:

SPARSE_LAYOUT_COLUMN_MAJOR Storage of elements uses column-major

layout.
SPARSE_LAYOUT_ROW_MAJOR Storage of elements uses row-major layout.

x Array with a size of at least rows*cols:

layout = layout =
SPARSE_LAYOUT_COLUMN_MAJOR SPARSE_LAYOUT_ROW_MAJOR
rows ldx Number of columns in A
(number of
rows in x)
cols columns ldx
(number of
columns in
x)

columns Number of columns in matrix b.

ldx Specifies the leading dimension of matrix x.

b Array with a size of at least rows*cols:

layout = layout =
SPARSE_LAYOUT_COLUMN_MAJOR SPARSE_LAYOUT_ROW_MAJOR
rows ldb Number of columns in A
(number of
rows in b)
cols columns ldb
(number of
columns in
b)

ldb Specifies the leading dimension of matrix b.

Output Parameters

x Overwritten by the updated matrix x = Q-1*b.

Return Values

SPARSE_STATUS_SUCCESS The operation was successful.

251
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

SPARSE_STATUS_EXECUTI Execution failed.

ON_FAILED
SPARSE_STATUS_INTERNA An error in algorithm implementation occurred.
L_ERROR
SPARSE_STATUS_NOT_SUP The requested operation is not supported.
PORTED

mkl_sparse_?_qr_rsolve
Second stage of the solving step of the SPARSE QR
solver.

Syntax
sparse_status_t mkl_sparse_d_qr_rsolve ( sparse_operation_t operation, sparse_matrix_t
A, sparse_layout_t layout, MKL_INT columns, double *x, MKL_INT ldx, const double *b,
MKL_INT ldb );
sparse_status_t mkl_sparse_s_qr_rsolve ( sparse_operation_t operation, sparse_matrix_t
A, sparse_layout_t layout, MKL_INT columns, float *x, MKL_INT ldx, const float *b,
MKL_INT ldb );

Include Files
• mkl_sparse_qr.h

Description
The mkl_sparse_?_qr_rsolve routine computes the solution of A*x = b.

NOTE Currently, mkl_sparse_?_qr_rsolve supports only CSR format for the input matrix,
non-transpose operation, and single right-hand side.

Input Parameters

operation Specifies the operation to perform.

NOTE Currently, the only supported value is

SPARSE_OPERATION_NON_TRANSPOSE (non-transpose case; that is, A*x
= b is solved).

A Handle containing a sparse matrix in an internal data structure.

layout Describes the storage scheme for the dense matrix:

SPARSE_LAYOUT_COLUMN_MAJOR Storage of elements uses column-major

layout.
SPARSE_LAYOUT_ROW_MAJOR Storage of elements uses row-major layout.

x Array with a size of at least rows*cols:

layout = layout =
SPARSE_LAYOUT_COLUMN_MAJOR SPARSE_LAYOUT_ROW_MAJOR

252
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
rows ldx Number of columns in A
(number of
rows in x)
cols columns ldx
(number of
columns in
x)

columns Number of columns in matrix b.

ldx Specifies the leading dimension of matrix x.

b Array with a size of at least rows*cols:

layout = layout =
SPARSE_LAYOUT_COLUMN_MAJOR SPARSE_LAYOUT_ROW_MAJOR
rows ldb Number of columns in A
(number of
rows in b)
cols columns ldb
(number of
columns in
b)

ldb Specifies the leading dimension of matrix b.

Output Parameters

x Contains the solution of the triangular system R*x = b.

Return Values

SPARSE_STATUS_SUCCESS The operation was successful.

Compact BLAS and LAPACK Functions

253
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Overview
Many HPC applications rely on the application of BLAS and LAPACK operations on groups of very small
matrices. While existing batch Intel® oneAPI Math Kernel Library (oneMKL) BLAS routines already provide
meaningful speedup over OpenMP* loops around BLAS operations for these sizes, another customization
offers potential speedup by allocating matrices in aSIMD-friendly format, thus allowing for cross-matrix
vectorization in the BLAS and LAPACK routines of the Intel® oneAPI Math Kernel Library (oneMKL)
calledCompact BLAS and LAPACK.
The main idea behind these compact methods is to create true SIMD computations in which subgroups of
matrices are operated on with kernels that abstractly appear as scalar kernels, while registers are filled by
cross-matrix vectorization.
These are the BLAS/LAPACK compact functions:
• mkl_?gemm_compact
• mkl_?trsm_compact
• mkl_?potrf_compact
• mkl_?getrfnp_compact
• mkl_?geqrf_compact
• mkl_?getrinp_compact
The compact API provides additional service functions to refactor data. Because this capability is not specific
to any particular BLAS or LAPACK operation, this data manipulation can be executed once for an application's
data, allowing the entire program -- consisting of any number of BLAS and LAPACK operations for which
compact kernels have been written -- to be performed on the compact data without any refactoring. For
applications working on data in compact format, the packing function need not be used.
See "About the Compact Format" below for more details.
Along with this new data format, the API consists of two components:
• BLAS and LAPACK Compact Kernels: The first component of the API is a compact kernel that works on
matrices stored in compact format.
• Service Functions for the Compact Format: The second component of the API is a compact service
function allowing for data to be factored into and out of compact format. These are:
• mkl_?gepack_compact
• mkl_?geunpack_compact
• mkl_get_format_compact
• mkl_?get_size_compact
Note that there are some Numerical Limitations for the routines mentioned above.

About the Compact Format

In compact format, for calculations involving real precision, matrices are organized in packs of size V, where
V is the SIMD vector length of the underlying architecture. Each pack is a 3D-tensor with the matrix index
incrementing the fastest. These packs are then loaded into registers and operated on using SIMD
instructions.

254
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The figure below demonstrates the packing of a set of four 3 x 3 real-precision matrices into compact format.
The pack length for this example is V = 2, resulting in 2 compact packs.

Interleaved Data for Compact BLAS and LAPACK

255
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

For calculations involving complex precision, the real and imaginary parts of each matrix are packed
separately. In the figure below, the group of four 3 x 3 complex matrices is packed into compact format with
pack length V = 2. The first pack consists of the real parts of the first two matrices, and the second pack
consists of the imaginary parts of the first two matrices. Real and imaginary packs alternate in memory. This
storage format means that all compact arrays can be handled as a real type.

Compact Format for Complex Precision

The particular specifications (size and number) of the compact packs for the architecture and problem-
precision definition are specified by an MKL_COMPACT_PACK enum type. For example: given a double-
precision problem involving a group of 128 matrices working on an architecture with a 256-bit SIMD vector
length, the optimal pack length is V = 4, and the number of packs is 32.
The initially-permitted values for the enum are:
• MKL_COMPACT_SSE - pack length 2 for double precision, pack length 4 for single precision.
• MKL_COMPACT_AVX - pack length 4 for double precision, pack length 8 for single precision.
• MKL_COMPACT_AVX512 - pack length 8 for double precision, pack length 16 for single precision.
For calculations involving complex precision, the pack length is the same; however, half of the packs store
the real parts of matrices, and half store the imaginary parts. The means that it takes double the number of
packs to store the same number of matrices.

256
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The above examples illustrate the case when the number of matrices is evenly-divisible by the pack length.
When this is not the case, there will be partially-unfilled packs at the end of the memory segment, and the
compact-packing routine will pad these partially unfilled packs with identity matrices, so that compact
routines use only the completely-filled registers in their calculations. The next figure illustrates this padding
for a group of three 3 x 3 real-precision matrices with a pack length of 2.

Compact Format with Padding

Before calling a BLAS or LAPACK compact function, the input data must be packed in compact format. After
execution, the output data should be unpacked from this compact format, unless another compact routine
will be called immediately following the first. Two service functions, mkl_?gepack_compact, and mkl_?
geunpack_compact, facilitate the process of storing matrices in compact format. It is recommended that the
user call the function mkl_get_format_compact before calling the mkl_?gepack_compactroutine to obtain the
optimal format for performance. Advanced users can pack and unpack the matrices themselves and still use
Intel® oneAPI Math Kernel Library (oneMKL) compact functions on the packed set.
Compact routines can only be called for groups of matrices that have the same dimensions, leading
dimension, and storage format. For example, the routine mkl_?getrfnp_compact, which calculates the LU
factorization of a group of m x n matrices without pivoting, can only be called for a group of matrices with
the same number of rows (m) and the same number of columns (n). All of the matrices must also be stored
in arrays with the same leading dimension, and all must be stored in the same storage format (column-major
or row-major).

mkl_?gemm_compact
Computes a matrix-matrix product of a set of compact
format general matrices.

Syntax
void mkl_sgemm_compact (MKL_LAYOUT layout, MKL_TRANSPOSE transa, MKL_TRANSPOSE transb,
MKL_INT m, MKL_INT n, MKL_INT k, float alpha, const float *ap, MKL_INT ldap, const float
*bp, MKL_INT ldbp, float beta, float *cp, MKL_INT ldcp, MKL_COMPACT_PACK format, MKL_INT
nm);

257
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

void mkl_dgemm_compact (MKL_LAYOUT layout, MKL_TRANSPOSE transa, MKL_TRANSPOSE transb,

MKL_INT m, MKL_INT n, MKL_INT k, double alpha, const double *ap, MKL_INT ldap, const
double *bp, MKL_INT ldbp, double beta, double *cp, MKL_INT ldcp, MKL_COMPACT_PACK
format, MKL_INT nm);
void mkl_cgemm_compact (MKL_LAYOUT layout, MKL_TRANSPOSE transa, MKL_TRANSPOSE transb,
MKL_INT m, MKL_INT n, MKL_INT k, mkl_compact_complex_float *alpha, const float *ap,
MKL_INT ldap, const float *bp, MKL_INT ldbp, mkl_compact_complex_float *beta, float
*cp, MKL_INT ldcp, MKL_COMPACT_PACK format, MKL_INT nm);
void mkl_zgemm_compact (MKL_LAYOUT layout, MKL_TRANSPOSE transa, MKL_TRANSPOSE transb,
MKL_INT m, MKL_INT n, MKL_INT k, mkl_compact_complex_double *alpha, const double *ap,
MKL_INT ldap, const double *bp, MKL_INT ldbp, mkl_compact_complex_double *beta, double
*cp, MKL_INT ldcp, MKL_COMPACT_PACK format, MKL_INT nm);

Description

The mkl_?gemm_compact routine computes a scalar-matrix-matrix product and adds the result to a scalar-
matrix product for a group of nm general matrices Ac that have been stored in compact format. The operation
is defined for each matrix as:
Cc := alpha*op(Ac)*op(Bc) + beta*Cc
Where

• op(Xc) is one of op(Xc) = Xc, or op(Xc) = XcT, or op(Xc) = XcH,

• alpha and beta are scalars,
• Ac, Bc, and Cc are matrices that have been stored in compact format,
• op(Ac) is an m-by-k matrix for each matrix in the group,
• op(Bc) is a k-by-n matrix for each matrix in the group,
• and Cc is an m-by-n matrix.

Input Parameters

layout Specifies whether two-dimensional array storage is row-major

(MKL_ROW_MAJOR) or column-major (MKL_COL_MAJOR).

transa Specifies the operation:

If transa=MKL_NOTRANS, then op(Ac):=Ac.

If transa=MKL_TRANS, then op(Ac):=AcT.

If transa=MKL_CONJTRANS, then op(Ac):=AcH.

transb Specifies the operation:

If transb=MKL_NOTRANS, then op(Bc):=Bc.

If transb=MKL_TRANS, then op(Bc):=BcT.

If transb=MKL_CONJTRANS, then op(Bc):=BcH.

m The number of rows of the matrices op(Ac), m >= 0.

n The number of columns of matrices op(Bc) and Cc. n≥0.

k The number of columns of matrices op(Ac) and the number of rows of

matrices op(Bc). k≥0.

alpha Specifies the scalar alpha.

258
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ap Points to the beginning of the array that stores the nmAc matrices. See
Compact Format for more details.

transa=MKL_NOTRANS transa=MKL_TRANS or
transa=MKL_CONJTRANS

layout = ap has size ldapknm. ap has size ldapmnm.

MKL_COL_MAJOR

layout = ap has size ldapmnm. ap has size ldapknm.

MKL_ROW_MAJOR

ldap Specifies the leading dimension of Ac.

bp Points to the beginning of the array that stores the nmBc matrices. See
Compact Format for more details.

transb=MKL_NOTRANS transb=MKL_TRANS or
transb=MKL_CONJTRANS

layout = bp has size ldbpnnm. bp has size ldbpknm.

MKL_COL_MAJOR

layout = bp has size ldbpknm. bp has size ldbpnnm.

MKL_ROW_MAJOR

ldbp Specifies the leading dimension of Bc.

beta Specifies the scalar beta.

cp Before entry, cp points to the beginning of the array that stores the nmCc
matrices, except when beta is equal to zero, in which case cp need not be
set on entry.

layout = MKL_COL_MAJOR cp has size ldapnnm.

layout = MKL_ROW_MAJOR cp has size ldapmnm.

ldcp Specifies the leading dimension of Cc.

layout = MKL_COL_MAJOR ldcp must be at least max (1,m).

layout = MKL_ROW_MAJOR ldcp must be at least max (1,n).

format Specifies the format of the compact matrices. See Compact Format or
mkl_get_format_compact for details.

nm Total number of matrices stored in compact format in the group of matrices.

259
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

NOTE
The values of ldap, ldbp, and ldcp used in mkl_?gemm_compact must be consistent with the
values used in mkl_?get_size_compact, mkl_?gepack_compact, and mkl_?geunpack_compact.

Output Parameters

cp Each matrix Cc is overwritten by the m-by-n matrix (alphaop(Ac)op(Bc)

+ beta*Cc).

mkl_?trsm_compact
Solves a triangular matrix equation for a set of
general, m x n matrices that have been stored in
Compact format.

Syntax
mkl_strsm_compact (MKL_LAYOUT layout, MKL_SIDE side, MKL_UPLO uplo, MKL_TRANSPOSE
transa, MKL_DIAG diag, MKL_INT m, MKL_INT n, float alpha, const float *ap, MKL_INT
a_stride, float *bp, MKL_INT b_stide, MKL_COMPACT_PACK format, MKL_INT nm);
mkl_dtrsm_compact (MKL_LAYOUT layout, MKL_SIDE side, MKL_UPLO uplo, MKL_TRANSPOSE
transa, MKL_DIAG diag, MKL_INT m, MKL_INT n, double alpha, const double*ap, MKL_INT
a_stride, double *bp, MKL_INT b_stride, MKL_COMPACT_PACK format, MKL_INT nm);
mkl_ctrsm_compact (MKL_LAYOUT layout, MKL_SIDE side, MKL_UPLO uplo, MKL_TRANSPOSE
transa, MKL_DIAG diag, MKL_INT m, MKL_INT n, mkl_compact_complex_float *alpha, const
float *ap, MKL_INT a_stride, float *bp, MKL_INT b_stride, MKL_COMPACT_PACK format,
MKL_INT nm);
mkl_ztrsm_compact (MKL_LAYOUT layout, MKL_SIDE side, MKL_UPLO uplo, MKL_TRANSPOSE
transa, MKL_DIAG diag, MKL_INT m, MKL_INT n, mkl_compact_complex_double *alpha, const
double *ap, MKL_INT a_stride, double *bp, MKL_INT b_stride, MKL_COMPACT_PACK format,
MKL_INT nm);

Description
The routine solves one of the following matrix equations for a group of nm matrices:

op(Ac)*Xc = alpha*Bc,
or
Xc*op(Ac) = alpha*Bc
where:
alpha is a scalar, Xc and Bc are m-by-n matrices that have been stored in compact format, and Ac is a m-by-
m unit, or non-unit, upper or lower triangular matrix that has been stored in compact format.
op(Ac) is one of op(Ac) = Ac, or op(Ac) = AcT, or op(Ac) = AcH,
Bc is overwritten by the solution matrix Xc.

Input Parameters

layout Specifies whether two-dimensional array storage is row-major

(MKL_ROW_MAJOR) or column-major (MKL_COL_MAJOR).

side Specifies whether op(Ac) appears on the left or right of Xc in the equation:

260
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
if side = MKL_LEFT, then op(Ac)*Xc = alpha*Bc, if side = MKL_RIGHT, then
Xc*op(Ac) = alpha*Bc

uplo Specifies whether matrix Ac is upper or lower triangular.

If uplo = MKL_UPPER, Ac is upper triangular.
If uplo = MKL_LOWER, Ac is lower triangular.

transa Specifies the operation:

If transa=MKL_NOTRANS, then op(Ac) = Ac;

If transa=MKL_TRANS, then op(Ac) = AcT;

If transa=MKL_CONJTRANS, then op(Ac) = AcH ;

diag Specifies whether the matrix Ac is unit triangular:

If diag=MKL_UNIT, then the matrix is unit triangular;

if diag=MKL_NONUNIT, then the matrix is not unit triangular.

m The number of rows of Bc and the number of rows and columns of

Ac when side=MKL_LEFT; m >= 0.
n The number of columns of Bc and the number of rows and columns
of Ac when side=MKL_RIGHT; n >= 0.
alpha Specifies the scalar alpha. When alpha is zero, then ap is not
referenced and bp need not be set before entry.
ap Array, size ldap*k*nm, where k is m when side= MKL_LEFTand n
when side = MKL_RIGHT. ap points to the beginning of nm Ac matrices
stored in compact format. When uplo = MKL_UPPER, Ac is assumed
to be an upper triangular matrix and the lower triangular part of Ac
is not referenced. With uplo = MKL_LOWER, Ac is assumed to be a
lower triangular matrix and the upper triangular part of Ac is not
referenced. With diag = MKL_UNIT, the diagonal elements of Ac are
not referenced either, but are assumed to be unity.
ldap Column stride (column-major layout) or row stride (row-major
layout) of Ac.
When side=MKL_LEFT, ldap must be at least max (1,m).

When side=MKL_RIGHT, ldap must be at least max (1,n).

bp Array, size ldbpnnm when layout = MKL_COL_MAJOR; size

ldbp*m*nm when layout = MKL_ROW_MAJOR. Before entry, bp
points to the beginning of nm Bc matrices stored in compact format.
ldbp Column stride (column-major layout) or row stride (row-major
layout) of Bc.

layout = MKL_COL_MAJOR ldbp must be at least max (1,m).

layout = MKL_ROW_MAJOR *ldbp must be at least max (1,n).

format Specifies the format of the compact matrices. See <Compact

Format> or mkl_get_format_compact for details.
nm Total number of matrices stored in Compact format; nm >= 0.

261
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

NOTE
The values of ldap and ldbp used in mkl_?trsm_compact must be consistent with the values
used in mkl_?get_size_compact, mkl_?gepack_compact, and mkl_?geunpack_compact.

Output Parameters

bp On exit, Bc is overwritten by the solution matrix Xc. bp points to the

beginning of nm such Xc matrices.

mkl_?potrf_compact
Computes the Cholesky factorization of a set of
symmetric (Hermitian), positive-definite matrices,
stored in Compact format (see Compact Format for
details).

Syntax
void mkl_spotrf_compact (MKL_LAYOUT layout, MKL_UPLO uplo, MKL_INT n, float * ap,
MKL_INT ldap, MKL_INT * info, MKL_COMPACT_PACK format, MKL_INT nm);
void mkl_cpotrf_compact (MKL_LAYOUT layout, MKL_UPLO uplo, MKL_INT n, float * ap,
MKL_INT ldap, MKL_INT * info, MKL_COMPACT_PACK format, MKL_INT nm);
void mkl_dpotrf_compact (MKL_LAYOUT layout, MKL_UPLO uplo, MKL_INT n, double * ap,
MKL_INT ldap, MKL_INT * info, MKL_COMPACT_PACK format, MKL_INT nm);
void mkl_zpotrf_compact (MKL_LAYOUT layout, MKL_UPLO uplo, MKL_INT n, double * ap,
MKL_INT ldap, MKL_INT * info, MKL_COMPACT_PACK format, MKL_INT nm);

Description
The routine forms the Cholesky factorization of a set of symmetric, positive definite (or, for complex data,
Hermitian, positive-definite), n x n matrices Ac, stored in Compact format, as:

• Ac = Uc T*Uc (for real data), Ac = Uc H*Uc (for complex data), if uplo = MKL_UPPER
• Ac = Lc*Lc T (for real data), Ac = Lc*Lc H (for complex data), if uplo = MKL_LOWER
where Lc is a lower triangular matrix, and Uc is upper triangular. The factorization (output) data will also be
stored in Compact format.
Before calling this routine, call mkl_?gepack_compact to store the matrices in the Compact format.

NOTE
Compact routines have some limitations; see Numerical Limitations.

Input Parameters

layout Specifies whether two-dimensional array storage is row-major

(MKL_ROW_MAJOR) or column-major (MKL_COL_MAJOR).

uplo Must be MKL_UPPER or MKL_LOWER

Indicates whether the upper or lower triangular part of Ac has been stored
and will be factored.
If uplo = MKL_UPPER, the upper triangular part of Ac is stored, and the
strictly lower triangular part of Ac is not referenced.

262
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If uplo = MKL_LOWER, the lower triangular part of Ac is stored, and the
strictly upper triangular part of Ac is not referenced.

n The order of Ac; n >= 0.

ldap Column stride (column-major layout) or row stride (row-major

layout) of Ac.
ap Points to the beginning of the nm Ac matrices. On entry, ap contains
either the upper or the lower triangular part of Ac (see uplo).
format Specifies the format of the compact matrices. See Compact Format or
mkl_get_format_compact for details.

nm Total number of matrices stored in Compact format; nm >= 0.

Output Parameters

ap The upper or lower triangular part of Ac, stored in Compact format

in ap, is overwritten by its Cholesky factor Uc or Lc (as specified by
uplo). ap now points to the beginning of this set of factors, stored in
Compact format.
info The parameter is not currently used in this routine. It is reserved for
the future use.

mkl_?getrfnp_compact
The routine computes the LU factorization, without
pivoting, of a set of general, m x n matrices that have
been stored in Compact format (see Compact
Format).

Syntax
void mkl_sgetrfnp_compact (MKL_LAYOUT layout, MKL_INT m, MKL_INT n, float * ap, MKL_INT
ldap, MKL_INT * info, MKL_COMPACT_PACK format, MKL_INT nm);
void mkl_dgetrfnp_compact (MKL_LAYOUT layout, MKL_INT m, MKL_INT n, double * ap,
MKL_INT ldap, MKL_INT * info, MKL_COMPACT_PACK format, MKL_INT nm);
void mkl_cgetrfnp_compact (MKL_LAYOUT layout, MKL_INT m, MKL_INT n, float * ap, MKL_INT
ldap, MKL_INT * info, MKL_COMPACT_PACK format, MKL_INT nm);
void mkl_zgetrfnp_compact (MKL_LAYOUT layout, MKL_INT m, MKL_INT n, double * ap,
MKL_INT ldap, MKL_INT * info, MKL_COMPACT_PACK format, MKL_INT nm);

Description
The mkl_?getrfnp_compact routine calculates the LU factorizations of a set of nm general (m x n) matrices
A, stored in Compact format, as Ac = Lc*Uc. The factorization (output) data will also be stored in Compact
format.

263
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

NOTE
Compact routines have some limitations; see Numerical Limitations.

Input Parameters

layout Specifies whether two-dimensional array storage is row-major

(MKL_ROW_MAJOR) or column-major (MKL_COL_MAJOR).

m The number of rows of A; m ≥ 0.

n The number of columns of A; n ≥ 0.

ap Points to the beginning of the the array which stores nm Ac

matrices.
See Compact Format for more details.

ldap Leading dimension of Ac.

format Specifies the format of the compact matrices. See Compact Format
or mkl_get_format_compact for details.
nm Total number of matrices stored in Compact format.

Application Notes:
Before calling this routine, mkl_?gepack_compact must be called. After calling this routine, mkl_?
geunpack_compact should be called, unless another compact routine will be subsequently called for the
Compact format matrices.
The approximate number of floating-point operations for real flavors is:
nm*(2/3)n3, if m = n,
nm*(1/3)n2(3m-n), if m > n,
nm*(1/3)m2(3n-m), if m < n.
The number of operations for complex flavors is four times greater. Directly after calling this routine, you can
call the following:
mkl_?getrinp_compact, for computing the inverse of the nm input matrices in Compact format

Output Parameters

ap On exit, Ac is overwritten by its factorization data. ap points to the

beginning of nm Lc and Uc factors of Ac. The unit diagonal elements
of Lc are not stored.
info The parameter is not currently used in this routine. It is reserved for
the future use.

mkl_?geqrf_compact
Computes the QR factorization of a set of general m x
n, matrices, stored in Compact format (see Compact
Format for details).

Syntax
void mkl_sgeqrf_compact (MKL_LAYOUT layout, MKL_INT m, MKL_INT n, float * ap, MKL_INT
ldap, float * taup, float * work, MKL_INT lwork, MKL_INT * info, MKL_COMPACT_PACK
format, MKL_INT nm);

264
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void mkl_cgeqrf_compact (MKL_LAYOUT layout, MKL_INT m, MKL_INT n, float * ap, MKL_INT
ldap, float * taup, float * work, MKL_INT lwork, MKL_INT * info, MKL_COMPACT_PACK
format, MKL_INT nm);
void mkl_dgeqrf_compact (MKL_LAYOUT layout, MKL_INT m, MKL_INT n, double * ap, MKL_INT
ldap, double * taup, double * work, MKL_INT lwork, MKL_INT * info, MKL_COMPACT_PACK
format, MKL_INT nm);
void mkl_zgeqrf_compact (MKL_LAYOUT layout, MKL_INT m, MKL_INT n, double * ap, MKL_INT
ldap, double * taup, double * work, MKL_INT lwork, MKL_INT * info, MKL_COMPACT_PACK
format, MKL_INT nm);

Description
The routine forms the QR factorization of a set of general, m x n matrices A, stored in Compact format. The
routine does not form the Q factors explicitly. Instead, Q is represented as a product of min(m,n) elementary
reflectors. The factorization (output) data will also be stored in Compact format.

NOTE
Compact routines have some limitations; see Numerical Limitations.

Input Parameters

layout Specifies whether two-dimensional array storage is row-major

(MKL_ROW_MAJOR) or column-major (MKL_COL_MAJOR).

m The number of rows of Ac; m≥ 0.

n The number of columns of Ac; n≥ 0.

ap Points to the beginning of the nm Ac matrices. On entry, ap contains

either the upper or the lower triangular part of Ac (see uplo).
ldap Column stride (column-major layout) or row stride (row-major
layout) of Ac.
work Points to the beginning of the workspace array.

lwork The size of the work array. If lwork = -1, a workspace query is
assumed; the routine only calculates the optimal size of the work
array and returns this value as the first entry of the work array.
format Specifies the format of the compact matrices. See Compact Format
or mkl_get_format_compact for details.
nm Total number of matrices stored in Compact format.

Application Notes:
The compact array that will store the elementary reflectors needs to be allocated before the routine is called
and unpacked after. First, the routine mkl_?get_size_compact should be called, to determine the size of taup,
and memory for taup should be allocated. After calling mkl_?geqrf_compact, taup stores the elementary
reflectors in compact form, so should be unpacked using mkl_?geunpack_compact. See Compact Format for
more details, or reference the example below. (Note: the following example is meant to demonstrate the
calling sequence to allocate memory and unpack taup. All other parameters are assumed to be already set
up before the sequence below is executed.)

MKL_R_TYPE *tau_array[nm];
// ...
tau_buffer_size = mkl_?get_size_compact(min(m, n), 1, format, nm);

265
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

MKL_R_TYPE tau_compact = (MKL_R_TYPE )mkl_malloc(tau_buffer_size, 128);

mkl_?geqrf_compact(layout, m, n, a_compact, ldap, tau_compact, work, lwork, &info, format, nm);
// Note that here MKL_COL_MAJOR is used because tau is a 1-d array
mkl_?geunpack_compact(MKL_COL_MAJOR, min(m, n), 1, tau_array, min(m, n), tau_compact, min(m, n),
format, nm);

Output Parameters

ap On exit, A c is overwritten by its factorization data. ap points to the

beginning of nm factorizations of A c , stored in Compact format.
The factorization data is stored as follows: The elements on and
above the diagonal contain the min( m , n )-by- n upper trapezoidal
matrix R c ( R c is upper triangular if m ≥ n ); the elements below
the diagonal, with tau , present the orthogonal matrix Q c as a
product of min( m , n ) elementary reflectors (see Orthogonal
Factorizations: LAPACK Computational Routines). See Compact
Format for more details.
taup Points to the beginning of a set of the tauc arrays, each of which has size
min(m,n), stored in Compact format. tauc contains scalars that define
elementary reflectors for Qc in its decomposition in a product of elementary
reflectors. taup needs to be allocated by the user before calling this routine.
See the application notes (below the description) for more details.

work[0] On exit contains the minimum value of lwork required for optimum
performance. Use this lwork for subsequent runs.
info The parameter is not currently used in this routine. It is reserved for
the future use.

mkl_?getrinp_compact
Computes the inverse of a set of LU-factorized general
matrices, without pivoting, stored in the compact
format (see Compact Format for details).

Syntax
void mkl_sgetrinp_compact (MKL_LAYOUT layout, MKL_INT n, float * ap, MKL_INT ldap,
float * work, MKL_INT lwork, MKL_INT * info, MKL_COMPACT_PACK format, MKL_INT nm);
void mkl_dgetrinp_compact (MKL_LAYOUT layout, MKL_INT n, double * ap, MKL_INT ldap,
double * work, MKL_INT lwork, MKL_INT * info, MKL_COMPACT_PACK format, MKL_INT nm);
void mkl_cgetrinp_compact (MKL_LAYOUT layout, MKL_INT n, float * ap, MKL_INT ldap,
float * work, MKL_INT lwork, MKL_INT * info, MKL_COMPACT_PACK format, MKL_INT nm);
void mkl_zgetrinp_compact (MKL_LAYOUT layout, MKL_INT n, double * ap, MKL_INT ldap,
double * work, MKL_INT lwork, MKL_INT * info, MKL_COMPACT_PACK format, MKL_INT nm);

Description
This routine computes the inverse inv( Ac) of a set of general, n x n matrices Ac, that have been stored in
Compact format. The factorization (output) data will also be stored in Compact format.

NOTE
Compact routines have some limitations; see Numerical Limitations.

266
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

layout Specifies whether two-dimensional array storage is row-major

(MKL_ROW_MAJOR) or column-major (MKL_COL_MAJOR).

n The order of Ac; n >= 0.

ap Points to the beginning of the nm Ac matrices. On entry, ap contains

the LU factorizations of Ac, stored in Compact format, as returned
by mkl_?getrfnp_compact : Ac=Lc*Uc.
See Compact Format for more details.

ldap Column stride (column-major layout) or row stride (row-major

layout) of Ac.
work Points to the beginning of the work array.

lwork The size of the work array. If lwork = -1, a workspace query is
assumed; the routine calculates only the optimal size of the work
array and returns this value as the first entry of the work array.
format Specifies the format of the compact matrices. See Compact Format
or mkl_get_format_compact for details.
nm Total number of matrices stored in Compact format.

Output Parameters

ap On exit, A c is overwritten by inv(Ac). ap points to the beginning of

nm inv(Ac) matrices stored in Compact format.
work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance. Use this lwork for subsequent runs.
info The parameter is not currently used in this routine. It is reserved for
the future use.

Numerical Limitations for Compact BLAS and Compact LAPACK Routines

Compact routines are subject to a set of numerical limitations. They also skip most of the checks presented
in regular BLAS and LAPACK routines in order to provide effective vectorization. The following limitations
apply to at least one compact routine.
Complex division: BLAS and LAPACK compact routines rely on a naïve method for complex division that does
not protect the solution against overflow, underflow, or loss of precision.
Error checking : the LAPACK compact routines skip error checking for performance reasons ; therefore, the
user is responsible for passing correct parameters. There are no checks for incorrect matrices (such as
singular for LU, non-positive-definite for Cholesky) - it is always assumed that the algorithm for the input
matrix can be completed without error.
No pivoting: the generic LU factorization routine, ?getrf , calculates the factorization using partial pivoting.
However, because pivoting includes comparisons which cannot be effectively vectorized, only non-pivoting
versions of LU mkl_?getrfnp_compact and Inverse from LU (mkl_?getrinp_compact) are provided as
compact routines.

267
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Matrices scaled near underflow/overflow: the LAPACK compact routines do not provide safe handling for
values near underflow/overflow. This means that Compact routines may return incorrect results for such
matrices. This limitation is related to compact routine for QR: mkl_?geqrf_compact.
It is the responsibility of the user to ensure that the input matrices can be factorized, inverted, and/or solved
given these numerical limitations.

mkl_?get_size_compact
Returns the buffer size, in bytes, needed to pack data
in Compact format.

Syntax
MKL_INT mkl_sget_size_compact (MKL_INT ld, MKL_INT sd, MKL_COMPACT_PACK format, MKL_INT
nm);
MKL_INT mkl_dget_size_compact (MKL_INT ld, MKL_INT sd, MKL_COMPACT_PACK format, MKL_INT
nm);
MKL_INT mkl_cget_size_compact (MKL_INT ld, MKL_INT sd, MKL_COMPACT_PACK format, MKL_INT
nm);
MKL_INT mkl_zget_size_compact (MKL_INT ld, MKL_INT sd, MKL_COMPACT_PACK format, MKL_INT
nm);

Description
The routine returns the buffer size, in bytes, required for mkl_?gepack_compact.

Input Parameters

ld Leading dimension of the matrices in Compact format.

sd Second dimension of the matrices in Compact format.

format Describes the compact packing format according to the

MKL_COMPACT_PACK enum type.
nm Total number of matrices to be packed in Compact format.

Application Notes:
Before calling this routine, mkl_?get_format_compact can be called to determine the optimal format.

After calling this routine and allocating the amount of memory indicated by size, the user can call
mkl_?gepack_compact to pack the nm input matrices in Compact format.

Return Values
This function returns a value size.
size The buffer size, in bytes, required by the packing function
mkl_?gepack_compact.

mkl_get_format_compact
Returns the optimal compact packing format for the
architecture, needed for all compact routines.

Syntax
MKL_COMPACT_PACK mkl_get_format_compact ();

268
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
The routine returns the optimal compact packing format, which is an MKL_COMPACT_PACK type, for the
current architecture. The optimal value of format is determined by the architecture's vector-register length.
format is a required parameter for any packing, unpacking, or BLAS/LAPACK compact routine. See Compact
Format for details.

Return Values
The function returns a value format.
format format can be returned as any of the following three
values. MKL_COMPACT_AVX512 is the optimal format
value for:

• Intel® Advanced Vector Extensions 512 (Intel®

AVX-512)-enabled processors.
• Intel® Advanced Vector Extensions 512 (Intel®
AVX-512) for Intel® Many Integrated Core
Architecture (Intel® MIC Architecture)-enabled
processors.
• Intel® Advanced Vector Extensions 512 (Intel®
AVX-512) for Intel® Many Integrated Core
Architecture (Intel® MIC Architecture) with
support of AVX512_4FMAPS and
AVX512_4VNNIW instruction groups processors.
MKL_COMPACT_AVX is the optimal format value for:

• Intel® Advanced Vector Extensions (Intel® AVX)-

enabled processors.
• Intel® Advanced Vector Extensions 2 (Intel®
AVX2)-enabled processors.
MKL_COMPACT_SSE is the optimal format value for all
other processors.

Application Notes:
After calling this routine, mkl_?get_size_compact can be called to calculate the buffer size needed for
mkl_?gepack_compact.

mkl_?gepack_compact
Packs matrices from standard (row or column-major)
format to Compact format.

Syntax
mkl_sgepack_compact(MKL_LAYOUT layout, MKL_INT rows, MKL_INT columns, const float *
const *a, MKL_INT lda, float *ap, MKL_INT ldap, MKL_COMPACT_PACK format, MKL_INT nm);
mkl_dgepack_compact(MKL_LAYOUT layout, MKL_INT rows, MKL_INT columns, const double *
const *a, MKL_INT lda, double *ap, MKL_INT ldap, MKL_COMPACT_PACK format, MKL_INT nm);
mkl_cgepack_compact (MKL_LAYOUT layout, MKL_INT rows, MKL_INT columns, const
mkl_compact_complex_float * const *a, MKL_INT lda, float *ap, MKL_INT ldap,
MKL_COMPACT_PACK format, const MKL_INT nm);

269
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

mkl_zgepack_compact (MKL_LAYOUT layout, MKL_INT rows, MKL_INT columns, const

mkl_compact_complex_double * const *a, MKL_INT lda, double *ap, MKL_INT ldap,
MKL_COMPACT_PACK format, MKL_INT nm);

Description
The routine packs nm matrices A from standard format (row or column-major, pointer to pointer) in a into
Compact format, storing the new compact format matrices Ac in array ap.

Input Parameters

layout Specifies whether two-dimensional array storage is row-major

(MKL_ROW_MAJOR) or column-major (MKL_COL_MAJOR).

rows The number of rows of A; rows >= 0.

columns The number of columns of A; columns >= 0.

a A standard format (row or column-major, pointer-to-pointer) array,

storing nm input A matrices.
lda Leading dimension of A.

layout = MKL_COL_MAJOR lda must be at least max (1,rows).

layout = MKL_ROW_MAJOR lda must be at least max (1,columns).

ldap Leading dimension of Ac.

layout = MKL_COL_MAJOR ldap must be at least max (1,rows).

layout = MKL_ROW_MAJOR ldap must be at least max (1,columns).

NOTE
The values of ldap used in mkl_?gepack_compact must be
consistent with the values used in mkl_?get_size_compact and
mkl_?geunpack_compact.

format Specifies the format of the compact matrices. See Compact Format
or mkl_get_format_compact for details.
nm Total number of matrices that will be stored in Compact format.

Application Notes:
Directly after calling this routine, any BLAS or LAPACK compact routine can be called. Unpacking matrices
from Compact format can be done by calling mkl_?geunpack_compact.

Output Parameters

ap Array storing the compact format input matrices Ac. ap must have
size at least size = mkl_?get_size_compact.

270
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

mkl_?geunpack_compact
Unpacks matrices from Compact format to standard
(row- or column-major, pointer-to-pointer) format.

Syntax
mkl_sgeunpack_compact (MKL_LAYOUT layout, MKL_INT rows, MKL_INT columns, float * const
*a, MKL_INT lda, const float *ap, MKL_INT ldap, MKL_COMPACT_PACK format, MKL_INT nm);
mkl_dgeunpack_compact (MKL_LAYOUT layout, MKL_INT rows, MKL_INT columns, double * const
*a, MKL_INT lda, const double *ap, MKL_INT ldap, MKL_COMPACT_PACK format, MKL_INT nm);
mkl_cgeunpack_compact (MKL_LAYOUT layout, MKL_INT rows, MKL_INT columns,
mkl_compact_complex_float * const *a, MKL_INT lda, const float *ap, MKL_INT ldap,
MKL_COMPACT_PACK format, MKL_INT nm);
mkl_zgeunpack_compact (MKL_LAYOUT layout, MKL_INT rows, MKL_INT columns,
mkl_compact_complex_double * const *a, MKL_INT lda, const double *ap, MKL_INT ldap,
MKL_COMPACT_PACK format, MKL_INT nm);

Description
The routine unpacks nm Compact format matrices Ac from array ap into standard (row- or column-major,
pointer-to-pointer) format in array A.

Input Parameters

layout Specifies whether two-dimensional array storage is row-major

(MKL_ROW_MAJOR) or column-major (MKL_COL_MAJOR).

rows The number of rows of A; rows >= 0.

columns The number of columns of A; columns >= 0.

lda Leading dimension of A.

layout = MKL_COL_MAJOR lda must be at least max (1,rows).

layout = MKL_ROW_MAJOR lda must be at least max (1,columns).

ap Array storing the compact format of input matrices Ac. See Compact
Formator mkl_get_format_compact for details.

layout = MKL_COL_MAJOR ap has size ldapcolumnsnm.

layout = MKL_ROW_MAJOR ap has size ldaprowsnm.

ldap Leading dimension of of Ac.

layout = MKL_COL_MAJOR ldap must be at least max (1,rows).

layout = MKL_ROW_MAJOR ldap must be at least max (1,columns).

271
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

NOTE
The values of ldap used in mkl_?geunpack_compact must be
consistent with the values used in mkl_?get_size_compact and
mkl_?gepack_compact.

format Specifies the format of the compact matrices. See Compact Format
ormkl_get_format_compact for details.
nm Total number of matrices that will be stored in Compact format.

Output Parameters

a A standard format (row- or column-major, pointer-to-pointer) array,

storing nm output A matrices.

layout = MKL_COL_MAJOR a has size ldacolumnsnm.

layout = MKL_ROW_MAJOR a has size ldarowsnm.

Inspector-executor Sparse BLAS Routines

The inspector-executor API for Sparse BLAS divides operations into two stages: analysis and execution.
During the initial analysis stage, the API inspects the matrix sparsity pattern and applies matrix structure
changes. In the execution stage, subsequent routine calls reuse this information in order to improve
performance.
The inspector-executor API supports key Sparse BLAS operations for iterative sparse solvers:
• Sparse matrix-vector multiplication
• Sparse matrix-matrix multiplication with a sparse or dense result
• Solution of triangular systems
• Sparse matrix addition

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Naming Conventions in Inspector-Executor Sparse BLAS Routines

The Inspector-Executor Sparse BLAS API routine names use the following convention:
mkl_sparse_[<character>_]<operation>[_<format>]
The <character> field indicates the data type:

s real, single precision

c complex, single precision

d real, double precision

z complex, double precision

272
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The data type is included in the name only if the function accepts dense matrix or scalar floating point
parameters.
The <operation> field indicates the type of operation:

create create matrix handle

copy create a copy of matrix handle

convert convert matrix between sparse formats

export export matrix from internal representation to CSR or BSR format

destroy frees memory allocated for matrix handle

set_<op>_hint provide information about number of upcoming compute operations and

operation type for optimization purposes, where <op> is mv, sv, mm, sm, dotmv,
symgs, or memory

optimize analyze the matrix using hints and store optimization information in matrix
handle

mv compute sparse matrix-vector product

mm compute sparse matrix by dense matrix product (batch mv)

set_value change a value in a matrix

spmm/spmmd compute sparse matrix by sparse matrix product and store the result as a
sparse/dense matrix

trsv solve a triangular system

trsm solve a triangular system with multiple right-hand sides

add compute sum of two sparse matrices

symgs compute a symmetric Gauss-Zeidel preconditioner

symgs_mv compute a symmetric Gauss-Zeidel preconditioner with a final matrix-vector

multiplication

sorv computes forward, backward sweeps or symmetric successive over-relaxation

preconditioner

sypr compute the symmetric or Hermitian product of sparse matrices and store the
result as a sparse matrix

syprd compute the symmetric or Hermitian product of sparse and dense matrices and
store the result as a dense matrix

syrk compute the product of sparse matrix with its transposed matrix and store the
result as a sparse matrix

syrkd compute the product of sparse matrix with its transposed matrix and store the
result as a dense matrix

order perform ordering of column indexes of the matrix in CSR format

dotmv compute a sparse matrix-vector product with dot product

The <format> field indicates the sparse matrix storage format:

coo coordinate format

bsr block sparse row format plus variations. Fill out either rows_start and rows_end
(for 4-arrays representation) or rowIndex array (for 3-array BSR/CSR).

273
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

csr compressed sparse row format plus variations. Fill out either rows_start and
rows_end (for 4-arrays representation) or rowIndex array (for 3-array BSR/
CSR).

csc compressed sparse column format plus variations. Fill out either cols_start
and cols_end (for 4-arrays representation) or colIndex array (for 3 array
CSC).

The format is included in the function name only if the function parameters include an explicit sparse matrix
in one of the conventional sparse matrix formats.

Sparse Matrix Storage Formats for Inspector-executor Sparse BLAS Routines

Inspector-executor Sparse BLAS routines support four conventional sparse matrix storage formats:
• compressed sparse row format (CSR) plus variations
• compressed sparse column format (CSC) plus variations
• coordinate format (COO)
• block sparse row format (BSR) plus variations
Computational routines operate on a matrix handle that stores a matrix in CSR or BSR formats. Other
formats should be converted to CSR or BSR format before calling any computational routines. For more
information see Sparse Matrix Storage Formats.

Supported Inspector-executor Sparse BLAS Operations

The Inspector-executor Sparse BLAS API can perform several operations involving sparse matrices. These
notations are used in the description of the operations:
• A, G, V are sparse matrices
• B and C are dense matrices
• x and y are dense vectors
• alpha and beta are scalars
op(A) represents a possible transposition of matrix A
op(A) = A
op(A) = AT - transpose of A
op(A) = AH - conjugate transpose of A
op(A)-1 denotes the inverse of op(A).
The Inspector-executor Sparse BLAS routines support the following operations:
• computing the vector product between a sparse matrix and a dense vector:
y := alpha*op(A)*x + beta*y
• solving a single triangular system:
y := alpha*inv(op(A))*x
• computing a product between a sparse matrix and a dense matrix:
C := alpha*op(A)*B + beta*C
• computing a product between sparse matrices with a sparse result:
V := alpha*op(A)*op(G)
• computing a product between sparse matrices with a dense result:
C := alpha*op(A)*op(G)
• computing a sum of sparse matrices with a sparse result:
V := alpha*op(A) + G
• solving a sparse triangular system with multiple right-hand sides:
C := alpha*inv(op(A))*B

274
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

Two-stage Algorithm in Inspector-Executor Sparse BLAS Routines

You can use a two-stage algorithm in Inspector-executor Sparse BLAS routines which produce a sparse
matrix. The applicable routines are:
• mkl_sparse_sp2m (BSR/CSR/CSC formats)
• mkl_sparse_sypr (CSR format)
The two-stage algorithm allows you to split computations into stages. The main purpose of the splitting is to
provide an estimate for the memory required for the output prior to allocating the largest part of the memory
(for the indices and values of the non-zero elements). Additionally, the two-stage approach extends the
functionality and allows more complex usage models.

NOTE The multistage approach currently does not allow you to allocate memory for the output matrix
outside oneMKL.

In the two-stage algorithm:

1. The first stage allocates data which is necessary for the memory estimation (arrays rows_start/
rows_end or cols_start/cols_end depending on the format, (see Sparse Matrix Storage Formats) and
computes the number of entries or the full structure of the matrix.

NOTE The format of the output is decided internally but can be checked using the export functionality
mkl_sparse_?_export_<format>.

2. The second stage allocates data and computes column or row indices (depending on the format) of
non-zero elements and/or values of the output matrix.
Specifying the stage for execution is supported through the sparse_request_t parameter in the API with
the following options:
Values for sparse_request_t parameter
Value
Description
SPARSE_STAGE_NNZ_COUN
T
Allocates and computes only the rows_start/rows_end (CSR/BSR format) or
cols_start/cols_end (CSC format) arrays for the output matrix. After this
stage, by calling mkl_sparse_?_export_<format>, you can obtain the
number of non-zeros in the output matrix and calculate the amount of
memory required for the output matrix.
SPARSE_STAGE_FINALIZE_
MULT_NO_VAL
Allocates and computes row/column indices provided that rows_start/
rows_end or cols_start/cols_end have already been computed in a prior call
with the request SPARSE_STAGE_NNZ_COUNT. The values of the output
matrix are not computed.
SPARSE_STAGE_FINALIZE_
MULT
Depending on the state of the output matrix C on entry to the routine, this
stage does one of the following:
• Allocates and computes row/column indices and values of nonzero
elements, if only rows_start/rows_end or cols_start/cols_end are present
• allocates and computes values of nonzero elements, if rows_start/
rows_end or cols_start/cols_end and row/column indices of non-zero
elements are present
SPARSE_STAGE_FULL_MULT
_NO_VAL
Allocates and computes the output matrix structure in a single step. The
values of the output matrix are not computed.
SPARSE_STAGE_FULL_MULT
Allocates and computes the entire output matrix (structure and values) in a
single step.

275
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The example below shows how you can use the two-stage approach for estimating the memory requirements
for the output matrix in CSR format:
First stage (sparse_request_t = SPARSE_STAGE_NNZ_COUNT)
1. The routine mkl_sparse_sp2m is called with the request parameter SPARSE_STAGE_NNZ_COUNT.
2. The arrays rows_start and rows_end are exported using the mkl_sparse_x_export_csr routine.
3. These arrays are used to calculate the number of non-zeros (nnz) of the resulting output matrix.
Note that by the end of the first stage, the arrays associated with column indices and values of the output
matrix have not been allocated or computed yet.

sparse_matrix_t csrC = NULL;

status = mkl_sparse_sp2m (opA, descrA, csrA, opB, descrB, csrB, SPARSE_STAGE_NNZ_COUNT, &csrC);

/* optional calculation of nnz in the output matrix for getting a memory estimate */

status = mkl_sparse_?_export_csr (csrC, &indexing, &nrows, &ncols, &rows_start, &rows_end,

&col_indx, &values);

MKL_INT nnz = rows_end[nrows-1] - rows_start[0];

Second stage (sparse_request_t = SPARSE_STAGE_FINALIZE_MULT)
This stage allocates and computes the remaining output arrays (associated with column indices and values of
output matrix entries) and completes the matrix-matrix multiplication.
status = mkl_sparse_sp2m (opA, descrA, csrA, opB, descrB, csrB, SPARSE_STAGE_FINALIZE_MULT,
&csrC);
When the two-stage approach is not needed, you can perform both stages in a single call:
Single stage operation (sparse_request_t = SPARSE_STAGE_FULL_MULT)

status = mkl_sparse_sp2m (opA, descrA, csrA, opB, descrB, csrB, SPARSE_STAGE_FULL_MULT, &csrC);

Matrix Manipulation Routines

The Matrix Manipulation Routines table lists the matrix manipulation routines and the data types associated
with them.

Matrix Manipulation Routines and Their Data Types

Routine or Data Types Description
Function Group

mkl_sparse_? s, d, c, z Creates a handle for a CSR-format matrix.

_create_csr

mkl_sparse_? s, d, c, z Creates a handle for a CSC format matrix.

_create_csc

mkl_sparse_? s, d, c, z Creates a handle for a matrix in COO format.

_create_coo

mkl_sparse_? s, d, c, z Creates a handle for a matrix in BSR format.

_create_bsr

mkl_sparse_copy NA Creates a copy of a matrix handle.

mkl_sparse_destro NA Frees memory allocated for matrix handle.

mkl_sparse_conve NA Converts internal matrix representation to CSR format.

rt_csr

276
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Routine or Data Types Description
Function Group

mkl_sparse_conve NA Converts internal matrix representation to BSR format or

rt_bsr changes BSR block size.

mkl_sparse_? s, d, c, z Exports CSR matrix from internal representation.

_export_csr

mkl_sparse_? s, d, c, z Exports CSC matrix from internal representation.

_export_csc

mkl_sparse_? s, d, c, z Exports BSR matrix from internal representation.

_export_bsr

mkl_sparse_? s, d, c, z Changes a single value of matrix in internal

_set_value representation.

mkl_sparse_? s, d, c, z Changes all or selected matrix values in internal

_update_values representation.

mkl_sparse_order NA Performs ordering of column indexes of the matrix in CSR

format.

mkl_sparse_?_create_csr
Creates a handle for a CSR-format matrix.

Syntax
sparse_status_t mkl_sparse_s_create_csr (sparse_matrix_t *A, const sparse_index_base_t
indexing, const MKL_INT rows, const MKL_INT cols, MKL_INT *rows_start, MKL_INT
*rows_end, MKL_INT *col_indx, float *values);
sparse_status_t mkl_sparse_d_create_csr (sparse_matrix_t *A, const sparse_index_base_t
indexing, const MKL_INT rows, const MKL_INT cols, MKL_INT *rows_start, MKL_INT
*rows_end, MKL_INT *col_indx, double *values);
sparse_status_t mkl_sparse_c_create_csr (sparse_matrix_t *A, const sparse_index_base_t
indexing, const MKL_INT rows, const MKL_INT cols, MKL_INT *rows_start, MKL_INT
*rows_end, MKL_INT *col_indx, MKL_Complex8 *values);
sparse_status_t mkl_sparse_z_create_csr (sparse_matrix_t *A, const sparse_index_base_t
indexing, const MKL_INT rows, const MKL_INT cols, MKL_INT *rows_start, MKL_INT
*rows_end, MKL_INT *col_indx, MKL_Complex16 *values);

Include Files
• mkl_spblas.h

Description
The mkl_sparse_?_create_csr routine creates a handle for an m-by-k matrix A in CSR format.

NOTE
The input arrays provided are left unchanged except for the call to mkl_sparse_order, which
performs ordering of column indexes of the matrix. To avoid any changes to the input data,
use mkl_sparse_copy.

277
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

indexing Indicates how input arrays are indexed.

SPARSE_INDEX_BASE_ZER Zero-based (C-style) indexing: indices start at

O 0.

SPARSE_INDEX_BASE_ONE One-based (Fortran-style) indexing: indices

start at 1.

rows Number of rows of matrix A.

cols Number of columns of matrix A.

rows_start Array of length at least rows. This array contains row indices, such that
rows_start[i] - indexing is the first index of row i in the arrays values
and col_indx. The value of indexing is 0 for zero-based indexing and 1
for one-based indexing.
Refer to pointerB array description in CSR Format for more details.

rows_end Array of at least length rows. This array contains row indices, such that
rows_end[i] - indexing - 1 is the last index of row i in the arrays
values and col_indx. The value of indexing is 0 for zero-based indexing
and 1 for one-based indexing.
Refer to pointerE array description in CSR Format for more details.

col_indx For one-based indexing, array containing the column indices plus one for
each non-zero element of the matrix A. For zero-based indexing, array
containing the column indices for each non-zero element of the matrix A.
Its length is at least rows_end[rows - 1] - indexing.

The value of indexing is 0 for zero-based indexing and 1 for one-based

indexing.

values Array containing non-zero elements of the matrix A. Its length is equal to
length of the col_indx array.

Refer to values array description in CSR Format for more details.

Output Parameters

A Handle containing internal data for subsequent Inspector-executor Sparse

BLAS operations.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

278
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_?_create_csc
Creates a handle for a CSC format matrix.

Syntax
sparse_status_t mkl_sparse_s_create_csc (sparse_matrix_t *A, const sparse_index_base_t
indexing, const MKL_INT rows, const MKL_INT cols, MKL_INT *cols_start, MKL_INT
*cols_end, MKL_INT *row_indx, float *values);
sparse_status_t mkl_sparse_d_create_csc (sparse_matrix_t *A, const sparse_index_base_t
indexing, const MKL_INT rows, const MKL_INT cols, MKL_INT *cols_start, MKL_INT
*cols_end, MKL_INT *row_indx, double *values);
sparse_status_t mkl_sparse_c_create_csc (sparse_matrix_t *A, const sparse_index_base_t
indexing, const MKL_INT rows, const MKL_INT cols, MKL_INT *cols_start, MKL_INT
*cols_end, MKL_INT *row_indx, MKL_Complex8 *values);
sparse_status_t mkl_sparse_z_create_csc (sparse_matrix_t *A, const sparse_index_base_t
indexing, const MKL_INT rows, const MKL_INT cols, MKL_INT *cols_start, MKL_INT
*cols_end, MKL_INT *row_indx, MKL_Complex16 *values);

Include Files
• mkl_spblas.h

Description
The mkl_sparse_?_create_csc routine creates a handle for an m-by-k matrix A in CSC format.

Input Parameters

indexing Indicates how input arrays are indexed.

SPARSE_INDEX_BASE_ZER Zero-based (C-style) indexing: indices start at

O 0.

SPARSE_INDEX_BASE_ONE One-based (Fortran-style) indexing: indices

start at 1.

rows Number of rows of the matrix A.

cols Number of columns of the matrix A.

cols_start Array of length at least m. This array contains col indices, such that
cols_start[i] - ind is the first index of col i in the arrays values and
row_indx. ind takes 0 for zero-based indexing and 1 for one-based
indexing.

279
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Refer to pointerB array description in CSC Format for more details.

cols_end Array of at least length m. This array contains col indices, such that
cols_end[i] - ind - 1 is the last index of col i in the arrays values and
row_indx. ind takes 0 for zero-based indexing and 1 for one-based
indexing.
Refer to pointerE array description in CSC Format for more details.

row_indx For one-based indexing, array containing the row indices plus one for each
non-zero element of the matrix A. For zero-based indexing, array containing
the row indices for each non-zero element of the matrix A. Its length is at
least cols_end[cols - 1] - ind. ind takes 0 for zero-based indexing and
1 for one-based indexing.

values Array containing non-zero elements of the matrix A. Its length is equal to
length of the row_indx array.

Refer to values array description in CSC Format for more details.

Output Parameters

A Handle containing internal data.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_?_create_coo
Creates a handle for a matrix in COO format.

Syntax
sparse_status_t mkl_sparse_s_create_coo (sparse_matrix_t *A, const sparse_index_base_t
indexing, const MKL_INT rows, const MKL_INT cols, const MKL_INT nnz, MKL_INT *row_indx,
MKL_INT * col_indx, float *values);
sparse_status_t mkl_sparse_d_create_coo (sparse_matrix_t *A, const sparse_index_base_t
indexing, const MKL_INT rows, const MKL_INT cols, const MKL_INT nnz, MKL_INT *row_indx,
MKL_INT * col_indx, double *values);
sparse_status_t mkl_sparse_c_create_coo (sparse_matrix_t *A, const sparse_index_base_t
indexing, const MKL_INT rows, const MKL_INT cols, const MKL_INT nnz, MKL_INT *row_indx,
MKL_INT * col_indx, MKL_Complex8 *values);

280
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
sparse_status_t mkl_sparse_z_create_coo (sparse_matrix_t *A, const sparse_index_base_t
indexing, const MKL_INT rows, const MKL_INT cols, const MKL_INT nnz, MKL_INT *row_indx,
MKL_INT * col_indx, MKL_Complex16 *values);

Include Files
• mkl_spblas.h

Description
The mkl_sparse_?_create_coo routine creates a handle for an m-by-k matrix A in COO format.

Input Parameters

indexing Indicates how input arrays are indexed.

SPARSE_INDEX_BASE_ZER Zero-based (C-style) indexing: indices start at

O 0.

SPARSE_INDEX_BASE_ONE One-based (Fortran-style) indexing: indices

start at 1.

rows Number of rows of matrix A.

cols Number of columns of matrix A.

nnz Specifies the number of non-zero elements of the matrix A.

Refer to nnz description in Coordinate Format for more details.

row_indx Array of length nnz, containing the row indices for each non-zero element
of matrix A.
Refer to rows array description in Coordinate Format for more details.

col_indx Array of length nnz, containing the column indices for each non-zero
element of matrix A.
Refer to columns array description in Coordinate Format for more details.

values Array of length nnz, containing the non-zero elements of matrix A in

arbitrary order.
Refer to values array description in Coordinate Format for more details.

Output Parameters

A Handle containing internal data.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

281
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_?_create_bsr
Creates a handle for a matrix in BSR format.

Syntax
sparse_status_t mkl_sparse_s_create_bsr (sparse_matrix_t *A, const sparse_index_base_t
indexing, const sparse_layout_t block_layout, const MKL_INT rows, const MKL_INT cols,
const MKL_INT block_size, MKL_INT *rows_start, MKL_INT *rows_end, MKL_INT *col_indx,
float *values);
sparse_status_t mkl_sparse_d_create_bsr (sparse_matrix_t *A, const sparse_index_base_t
indexing, const sparse_layout_t block_layout, const MKL_INT rows, const MKL_INT cols,
const MKL_INT block_size, MKL_INT *rows_start, MKL_INT *rows_end, MKL_INT *col_indx,
double *values);
sparse_status_t mkl_sparse_c_create_bsr (sparse_matrix_t *A, const sparse_index_base_t
indexing, const sparse_layout_t block_layout, const MKL_INT rows, const MKL_INT cols,
const MKL_INT block_size, MKL_INT *rows_start, MKL_INT *rows_end, MKL_INT *col_indx,
MKL_Complex8 *values);
sparse_status_t mkl_sparse_z_create_bsr (sparse_matrix_t *A, const sparse_index_base_t
indexing, const sparse_layout_t block_layout, const MKL_INT rows, const MKL_INT cols,
const MKL_INT block_size, MKL_INT *rows_start, MKL_INT *rows_end, MKL_INT *col_indx,
MKL_Complex16 *values);

Include Files
• mkl_spblas.h

Description
The mkl_sparse_?_create_bsr routine creates a handle for an m-by-k matrix A in BSR format.

Input Parameters

indexing Indicates how input arrays are indexed.

SPARSE_INDEX_BASE_ZER Zero-based (C-style) indexing: indices start at

O 0.

282
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
SPARSE_INDEX_BASE_ONE One-based (Fortran-style) indexing: indices
start at 1.

block_layout Specifies layout of blocks:

SPARSE_LAYOUT_ROW_MAJ Storage of elements of blocks uses row major

OR layout.

SPARSE_LAYOUT_COLUMN_ Storage of elements of blocks uses column

MAJOR major layout.

rows Number of block rows of matrix A.

cols Number of block columns of matrix A.

block_size Size of blocks in matrix A.

rows_start Array of length m. This array contains row indices, such that
rows_start[i] - ind is the first index of block row i in the arrays values
and col_indx. ind takes 0 for zero-based indexing and 1 for one-based
indexing.
Refer to pointerB array description in CSR Format for more details.

rows_end Array of length m. This array contains row indices, such that rows_end[i]
- ind- 1 is the last index of block row i in the arrays values and
col_indx. ind takes 0 for zero-based indexing and 1 for one-based
indexing.
Refer to pointerE array description in CSR Format for more details.

col_indx For one-based indexing, array containing the column indices plus one for
each non-zero block of the matrix A. For zero-based indexing, array
containing the column indices for each non-zero block of the matrix A. Its
length is rows_end[rows - 1] - ind. ind takes 0 for zero-based indexing
and 1 for one-based indexing.

values Array containing non-zero elements of the matrix A. Its length is equal to
length of the col_indx array multiplied by block_size*block_size.

Refer to the values array description in BSR Format for more details.

Output Parameters

A Handle containing internal data.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

283
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_copy
Creates a copy of a matrix handle.

Syntax
sparse_status_t mkl_sparse_copy (const sparse_matrix_t source, const struct
matrix_descr descr, sparse_matrix_t *dest);

Include Files
• mkl_spblas.h

Description
The mkl_sparse_copy routine creates a copy of a matrix handle.

NOTE
Currently, the mkl_sparse_copy routine does not support the descriptor argument and
creates an exact (deep) copy of the input matrix.

Input Parameters

source Specifies handle containing internal data.

descr Structure specifying sparse matrix properties.

sparse_matrix_type_t type - Specifies the type of a sparse matrix:

SPARSE_MATRIX_TYPE_GE The matrix is processed as is.

NERAL

SPARSE_MATRIX_TYPE_SY The matrix is symmetric (only the requested

MMETRIC triangle is processed).

SPARSE_MATRIX_TYPE_HE The matrix is Hermitian (only the requested

RMITIAN triangle is processed).

SPARSE_MATRIX_TYPE_TR The matrix is triangular (only the requested

IANGULAR triangle is processed).

SPARSE_MATRIX_TYPE_DI The matrix is diagonal (only diagonal elements

AGONAL are processed).

SPARSE_MATRIX_TYPE_BL The matrix is block-triangular (only requested

OCK_TRIANGULAR triangle is processed). Applies to BSR format
only.

SPARSE_MATRIX_TYPE_BL The matrix is block-diagonal (only diagonal

OCK_DIAGONAL blocks are processed). Applies to BSR format
only.

sparse_fill_mode_t mode - Specifies the triangular matrix part for

symmetric, Hermitian, triangular, and block-triangular matrices:

284
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
SPARSE_FILL_MODE_LOWE The lower triangular matrix part is processed.
R

SPARSE_FILL_MODE_UPPE The upper triangular matrix part is processed.

R
sparse_diag_type_t diag - Specifies diagonal type for non-general
matrices:

SPARSE_DIAG_NON_UNIT Diagonal elements might not be equal to one.

SPARSE_DIAG_UNIT Diagonal elements are equal to one.

Output Parameters

dest Handle containing internal data.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_destroy
Frees memory allocated for matrix handle.

Syntax
sparse_status_t mkl_sparse_destroy (sparse_matrix_t A);

Include Files
• mkl_spblas.h

Description
The mkl_sparse_destroy routine frees memory allocated for matrix handle.

NOTE
You must free memory allocated for matrices after completing use of them. The mkl_sparse_destroy
routine provides a utility to do so.

Input Parameters

A Handle containing internal data.

285
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_convert_csr
Converts internal matrix representation to CSR
format.

Syntax
sparse_status_t mkl_sparse_convert_csr (const sparse_matrix_t source, const
sparse_operation_t operation, sparse_matrix_t *dest);

Include Files
• mkl_spblas.h

Description
The mkl_sparse_convert_csr routine converts internal matrix representation to CSR format.

When the source matrix is in COO format, the routine performs a sum reduction on duplicate elements.

Input Parameters

source Handle containing internal data.

operation Specifies operation op() on input matrix.

SPARSE_OPERATION_NON_ Non-transpose, op(A) = A.

TRANSPOSE
T
SPARSE_OPERATION_TRAN Transpose, op(A) = A .
SPOSE
H
SPARSE_OPERATION_CONJ Conjugate transpose, op(A) = A .
UGATE_TRANSPOSE

Output Parameters

dest Handle containing internal data.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

286
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_convert_bsr
Converts internal matrix representation to BSR format
or changes BSR block size.

Syntax
sparse_status_t mkl_sparse_convert_bsr (const sparse_matrix_t source, const MKL_INT
block_size, const sparse_layout_t block_layout, const sparse_operation_t operation,
sparse_matrix_t *dest);

Include Files
• mkl_spblas.h

Description
Themkl_sparse_convert_bsr routine converts internal matrix representation to BSR format or changes
BSR block size.
When the source matrix is in COO format, the routine performs a sum reduction on duplicate elements.

Input Parameters

source Handle containing internal data.

block_size Size of the block in the output structure.

block_layout Specifies layout of blocks:

SPARSE_LAYOUT_ROW_MAJ Storage of elements of blocks uses row major

OR layout.

SPARSE_LAYOUT_COLUMN_ Storage of elements of blocks uses column

MAJOR major layout.

operation Specifies operation op() on input matrix.

SPARSE_OPERATION_NON_ Non-transpose, op(A) = A.

TRANSPOSE
T
SPARSE_OPERATION_TRAN Transpose, op(A) = A .
SPOSE
H
SPARSE_OPERATION_CONJ Conjugate transpose, op(A) = A .
UGATE_TRANSPOSE

287
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

dest Handle containing internal data.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_?_export_csr
Exports CSR matrix from internal representation.

Syntax
sparse_status_t mkl_sparse_s_export_csr (const sparse_matrix_t source,
sparse_index_base_t *indexing, MKL_INT *rows, MKL_INT *cols, MKL_INT **rows_start,
MKL_INT **rows_end, MKL_INT **col_indx, float **values);
sparse_status_t mkl_sparse_d_export_csr (const sparse_matrix_t source,
sparse_index_base_t *indexing, MKL_INT *rows, MKL_INT *cols, MKL_INT **rows_start,
MKL_INT **rows_end, MKL_INT **col_indx, double **values);
sparse_status_t mkl_sparse_c_export_csr (const sparse_matrix_t source,
sparse_index_base_t *indexing, MKL_INT *rows, MKL_INT *cols, MKL_INT **rows_start,
MKL_INT **rows_end, MKL_INT **col_indx, MKL_Complex8 **values);
sparse_status_t mkl_sparse_z_export_csr (const sparse_matrix_t source,
sparse_index_base_t *indexing, MKL_INT *rows, MKL_INT *cols, MKL_INT **rows_start,
MKL_INT **rows_end, MKL_INT **col_indx, MKL_Complex16 **values);

Include Files
• mkl_spblas.h

Description
If the matrix specified by the source handle is in CSR format, the mkl_sparse_?_export_csr routine
exports an m-by-k matrix A in CSR format matrix from the internal representation. The routine returns
pointers to the internal representation and does not allocate additional memory.
If the matrix is not already in CSR format, the routine returns SPARSE_STATUS_INVALID_VALUE.

Input Parameters

source Handle containing internal data.

288
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

indexing Indicates how input arrays are indexed.

SPARSE_INDEX_BASE_ZER Zero-based (C-style) indexing: indices start at

O 0.

SPARSE_INDEX_BASE_ONE One-based (Fortran-style) indexing: indices

start at 1.

rows Number of rows of the matrix source.

cols Number of columns of the matrix source.

rows_start Pointer to array of length m. This array contains row indices, such that
rows_start[i] - ind is the first index of row i in the arrays values and
col_indx. ind takes 0 for zero-based indexing and 1 for one-based
indexing.
Refer to pointerB array description in CSR Format for more details.

rows_end Pointer to array of length m. This array contains row indices, such that
rows_end[i] - ind - 1 is the last index of row i in the arrays values and
col_indx. ind takes 0 for zero-based indexing and 1 for one-based
indexing.
Refer to pointerE array description in CSR Format for more details.

col_indx For one-based indexing, pointer to array containing the column indices plus
one for each non-zero element of the matrix source. For zero-based
indexing, pointer to array containing the column indices for each non-zero
element of the matrix source. Its length is rows_end[rows - 1] - ind.
ind takes 0 for zero-based indexing and 1 for one-based indexing.

values Pointer to array containing non-zero elements of the matrix A. Its length is
equal to length of the col_indx array.

Refer to values array description in CSR Format for more details.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_?_export_csc
Exports CSC matrix from internal representation.

289
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
sparse_status_t mkl_sparse_s_export_csc (const sparse_matrix_t source,
sparse_index_base_t *indexing, MKL_INT *rows, MKL_INT *cols, MKL_INT **cols_start,
MKL_INT **cols_end, MKL_INT **row_indx, float **values);
sparse_status_t mkl_sparse_d_export_csc (const sparse_matrix_t source,
sparse_index_base_t *indexing, MKL_INT *rows, MKL_INT *cols, MKL_INT **cols_start,
MKL_INT **cols_end, MKL_INT **row_indx, double **values);
sparse_status_t mkl_sparse_c_export_csc (const sparse_matrix_t source,
sparse_index_base_t *indexing, MKL_INT *rows, MKL_INT *cols, MKL_INT **cols_start,
MKL_INT **cols_end, MKL_INT **row_indx, MKL_Complex8 **values);
sparse_status_t mkl_sparse_z_export_csc (const sparse_matrix_t source,
sparse_index_base_t *indexing, MKL_INT *rows, MKL_INT *cols, MKL_INT **cols_start,
MKL_INT **cols_end, MKL_INT **row_indx, MKL_Complex16 **values);

Include Files
• mkl_spblas.h

Description
If the matrix specified by the source handle is in CSC format, the mkl_sparse_?_export_csc routine
exports an m-by-k matrix A in CSC format matrix from the internal representation. The routine returns
pointers to the internal representation and does not allocate additional memory.
If the matrix is not already in CSC format, the routine returns SPARSE_STATUS_INVALID_VALUE.

Input Parameters

source Handle containing internal data.

Output Parameters

indexing Indicates how input arrays are indexed.

SPARSE_INDEX_BASE_ZER Zero-based (C-style) indexing: indices start at

O 0.

SPARSE_INDEX_BASE_ONE One-based (Fortran-style) indexing: indices

start at 1.

rows Number of rows of the matrix source.

cols Number of columns of the matrix source.

cols_start Array of length m. This array contains column indices, such that
cols_start[i] - cols_start[0] is the first index of column i in the
arrays values and row_indx.

Refer to pointerb array description in csc Format for more details.

cols_end Pointer to array of length m. This array contains row indices, such that
cols_end[i] - cols_start[0] - 1 is the last index of column i in the
arrays values and row_indx.

Refer to pointerE array description in csc Format for more details.

290
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
row_indx For one-based indexing, pointer to array containing the row indices plus one
for each non-zero element of the matrix source. For zero-based indexing,
pointer to array containing the row indices for each non-zero element of the
matrix source. Its length is cols_end[cols - 1] - cols_start[0].

values Pointer to array containing non-zero elements of the matrix A. Its length is
equal to length of the row_indx array.

Refer to values array description in csc Format for more details.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_?_export_bsr
Exports BSR matrix from internal representation.

Syntax
sparse_status_t mkl_sparse_s_export_bsr (const sparse_matrix_t source,
sparse_index_base_t *indexing, sparse_layout_t *block_layout, MKL_INT *rows, MKL_INT
*cols, MKL_INT *block_size, MKL_INT **rows_start, MKL_INT **rows_end, MKL_INT
**col_indx, float **values);
sparse_status_t mkl_sparse_d_export_bsr (const sparse_matrix_t source,
sparse_index_base_t *indexing, sparse_layout_t *block_layout, MKL_INT *rows, MKL_INT
*cols, MKL_INT *block_size, MKL_INT **rows_start, MKL_INT **rows_end, MKL_INT
**col_indx, double **values);
sparse_status_t mkl_sparse_c_export_bsr (const sparse_matrix_t source,
sparse_index_base_t *indexing, sparse_layout_t *block_layout, MKL_INT *rows, MKL_INT
*cols, MKL_INT *block_size, MKL_INT **rows_start, MKL_INT **rows_end, MKL_INT
**col_indx, MKL_Complex8 **values);
sparse_status_t mkl_sparse_z_export_bsr (const sparse_matrix_t source,
sparse_index_base_t *indexing, sparse_layout_t *block_layout, MKL_INT *rows, MKL_INT
*cols, MKL_INT *block_size, MKL_INT **rows_start, MKL_INT **rows_end, MKL_INT
**col_indx, MKL_Complex16 **values);

Include Files
• mkl_spblas.h

291
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Description
If the matrix specified by the source handle is in BSR format, the mkl_sparse_?_export_bsr routine
exports an (block_size * rows)-by-(block_size * cols) matrix A in BSR format from the internal
representation. The routine returns pointers to the internal representation and does not allocate additional
memory.
If the matrix is not already in BSR format, the routine returns SPARSE_STATUS_INVALID_VALUE.

Input Parameters

source Handle containing internal data.

Output Parameters

indexing Indicates how input arrays are indexed.

SPARSE_INDEX_BASE_ZER Zero-based (C-style) indexing: indices start at

O 0.

SPARSE_INDEX_BASE_ONE One-based (Fortran-style) indexing: indices

start at 1.

block_layout Specifies layout of blocks:

SPARSE_LAYOUT_ROW_MAJ Storage of elements of blocks uses row major

OR layout.

SPARSE_LAYOUT_COLUMN_ Storage of elements of blocks uses column

MAJOR major layout.

rows Number of block rows of the matrix source.

cols Number of block columns of matrix source.

block_size Size of the square block in matrix source.

rows_start Pointer to array of length rows. This array contains row indices, such that
rows_start[i] - ind is the first index of block row i in the arrays values
and col_indx. ind takes 0 for zero-based indexing and 1 for one-based
indexing.
Refer to pointerB array description in BSR Format for more details.

rows_end Pointer to array of length rows. This array contains row indices, such that
rows_end[i] - ind - 1 is the last index of block row i in the arrays values
and col_indx. ind takes 0 for zero-based indexing and 1 for one-based
indexing.
Refer to pointerE array description in BSR Format for more details.

col_indx For one-based indexing, pointer to array containing the column indices plus
one for each non-zero blocks of the matrix source. For zero-based indexing,
pointer to array containing the column indices for each non-zero blocks of
the matrix source. Its length is rows_end[rows - 1] - ind[0]. ind takes
0 for zero-based indexing and 1 for one-based indexing.

292
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
values Pointer to array containing non-zero elements of matrix source. Its length is
equal to length of the col_indx array multiplied by
block_size*block_size.
Refer to the values array description in BSR Format for more details.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_?_set_value
Changes a single value of matrix in internal
representation.

Syntax
sparse_status_t mkl_sparse_s_set_value (const sparse_matrix_t A, const MKL_INT row,
const MKL_INT col, const float value);
sparse_status_t mkl_sparse_d_set_value (const sparse_matrix_t A, const MKL_INT row,
const MKL_INT col, const double value);
sparse_status_t mkl_sparse_c_set_value (const sparse_matrix_t A, const MKL_INT row,
const MKL_INT col, const MKL_Complex8 value);
sparse_status_t mkl_sparse_z_set_value (const sparse_matrix_t A, const MKL_INT row,
const MKL_INT col, const MKL_Complex16 value);

Include Files
• mkl_spblas.h

Description
Use the mkl_sparse_?_set_value routine to change a single value of a matrix in the internal Inspector-
executor Sparse BLAS format. The value should already be presented in a matrix structure.

Input Parameters

A Specifies handle containing internal data.

row Indicates row of matrix in which to set value.

col Indicates column of matrix in which to set value.

value Indicates value

293
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

A Handle containing modified internal data.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

mkl_sparse_?_update_values
Changes all or selected matrix values in internal
representation.

Syntax

NOTE
This routine is supported for sparse matrices in BSR format only.

sparse_status_t mkl_sparse_s_update_values (sparse_matrix_t A, MKL_INT nvalues, MKL_INT

*indx, MKL_INT *indy, float *values);
sparse_status_t mkl_sparse_d_update_values (sparse_matrix_t A, MKL_INT nvalues, MKL_INT
*indx, MKL_INT *indy, double *values);
sparse_status_t mkl_sparse_c_update_values (sparse_matrix_t A, MKL_INT nvalues, MKL_INT
*indx, MKL_INT *indy, MKL_Complex8 *values);
sparse_status_t mkl_sparse_z_update_values (sparse_matrix_t A, MKL_INT nvalues, MKL_INT
*indx, MKL_INT *indy, MKL_Complex16 *values);

Include Files
• mkl_spblas.h

Description
Use the mkl_sparse_?_update_values routine to change all or selected values of a matrix in the internal
Inspector-Executor Sparse BLAS format.
The values to be updated should already be present in the matrix structure.

• To change selected values, you must provide an array values (with new values) and also the
corresponding row and column indices for each value via indx and indy arrays as well as the overall
number of changed elements nvalues.
So that, for example, to change A(0, 0) to 1 and A(0, 1) to 2, pass the following input parameters:
nvalues = 2, indx = {0, 0}, indy = {0, 1} and values = {1, 2}.
• To change all the values in the matrix, provide the values array and explicitly set nvalues to 0 or the
actual number of non zero elements. There is no need to supply indx and indy arrays.

294
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

A Specifies handle containing internal data.

nvalues Total number of elements changed.

indx Row indices for the new values.

NOTE
Currently, only updating the full matrix is supported. Set indx
and indy as NULL.

indy Column indices for the new values.

NOTE
Currently, only updating the full matrix is supported. Set indx
and indy as NULL.

values New values.

Output Parameters

A Handle containing modified internal data.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_order
Performs ordering of column indexes of the matrix in
CSR format

Syntax
sparse_status_t mkl_sparse_order (const sparse_matrix_t csrA);

Include Files
• mkl_spblas.h

Description
Use the mkl_sparse_order routine to perform ordering of column indexes of the matrix in CSR format.

295
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

csrA CSR data

Output Parameters

csrA Handle containing modified internal data.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

Inspector-Executor Sparse BLAS Analysis Routines

Analysis Routines and Their Data Types
Routine or Function Description
Group

mkl_sparse_set_lu_smoot Provides and estimate of the number and type of upcoming calls to LU
her_hint smoother functionality.

mkl_sparse_set_mv_hint Provides estimate of number and type of upcoming matrix-vector operations.

mkl_sparse_set_sv_hint Provides estimate of number and type of upcoming triangular system solver
operations.

mkl_sparse_set_mm_hint Provides estimate of number and type of upcoming matrix-matrix

multiplication operations.

mkl_sparse_set_sm_hint Provides estimate of number and type of upcoming triangular matrix solve
with multiple right hand sides operations.

mkl_sparse_set_dotmv_h Sets estimate of the number and type of upcoming matrix-vector operations.
int

mkl_sparse_set_symgs_h Sets estimate of number and type of upcoming mkl_sparse_?_symgs

int operations.

mkl_sparse_set_sorv_hin Sets estimate of number and type of upcoming mkl_sparse_?_symgs

t operations.

mkl_sparse_set_memory Provides memory requirements for performance optimization purposes.

_hint

mkl_sparse_optimize Analyzes matrix structure and performs optimizations using the hints
provided in the handle.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

296
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
mkl_sparse_set_lu_smoother_hint
Provides an estimate of the number and type of
upcoming calls to LU smoother functionality.

Syntax
sparse_status_t mkl_sparse_set_lu_smoother_hint (sparse_matrix_t A, const
sparse_operation_t operation, struct matrix_descr descr, MKL_INT expected_calls);

Include Files
• mkl_spblas.h

Description
The mkl_sparse_set_lu_smoother_hint function provides subsequent Inspector-Executor Sparse BLAS
calls an estimate of the number of upcoming calls to the lu_smoother routine that ultimately may influence
the optimizations applied and specifies whether or not to perform an operation on the matrix.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

operation Specifies the operation op() on input matrix.

SPARSE_OPERATION_NON_ Non-transpose, op(A)= A.

TRANSPOSE
T
SPARSE_OPERATION_TRAN Transpose, op(A)= A .
SPOSE
H
SPARSE_OPERATION_CONJ Conjugate transpose, op(A)= A .
UGATE_TRANSPOSE

descr Structure specifying sparse matrix properties.

sparse_matrix_type_ttype - Specifies the type of a sparse matrix:

SPARSE_MATRIX_TYPE_GE The matrix is processed as is.

NERAL

SPARSE_MATRIX_TYPE_SY The matrix is symmetric (only the requested

MMETRIC triangle is processed).

SPARSE_MATRIX_TYPE_HE The matrix is Hermitian (only the requested

RMITIAN triangle is processed).

SPARSE_MATRIX_TYPE_TR The matrix is triangular (only the requested

IANGULAR triangle is processed).

SPARSE_MATRIX_TYPE_DI The matrix is diagonal (only diagonal elements

AGONAL are processed).

SPARSE_MATRIX_TYPE_BL The matrix is block-triangular (only the

OCK_TRIANGULAR requested triangle is processed). Applies to BSR
format only.

297
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

SPARSE_MATRIX_TYPE_BL The matrix is block-diagonal (only diagonal

OCK_DIAGONAL blocks are processed). Applies to BSR format
only.

sparse_fill_mode_tmode - Specifies the triangular matrix part for

symmetric, Hermitian, triangular, and block-triangular matrices:

SPARSE_FILL_MODE_LOWE The lower triangular matrix part is processed.

SPARSE_FILL_MODE_UPPE The upper triangular matrix part is processed.

R
sparse_diag_type_tdiag - Specifies the diagonal type for non-general
matrices:

SPARSE_DIAG_NON_UNIT Diagonal elements might not be equal to one.

SPARSE_DIAG_UNIT Diagonal elements are equal to one.

expected_calls Number of expected calls to execution routine.

A Handle containing internal data.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_set_mv_hint
Provides estimate of number and type of upcoming
matrix-vector operations.

Syntax
sparse_status_t mkl_sparse_set_mv_hint (const sparse_matrix_t A, const
sparse_operation_t operation, const struct matrix_descr descr, const MKL_INT
expected_calls);

Include Files
• mkl_spblas.h

Description
Use the mkl_sparse_set_mv_hint routine to provide the Inspector-executor Sparse BLAS API an estimate
of the number of upcoming matrix-vector multiplication operations for performance optimization, and specify
whether or not to perform an operation on the matrix.

298
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

operation Specifies operation op() on input matrix.

SPARSE_OPERATION_NON_ Non-transpose, op(A) = A.

TRANSPOSE
T
SPARSE_OPERATION_TRAN Transpose, op(A) = A .
SPOSE
H
SPARSE_OPERATION_CONJ Conjugate transpose, op(A) = A .
UGATE_TRANSPOSE

descr Structure specifying sparse matrix properties.

sparse_matrix_type_t type - Specifies the type of a sparse matrix:

SPARSE_MATRIX_TYPE_GE The matrix is processed as is.

NERAL

SPARSE_MATRIX_TYPE_SY The matrix is symmetric (only the requested

MMETRIC triangle is processed).

SPARSE_MATRIX_TYPE_HE The matrix is Hermitian (only the requested

RMITIAN triangle is processed).

SPARSE_MATRIX_TYPE_TR The matrix is triangular (only the requested

IANGULAR triangle is processed).

SPARSE_MATRIX_TYPE_DI The matrix is diagonal (only diagonal elements

AGONAL are processed).

SPARSE_MATRIX_TYPE_BL The matrix is block-triangular (only requested

OCK_TRIANGULAR triangle is processed). Applies to BSR format
only.

SPARSE_MATRIX_TYPE_BL The matrix is block-diagonal (only diagonal

OCK_DIAGONAL blocks are processed). Applies to BSR format
only.

sparse_fill_mode_t mode - Specifies the triangular matrix part for

symmetric, Hermitian, triangular, and block-triangular matrices:

SPARSE_FILL_MODE_LOWE The lower triangular matrix part is processed.

SPARSE_FILL_MODE_UPPE The upper triangular matrix part is processed.

R
sparse_diag_type_t diag - Specifies diagonal type for non-general
matrices:

SPARSE_DIAG_NON_UNIT Diagonal elements might not be equal to one.

299
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

SPARSE_DIAG_UNIT Diagonal elements are equal to one.

expected_calls Number of expected calls to execution routine.

Output Parameters

A Handle containing internal data.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_set_sv_hint
Provides estimate of number and type of upcoming
triangular system solver operations.

Syntax
sparse_status_t mkl_sparse_set_sv_hint (const sparse_matrix_t A, const
sparse_operation_t operation, const struct matrix_descr descr, const MKL_INT
expected_calls);

Include Files
• mkl_spblas.h

Description
The mkl_sparse_set_sv_hint routine provides an estimate of the number of upcoming triangular system
solver operations and type of these operations for performance optimization.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

operation Specifies operation op() on input matrix.

SPARSE_OPERATION_NON_ Non-transpose, op(A) = A.

TRANSPOSE

300
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
T
SPARSE_OPERATION_TRAN Transpose, op(A) = A .
SPOSE
H
SPARSE_OPERATION_CONJ Conjugate transpose, op(A) = A .
UGATE_TRANSPOSE

descr Structure specifying sparse matrix properties.

sparse_matrix_type_t type - Specifies the type of a sparse matrix:

SPARSE_MATRIX_TYPE_GE The matrix is processed as is.

NERAL

SPARSE_MATRIX_TYPE_SY The matrix is symmetric (only the requested

MMETRIC triangle is processed).

SPARSE_MATRIX_TYPE_HE The matrix is Hermitian (only the requested

RMITIAN triangle is processed).

SPARSE_MATRIX_TYPE_TR The matrix is triangular (only the requested

IANGULAR triangle is processed).

SPARSE_MATRIX_TYPE_DI The matrix is diagonal (only diagonal elements

AGONAL are processed).

SPARSE_MATRIX_TYPE_BL The matrix is block-triangular (only requested

OCK_TRIANGULAR triangle is processed). Applies to BSR format
only.

SPARSE_MATRIX_TYPE_BL The matrix is block-diagonal (only diagonal

OCK_DIAGONAL blocks are processed). Applies to BSR format
only.

sparse_fill_mode_t mode - Specifies the triangular matrix part for

symmetric, Hermitian, triangular, and block-triangular matrices:

SPARSE_FILL_MODE_LOWE The lower triangular matrix part is processed.

SPARSE_FILL_MODE_UPPE The upper triangular matrix part is processed.

R
sparse_diag_type_t diag - Specifies diagonal type for non-general
matrices:

SPARSE_DIAG_NON_UNIT Diagonal elements might not be equal to one.

SPARSE_DIAG_UNIT Diagonal elements are equal to one.

expected_calls Number of expected calls to execution routine.

Output Parameters

A Handle containing internal data.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

301
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_set_mm_hint
Provides estimate of number and type of upcoming
matrix-matrix multiplication operations.

Syntax
sparse_status_t mkl_sparse_set_mm_hint (const sparse_matrix_t A, const
sparse_operation_t operation, const struct matrix_descr descr, const sparse_layout_t
layout, const MKL_INT dense_matrix_size, const MKL_INT expected_calls);

Include Files
• mkl_spblas.h

Description
The mkl_sparse_set_mm_hint routine provides an estimate of the number of upcoming matrix-matrix
multiplication operations and type of these operations for performance optimization purposes.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

operation Specifies operation op() on input matrix.

SPARSE_OPERATION_NON_ Non-transpose, op(A) = A.

TRANSPOSE
T
SPARSE_OPERATION_TRAN Transpose, op(A) = A .
SPOSE
H
SPARSE_OPERATION_CONJ Conjugate transpose, op(A) = A .
UGATE_TRANSPOSE

descr Structure specifying sparse matrix properties.

sparse_matrix_type_t type - Specifies the type of a sparse matrix:

SPARSE_MATRIX_TYPE_GE The matrix is processed as is.

NERAL

SPARSE_MATRIX_TYPE_SY The matrix is symmetric (only the requested

MMETRIC triangle is processed).

302
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
SPARSE_MATRIX_TYPE_HE The matrix is Hermitian (only the requested
RMITIAN triangle is processed).

SPARSE_MATRIX_TYPE_TR The matrix is triangular (only the requested

IANGULAR triangle is processed).

SPARSE_MATRIX_TYPE_DI The matrix is diagonal (only diagonal elements

AGONAL are processed).

SPARSE_MATRIX_TYPE_BL The matrix is block-triangular (only requested

OCK_TRIANGULAR triangle is processed). Applies to BSR format
only.

SPARSE_MATRIX_TYPE_BL The matrix is block-diagonal (only diagonal

OCK_DIAGONAL blocks are processed). Applies to BSR format
only.

sparse_fill_mode_t mode - Specifies the triangular matrix part for

symmetric, Hermitian, triangular, and block-triangular matrices:

SPARSE_FILL_MODE_LOWE The lower triangular matrix part is processed.

SPARSE_FILL_MODE_UPPE The upper triangular matrix part is processed.

R
sparse_diag_type_t diag - Specifies diagonal type for non-general
matrices:

SPARSE_DIAG_NON_UNIT Diagonal elements might not be equal to one.

SPARSE_DIAG_UNIT Diagonal elements are equal to one.

layout Specifies layout of elements:

SPARSE_LAYOUT_COLUMN_ Storage of elements uses column major layout.

MAJOR

SPARSE_LAYOUT_ROW_MAJ Storage of elements uses row major layout.

dense_matrix_size Number of columns in dense matrix.

expected_calls Number of expected calls to execution routine.

Output Parameters

A Handle containing internal data.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

303
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_set_sm_hint
Provides estimate of number and type of upcoming
triangular matrix solve with multiple right hand sides
operations.

Syntax
sparse_status_t mkl_sparse_set_sm_hint (const sparse_matrix_t A, const
sparse_operation_t operation, const struct matrix_descr descr, const sparse_layout_t
layout, const MKL_INT dense_matrix_size, const MKL_INT expected_calls);

Include Files
• mkl_spblas.h

Description
The mkl_sparse_set_sm_hint routine provides an estimate of the number of upcoming triangular matrix
solve with multiple right hand sides operations and type of these operations for performance optimization
purposes.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

operation Specifies operation op() on input matrix.

SPARSE_OPERATION_NON_ Non-transpose, op(A) = A.

TRANSPOSE
T
SPARSE_OPERATION_TRAN Transpose, op(A) = A .
SPOSE
H
SPARSE_OPERATION_CONJ Conjugate transpose, op(A) = A .
UGATE_TRANSPOSE

descr Structure specifying sparse matrix properties.

sparse_matrix_type_t type - Specifies the type of a sparse matrix:

SPARSE_MATRIX_TYPE_GE The matrix is processed as is.

NERAL

SPARSE_MATRIX_TYPE_SY The matrix is symmetric (only the requested

MMETRIC triangle is processed).

SPARSE_MATRIX_TYPE_HE The matrix is Hermitian (only the requested

RMITIAN triangle is processed).

304
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
SPARSE_MATRIX_TYPE_TR The matrix is triangular (only the requested
IANGULAR triangle is processed).

SPARSE_MATRIX_TYPE_DI The matrix is diagonal (only diagonal elements

AGONAL are processed).

SPARSE_MATRIX_TYPE_BL The matrix is block-triangular (only requested

OCK_TRIANGULAR triangle is processed). Applies to BSR format
only.

SPARSE_MATRIX_TYPE_BL The matrix is block-diagonal (only diagonal

OCK_DIAGONAL blocks are processed). Applies to BSR format
only.

sparse_fill_mode_t mode - Specifies the triangular matrix part for

symmetric, Hermitian, triangular, and block-triangular matrices:

SPARSE_FILL_MODE_LOWE The lower triangular matrix part is processed.

SPARSE_FILL_MODE_UPPE The upper triangular matrix part is processed.

R
sparse_diag_type_t diag - Specifies diagonal type for non-general
matrices:

SPARSE_DIAG_NON_UNIT Diagonal elements might not be equal to one.

SPARSE_DIAG_UNIT Diagonal elements are equal to one.

layout Specifies layout of elements:

SPARSE_LAYOUT_COLUMN_ Storage of elements uses column major layout.

MAJOR

SPARSE_LAYOUT_ROW_MAJ Storage of elements uses row major layout.

dense_matrix_size Number of right-hand-side.

expected_calls Number of expected calls to execution routine.

Output Parameters

A Handle containing internal data.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

305
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_set_dotmv_hint
Sets estimate of the number and type of upcoming
matrix-vector operations.

Syntax
sparse_status_t mkl_sparse_set_dotmv_hint (const sparse_matrix_t A, const
sparse_operation_t operation, const struct matrix_descr descr, const MKL_INT
expected_calls);

Include Files
• mkl_spblas.h

Description
Use the mkl_sparse_set_dotmv_hint routine to provide the Inspector-executor Sparse BLAS API an
estimate of the number of upcoming matrix-vector multiplication operations for performance optimization,
and specify whether or not to perform an operation on the matrix.

Input Parameters

operation Specifies the operation performed on matrix A.

If operation = SPARSE_OPERATION_NON_TRANSPOSE, op(A) = A.

If operation = SPARSE_OPERATION_TRANSPOSE, op(A) = AT.

If operation = SPARSE_OPERATION_CONJUGATE_TRANSPOSE, op(A) = AH.

descr Structure specifying sparse matrix properties.

sparse_matrix_type_t type - Specifies the type of a sparse matrix:

SPARSE_MATRIX_TYPE_GE The matrix is processed as is.

NERAL

SPARSE_MATRIX_TYPE_SY The matrix is symmetric (only the requested

MMETRIC triangle is processed).

SPARSE_MATRIX_TYPE_HE The matrix is Hermitian (only the requested

RMITIAN triangle is processed).

SPARSE_MATRIX_TYPE_TR The matrix is triangular (only the requested

IANGULAR triangle is processed).

SPARSE_MATRIX_TYPE_DI The matrix is diagonal (only diagonal elements

AGONAL are processed).

SPARSE_MATRIX_TYPE_BL The matrix is block-triangular (only requested

OCK_TRIANGULAR triangle is processed). Applies to BSR format
only.

SPARSE_MATRIX_TYPE_BL The matrix is block-diagonal (only diagonal

OCK_DIAGONAL blocks are processed). Applies to BSR format
only.

306
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
sparse_fill_mode_t mode - Specifies the triangular matrix part for
symmetric, Hermitian, triangular, and block-triangular matrices:

SPARSE_FILL_MODE_LOWE The lower triangular matrix part is processed.

SPARSE_FILL_MODE_UPPE The upper triangular matrix part is processed.

R
sparse_diag_type_t diag - Specifies diagonal type for non-general
matrices:

SPARSE_DIAG_NON_UNIT Diagonal elements might not be equal to one.

SPARSE_DIAG_UNIT Diagonal elements are equal to one.

expected_calls Expected number of calls to the execution routine.

Output Parameters

A Handle containing internal data.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_set_symgs_hint

Syntax
Sets estimate of number and type of upcoming mkl_sparse_?_symgs operations.
sparse_status_t mkl_sparse_set_symgs_hint (const sparse_matrix_t A, const
sparse_operation_t operation, const struct matrix_descr descr, const MKL_INT
expected_calls);

Include Files
• mkl_spblas.h

Description
Use the mkl_sparse_set_symgs_hint routine to provide the Inspector-executor Sparse BLAS API an
estimate of the number of upcoming symmetric Gauss-Zeidel preconditioner operations for performance
optimization, and specify whether or not to perform an operation on the matrix.

307
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

operation Specifies the operation performed on matrix A.

If operation = SPARSE_OPERATION_NON_TRANSPOSE, op(A) = A.

If operation = SPARSE_OPERATION_TRANSPOSE, op(A) = AT.

If operation = SPARSE_OPERATION_CONJUGATE_TRANSPOSE, op(A) = AH.

descr Structure specifying sparse matrix properties.

sparse_matrix_type_t type - Specifies the type of a sparse matrix:

SPARSE_MATRIX_TYPE_GE The matrix is processed as is.

NERAL

SPARSE_MATRIX_TYPE_SY The matrix is symmetric (only the requested

MMETRIC triangle is processed).

SPARSE_MATRIX_TYPE_HE The matrix is Hermitian (only the requested

RMITIAN triangle is processed).

SPARSE_MATRIX_TYPE_TR The matrix is triangular (only the requested

IANGULAR triangle is processed).

SPARSE_MATRIX_TYPE_DI The matrix is diagonal (only diagonal elements

AGONAL are processed).

SPARSE_MATRIX_TYPE_BL The matrix is block-triangular (only requested

OCK_TRIANGULAR triangle is processed). Applies to BSR format
only.

SPARSE_MATRIX_TYPE_BL The matrix is block-diagonal (only diagonal

OCK_DIAGONAL blocks are processed). Applies to BSR format
only.

sparse_fill_mode_t mode - Specifies the triangular matrix part for

symmetric, Hermitian, triangular, and block-triangular matrices:

SPARSE_FILL_MODE_LOWE The lower triangular matrix part is processed.

SPARSE_FILL_MODE_UPPE The upper triangular matrix part is processed.

R
sparse_diag_type_t diag - Specifies diagonal type for non-general
matrices:

SPARSE_DIAG_NON_UNIT Diagonal elements might not be equal to one.

SPARSE_DIAG_UNIT Diagonal elements are equal to one.

diag Specifies diagonal type for non-general matrices

mode Specifies the triangular matrix part for symmetric, Hermitian, triangular,
and block-triangular matrices.

type Specifies the type of a sparse matrix.

expected_calls Estimate of the number to the execution routine.

308
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

A Handle containing internal data.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_set_sorv_hint
Sets an estimate of the number and type of upcoming
mkl_sparse_?_sorv operations.

Syntax

sparse_status_t mkl_sparse_set_sorv_hint(
const sparse_sor_type_t type,
const sparse_matrix_t A,
const struct matrix_descr descr,
const MKL_INT expected_calls
);

Include Files
• mkl_spblas.h

Description
Use the mkl_sparse_set_sorv_hint routine to provide the Inspector-Executor Sparse BLAS API an
estimate of the number of upcoming forward/backward sweeps or symmetric SOR preconditioner operations
for performance optimization.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

type Specifies the operation performed by the SORV preconditioner.

SPARSE_SOR_FORWARD Performs forward sweep as defined by:

309
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

SPARSE_SOR_BACKWARD Performs backward sweep as defined by:

SPARSE_SOR_SYMMETRIC Preconditioner matrix could be expressed as:

descr Structure specifying sparse matrix properties.

sparse_matrix_type_t Specifies the type of a sparse matrix:

type • SPARSE_MATRIX_TYPE_GENERAL
The matrix is processed as-is.
• SPARSE_MATRIX_TYPE_SYMMETRIC
The matrix is symmetric (only the requested
triangle is processed).
• SPARSE_MATRIX_TYPE_HERMITIAN
The matrix is Hermitian (only the requested
triangle is processed).
• SPARSE_MATRIX_TYPE_TRIANGULAR
The matrix is triangular (only the requested
triangle is processed).
• SPARSE_MATRIX_TYPE_DIAGONAL
The matrix is diagonal (only diagonal
elements are processed).
• SPARSE_MATRIX_TYPE_BLOCK_TRIANGULAR
The matrix is block-triangular (only
requested triangle is processed). Applies to
BSR format only.
• SPARSE_MATRIX_TYPE_BLOCK_DIAGONAL
The matrix is block-diagonal (only diagonal
blocks are processed). Applies to BSR format
only.

sparse_fill_mode_t Specifies the triangular matrix part for

mode symmetric, Hermitian, triangular, and block-
triangular matrices:

• SPARSE_FILL_MODE_LOWER
The lower triangular matrix part is processed.
• SPARSE_FILL_MODE_UPPER
The upper triangular matrix part is
processed.

sparse_diag_type_t Specifies diagonal type for non-general

diag matrices:

310
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• SPARSE_DIAG_NON_UNIT
Diagonal elements might not be equal to
one.
• SPARSE_DIAG_UNIT
Diagonal elements are equal to one.

A Handle containing internal data.

expected_calls Estimate of the number of calls to the execution routine.

Output Parameters

A Handle containing internal data.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_set_memory_hint
Provides memory requirements for performance
optimization purposes.

Syntax
sparse_status_t mkl_sparse_set_memory_hint (const sparse_matrix_t A, const
sparse_memory_usage_t policy);

Include Files
• mkl_spblas.h

Description
The mkl_sparse_set_memory_hint routine allocates additional memory for further performance
optimization purposes.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

311
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

policy Specify memory utilization policy for optimization routine using these types:

SPARSE_MEMORY_NONE Routine can allocate memory only for auxiliary

structures (such as for workload balancing); the
amount of memory is proportional to vector
size.

SPARSE_MEMORY_AGGRESS Default.
IVE Routine can allocate memory up to the size of
matrix A for converting into the appropriate
sparse format.

Output Parameters

A Handle containing internal data.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_optimize
Analyzes matrix structure and performs optimizations
using the hints provided in the handle.

Syntax
sparse_status_t mkl_sparse_optimize (sparse_matrix_t A);

Include Files
• mkl_spblas.h

Description
The mkl_sparse_optimize routine analyzes matrix structure and performs optimizations using the hints
provided in the handle. Generally, specifying a higher number of expected operations allows for more
aggressive and time consuming optimizations.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.

312
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Product and Performance Information

Notice revision #20201201

Input Parameters

A Handle containing internal data.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

Inspector-Executor Sparse BLAS Execution Routines

Execution Routines and Their Data Types
Routine or Data Types Description
Function Group

mkl_sparse_? s, d, c, z Computes an action of a preconditioner which corresponds

_lu_smoother to the approximate matrix decomposition A ≈ (L+D)*E*(U
+D) for the system Ax = b

mkl_sparse_?_mv s, d, c, z Computes a sparse matrix-vector product.

mkl_sparse_?_ s, d, c, z Solves a system of linear equations for a square sparse

trsv matrix.

mkl_sparse_?_mm s, d, c, z Computes the product of a sparse matrix and a dense

matrix and stores the result as a dense matrix.

mkl_sparse_? s, d, c, z Solves a system of linear equations with multiple right-

_trsm hand sides for a square sparse matrix.

mkl_sparse_?_add s, d, c, z Computes the sum of two sparse matrices. The result is

stored in a newly allocated sparse matrix.

mkl_sparse_spmm s, d, c, z Computes the product of two sparse matrices and stores

the result in a newly allocated sparse matrix.

mkl_sparse_? s, d, c, z Computes the product of two sparse matrices and stores

_spmmd the result as a dense matrix.

mkl_sparse_sp2m s, d, c, z Computes the product of two sparse matrices (support

operations on both matrices) and stores the result in a
newly allocated sparse matrix.

313
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Routine or Data Types Description

Function Group

mkl_sparse_? s, d, c, z Computes the product of two sparse matrices (support

_sp2md operations on both matrices) and stores the result as a
dense matrix.

mkl_sparse_sypr s, d, c, z Computes the symmetric product of three sparse matrices

and stores the result in a newly allocated sparse matrix.

mkl_sparse_? s, d, c, z Computes the symmetric triple product of a sparse matrix

_syprd and a dense matrix and stores the result as a dense
matrix.

mkl_sparse_? s, d, c, z Computes an action of a symmetric Gauss-Seidel

_symgs preconditioner.

mkl_sparse_? s, d, c, z Computes an action of a symmetric Gauss-Seidel

_symgs_mv preconditioner followed by a matrix-vector multiplication
at the end.

mkl_sparse_? s, d, c, z Computes the product of sparse matrix with its transpose

_syrkd (or conjugate transpose) and stores the result as a dense
matrix.

mkl_sparse_syrk s, d, c, z Computes the product of a sparse matrix with its

transpose (or conjugate transpose) and stores the result
in a newly allocated sparse matrix.

mkl_sparse_? s, d, c, z Computes a sparse matrix-vector product followed by a

_dotmv dot product.

mkl_sparse_?_lu_smoother
Computes an action of a preconditioner which
corresponds to the approximate matrix decomposition
A ≈ L + D × E × U + D for the system Ax = b (see
description below).

Syntax
sparse_status_t mkl_sparse_s_lu_smoother (const sparse_operation_t op, const
sparse_matrix_t A, const struct matrix descr descr, const float *diag, const float
*approx_diag_inverse, float *x, const float *b);
sparse_status_t mkl_sparse_d_lu_smoother (const sparse_operation_t op, const
sparse_matrix_t A, const struct matrix descr descr, const double *diag, const double
*approx_diag_inverse, double *x, const double *b);
sparse_status_t mkl_sparse_c_lu_smoother (const sparse_operation_t op, const
sparse_matrix_t A, const struct matrix descr descr, const MKL_COMPLEX8 *diag, const
MKL_COMPLEX8 *approx_diag_inverse, MKL_COMPLEX8 *x, const MKL_COMPLEX8 *b);
sparse_status_t mkl_sparse_z_lu_smoother (const sparse_operation_t op, const
sparse_matrix_t A, const struct matrix descr descr, const MKL_COMPLEX16 *diag, const
MKL_COMPLEX16 *approx_diag_inverse, MKL_COMPLEX16 *x, const MKL_COMPLEX16 *b);

Include Files
• mkl_spblas.h

314
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
This routine computes an update for an iterative solution x of the system Ax=b by means of applying one
iteration of an approximate preconditioner which is based on the following approximation:

A L + D * E * U + D , where E is an approximate inverse of the diagonal (using exact inverse will result in
Gauss-Seidel preconditioner), L and U are lower/upper triangular parts of A, D is the diagonal (block diagonal
in case of BSR format) of A.
The mkl_sparse_?_lu_smoother routine performs these operations:

r = b - Ax / 1. Computes the residual */

(L + D)*E*(U + D)*dx = r /* 2. Finds the update dx by solving the system */
y = x + dx /* 3. Performs an update */
This is also equal to the Symmetric Gauss-Seidel operation in the case of a CSR format and 1x1 diagonal
blocks:

(L + D)x^1 = b - Ux /* Lower solve for intermediate x^1 */

(U + D)*x = b - L*x^1 /* Upper solve */

NOTE
This routine is supported only for non-transpose operation, real data types, and CSR/BSR
sparse formats. In a BSR format, both diagonal values and approximate diagonal inverse
arrays should be passed explicitly. For CSR format, diagonal values should be passed
explicitly.

Input Parameters

operation Specifies the operation performed on matrix A.

SPARSE_OPERATION_NON_
TRANSPOSE, op(A) := A NOTE
Transpose and conjugate transpose
(SPARSE_OPERATION_TRANSPOSE and
SPARSE_OPERATION_CONJUGATE_TRANSPOSE)
are not supported.

Non-transpose, op(A)= A.

A Handle which contains the sparse matrix A.

descr Structure specifying sparse matrix properties.

sparse_matrix_type_ttype - Specifies the type of a sparse matrix:

SPARSE_MATRIX_TYPE_GE The matrix is processed as is.

NERAL

SPARSE_MATRIX_TYPE_SY The matrix is symmetric (only the requested

MMETRIC triangle is processed).

SPARSE_MATRIX_TYPE_HE The matrix is Hermitian (only the requested

RMITIAN triangle is processed).

SPARSE_MATRIX_TYPE_TR The matrix is triangular (only the requested

IANGULAR triangle is processed).

315
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

SPARSE_MATRIX_TYPE_DI The matrix is diagonal (only diagonal elements

AGONAL are processed).

SPARSE_MATRIX_TYPE_BL The matrix is block-triangular (only the

OCK_TRIANGULAR requested triangle is processed). Applies to BSR
format only.

SPARSE_MATRIX_TYPE_BL The matrix is block-diagonal (only diagonal

OCK_DIAGONAL blocks are processed). Applies to BSR format
only.

sparse_fill_mode_tmode - Specifies the triangular matrix part for

symmetric, Hermitian, triangular, and block-triangular matrices:

SPARSE_FILL_MODE_LOWE The lower triangular matrix part is processed.

SPARSE_FILL_MODE_UPPE The upper triangular matrix part is processed.

R
sparse_diag_type_tdiag - Specifies the diagonal type for non-general
matrices:

SPARSE_DIAG_NON_UNIT Diagonal elements might not be equal to one.

SPARSE_DIAG_UNIT Diagonal elements are equal to one.

NOTE
Only SPARSE_MATRIX_TYPE_GENERAL is supported.

diag Array of size at least m, where m is the number of rows (or nrows *
block_size * block_size in case of BSR format) of matrix A.
The array diag must contain the diagonal values of matrix A.

approx_diag_inverse Array of size at least m, where m is the number of rows (or the number of
rows * block_size * block_size in case of BSR format) of matrix A.
The array approx_diag_inverse will be used as E, approximate inverse of
the diagonal of the matrix A.

x Array of size at least k, where k is the number of columns (or columns *

block_size in case of BSR format) of matrix A.
On entry, the array x must contain the input vector.

b Array of size at least m, where m is the number of rows ( or rows *

block_size in case of BSR format ) of matrix A. The array b must contain
the values of the right-hand side of the system.

Output Parameters

x Overwritten by the computed vector y.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

316
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_?_mv
Computes a sparse matrix- vector product.

Syntax
sparse_status_t mkl_sparse_s_mv (const sparse_operation_t operation, const float alpha,
const sparse_matrix_t A, const struct matrix_descr descr, const float *x, const float
beta, float *y);
sparse_status_t mkl_sparse_d_mv (const sparse_operation_t operation, const double
alpha, const sparse_matrix_t A, const struct matrix_descr descr, const double *x, const
double beta, double *y);
sparse_status_t mkl_sparse_c_mv (const sparse_operation_t operation, const MKL_Complex8
alpha, const sparse_matrix_t A, const struct matrix_descr descr, const MKL_Complex8 *x,
const MKL_Complex8 beta, MKL_Complex8 *y);
sparse_status_t mkl_sparse_z_mv (const sparse_operation_t operation, const
MKL_Complex16 alpha, const sparse_matrix_t A, const struct matrix_descr descr, const
MKL_Complex16 *x, const MKL_Complex16 beta, MKL_Complex16 *y);

Include Files
• mkl_spblas.h

Description
The mkl_sparse_?_mv routine computes a sparse matrix-dense vector product defined as

y := alpha*op(A)*x + beta*y
where:
alpha and beta are scalars, x and y are vectors, and A is a sparse matrix handle of a matrix with m rows and
k columns, and op is a matrix modifier for matrix A.

Input Parameters

operation Specifies operation op() on input matrix.

SPARSE_OPERATION_NON_ Non-transpose, op(A) = A.

TRANSPOSE
T
SPARSE_OPERATION_TRAN Transpose, op(A) = A .
SPOSE

317
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

H
SPARSE_OPERATION_CONJ Conjugate transpose, op(A) = A .
UGATE_TRANSPOSE

alpha Specifies the scalar alpha.

A Handle which contains the input matrix A.

descr Structure specifying sparse matrix properties.

sparse_matrix_type_t type - Specifies the type of a sparse matrix:

SPARSE_MATRIX_TYPE_GE The matrix is processed as is.

NERAL

SPARSE_MATRIX_TYPE_SY The matrix is symmetric (only the requested

MMETRIC triangle is processed).

SPARSE_MATRIX_TYPE_HE The matrix is Hermitian (only the requested

RMITIAN triangle is processed).

SPARSE_MATRIX_TYPE_TR The matrix is triangular (only the requested

IANGULAR triangle is processed).

SPARSE_MATRIX_TYPE_DI The matrix is diagonal (only diagonal elements

AGONAL are processed).

SPARSE_MATRIX_TYPE_BL The matrix is block-triangular (only requested

OCK_TRIANGULAR triangle is processed). Applies to BSR format
only.

SPARSE_MATRIX_TYPE_BL The matrix is block-diagonal (only diagonal

OCK_DIAGONAL blocks are processed). Applies to BSR format
only.

sparse_fill_mode_t mode - Specifies the triangular matrix part for

symmetric, Hermitian, triangular, and block-triangular matrices:

SPARSE_FILL_MODE_LOWE The lower triangular matrix part is processed.

SPARSE_FILL_MODE_UPPE The upper triangular matrix part is processed.

R
sparse_diag_type_t diag - Specifies diagonal type for non-general
matrices:

SPARSE_DIAG_NON_UNIT Diagonal elements might not be equal to one.

SPARSE_DIAG_UNIT Diagonal elements are equal to one.

x Array of size equal to the number of columns, k of A if operation =

SPARSE_OPERATION_NON_TRANSPOSE and at least the number of rows, m,
of A otherwise. On entry, the array must contain the vector x.

beta Specifies the scalar beta.

y Array with size at least m if

operation=SPARSE_OPERATION_NON_TRANSPOSE and at least k otherwise.
On entry, the array y must contain the vector y. Array of size equal to the

318
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
number of rows, m of A if operation =
SPARSE_OPERATION_NON_TRANSPOSE and at least the number of columns,
k, of A otherwise. On entry, the array y must contain the vector y.

Output Parameters

y Overwritten by the updated vector y.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_?_trsv
Solves a system of linear equations for a triangular
sparse matrix.

Syntax
sparse_status_t mkl_sparse_s_trsv (const sparse_operation_t operation, const float
alpha, const sparse_matrix_t A, const struct matrix_descr descr, const float *x, float
*y);
sparse_status_t mkl_sparse_d_trsv (const sparse_operation_t operation, const double
alpha, const sparse_matrix_t A, const struct matrix_descr descr, const double *x,
double *y);
sparse_status_t mkl_sparse_c_trsv (const sparse_operation_t operation, const
MKL_Complex8 alpha, const sparse_matrix_t A, const struct matrix_descr descr, const
MKL_Complex8 *x, MKL_Complex8 *y);
sparse_status_t mkl_sparse_z_trsv (const sparse_operation_t operation, const
MKL_Complex16 alpha, const sparse_matrix_t A, const struct matrix_descr descr, const
MKL_Complex16 *x, MKL_Complex16 *y);

Include Files
• mkl_spblas.h

Description
The mkl_sparse_?_trsv routine solves a system of linear equations for a matrix:

op(A)*y = alpha * x
where A is a triangular sparse matrix , op is a matrix modifier for matrix A, alpha is a scalar, and x and y are
vectors .

319
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

NOTE
For sparse matrices in the BSR format, the supported combinations of
(indexing,block_layout) are:
• (SPARSE_INDEX_BASE_ZERO, SPARSE_LAYOUT_ROW_MAJOR)
• (SPARSE_INDEX_BASE_ONE, SPARSE_LAYOUT_COLUMN_MAJOR)

Input Parameters

operation Specifies operation op() on input matrix.

SPARSE_OPERATION_NON_ Non-transpose, op(A) = A.

TRANSPOSE
T
SPARSE_OPERATION_TRAN Transpose, op(A) = A .
SPOSE
H
SPARSE_OPERATION_CONJ Conjugate transpose, op(A) = A .
UGATE_TRANSPOSE

alpha Specifies the scalar alpha.

A Handle which contains the input matrix A.

descr Structure specifying sparse matrix properties.

sparse_matrix_type_t type - Specifies the type of a sparse matrix:

SPARSE_MATRIX_TYPE_GE The matrix is processed as is.

NERAL

SPARSE_MATRIX_TYPE_SY The matrix is symmetric (only the requested

MMETRIC triangle is processed).

SPARSE_MATRIX_TYPE_HE The matrix is Hermitian (only the requested

RMITIAN triangle is processed).

SPARSE_MATRIX_TYPE_TR The matrix is triangular (only the requested

IANGULAR triangle is processed).

SPARSE_MATRIX_TYPE_DI The matrix is diagonal (only diagonal elements

AGONAL are processed).

SPARSE_MATRIX_TYPE_BL The matrix is block-triangular (only requested

OCK_TRIANGULAR triangle is processed). Applies to BSR format
only.

SPARSE_MATRIX_TYPE_BL The matrix is block-diagonal (only diagonal

OCK_DIAGONAL blocks are processed). Applies to BSR format
only.

sparse_fill_mode_t mode - Specifies the triangular matrix part for

symmetric, Hermitian, triangular, and block-triangular matrices:

SPARSE_FILL_MODE_LOWE The lower triangular matrix part is processed.

SPARSE_FILL_MODE_UPPE The upper triangular matrix part is processed.

320
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
sparse_diag_type_t diag - Specifies diagonal type for non-general
matrices:

SPARSE_DIAG_NON_UNIT Diagonal elements might not be equal to one.

SPARSE_DIAG_UNIT Diagonal elements are equal to one.

x Array of size at least m, where m is the number of rows of matrix A. On

entry, the array must contain the vector x.

Output Parameters

y Array of size at least m containing the solution to the system of linear

equations.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_?_mm
Computes the product of a sparse matrix and a dense
matrix and stores the result as a dense matrix.

Syntax
sparse_status_t mkl_sparse_s_mm (const sparse_operation_t operation, const float alpha,
const sparse_matrix_t A, const struct matrix_descr descr, const sparse_layout_t layout,
const float *B, const MKL_INT columns, const MKL_INT ldb, const float beta, float *C,
const MKL_INT ldc);
sparse_status_t mkl_sparse_d_mm (const sparse_operation_t operation, const double
alpha, const sparse_matrix_t A, const struct matrix_descr descr, const sparse_layout_t
layout, const double *B, const MKL_INT columns, const MKL_INT ldb, const double beta,
double *C, const MKL_INT ldc);
sparse_status_t mkl_sparse_c_mm (const sparse_operation_t operation, const MKL_Complex8
alpha, const sparse_matrix_t A, const struct matrix_descr descr, const sparse_layout_t
layout, const MKL_Complex8 *B, const MKL_INT columns, const MKL_INT ldb, const
MKL_Complex8 beta, MKL_Complex8 *C, const MKL_INT ldc);
sparse_status_t mkl_sparse_z_mm (const sparse_operation_t operation, const
MKL_Complex16 alpha, const sparse_matrix_t A, const struct matrix_descr descr, const
sparse_layout_t layout, const MKL_Complex16 *B, const MKL_INT columns, const MKL_INT
ldb, const MKL_Complex16 beta, MKL_Complex16 *C, const MKL_INT ldc);

321
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl_spblas.h

Description
The mkl_sparse_?_mm routine performs a matrix-matrix operation:

C := alpha*op(A)*B + beta*C
where alpha and beta are scalars, A is a sparse matrix, op is a matrix modifier for matrix A, and B and C are
dense matrices.
The mkl_sparse_?_mm and mkl_sparse_?_trsm routines support these configurations:

Column-major dense matrix: Row-major dense matrix: layout

layout = = SPARSE_LAYOUT_ROW_MAJOR
SPARSE_LAYOUT_COLUMN_MAJOR

0-based sparse matrix: CSR All formats

SPARSE_INDEX_BASE_ZERO
BSR: general non-transposed
matrix multiplication only

1-based sparse matrix: All formats CSR

SPARSE_INDEX_BASE_ONE
BSR: general non-transposed
matrix multiplication only

NOTE
For sparse matrices in the BSR format, the supported combinations of
(indexing,block_layout) are:
• (SPARSE_INDEX_BASE_ZERO, SPARSE_LAYOUT_ROW_MAJOR )
• (SPARSE_INDEX_BASE_ONE, SPARSE_LAYOUT_COLUMN_MAJOR )

Input Parameters

operation Specifies operation op() on input matrix.

SPARSE_OPERATION_NON_ Non-transpose, op(A) = A.

TRANSPOSE
T
SPARSE_OPERATION_TRAN Transpose, op(A) = A .
SPOSE
H
SPARSE_OPERATION_CONJ Conjugate transpose, op(A) = A .
UGATE_TRANSPOSE

alpha Specifies the scalar alpha.

A Handle which contains the sparse matrix A.

descr Structure specifying sparse matrix properties.

sparse_matrix_type_t type - Specifies the type of a sparse matrix:

SPARSE_MATRIX_TYPE_GE The matrix is processed as is.

NERAL

322
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
SPARSE_MATRIX_TYPE_SY The matrix is symmetric (only the requested
MMETRIC triangle is processed).

SPARSE_MATRIX_TYPE_HE The matrix is Hermitian (only the requested

RMITIAN triangle is processed).

SPARSE_MATRIX_TYPE_TR The matrix is triangular (only the requested

IANGULAR triangle is processed).

SPARSE_MATRIX_TYPE_DI The matrix is diagonal (only diagonal elements

AGONAL are processed).

SPARSE_MATRIX_TYPE_BL The matrix is block-triangular (only requested

OCK_TRIANGULAR triangle is processed). Applies to BSR format
only.

SPARSE_MATRIX_TYPE_BL The matrix is block-diagonal (only diagonal

OCK_DIAGONAL blocks are processed). Applies to BSR format
only.

sparse_fill_mode_t mode - Specifies the triangular matrix part for

symmetric, Hermitian, triangular, and block-triangular matrices:

SPARSE_FILL_MODE_LOWE The lower triangular matrix part is processed.

SPARSE_FILL_MODE_UPPE The upper triangular matrix part is processed.

R
sparse_diag_type_t diag - Specifies diagonal type for non-general
matrices:

SPARSE_DIAG_NON_UNIT Diagonal elements might not be equal to one.

SPARSE_DIAG_UNIT Diagonal elements are equal to one.

layout Describes the storage scheme for the dense matrix:

SPARSE_LAYOUT_COLUMN_ Storage of elements uses column major layout.

MAJOR
SPARSE_LAYOUT_ROW_MAJ Storage of elements uses row major layout.
OR

B Array of size at least rows*cols.

layout = layout =
SPARSE_LAYOUT_COLU SPARSE_LAYOUT_ROW_MA
MN_MAJOR JOR

rows (number of ldb If op(A) = A, number

rows in B) of columns in A
If op(A) = AT, number
of rows in A

cols (number of columns ldb

columns in B)

columns Number of columns of matrix C.

323
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ldb Specifies the leading dimension of matrix B.

beta Specifies the scalar beta

C Array of size at least rows*cols, where

layout = layout =
SPARSE_LAYOUT_COLU SPARSE_LAYOUT_ROW_MA
MN_MAJOR JOR

rows (number of ldc If op(A) = A, number

rows in C) of rows in A
If op(A) = AT, number
of columns in A

cols (number of columns ldc

columns in C)

ldc Specifies the leading dimension of matrix C.

Output Parameters

C Overwritten by the updated matrix C.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_?_trsm
Solves a system of linear equations with multiple right
hand sides for a triangular sparse matrix.

Syntax
sparse_status_t mkl_sparse_s_trsm (const sparse_operation_t operation, const float
alpha, const sparse_matrix_t A, const struct matrix_descr descr, const sparse_layout_t
layout, const float *x, const MKL_INT columns, const MKL_INT ldx, float *y, const
MKL_INT ldy);
sparse_status_t mkl_sparse_d_trsm (const sparse_operation_t operation, const double
alpha, const sparse_matrix_t A, const struct matrix_descr descr, const sparse_layout_t
layout, const double *x, const MKL_INT columns, const MKL_INT ldx, double *y, const
MKL_INT ldy);

324
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
sparse_status_t mkl_sparse_c_trsm (const sparse_operation_t operation, const
MKL_Complex8 alpha, const sparse_matrix_t A, const struct matrix_descr descr, const
sparse_layout_t layout, const MKL_Complex8 *x, const MKL_INT columns, const MKL_INT
ldx, MKL_Complex8 *y, const MKL_INT ldy);
sparse_status_t mkl_sparse_z_trsm (const sparse_operation_t operation, const
MKL_Complex16 alpha, const sparse_matrix_t A, const struct matrix_descr descr, const
sparse_layout_t layout, const MKL_Complex16 *x, const MKL_INT columns, const MKL_INT
ldx, MKL_Complex16 *y, const MKL_INT ldy);

Include Files
• mkl_spblas.h

Description
The mkl_sparse_?_trsm routine solves a system of linear equations with multiple right hand sides for a
triangular sparse matrix:

Y := alpha*inv(op(A))*X
where:
alpha is a scalar, X and Y are dense matrices, A is a sparse matrix, and op is a matrix modifier for matrix A.

The mkl_sparse_?_mm and mkl_sparse_?_trsm routines support these configurations:

Column-major dense matrix: Row-major dense matrix: layout

layout = = SPARSE_LAYOUT_ROW_MAJOR
SPARSE_LAYOUT_COLUMN_MAJOR

0-based sparse matrix: CSR All formats

SPARSE_INDEX_BASE_ZERO
BSR: general non-transposed
matrix multiplication only

1-based sparse matrix: All formats CSR

SPARSE_INDEX_BASE_ONE
BSR: general non-transposed
matrix multiplication only

NOTE
For sparse matrices in the BSR format, the supported combinations of
(indexing,block_layout) are:
• (SPARSE_INDEX_BASE_ZERO, SPARSE_LAYOUT_ROW_MAJOR )
• (SPARSE_INDEX_BASE_ONE, SPARSE_LAYOUT_COLUMN_MAJOR )

Input Parameters

operation Specifies operation op() on input matrix.

SPARSE_OPERATION_NON_ Non-transpose, op(A) = A.

TRANSPOSE
T
SPARSE_OPERATION_TRAN Transpose, op(A) = A .
SPOSE
H
SPARSE_OPERATION_CONJ Conjugate transpose, op(A) = A .
UGATE_TRANSPOSE

325
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

alpha Specifies the scalar alpha.

A Handle which contains the sparse matrix A.

descr Structure specifying sparse matrix properties.

sparse_matrix_type_t type - Specifies the type of a sparse matrix:

SPARSE_MATRIX_TYPE_GE The matrix is processed as is.

NERAL

SPARSE_MATRIX_TYPE_SY The matrix is symmetric (only the requested

MMETRIC triangle is processed).

SPARSE_MATRIX_TYPE_HE The matrix is Hermitian (only the requested

RMITIAN triangle is processed).

SPARSE_MATRIX_TYPE_TR The matrix is triangular (only the requested

IANGULAR triangle is processed).

SPARSE_MATRIX_TYPE_DI The matrix is diagonal (only diagonal elements

AGONAL are processed).

SPARSE_MATRIX_TYPE_BL The matrix is block-triangular (only requested

OCK_TRIANGULAR triangle is processed). Applies to BSR format
only.

SPARSE_MATRIX_TYPE_BL The matrix is block-diagonal (only diagonal

OCK_DIAGONAL blocks are processed). Applies to BSR format
only.

sparse_fill_mode_t mode - Specifies the triangular matrix part for

symmetric, Hermitian, triangular, and block-triangular matrices:

SPARSE_FILL_MODE_LOWE The lower triangular matrix part is processed.

SPARSE_FILL_MODE_UPPE The upper triangular matrix part is processed.

R
sparse_diag_type_t diag - Specifies diagonal type for non-general
matrices:

SPARSE_DIAG_NON_UNIT Diagonal elements might not be equal to one.

SPARSE_DIAG_UNIT Diagonal elements are equal to one.

layout Describes the storage scheme for the dense matrix:

SPARSE_LAYOUT_COLUMN_ Storage of elements uses column major layout.

MAJOR

SPARSE_LAYOUT_ROW_MAJ Storage of elements uses row major layout.

x Array of size at least rows*cols.

layout = layout =
SPARSE_LAYOUT_COLU SPARSE_LAYOUT_ROW_MA
MN_MAJOR JOR

326
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
rows (number of ldx number of rows in A
rows in x)

cols (number of columns ldx

columns in x)

On entry, the array x must contain the matrix X.

columns Number of columns in matrix Y.

ldx Specifies the leading dimension of matrix X.

y Array of size at least rows*cols, where

layout = layout =
SPARSE_LAYOUT_COLU SPARSE_LAYOUT_ROW_MA
MN_MAJOR JOR

rows (number of ldy number of rows in A

rows in y)

cols (number of columns ldy

columns in y)

Output Parameters

y Overwritten by the updated matrix Y.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_?_add
Computes the sum of two sparse matrices. The result
is stored in a newly allocated sparse matrix.

Syntax
sparse_status_t mkl_sparse_s_add (const sparse_operation_t operation, const
sparse_matrix_t A, const float alpha, const sparse_matrix_t B, sparse_matrix_t *C);
sparse_status_t mkl_sparse_d_add (const sparse_operation_t operation, const
sparse_matrix_t A, const double alpha, const sparse_matrix_t B, sparse_matrix_t *C);

327
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

sparse_status_t mkl_sparse_c_add (const sparse_operation_t operation, const

sparse_matrix_t A, const MKL_Complex8 alpha, const sparse_matrix_t B, sparse_matrix_t
*C);
sparse_status_t mkl_sparse_z_add (const sparse_operation_t operation, const
sparse_matrix_t A, const MKL_Complex16 alpha, const sparse_matrix_t B, sparse_matrix_t
*C);

Include Files
• mkl_spblas.h

Description
The mkl_sparse_?_add routine performs a matrix-matrix operation:

C := alpha*op(A) + B
where alpha is a scalar, op is a matrix modifier, and A, B, and C are sparse matrices.

NOTE
This routine is only supported for sparse matrices in CSR and BSR formats. It is not
supported for COO or CSC formats.

Input Parameters

A Handle which contains the sparse matrix A.

alpha Specifies the scalar alpha.

operation Specifies operation op() on input matrix.

SPARSE_OPERATION_NON_ Non-transpose, op(A) = A.

TRANSPOSE
T
SPARSE_OPERATION_TRAN Transpose, op(A) = A .
SPOSE
H
SPARSE_OPERATION_CONJ Conjugate transpose, op(A) = A .
UGATE_TRANSPOSE

B Handle which contains the sparse matrix B.

Output Parameters

C Handle which contains the resulting sparse matrix.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

328
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_spmm
Computes the product of two sparse matrices. The
result is stored in a newly allocated sparse matrix.

Syntax
sparse_status_t mkl_sparse_spmm (const sparse_operation_t operation, const
sparse_matrix_t A, const sparse_matrix_t B, sparse_matrix_t *C);

Include Files
• mkl_spblas.h

Description
The mkl_sparse_spmm routine performs a matrix-matrix operation:

C := op(A) *B
where A, B, and C are sparse matrices and op is a matrix modifier for matrix A.

Notes
• This routine is supported only for sparse matrices in CSC, CSR, and BSR formats. It is not
supported for sparse matrices in COO format.
• The column indices of the output matrix (if in CSR format) can appear unsorted due to the
algorithm chosen internally. To ensure sorted column indices (if that is important), call
mkl_sparse_order().

Input Parameters

operation Specifies operation op() on input matrix.

SPARSE_OPERATION_NON_ Non-transpose, op(A) = A.

TRANSPOSE
T
SPARSE_OPERATION_TRAN Transpose, op(A) = A .
SPOSE
H
SPARSE_OPERATION_CONJ Conjugate transpose, op(A) = A .
UGATE_TRANSPOSE

A Handle which contains the sparse matrix A.

B Handle which contains the sparse matrix B.

Output Parameters

C Handle which contains the resulting sparse matrix.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

329
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_?_spmmd
Computes the product of two sparse matrices and
stores the result as a dense matrix.

Syntax
sparse_status_t mkl_sparse_s_spmmd (const sparse_operation_t operation, const
sparse_matrix_t A, const sparse_matrix_t B, const sparse_layout_t layout, float *C,
const MKL_INT ldc);
sparse_status_t mkl_sparse_d_spmmd (const sparse_operation_t operation, const
sparse_matrix_t A, const sparse_matrix_t B, const sparse_layout_t layout, double *C,
const MKL_INT ldc);
sparse_status_t mkl_sparse_c_spmmd (const sparse_operation_t operation, const
sparse_matrix_t A, const sparse_matrix_t B, const sparse_layout_t layout, MKL_Complex8
*C, const MKL_INT ldc);
sparse_status_t mkl_sparse_z_spmmd (const sparse_operation_t operation, const
sparse_matrix_t A, const sparse_matrix_t B, const sparse_layout_t layout, MKL_Complex16
*C, const MKL_INT ldc);

Include Files
• mkl_spblas.h

Description
The mkl_sparse_?_spmmd routine performs a matrix-matrix operation:

C := op(A)*B
where A and B are sparse matrices, op is a matrix modifier for matrix A, and C is a dense matrix.

NOTE
This routine is not supported for sparse matrices in the COO format. For sparse matrices in
BSR format, these combinations of (indexing, block_layout) are supported:
• (SPARSE_INDEX_BASE_ZERO, SPARSE_LAYOUT_ROW_MAJOR)
• (SPARSE_INDEX_BASE_ONE, SPARSE_LAYOUT_COLUMN_MAJOR)

Input Parameters

operation Specifies operation op() on input matrix.

330
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
SPARSE_OPERATION_NON_ Non-transpose, op(A) = A.
TRANSPOSE
T
SPARSE_OPERATION_TRAN Transpose, op(A) = A .
SPOSE
H
SPARSE_OPERATION_CONJ Conjugate transpose, op(A) = A .
UGATE_TRANSPOSE

A Handle which contains the sparse matrix A.

B Handle which contains the sparse matrix B.

layout Describes the storage scheme for the dense matrix:

SPARSE_LAYOUT_COLUMN_ Storage of elements uses column major layout.

MAJOR

SPARSE_LAYOUT_ROW_MAJ Storage of elements uses row major layout.

ldC Leading dimension of matrix C.

Output Parameters

C Resulting dense matrix.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_sp2m
Computes the product of two sparse matrices. The
result is stored in a newly allocated sparse matrix.

Syntax
sparse_status_t mkl_sparse_sp2m (const sparse_operation_t transA, const struct
matrix_descr descrA, const sparse_matrix_t A, const sparse_operation_t transB, const
struct matrix_descr descrB, const sparse_matrix_t B, const sparse_request_t request,
sparse_matrix_t *C);

Include Files
• mkl_spblas.h

331
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Description
The mkl_sparse_sp2m routine performs a matrix-matrix operation:

C := opA(A) *opB(B)
where A,B, and C are sparse matrices, opA and opB are matrix modifiers for matrices A and B, respectively.

NOTE
The column indices of the output matrix (if in CSR format) can appear unsorted due to the
algorithm chosen internally. To ensure sorted column indices (if that is important), call
mkl_sparse_order().

Input Parameters

opA Specifies operation on input matrix.

SPARSE_OPERATION_NON_TRANSPOSE Non-transpose, op(A)=A

SPARSE_OPERATION_TRANSPOSE Transpose, op(A)=AT
SPARSE_OPERATION_CONJUGATE_TRANSP Conjugate transpose,
OSE op(A)=AH

opB Specifies operation on input matrix.

SPARSE_OPERATION_NON_TRANSPOSE Non-transpose, op(B)=B

SPARSE_OPERATION_TRANSPOSE Transpose, op(B)=BT
SPARSE_OPERATION_CONJUGATE_TRANSP Conjugate transpose,
OSE op(B)=BH

descrA Structure that specifies sparse matrix properties.

NOTE Currently, only SPARSE_MATRIX_TYPE_GENERAL is

supported.

sparse_matrix_type_ttype specifies the type of sparse matrix.

SPARSE_MATRIX_TYPE_GENERAL The matrix is processed as is.

332
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
SPARSE_MATRIX_TYPE_BLOCK_TRI The matrix is block-triangular (only
ANGULAR the requested triangle is
processed). This applies to BSR
format only.
SPARSE_MATRIX_TYPE_BLOCK_DIA The matrix is block-diagonal (only
GONAL the requested triangle is
processed). This applies to BSR
format only.

sparse_fill_mode_tmode specifies the triangular matrix portion for

symmetric, Hermitian, triangular, and block-triangular matrices.

SPARSE_FILL_MODE_LOWER The lower triangular matrix is

processed.
SPARSE_FILL_MODE_UPPER The upper triangular matrix is
processed.

sparse_diag_type_tdiag specifies the type of diagonal for non-general

matrices.

SPARSE_DIAG_NON_UNIT Diagonal elements must not be

equal to 1.
SPARSE_DIAG_UNIT Diagonal elements are equal to 1.

descrB Structure that specifies sparse matrix properties.

NOTE Currently, only SPARSE_MATRIX_TYPE_GENERAL is

supported.

sparse_matrix_type_ttype specifies the type of sparse matrix.

SPARSE_MATRIX_TYPE_GENERAL The matrix is processed as is.

SPARSE_MATRIX_TYPE_SYMMETRIC The matrix is symmetric (only the
requested triangle is processed).
SPARSE_MATRIX_TYPE_HERMITIAN The matrix is Hermitian (only the
requested triangle is processed).
SPARSE_MATRIX_TYPE_TRIANGULA The matrix is triangular (only the
R requested triangle is processed).
SPARSE_MATRIX_TYPE_DIAGONAL The matrix is diagonal (only
diagonal elements are processed).
SPARSE_MATRIX_TYPE_BLOCK_TRI The matrix is block-triangular (only
ANGULAR the requested triangle is
processed). This applies to BSR
format only.

333
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

SPARSE_MATRIX_TYPE_BLOCK_DIA The matrix is block-diagonal (only

GONAL the requested triangle is
processed). This applies to BSR
format only.

sparse_fill_mode_tmode specifies the triangular matrix portion for

symmetric, Hermitian, triangular, and block-triangular matrices.

SPARSE_FILL_MODE_LOWER The lower triangular matrix is

processed.
SPARSE_FILL_MODE_UPPER The upper triangular matrix is
processed.

sparse_diag_type_tdiag specifies the type of diagonal for non-general

matrices.

SPARSE_DIAG_NON_UNIT Diagonal elements must not be

equal to 1.
SPARSE_DIAG_UNIT Diagonal elements are equal to 1.

A Handle which contains the sparse matrix A.

B Handle which contains the sparse matrix B.

request Specifies whether the full computations are performed at once or using the
two-stage algorithm. See Two-stage Algorithm for Inspector-executor
Sparse BLAS Routines.

SPARSE_STAGE_NNZ_COUNT Only rowIndex (BSR/CSR format) or

colIndex (CSC format) array of the
matrix is computed internally. The
computation can be extracted to
measure the memory required for full
operation.
SPARSE_STAGE_FINALIZE_MULT_NO_ Finalize computations of the matrix
VAL structure (values will not be
computed). Use only after the call with
SPARSE_STAGE_NNZ_COUNT
parameter.
SPARSE_STAGE_FINALIZE_MULT Finalize computation. Can also be used
when the matrix structure remains
unchanged and only values of the
resulting matrix C need to be
recomputed.
SPARSE_STAGE_FULL_MULT_NO_VAL Perform computations of the matrix
structure.
SPARSE_STAGE_FULL_MULT Perform the entire computation in a
single step.

Output Parameters

C Handle which contains the resulting sparse matrix.

334
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Return Values
The function returns a value indicating whether the operation was successful, or the reason why it failed.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED The internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILE The execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error occurred in the implementation of the algorithm.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_?_sp2md
Computes the product of two sparse matrices (support
operations on both matrices) and stores the result as
a dense matrix.

Syntax
sparse_status_t mkl_sparse_s_sp2md ( const sparse_operation_t transA, const struct
matrix_descr descrA, const sparse_matrix_t A, const sparse_operation_t transB, const
struct matrix_descr descrB, const sparse_matrix_t B, const float alpha, const float
beta, float *C, const sparse_layout_t layout, const MKL_INT ldc );
sparse_status_t mkl_sparse_d_sp2md ( const sparse_operation_t transA, const struct
matrix_descr descrA, const sparse_matrix_t A, const sparse_operation_t transB, const
struct matrix_descr descrB, const sparse_matrix_t B, const double alpha, const double
beta, double *C, const sparse_layout_t layout, const MKL_INT ldc );
sparse_status_t mkl_sparse_c_sp2md ( const sparse_operation_t transA, const struct
matrix_descr descrA, const sparse_matrix_t A, const sparse_operation_t transB, const
struct matrix_descr descrB, const sparse_matrix_t B, const MKL_Complex8 alpha, const
MKL_Complex8 beta, MKL_Complex8 *C, const sparse_layout_t layout, const MKL_INT ldc );
sparse_status_t mkl_sparse_z_sp2md ( const sparse_operation_t transA, const struct
matrix_descr descrA, const sparse_matrix_t A, const sparse_operation_t transB, const
struct matrix_descr descrB, const sparse_matrix_t B, const MKL_Complex16 alpha, const
MKL_Complex16 beta, MKL_Complex16 *C, const sparse_layout_t layout, const MKL_INT
ldc );

Include Files
• mkl_spblas.h

Description
The mkl_sparse_?_sp2md routine performs a matrix-matrix operation:

C = alpha * opA(A) opB(B) + betaC

where A and B are sparse matrices, opA is a matrix modifier for matrix A, opB is a matrix modifier for matrix
B, and C is a dense matrix, alpha and beta are scalars.

335
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

transA Specifies operation op() on the input matrix.

SPARSE_OPERATION_NON_TRANSPOSE Non-transpose, op(A)=A

SPARSE_OPERATION_TRANSPOSE Transpose, op(A)=AT
SPARSE_OPERATION_CONJUGATE_TRANSP Conjugate transpose,
OSE op(A)=AH

descrA Structure that specifies the sparse matrix properties.

NOTE Currently, only SPARSE_MATRIX_TYPE_GENERAL is

supported.

sparse_matrix_type_ttype specifies the type of sparse matrix.

SPARSE_MATRIX_TYPE_GENERAL The matrix is processed as is.

sparse_fill_mode_tmode specifies the triangular matrix portion for

symmetric, Hermitian, triangular, and block-triangular matrices.

SPARSE_FILL_MODE_LOWER The lower triangular matrix is

processed.

336
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
SPARSE_FILL_MODE_UPPER The upper triangular matrix is
processed.

sparse_diag_type_tdiag specifies the type of diagonal for non-general

matrices.

SPARSE_DIAG_NON_UNIT Diagonal elements must not be

equal to 1.
SPARSE_DIAG_UNIT Diagonal elements are equal to 1.

A Handle which contains the sparse matrix A.

transB Specifies operation opB() on the input matrix.

SPARSE_OPERATION_NON_TRANSPO Non-transpose, opB(B)=B.

SE
SPARSE_OPERATION_TRANSPOSE Transpose, opB(B)=BT .
SPARSE_OPERATION_CONJUGATE_T Conjugate transpose, opB(B)=BH .
RANSPOSE

descrB Structure that specifies the sparse matrix properties.

NOTE
Currently, only SPARSE_MATRIX_TYPE_GENERAL is supported.

sparse_matrix_type_ttype specifies the type of sparse matrix.

SPARSE_MATRIX_TYPE_GENERAL The matrix is processed as is.

sparse_fill_mode_tmode specifies the triangular matrix portion for

symmetric, Hermitian, triangular, and block-triangular matrices.

337
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

SPARSE_FILL_MODE_LOWER The lower triangular matrix is

processed.
SPARSE_FILL_MODE_UPPER The upper triangular matrix is
processed.

sparse_diag_type_tdiag specifies the type of diagonal for non-general

matrices.

SPARSE_DIAG_NON_UNIT Diagonal elements must not be

equal to 1.
SPARSE_DIAG_UNIT Diagonal elements are equal to 1.

B Handle which contains the sparse matrix B.

alpha Specifies the scalar alpha.

beta Specifies the scalar beta.

layout Describes the storage scheme for the dense matrix:

SPARSE_LAYOUT_COLUMN_MAJOR Storage of elements uses column

major layout.
SPARSE_LAYOUT_ROW_MAJOR Storage of elements uses row
major layout.

ldc Leading dimension of matrix C.

Output Parameters

C The resulting dense matrix.

Return Values
The function returns a value indicating whether the operation was successful, or the reason why it failed.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED The internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILE The execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error occurred in the implementation of the algorithm.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_sypr
Computes the symmetric product of three sparse
matrices and stores the result in a newly allocated
sparse matrix.

338
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
sparse_status_t mkl_sparse_sypr (const sparse_operation_t operation , const
sparse_matrix_t A, const sparse_matrix_t B, const struct matrix_descr B,
sparse_matrix_t *C, const sparse_request_t request);

Include Files
• mkl_spblas.h

Description
The mkl_sparse_sypr routine performs a multiplication of three sparse matrices that results in a symmetric
or Hermitian matrix, C.

C:=A*B*opA(A)
or

C:=opA(A)*B*A
depending on the matrix modifier operation.
Here, A, B, and C are sparse matrices, where A has a general structure while B and C are symmetric (for real
data types) or Hermitian (for complex data types) matrices. opA is the transpose (real data types) or
conjugate transpose (complex data types) operator.

NOTE
This routine is not supported for sparse matrices in COO or CSC formats. This routine
supports only CSR and BSR formats. In addition, it supports only the sorted CSR and sorted
BSR formats for the input matrix. If the data is unsorted, call the mkl_sparse_order routine
before either mkl_sparse_sypr or mkl_sparse_?_syprd.

Input Parameters

operation Specifies operation on the input sparse matrices.

SPARSE_OPERATION_NON_TRANSPOSE Non-transpose case.

C:=A*B*(AT) for real
precision
C:=A*B*(AH) for
complex precision.
SPARSE_OPERATION_TRANSPOSE Transpose case. This is
not supported for
complex matrices.
C:=(AT)*B*A
SPARSE_OPERATION_CONJUGATE_TRAN Conjugate transpose
SPOSE case. This is not
supported for real
matrices.
C:=(AH)*B*A

A Handle which contains the sparse matrix A.

339
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

B Handle which contains the sparse matrix B.

descrB Structure specifying properties of the sparse matrix.

sparse_matrix_type_t type specifies the type of a sparse matrix

SPARSE_MATRIX_TYPE_SYMMETRIC
The matrix is symmetric (only
the specified triangle is
processed).
SPARSE_MATRIX_TYPE_HERMITIAN
The matrix is Hermitian (only
the specified triangle is
processed).

sparse_fill_mode_t mode specifies the triangular matrix part.

SPARSE_FILL_MODE_LOWER
The lower triangular matrix part
is processed.
SPARSE_FILL_MODE_UPPER
The upper triangular matrix part
is processed.

sparse_diag_type_t diag specifies the type of diagonal.

SPARSE_DIAG_NON_UNIT
Diagonal elements cannot be
equal to one.

NOTE
This routine also supports C=AAT,H with these parameters:
descrB.type=SPARSE_MATRIX_TYPE_DIAGONAL
descrB.diag=SPARSE_DIAG_UNIT
In this case, you do not need to allocate structure B. Use the
routine as a 2-stage version of mkl_sparse_syrk.

request Use this routine to specify if the computations should be performed in

a single step or using the two-stage algorithm. See Two-stage
Algorithm for Inspector-executor Sparse BLAS Routines for more
information.

SPARSE_STAGE_NNZ_COUNT Only rowIndex (BSR/CSR format)

or colIndex (CSC format) array
of the matrix is computed internally.
The computation can be extracted
to measure the memory required
for full operation.
SPARSE_STAGE_FINALIZE_MULT_N Finalize computations of the matrix
O_VAL structure (values will not be
computed). Use only after the call
with SPARSE_STAGE_NNZ_COUNT
parameter.
SPARSE_STAGE_FINALIZE_MULT Finalize computation. Can be used
after the call with the
SPARSE_STAGE_NNZ_COUNT or

340
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
SPARSE_STAGE_FINALIZE_MULT_N
O_VAL. Can also be used when the
matrix structure remains unchanged
and only values of the resulting
matrix C need to be recomputed.
SPARSE_STAGE_FULL_MULT_NO_V Perform computations of the matrix
AL structure.
SPARSE_STAGE_FULL_MULT Perform the entire computation in a
single step.

Output Parameters

C Handle which contains the resulting sparse matrix. Only the upper-
triangular part of the matrix is computed.

Return Values
The function returns a value indicating whether the operation was successful, or the reason why it failed.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_?_syprd
Computes the symmetric triple product of a sparse
matrix and a dense matrix and stores the result as a
dense matrix.

Syntax
sparse_status_t mkl_sparse_s_syprd (const sparse_operation_t op, const sparse_matrix_t
A, const float *B, const sparse_layout_t layoutB, const MKL_INT ldb, const float alpha,
const float beta, float *C, const sparse_layout_t layoutC, const MKL_INT ldc);
sparse_status_t mkl_sparse_d_syprd (const sparse_operation_t op, const sparse_matrix_t
A, const double *B, const sparse_layout_t layoutB, const MKL_INT ldb, const double
alpha, const double beta, double *C, const sparse_layout_t layoutC, const MKL_INT ldc);
sparse_status_t mkl_sparse_c_syprd (const sparse_operation_t op, const sparse_matrix_t
A, const MKL_Complex8 *B, const sparse_layout_t layoutB, const MKL_INT ldb, const
MKL_Complex8 alpha, const MKL_Complex8 beta, MKL_Complex8 *C, const sparse_layout_t
layoutC, const MKL_INT ldc);
sparse_status_t mkl_sparse_z_syprd (const sparse_operation_t op, const sparse_matrix_t
A, const MKL_Complex16 *B, const sparse_layout_t layoutB, const MKL_INT ldb, const
MKL_Complex16 alpha, const MKL_Complex16 beta, MKL_Complex16 *C, const sparse_layout_t
layoutC, const MKL_INT ldc);

341
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl_spblas.h

Description
The mkl_sparse_?_syprd routine performs a multiplication of three sparse matrices that results in a
symmetric or Hermitian matrix, C.

C:=alpha*A*B*op(A) + beta*C
or

C:=alpha*op(A)*B*A + beta*C
depending on the matrix modifier operation. Here A is a sparse matrix, B and C are dense and symmetric
(or Hermitian) matrices.
op is the transpose (real precision) or conjugate transpose (complex precision) operator.

NOTE
This routine is not supported for sparse matrices in COO or CSC formats. It supports only
CSR and BSR formats. In addition, this routine supports only the sorted CSR and sorted BSR
formats for the input matrix. If the data is unsorted, call the mkl_sparse_order routine
before either mkl_sparse_sypr or mkl_sparse_?_syprd.

Input Parameters

operation Specifies operation on the input sparse matrix.

SPARSE_OPERATION_NON_TRANSPOSE Non-transpose case.

C:=alpha*A*B*(AT)
+beta*C for real
precision.
C:=alpha*A*B*(AH)
+beta*C for complex
precision.
SPARSE_OPERATION_TRANSPOSE Transpose case. This is
not supported for
complex matrices.
C:=alpha*(AT)*B*A
+beta*C
SPARSE_OPERATION_CONJUGATE_TRAN Conjugate transpose
SPOSE case. This is not
supported for real
matrices.
C:=alpha*(AH)*B*A
+beta*C

A Handle which contains the sparse matrix A.

B Input dense matrix. Only the upper triangular part of the matrix is
used for computation.

342
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
denselayoutB Structure that describes the storage scheme for the dense matrix.

SPARSE_LAYOUT_COLUMN_MAJOR
Store elements in a column-
major layout.
SPARSE_LAYOUT_ROW_MAJOR
Store elements in a row-major
layout.

ldb Leading dimension of matrix B.

alpha Scalar parameter.

beta Scalar parameter.

NOTE
Since the upper triangular part of matrix C is the only
portion that is processed, set real values of alpha and beta
in the complex case to obtain the Hermitian matrix.

denselayoutC Structure that describes the storage scheme for the dense matrix.

SPARSE_LAYOUT_COLUMN_MAJOR
Store elements in a column-
major layout.
SPARSE_LAYOUT_ROW_MAJOR
Store elements in a row-major
layout.

ldc Leading dimension of matrix C.

Output Parameters

C Handle which contains the resulting dense matrix. Only the upper-
triangular part of the matrix is computed.

Return Values
The function returns a value indicating whether the operation was successful, or the reason why it failed.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED The internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILE The execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error occurred in the implementation of the algorithm.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_?_symgs
Computes a symmetric Gauss-Seidel preconditioner.

343
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
sparse_status_t mkl_sparse_s_symgs (const sparse_operation_t operation, const
sparse_matrix_t A, const struct matrix_descr descr, const float alpha, const float *b,
float *x);
sparse_status_t mkl_sparse_d_symgs (const sparse_operation_t operation, const
sparse_matrix_t A, const struct matrix_descr descr, const double alpha, const double
*b, double *x);
sparse_status_t mkl_sparse_c_symgs (const sparse_operation_t operation, const
sparse_matrix_t A, const struct matrix_descr descr, const MKL_Complex8 alpha, const
MKL_Complex8 *b, MKL_Complex8 *x);
sparse_status_t mkl_sparse_z_symgs (const sparse_operation_t operation, const
sparse_matrix_t A, const struct matrix_descr descr, const MKL_Complex16 alpha, const
MKL_Complex16 *b, MKL_Complex16 *x);

Include Files
• mkl_spblas.h

Description
The mkl_sparse_?_symgs routine performs this operation:

x0 := x*alpha;
(L + D)*x1 = b - U*x0;
(U + D)*x = b - L*x1;
where A = L + D + U.

NOTE
This routine is not supported for sparse matrices in BSR, COO, or CSC formats. It supports
only the CSR format. Additionally, only symmetric matrices are supported, so the desc.type
must be SPARSE_MATRIX_TYPE_SYMMETRIC.

Input Parameters

operation Specifies the operation performed on matrix A.

SPARSE_OPERATION_NON_TRANSPOSE, op(A) := A.

NOTE
Transpose (SPARSE_OPERATION_TRANSPOSE) and conjugate
transpose (SPARSE_OPERATION_CONJUGATE_TRANSPOSE) are not
supported.

A Handle which contains the sparse matrix A.

alpha Specifies the scalar alpha.

descr Structure specifying sparse matrix properties.

sparse_matrix_type_t type - Specifies the type of a sparse matrix:

344
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
SPARSE_MATRIX_TYPE_GE The matrix is processed as is.
NERAL

SPARSE_MATRIX_TYPE_SY The matrix is symmetric (only the requested

MMETRIC triangle is processed).

SPARSE_MATRIX_TYPE_HE The matrix is Hermitian (only the requested

RMITIAN triangle is processed).

SPARSE_MATRIX_TYPE_TR The matrix is triangular (only the requested

IANGULAR triangle is processed).

SPARSE_MATRIX_TYPE_DI The matrix is diagonal (only diagonal elements

AGONAL are processed).

SPARSE_MATRIX_TYPE_BL The matrix is block-triangular (only requested

OCK_TRIANGULAR triangle is processed). Applies to BSR format
only.

SPARSE_MATRIX_TYPE_BL The matrix is block-diagonal (only diagonal

OCK_DIAGONAL blocks are processed). Applies to BSR format
only.

sparse_fill_mode_t mode - Specifies the triangular matrix part for

symmetric, Hermitian, triangular, and block-triangular matrices:

SPARSE_FILL_MODE_LOWE The lower triangular matrix part is processed.

SPARSE_FILL_MODE_UPPE The upper triangular matrix part is processed.

R
sparse_diag_type_t diag - Specifies diagonal type for non-general
matrices:

SPARSE_DIAG_NON_UNIT Diagonal elements might not be equal to one.

SPARSE_DIAG_UNIT Diagonal elements are equal to one.

x Array of size at least m, where m is the number of rows of matrix A.

On entry, the array x must contain the vector x.

b Array of size at least m, where m is the number of rows of matrix A.

On entry, the array b must contain the vector b.

Output Parameters

x Overwritten by the computed vector x.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

345
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_?_symgs_mv
Computes a symmetric Gauss-Seidel preconditioner
followed by a matrix-vector multiplication.

Syntax
sparse_status_t mkl_sparse_s_symgs_mv (const sparse_operation_t operation, const
sparse_matrix_t A, const struct matrix_descr descr, const float alpha, const float *b,
float *x, float *y);
sparse_status_t mkl_sparse_d_symgs_mv (const sparse_operation_t operation, const
sparse_matrix_t A, const struct matrix_descr descr, const double alpha, const double
*b, double *x, double *y);
sparse_status_t mkl_sparse_c_symgs_mv (const sparse_operation_t operation, const
sparse_matrix_t A, const struct matrix_descr descr, const MKL_Complex8 alpha, const
MKL_Complex8 *b, MKL_Complex8 *x, MKL_Complex8 *y);
sparse_status_t mkl_sparse_z_symgs_mv (const sparse_operation_t operation, const
sparse_matrix_t A, const struct matrix_descr descr, const MKL_Complex16 alpha, const
MKL_Complex16 *b, MKL_Complex16 *x, MKL_Complex16 *y);

Include Files
• mkl_spblas.h

Description
The mkl_sparse_?_symgs_mv routine performs this operation:

x0 := x*alpha;
(L + D)*x1 = b - U*x0;
(U + D)*x = b - L*x1;
y := A*x
where A = L + D + U

Input Parameters

operation Specifies the operation performed on input matrix.

SPARSE_OPERATION_NON_TRANSPOSE, op(A) = A.

346
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

NOTE
Transpose (SPARSE_OPERATION_TRANSPOSE) and conjugate
transpose (SPARSE_OPERATION_CONJUGATE_TRANSPOSE) are not
supported.

A Handle which contains the sparse matrix A.

alpha Specifies the scalar alpha.

descr Structure specifying sparse matrix properties.

sparse_matrix_type_t type - Specifies the type of a sparse matrix:

SPARSE_MATRIX_TYPE_GE The matrix is processed as is.

NERAL

SPARSE_MATRIX_TYPE_SY The matrix is symmetric (only the requested

MMETRIC triangle is processed).

SPARSE_MATRIX_TYPE_HE The matrix is Hermitian (only the requested

RMITIAN triangle is processed).

SPARSE_MATRIX_TYPE_TR The matrix is triangular (only the requested

IANGULAR triangle is processed).

SPARSE_MATRIX_TYPE_DI The matrix is diagonal (only diagonal elements

AGONAL are processed).

SPARSE_MATRIX_TYPE_BL The matrix is block-triangular (only requested

OCK_TRIANGULAR triangle is processed). Applies to BSR format
only.

SPARSE_MATRIX_TYPE_BL The matrix is block-diagonal (only diagonal

OCK_DIAGONAL blocks are processed). Applies to BSR format
only.

sparse_fill_mode_t mode - Specifies the triangular matrix part for

symmetric, Hermitian, triangular, and block-triangular matrices:

SPARSE_FILL_MODE_LOWE The lower triangular matrix part is processed.

SPARSE_FILL_MODE_UPPE The upper triangular matrix part is processed.

R
sparse_diag_type_t diag - Specifies diagonal type for non-general
matrices:

SPARSE_DIAG_NON_UNIT Diagonal elements might not be equal to one.

SPARSE_DIAG_UNIT Diagonal elements are equal to one.

x Array of size at least m, where m is the number of rows of matrix A.

On entry, the array x must contain the vector x.

b Array of size at least m, where m is the number of rows of matrix A.

On entry, the array b must contain the vector b.

347
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

x Overwritten by the computed vector x.

y Array of size at least m, where m is the number of rows of matrix A.

Overwritten by the computed vector y.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_syrk
Computes the product of sparse matrix with its
transpose (or conjugate transpose) and stores the
result in a newly allocated sparse matrix.

Syntax
sparse_status_t mkl_sparse_syrk (const sparse_operation_t operation, const
sparse_matrix_t A, sparse_matrix_t *C);

Include Files
• mkl_spblas.h

Description
The mkl_sparse_syrk routine performs a sparse matrix-matrix operation which results in a sparse matrix C
that is either Symmetric (real) or Hermitian (complex):

C := A*op(A)
where op(*) is the transpose for real matrices and conjugate transpose for complex matrices OR

C := op(A)*A
depending on the matrix modifier op which can be the transpose for real matrices or conjugate transpose for
complex matrices.
Here, A and C are sparse matrices.

NOTE This routine is not supported for sparse matrices in COO or CSC formats. It supports
only CSR and BSR formats. Additionally, this routine supports only the sorted CSR and
sorted BSR formats for the input matrix. If data is unsorted, call the mkl_sparse_order
routine before either mkl_sparse_syrk or mkl_sparse_?_syrkd.

348
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

operation Specifies the operation op() on input matrix .

SPARSE_OPERATION_NON_TRANSPOSE, Non-transpose,C := A*op(A) where

op(*) is the transpose for real matrices and conjugate transpose for
complex matrices
SPARSE_OPERATION_TRANSPOSE, Transpose,C := (AT)*Afor real matrix A
SPARSE_OPERATION_CONJUGATE_TRANSPOSE, Conjugate transpose, C :=
(AH)*A for complex matrix A.

A Handle which contains the sparse matrix A.

Output Parameters

C Handle which contains the resulting sparse matrix. Only the upper-
triangular part of the matrix is computed.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_?_syrkd
Computes the product of sparse matrix with its
transpose (or conjugate transpose) and stores the
result as a dense matrix.

Syntax
sparse_status_t mkl_sparse_s_syrkd (sparse_operation_t operation, const sparse_matrix_t
A, float alpha, float beta, float *C, sparse_layout_t layout, MKL_INT ldc);
sparse_status_t mkl_sparse_d_syrkd (sparse_operation_t operation, const sparse_matrix_t
A, double alpha, double beta, double *C, sparse_layout_t layout, MKL_INT ldc);
sparse_status_t mkl_sparse_c_syrkd (sparse_operation_t operation, const sparse_matrix_t
A, const MKL_Complex8 alpha, MKL_Complex8 beta, MKL_Complex8 *C, sparse_layout_t
layout, MKL_INT ldc);
sparse_status_t mkl_sparse_z_syrkd (sparse_operation_t operation, const sparse_matrix_t
A, MKL_Complex16 alpha, MKL_Complex16 beta, MKL_Complex16 *C, sparse_layout_t layout,
const MKL_INT ldc);

349
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl_spblas.h

Description
The mkl_sparse_?_syrkd routine performs a sparse matrix-matrix operation which results in a dense
matrix C that is either symmetric (real case) or Hermitian (complex case):

C := beta*C + alpha*A*op(A)
or

C := beta*C + alpha*op(A)*A
depending on the matrix modifier op which can be the transpose for real matrices or conjugate transpose for
complex matrices. Here, A is a sparse matrix and C is a dense matrix.

Input Parameters

operation Specifies the operation op() performed on the input matrix.

SPARSE_OPERATION_NON_TRANSPOSE, Non-transpose, C := beta*C +

alpha*A*op(A) where op(*) is the transpose (real matrices) or conjugate
transpose (complex matrices).
SPARSE_OPERATION_TRANSPOSE, Transpose,C := beta*C + alpha*AT*A
for real matrix A.
SPARSE_OPERATION_CONJUGATE_TRANSPOSE Conjugate transpose,C :=
beta*C + alpha*AH*A for complex matrix A.

A Handle which contains the sparse matrix A.

alpha Scalar parameter alpha.

beta Scalar parameter beta.

layout Describes the storage scheme for the dense matrix.

Storage of elements uses

layout = SPARSE_LAYOUT_COLUMN_MAJOR
column-major layout.
Storage of elements uses
layout = SPARSE_LAYOUT_ROW_MAJOR
row-major layout.

ldc Leading dimension of matrix C.

NOTE
Only the upper triangular part of matrix C is processed. Therefore, you must set real values
of alpha and beta for complex matrices in order to obtain a Hermitian matrix.

350
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

C Resulting dense matrix. Only the upper triangular part of the matrix is
computed.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_?_dotmv
Computes a sparse matrix-vector product followed by
a dot product.

Syntax
sparse_status_t mkl_sparse_s_dotmv (const sparse_operation_t operation, const float
alpha, const sparse_matrix_t A, const struct matrix_descr descr, const float *x, const
float beta, float *y, float *d);
sparse_status_t mkl_sparse_d_dotmv (const sparse_operation_t operation, const double
alpha, const sparse_matrix_t A, const struct matrix_descr descr, const double *x, const
double beta, double *y, double *d);
sparse_status_t mkl_sparse_c_dotmv (const sparse_operation_t operation, const
MKL_Complex8 alpha, const sparse_matrix_t A, const struct matrix_descr descr, const
MKL_Complex8 *x, const MKL_Complex8 beta, MKL_Complex8 *y, MKL_Complex8 *d);
sparse_status_t mkl_sparse_z_dotmv (const sparse_operation_t operation, const
MKL_Complex16 alpha, const sparse_matrix_t A, const struct matrix_descr descr, const
MKL_Complex16 *x, const MKL_Complex16 beta, MKL_Complex16 *y, MKL_Complex16 *d);

Include Files
• mkl_spblas.h

Description
The mkl_sparse_?_dotmv routine computes a sparse matrix-vector product and dot product:

y := alphaop(A)x + betayd := ∑ixiyi (real case)

d := ∑iconj(xi)*yi (complex case)
where
• alpha and beta are scalars.
• x and y are vectors.
• A is an m-by-k matrix.

351
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

• conj represents complex conjugation.

• op(A) is a matrix modifier.
Available options for op(A) are A, AT, or AH.

NOTE
For sparse matrices in the BSR format, the supported combinations of
(indexing,block_layout) are:
• (SPARSE_INDEX_BASE_ZERO, SPARSE_LAYOUT_ROW_MAJOR )
• (SPARSE_INDEX_BASE_ONE, SPARSE_LAYOUT_COLUMN_MAJOR )

Input Parameters

operation Specifies the operation performed on matrix A.

If operation = SPARSE_OPERATION_NON_TRANSPOSE, op(A) = A.

If operation = SPARSE_OPERATION_TRANSPOSE, op(A) = AT.

If operation = SPARSE_OPERATION_CONJUGATE_TRANSPOSE, op(A) = AH.

alpha Specifies the scalar alpha.

A Handle which contains the sparse matrix A.

descr Structure specifying sparse matrix properties.

sparse_matrix_type_t type - Specifies the type of a sparse matrix:

SPARSE_MATRIX_TYPE_GE The matrix is processed as is.

NERAL

SPARSE_MATRIX_TYPE_SY The matrix is symmetric (only the requested

MMETRIC triangle is processed).

SPARSE_MATRIX_TYPE_HE The matrix is Hermitian (only the requested

RMITIAN triangle is processed).

SPARSE_MATRIX_TYPE_TR The matrix is triangular (only the requested

IANGULAR triangle is processed).

SPARSE_MATRIX_TYPE_DI The matrix is diagonal (only diagonal elements

AGONAL are processed).

SPARSE_MATRIX_TYPE_BL The matrix is block-triangular (only requested

OCK_TRIANGULAR triangle is processed). Applies to BSR format
only.

SPARSE_MATRIX_TYPE_BL The matrix is block-diagonal (only diagonal

OCK_DIAGONAL blocks are processed). Applies to BSR format
only.

sparse_fill_mode_t mode - Specifies the triangular matrix part for

symmetric, Hermitian, triangular, and block-triangular matrices:

SPARSE_FILL_MODE_LOWE The lower triangular matrix part is processed.

352
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
SPARSE_FILL_MODE_UPPE The upper triangular matrix part is processed.
R
sparse_diag_type_t diag - Specifies diagonal type for non-general
matrices:

SPARSE_DIAG_NON_UNIT Diagonal elements might not be equal to one.

SPARSE_DIAG_UNIT Diagonal elements are equal to one.

x If operation = SPARSE_OPERATION_NON_TRANSPOSE, array of size at least

k, where k is the number of columns of matrix A.
Otherwise, array of size at least m, where m is the number of rows of
matrix A.
On entry, the array x must contain the vector x.

beta Specifies the scalar beta.

y If operation = SPARSE_OPERATION_NON_TRANSPOSE, array of size at least

m, where k is the number of rows of matrix A.
Otherwise, array of size at least k, where k is the number of columns of
matrix A.
On entry, the array y must contain the vector y.

Output Parameters

y Overwritten by the updated vector y.

d Overwritten by the dot product of x and y.

Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

mkl_sparse_?_sorv
Computes forward, backward sweeps or a symmetric
successive over-relaxation preconditioner operation.

Syntax

sparse_status_t mkl_sparse_s_sorv(
const sparse_sor_type_t type,
const struct matrix_descr descrA,

353
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

const sparse_matrix_t A,
float omega,
float alpha,
float* x,
float* b
);

sparse_status_t mkl_sparse_d_sorv(
const sparse_sor_type_t type,
const struct matrix_descr descrA,
const sparse_matrix_t A,
double omega,
double alpha,
double* x,
double* b
);

Include Files
• mkl_spblas.h

Description
The mkl_sparse_?_sorv routine performs one of the following operations:
SPARSE_SOR_FORWARD:

SPARSE_SOR_BACKWARD:

SPARSE_SOR_SYMMETRIC: Performs application of a

preconditioner.
where A = L + D + U and x^0 is an input vector x scaled by input parameter alpha vector and x^1 is an
output stored in vector x.

354
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
NOTE
Currently this routine only supports the following configuration:
• CSR format of the input matrix
• SPARSE_SOR_FORWARD operation
• General matrix (descr.type is SPARSE_MATRIX_TYPE_GENERAL) or symmetric matrix with full
portrait and unit diagonal (descr.type is SPARSE_MATRIX_TYPE_SYMMETRIC, descr.mode is
SPARSE_FILL_MODE_FULL, and descr.diag is SPARSE_DIAG_UNIT)

NOTE
Currently, this routine is optimized only for sequential threading execution mode.

Warning It is currently not allowed to place a sorv call in a parallel section (e.g., under
#pragma omp parallel), because it is not thread-safe in this scenario. This limitation will be
addressed in one of the upcoming releases.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

type Specifies the operation performed by the SORV preconditioner.

SPARSE_SOR_FORWARD Performs forward sweep as defined by:

SPARSE_SOR_BACKWARD Performs backward sweep as defined by:

SPARSE_SOR_SYMMETRIC Preconditioner matrix could be expressed as:

descr Structure specifying sparse matrix properties.

sparse_matrix_type_t Specifies the type of a sparse matrix:

355
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The matrix is Hermitian (only the requested

triangle is processed).
• SPARSE_MATRIX_TYPE_TRIANGULAR
The matrix is triangular (only the requested
triangle is processed).
• SPARSE_MATRIX_TYPE_DIAGONAL
The matrix is diagonal (only diagonal
elements are processed).
• SPARSE_MATRIX_TYPE_BLOCK_TRIANGULAR
The matrix is block-triangular (only
requested triangle is processed). Applies to
BSR format only.
• SPARSE_MATRIX_TYPE_BLOCK_DIAGONAL
The matrix is block-diagonal (only diagonal
blocks are processed). Applies to BSR format
only.

sparse_fill_mode_t Specifies the triangular matrix part for

mode symmetric, Hermitian, triangular, and block-
triangular matrices:

• SPARSE_FILL_MODE_LOWER
The lower triangular matrix part is processed.
• SPARSE_FILL_MODE_UPPER
The upper triangular matrix part is
processed.

sparse_diag_type_t Specifies diagonal type for non-general

diag matrices:

• SPARSE_DIAG_NON_UNIT
Diagonal elements might not be equal to
one.
• SPARSE_DIAG_UNIT
Diagonal elements are equal to one.

A Handle containing internal data.

omega Relaxation factor.

alpha Parameter that could be used to normalize or set to zero the vector x that
holds the initial guess.

x Initial guess on input.

b Right-hand side.

Output Parameters

x Solution vector on output.

356
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Return Values
The function returns a value indicating whether the operation was successful or not, and why.

SPARSE_STATUS_SUCCESS The operation was successful.

SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.

SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.

SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.

SPARSE_STATUS_EXECUTION_FAILED Execution failed.

SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.

SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.

BLAS-like Extensions
Intel® oneAPI Math Kernel Library provides C and Fortran routines to extend the functionality of the BLAS
routines. These include routines to compute vector products, matrix-vector products, and matrix-matrix
products.
Intel® oneAPI Math Kernel Library also provides routines to perform certain data manipulation, including
matrix in-place and out-of-place transposition operations combined with simple matrix arithmetic operations.
Transposition operations are Copy As Is, Conjugate transpose, Transpose, and Conjugate. Each routine adds
the possibility of scaling during the transposition operation by giving some alpha and/or beta parameters.
Each routine supports both row-major orderings and column-major orderings.
Table “BLAS-like Extensions” lists these routines.
The <?> symbol in the routine short names is a precision prefix that indicates the data type:

s float

d double

c MKL_Complex8

z MKL_Complex16

BLAS-like Extensions
Routine Data Types Description

cblas_?axpby s, d, c, z Scales two vectors, adds them to one another and stores
result in the vector (routines).

cblas_?axpy_batch s, d, c, z Computes groups of vector-scalar products added to a

vector.
cblas_?axpy_batch_strided

s, d, c, z
cblas_?dgmm_batch_strided Computes groups of diagonal matrix-general matrix
product
cblas_?dgmm_batch

cblas_?gemm_batch s, d, c, z Computes scalar-matrix-matrix products and adds the

results to scalar matrix products for groups of general
cblas_?gemm_batch_strided matrices.

bfloat16
cblas_gemm_bf16bf16f32 Computes a matrix-matrix product with general matrices
of bfloat16 data type.

357
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Routine Data Types Description

bfloat16
cblas_gemm_bf16bf16f32_compute Computes a matrix-matrix product with general matrices
of bfloat16 data type where one or both input matrices
are stored in a packed data structure, and adds the result
to a scalar-matrix product.

cblas_gemm_f16f16f32 half precision Computes a matrix-matrix product with general matrices

of half precision data type.

half precision
cblas_gemm_f16f16f32_compute Computes a matrix-matrix product with general matrices
of half precision data type where one or both input
matrices are stored in a packed data structure, and adds
the result to a scalar-matrix product.

cblas_gemm_* Integer Computes a matrix-matrix product with general integer

matrices.

cblas_?gemm_compute h, s, d Computes a matrix-matrix product with general matrices

where one or both input matrices are stored in a packed
data structure and adds the result to a scalar-matrix
product.
cblas_gemm_*_compute
Integer Computes a matrix-matrix product with general integer
matrices where one or both input matrices are stored in a
packed data structure and adds the result to a scalar-
matrix product.

cblas_?gemm_pack h, s, d Performs scaling and packing of the matrix into the

previously allocated buffer.

cblas_gemm_*_pack Integer, bfloat16 Pack the matrix into the buffer allocated previously.

h, s, d
cblas_?gemm_pack_get_size Returns the number of bytes required to store the packed
matrix.

Integer, bfloat16
cblas_gemm_*_pack_get_size Returns the number of bytes required to store the packed
matrix.

cblas_?gemm3m c, z Computes a scalar-matrix-matrix product using matrix

multiplications and adds the result to a scalar-matrix
product.

cblas_?gemm3m_batch c, z Computes a scalar-matrix-matrix product using matrix

multiplications and adds the result to a scalar-matrix
cblas_?gemm3m_batch_strided product.

cblas_?gemmt s, d, c, z Computes a matrix-matrix product with general matrices

but updates only the upper or lower triangular part of the
result matrix.

s, d, c, z
cblas_?gemv_batch_strided Computes groups of matrix-vector product using general
matrices.
cblas_?gemv_batch
Solves a triangular matrix equation for a group of matrices.
cblas_?trsm_batch s, d, c, z
?cblas_?trsm_batch_strided

mkl_?imatcopy s, d, c, z Performs scaling and in-place transposition/copying of

matrices.

s, d, c, z
mkl_?imatcopy_batch_strided Computes groups of in-place matrix copy/transposition
with scaling using general matrices.
mkl_?imatcopy_batch

358
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Routine Data Types Description

mkl_?omatadd s, d, c, z Performs scaling and sum of two matrices including their

out-of-place transposition/copying.

mkl_?omatcopy s, d, c, z Performs scaling and out-of-place transposition/copying of

matrices.

mkl_? s, d, c, z Computes groups of out of place matrix copy/transposition

omatcopy_batch_stride with scaling using general matrices.
d
mkl_?omatcopy_batch

mkl_?omatcopy2 s, d, c, z Performs two-strided scaling and out-of-place

transposition/copying of matrices.

mkl_jit_create_?gemm s, d, c, z Creates a handle on a jitter and generates a GEMM kernel

that computes a scalar-matrix-matrix product and adds
the result to a scalar-matrix product, with general
matrices.

mkl_jit_destroy Deletes the previously created jitter and the generated

GEMM kernel.

mkl_jit_get_?gemm_ptrs, d, c, z Returns the GEMM kernel previously generated.

cblas_?axpy_batch
Computes a group of vector-scalar products added to
a vector.

Syntax
void cblas_saxpy_batch (const MKL_INT *n_array, const float *alpha_array, const float
**x_array, const MKL_INT *incx_array, float **y_array, const MKL_INT *incy_array, const
MKL_INT group_count, const MKL_INT *group_size_array);
void cblas_daxpy_batch (const MKL_INT *n_array, const double *alpha_array, const double
**x_array, const MKL_INT *incx_array, double **y_array, const MKL_INT *incy_array,
const MKL_INT group_count, const MKL_INT *group_size_array);
void cblas_caxpy_batch (const MKL_INT *n_array, const void *alpha_array, const void
**x_array, const MKL_INT *incx_array, void **y_array, const MKL_INT *incy_array, const
MKL_INT group_count, const MKL_INT *group_size_array);
void cblas_zaxpy_batch (const MKL_INT *n_array, const void *alpha_array, const void
**x_array, const MKL_INT *incx_array, void **y_array, const MKL_INT *incy_array, const
MKL_INT group_count, const MKL_INT *group_size_array);

Description
The cblas_?axpy_batch routines perform a series of scalar-vector product added to a vector. They are
similar to the cblas_?axpy routine counterparts, but the cblas_?axpy_batch routines perform vector
operations with a group of vectors. The groups contain vectors with the same parameters.
The operation is defined as

idx = 0
for i = 0 … group_count – 1
n, alpha, incx, incy and group_size at position i in n_array, alpha_array, incx_array,
incy_array and group_size_array
for j = 0 … group_size – 1

359
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

x and y are vectors of size n at position idx in x_array and y_array

y := alpha * x + y
idx := idx + 1
end for
end for
The number of entries in x_array, and y_array is total_batch_count = the sum of all of the group_size
entries.

Input Parameters

n_array Array of size group_count. For the group i, ni = n_array[i] is the

number of elements in vectors x and y.

alpha_array Array of size group_count. For the group i, alphai = alpha_array[i] is

the scalar alpha.

x_array Array of size total_batch_count of pointers used to store x vectors. The

array allocated for the x vectors of the group i must be of size at least (1 +
(ni – 1)*abs(incxi)).

incx_array Array of size group_count. For the group i, incxi = incx_array[i] is the
stride of vector x.

y_array Array of size total_batch_count of pointers used to store y vectors. The

array allocated for the y vectors of the group i must be of size at least (1 +
(ni – 1)*abs(incyi)).

incy_array Array of size group_count. For the group i, incyi = incy_array[i] is the
stride of vector y.

group_count Number of groups. Must be at least 0.

group_size_array Array of size group_count. The element group_size_array[i] is the

number of vector in the group i. Each element in group_size_array must be
at least 0.

Output Parameters

y_array Array of pointers holding the total_batch_count updated vector y.

cblas_?axpy_batch_strided
Computes a group of vector-scalar products added to
a vector.

Syntax
void cblas_saxpy_batch_strided (const MKL_INT n, const float alpha, const float *x,
const MKL_INT incx, const MKL_INT stridex, float *y, const MKL_INT incy, const MKL_INT
stridey, const MKL_INT batch_size);
void cblas_daxpy_batch_strided (const MKL_INT n, const double alpha, const double *x,
const MKL_INT incx, const MKL_INT stridex, double *y, const MKL_INT incy, const MKL_INT
stridey, const MKL_INT batch_size);
void cblas_caxpy_batch_strided (const MKL_INT n, const void alpha, const void *x, const
MKL_INT incx, const MKL_INT stridex, void *y, const MKL_INT incy, const MKL_INT
stridey, const MKL_INT batch_size);

360
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void cblas_zaxpy_batch_strided (const MKL_INT n, const void alpha, const void *x, const
MKL_INT incx, const MKL_INT stridex, void *y, const MKL_INT incy, const MKL_INT
stridey, const MKL_INT batch_size);

Include Files
• mkl.h

Description
The cblas_?axpy_batch_strided routines perform a series of scalar-vector product added to a vector.
They are similar to the cblas_?axpy routine counterparts, but the cblas_?axpy_batch_strided routines
perform vector operations with a group of vectors.
All vector x (respectively, y) have the same parameters (size, increments) and are stored at constant stridex
(respectively, stridey) from each other. The operation is defined as

For i = 0 … batch_size – 1
X and Y are vectors at offset i * stridex and i * stridey in x and y
Y = alpha * X + Y
end for

Input Parameters

n Number of elements in vectors x and y.

alpha Specifies the scalar alpha.

x Array of size at least stridex*batch_size holding the x vectors.

incx Specifies the increment for the elements of x.

stridex Stride between two consecutive x vectors; must be at least zero.

y Array of size at least stridey*batch_size holding the y vectors.

incy Specifies the increment for the elements of y.

stridey Stride between two consecutive y vectors; must be at least (1 +

(n-1)*abs(incy)).

batch_size Number of axpy computations to perform and x and y vectors. Must be at

least 0.

Output Parameters

y Array holding the batch_size updated vector y.

cblas_?axpby
Scales two vectors, adds them to one another and
stores result in the vector.

Syntax
void cblas_saxpby (const MKL_INT n, const float a, const float *x, const MKL_INT incx,
const float b, float *y, const MKL_INT incy);
void cblas_daxpby (const MKL_INT n, const double a, const double *x, const MKL_INT
incx, const double b, double *y, const MKL_INT incy);

361
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

void cblas_caxpby (const MKL_INT n, const void *a, const void *x, const MKL_INT incx,
const void *b, void *y, const MKL_INT incy);
void cblas_zaxpby (const MKL_INT n, const void *a, const void *x, const MKL_INT incx,
const void *b, void *y, const MKL_INT incy);

Include Files
• mkl.h

Description

The ?axpby routines perform a vector-vector operation defined as

y := a*x + b*y
where:
a and b are scalars
x and y are vectors each with n elements.

Input Parameters

n Specifies the number of elements in vectors x and y.

a Specifies the scalar a.

x Array, size at least (1 + (n-1)*abs(incx)).

incx Specifies the increment for the elements of x.

b Specifies the scalar b.

y Array, size at least (1 + (n-1)*abs(incy)).

incy Specifies the increment for the elements of y.

Output Parameters

y Contains the updated vector y.

Example
For examples of routine usage, see these code examples in the Intel® oneAPI Math Kernel Library (oneMKL)
installation directory:

• cblas_saxpby: examples\cblas\source\cblas_saxpbyx.c
• cblas_daxpby: examples\cblas\source\cblas_daxpbyx.c
• cblas_caxpby: examples\cblas\source\cblas_caxpbyx.c
• cblas_zaxpby: examples\cblas\source\cblas_zaxpbyx.c

cblas_?copy_batch
Computes a group of vector copies.

Syntax
void cblas_scopy_batch (const MKL_INT *n_array, const float **x_array, const MKL_INT
*incx_array, float **y_array, const MKL_INT *incy_array, const MKL_INT group_count,
const MKL_INT *group_size_array);

362
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void cblas_dcopy_batch (const MKL_INT *n_array, const double **x_array, const MKL_INT
*incx_array, double **y_array, const MKL_INT *incy_array, const MKL_INT group_count,
const MKL_INT *group_size_array);
void cblas_ccopy_batch (const MKL_INT *n_array, const void **x_array, const MKL_INT
*incx_array, void **y_array, const MKL_INT *incy_array, const MKL_INT group_count,
const MKL_INT *group_size_array);
void cblas_zcopy_batch (const MKL_INT *n_array, const void **x_array, const MKL_INT
*incx_array, void **y_array, const MKL_INT *incy_array, const MKL_INT group_count,
const MKL_INT *group_size_array);

Description
The cblas_?copy_batch routines perform a series of vector copies. They are similar to their cblas_?copy
routine counterparts, but the cblas_?copy_batch routines perform vector operations with a group of
vectors. Each groups contains vectors with the same parameters (size and increment), while those
parameters may vary between groups.
The operation is defined as follows:

idx = 0
for i = 0 … group_count – 1
n, incx, incy and group_size at position i in n_array, alpha_array, incx_array, incy_array
and group_size_array
for j = 0 … group_size – 1
x and y are vectors of size n at position idx in x_array and y_array
y := x
idx := idx + 1
end for
end for
The number of entries in x_array and y_array is total_batch_count, which is the sum of all the
group_size entries.

Input Parameters

n_array Array of size group_count. For the group i, n_i = n_array[i] is the
number of elements in the vectors x and y.
x_array Array of size total_batch_count of pointers used to store x vectors.
The array allocated for the x vectors of the group i must be of size
at least (1 + (n_i - 1)*abs(incx_i)).
incx_array Array of size group_count. For the group i, incx_i = incx_array[i]
is the increment (or stride) between two consecutive elements of
the vector x.
y_array Array of size total_batch_count of pointers used to store the output
vectors y. The array allocated for the y vectors of the group i must
be of size at least (1 + (n_i - 1)*abs(incy_i)).
incy_array Array of size group_count. For the group i, incy_i = incy_array[i]
is the increment (or stride) between two consecutive elements of
the vector y.
group_count Number of groups. Must be at least 0.

group_size_array Array of size group_count. The element group_size_array[i] is the

number of vectors in the group i. Each element in
group_size_array must be at least 0.

363
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

y_array Array of pointers holding the total_batch_count copied vectors y.

cblas_?copy_batch_strided
Computes a group of vector copies.

Syntax
void cblas_scopy_batch_strided (const MKL_INT n, const float *x, const MKL_INT incx,
const MKL_INT stridex, float *y, const MKL_INT incy, const MKL_INT stridey, const
MKL_INT batch_size);
void cblas_dcopy_batch_strided (const MKL_INT n, const double *x, const MKL_INT incx,
const MKL_INT stridex, double *y, const MKL_INT incy, const MKL_INT stridey, const
MKL_INT batch_size);
void cblas_ccopy_batch_strided (const MKL_INT n, const void *x, const MKL_INT incx,
const MKL_INT stridex, void *y, const MKL_INT incy, const MKL_INT stridey, const
MKL_INT batch_size);
void cblas_zcopy_batch_strided (const MKL_INT n, const void *x, const MKL_INT incx,
const MKL_INT stridex, void *y, const MKL_INT incy, const MKL_INT stridey, const
MKL_INT batch_size);

Description
The cblas_?copy_batch_strided routines perform a series of vector copies. They are similar to their
cblas_?copy routine counterparts, but the cblas_?copy_batch_strided routines perform vector
operations with a group of vectors.
All vectors x and y have the same parameters (size, increments) and are stored at constant distance
stridex (respectively, stridey) from each other. The operation is defined as follows:

for i = 0 … batch_size – 1
X and Y are vectors at offset i * stridex and i * stridey in x and y
Y = X
end for

Input Parameters

n Number of elements in vectors x and y. Must be at least 0.

x Array containing the input vectors. Must be of size at least (1 +

(n-1)*abs(incx)) + (batch_size – 1) * stridex.
incx Increment between two consecutive elements of a single vector in x.

stridex Stride between two consecutive vectors in x. Must be at least (1 +

(n-1)*abs(incx)).
y Array holding the output vectors. Must be of size at least (1 +
(n-1)*abs(incy)) + (batch_size – 1) * stridey.
incy Increment between two consecutive elements of a single vector in y.

stridey Stride between two consecutive y vectors. Must be at least (1 +

(n-1)*abs(incy)).
batch_size Number of copy computations to perform; also the number of x and
y vectors. Must be at least 0.

364
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

y Array holding the batch_size copied vectors y.

cblas_?gemmt
Computes a matrix-matrix product with general
matrices but updates only the upper or lower
triangular part of the result matrix.

Syntax
void cblas_sgemmt (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE transa, const CBLAS_TRANSPOSE transb, const MKL_INT n, const MKL_INT k,
const float alpha, const float *a, const MKL_INT lda, const float *b, const MKL_INT
ldb, const float beta, float *c, const MKL_INT ldc);
void cblas_dgemmt (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE transa, const CBLAS_TRANSPOSE transb, const MKL_INT n, const MKL_INT k,
const double alpha, const double *a, const MKL_INT lda, const double *b, const MKL_INT
ldb, const double beta, double *c, const MKL_INT ldc);
void cblas_cgemmt (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE transa, const CBLAS_TRANSPOSE transb, const MKL_INT n, const MKL_INT k,
const void *alpha, const void *a, const MKL_INT lda, const void *b, const MKL_INT ldb,
const void *beta, void *c, const MKL_INT ldc);
void cblas_zgemmt (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE transa, const CBLAS_TRANSPOSE transb, const MKL_INT n, const MKL_INT k,
const void *alpha, const void *a, const MKL_INT lda, const void *b, const MKL_INT ldb,
const void *beta, void *c, const MKL_INT ldc);

Include Files
• mkl.h

Description
The ?gemmt routines compute a scalar-matrix-matrix product with general matrices and add the result to the
upper or lower part of a scalar-matrix product. These routines are similar to the ?gemm routines, but they
only access and update a triangular part of the square result matrix (see Application Notes below).
The operation is defined as
C := alpha*op(A)*op(B) + beta*C,
where:
op(X) is one of op(X) = X, or op(X) = XT, or op(X) = XH,
alpha and beta are scalars,
A, B and C are matrices:
op(A) is an n-by-k matrix,
op(B) is a k-by-n matrix,
C is an n-by-n upper or lower triangular matrix.

365
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

uplo Specifies whether the upper or lower triangular part of the array c is used.
If uplo = 'CblasUpper', then the upper triangular part of the array c is
used. If uplo = 'CblasLower', then the lower triangular part of the array
c is used.

transa Specifies the form of op(A) used in the matrix multiplication:

if transa = 'CblasNoTrans', then op(A) = A;

if transa = 'CblasTrans', then op(A) = AT;

if transa = 'CblasConjTrans', then op(A) = AH.

transb Specifies the form of op(B) used in the matrix multiplication:

if transb = 'CblasNoTrans', then op(B) = B;

if transb = 'CblasTrans', then op(B) = BT;

if transb = 'CblasConjTrans', then op(B) = BH.

n Specifies the order of the matrix C. The value of n must be at least zero.

k Specifies the number of columns of the matrix op(A) and the number of
rows of the matrix op(B). The value of k must be at least zero.

alpha Specifies the scalar alpha.

a transa='CblasNoTr transa='CblasTran
ans' s' or
'CblasConjTrans'
Layout='CblasColMaj Array, size lda * k. Array, size lda * n.
or' Before entry, the leading Before entry, the leading
n-by-k part of the array k-by-n part of the array
a must contain the a must contain the
matrix A. matrix A.
Layout='CblasRowMaj Array, size lda * n. Array, size lda * k.
or' Before entry, the leading Before entry, the leading
k-by-n part of the array n-by-k part of the array
a must contain the a must contain the
matrix A. matrix A.

lda Specifies the leading dimension of a as declared in the calling

(sub)program.

transa='CblasNoTr transa='CblasTran
ans' s' or
'CblasConjTrans'
Layout='CblasColMaj lda must be at least lda must be at least
or' max(1, n). max(1, k).

366
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Layout='CblasRowMaj lda must be at least lda must be at least
or' max(1, k). max(1, n).

b transb='CblasNoTr transb='CblasTran
ans' s' or
'CblasConjTrans'
Layout='CblasColMaj Array, size ldb * n. Array, size ldb * k.
or' Before entry, the leading Before entry, the leading
k-by-n part of the array n-by-k part of the array
b must contain the b must contain the
matrix B. matrix B.
Layout='CblasRowMaj Array, size ldb * k. Array, size ldb * n.
or' Before entry, the leading Before entry, the leading
n-by-k part of the array k-by-n part of the array
b must contain the b must contain the
matrix B. matrix B.

ldb Specifies the leading dimension of b as declared in the calling

(sub)program.

transb='CblasNoTr transb='CblasTran
ans' s' or
'CblasConjTrans'
Layout='CblasColMaj ldb must be at least ldb must be at least
or' max(1, k). max(1, n).
Layout='CblasRowMaj ldb must be at least ldb must be at least
or' max(1, n). max(1, k).

beta Specifies the scalar beta. When beta is equal to zero, then c need not be
set on input.

c Array, size ldc by n.

When beta is equal to zero, c need not be set on input.

uplo = 'CblasUpper' uplo = 'CblasLower'

The leading n-by-n upper triangular The leading n-by-n lower triangular
part of the array c must contain the part of the array c must contain the
upper triangular part of the matrix C lower triangular part of the matrix C
and the strictly lower triangular part of and the strictly upper triangular part of
c is not referenced. c is not referenced.

ldc Specifies the leading dimension of c as declared in the calling

(sub)program. The value of ldc must be at least max(1, n).

Output Parameters

c When uplo = 'CblasUpper', the upper triangular part of the array c

is overwritten by the upper triangular part of the updated matrix.
When uplo = 'CblasLower', the lower triangular part of the array c
is overwritten by the lower triangular part of the updated matrix.

367
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Application Notes
These routines only access and update the upper or lower triangular part of the result matrix. This can be
useful when the result is known to be symmetric; for example, when computing a product of the form C :=
alpha*B*S*BT + beta*C , where S and C are symmetric matrices and B is a general matrix. In this case,
first compute A := B*S (which can be done using the corresponding ?symm routine), then compute C :=
alpha*A*BT + beta*C using the ?gemmt routine.

cblas_?gemm3m
Computes a scalar-matrix-matrix product using matrix
multiplications and adds the result to a scalar-matrix
product.

Syntax
void cblas_cgemm3m (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, const
CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const void
*alpha, const void *a, const MKL_INT lda, const void *b, const MKL_INT ldb, const void
*beta, void *c, const MKL_INT ldc);
void cblas_zgemm3m (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, const
CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const void
*alpha, const void *a, const MKL_INT lda, const void *b, const MKL_INT ldb, const void
*beta, void *c, const MKL_INT ldc);

Include Files
• mkl.h

Description

The ?gemm3m routines perform a matrix-matrix operation with general complex matrices. These routines are
similar to the ?gemm routines, but they use fewer matrix multiplication operations (see Application Notes
below).
The operation is defined as

C := alpha*op(A)*op(B) + beta*C,
where:
op(x) is one of op(x) = x, or op(x) = x', or op(x) = conjg(x'),
alpha and beta are scalars,
A, B and C are matrices:
op(A) is an m-by-k matrix,
op(B) is a k-by-n matrix,
C is an m-by-n matrix.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

transa Specifies the form of op(A) used in the matrix multiplication:

368
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
if transa=CblasNoTrans, then op(A) = A;

if transa=CblasTrans, then op(A) = A';

if transa=CblasConjTrans, then op(A) = conjg(A').

transb Specifies the form of op(B) used in the matrix multiplication:

if transb=CblasNoTrans, then op(B) = B;

if transb=CblasTrans, then op(B) = B';

if transb=CblasConjTrans, then op(B) = conjg(B').

m Specifies the number of rows of the matrix op(A) and of the matrix C. The
value of m must be at least zero.

n Specifies the number of columns of the matrix op(B) and the number of
columns of the matrix C.
The value of n must be at least zero.

k Specifies the number of columns of the matrix op(A) and the number of
rows of the matrix op(B).

The value of k must be at least zero.

alpha Specifies the scalar alpha.

a
transa=CblasNoTrans transa=CblasTrans or
transa=CblasConjTrans

Layout = Array, size ldak. Array, size ldam.

CblasColMajor
Before entry, the leading Before entry, the leading k-
m-by-k part of the array a by-m part of the array a
must contain the matrix must contain the matrix A.
A.

Layout = Array, size lda* m. Array, size lda*k.

CblasRowMajor
Before entry, the leading Before entry, the leading m-
k-by-m part of the array a by-k part of the array a
must contain the matrix must contain the matrix A.
A.

lda Specifies the leading dimension of a as declared in the calling

(sub)program.

transa=CblasNoTrans transa=CblasTrans or
transa=CblasConjTrans

Layout = lda must be at least lda must be at least

CblasColMajor max(1, m). max(1, k)

Layout = lda must be at least lda must be at least

CblasRowMajor max(1, k) max(1, m).

369
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

b
transb=CblasNoTrans transb=CblasTrans or
transb=CblasConjTrans

Layout = Array, size ldb by n. Array, size ldb by k. Before

CblasColMajor Before entry, the leading entry the leading n-by-k
k-by-n part of the array b part of the array b must
must contain the matrix contain the matrix B.
B.

Layout = Array, size ldb by k. Array, size ldb by n. Before

CblasRowMajor Before entry the leading entry, the leading k-by-n
n-by-k part of the array b part of the array b must
must contain the matrix contain the matrix B.
B.

ldb Specifies the leading dimension of b as declared in the calling

(sub)program.

transb=CblasNoTrans transb=CblasTrans or
transb=CblasConjTrans

Layout = ldb must be at least ldb must be at least

CblasColMajor max(1, k). max(1, n).

Layout = ldb must be at least ldb must be at least

CblasRowMajor max(1, n). max(1, k).

beta Specifies the scalar beta.

When beta is equal to zero, then c need not be set on input.

Layout = Array, size ldc by m. Before entry, the leading n-

CblasRowMajor by-m part of the array c must contain the matrix C,
except when beta is equal to zero, in which case c
need not be set on entry.

ldc Specifies the leading dimension of c as declared in the calling

(sub)program.

Layout = CblasColMajor ldc must be at least max(1, m).

Layout = CblasRowMajor ldc must be at least max(1, n).

Output Parameters

c Overwritten by the m-by-n matrix (alphaop(A)op(B) + beta*C).

370
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Application Notes
These routines perform a complex matrix multiplication by forming the real and imaginary parts of the input
matrices. This uses three real matrix multiplications and five real matrix additions instead of the conventional
four real matrix multiplications and two real matrix additions. The use of three real matrix multiplications
reduces the time spent in matrix operations by 25%, resulting in significant savings in compute time for
large matrices.
If the errors in the floating point calculations satisfy the following conditions:
fl(x op y)=(x op y)(1+δ),|δ|≤u, op=×,/, fl(x±y)=x(1+α)±y(1+β), |α|,|β|≤u
then for an n-by-n matrix Ĉ=fl(C1+iC2)= fl((A1+iA2)(B1+iB2))=Ĉ1+iĈ2, the following bounds are
satisfied:
║Ĉ1-C1║≤ 2(n+1)u║A║∞║B║∞+O(u2),
║Ĉ2-C2║≤ 4(n+4)u║A║∞║B║∞+O(u2),
where ║A║∞=max(║A1║∞,║A2║∞), and ║B║∞=max(║B1║∞,║B2║∞).

Thus the corresponding matrix multiplications are stable.

cblas_?gemm_batch
Computes scalar-matrix-matrix products and adds the
results to scalar matrix products for groups of general
matrices.

Syntax
void cblas_sgemm_batch (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE* transa_array,
const CBLAS_TRANSPOSE* transb_array, const MKL_INT* m_array, const MKL_INT* n_array,
const MKL_INT* k_array, const float* alpha_array, const float **a_array, const MKL_INT*
lda_array, const float **b_array, const MKL_INT* ldb_array, const float* beta_array,
float **c_array, const MKL_INT* ldc_array, const MKL_INT group_count, const MKL_INT*
group_size);
void cblas_dgemm_batch (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE* transa_array,
const CBLAS_TRANSPOSE* transb_array, const MKL_INT* m_array, const MKL_INT* n_array,
const MKL_INT* k_array, const double* alpha_array, const double **a_array, const
MKL_INT* lda_array, const double **b_array, const MKL_INT* ldb_array, const double*
beta_array, double **c_array, const MKL_INT* ldc_array, const MKL_INT group_count,
const MKL_INT* group_size);
void cblas_cgemm_batch (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE* transa_array,
const CBLAS_TRANSPOSE* transb_array, const MKL_INT* m_array, const MKL_INT* n_array,
const MKL_INT* k_array, const void *alpha_array, const void **a_array, const MKL_INT*
lda_array, const void **b_array, const MKL_INT* ldb_array, const void *beta_array, void
**c_array, const MKL_INT* ldc_array, const MKL_INT group_count, const MKL_INT*
group_size);
void cblas_zgemm_batch (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE* transa_array,
const CBLAS_TRANSPOSE* transb_array, const MKL_INT* m_array, const MKL_INT* n_array,
const MKL_INT* k_array, const void *alpha_array, const void **a_array, const MKL_INT*
lda_array, const void **b_array, const MKL_INT* ldb_array, const void *beta_array, void
**c_array, const MKL_INT* ldc_array, const MKL_INT group_count, const MKL_INT*
group_size);

371
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description

The ?gemm_batch routines perform a series of matrix-matrix operations with general matrices. They are
similar to the ?gemm routine counterparts, but the ?gemm_batch routines perform matrix-matrix operations
with groups of matrices, processing a number of groups at once. The groups contain matrices with the same
parameters.
The operation is defined as

idx = 0
for i = 0..group_count - 1
alpha and beta in alpha_array[i] and beta_array[i]
for j = 0..group_size[i] - 1
A, B, and C matrix in a_array[idx], b_array[idx], and c_array[idx]
C := alpha*op(A)*op(B) + beta*C,
idx = idx + 1
end for
end for
where:
op(X) is one of op(X) = X, or op(X) = XT, or op(X) = XH,
alpha and beta are scalar elements of alpha_array and beta_array,

A, B and C are matrices such that for m, n, and k which are elements of m_array, n_array, and k_array:

op(A) is an m-by-k matrix,

op(B) is a k-by-n matrix,
C is an m-by-n matrix.
A, B, and C represent matrices stored at addresses pointed to by a_array, b_array, and c_array,
respectively. The number of entries in a_array, b_array, and c_array is total_batch_count = the sum of all
of the group_size entries.

See also gemm for a detailed description of multiplication for general matrices and ?gemm3m_batch, BLAS-
like extension routines for similar matrix-matrix operations.

NOTE
Error checking is not performed for oneMKL Windows* single dynamic libraries for
the?gemm_batch routines.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

transa_array Array of size group_count. For the group i, transai = transa_array[i]

specifies the form of op(A) used in the matrix multiplication:

if transai = CblasNoTrans, then op(A) = A;

if transai = CblasTrans, then op(A) = AT;

372
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
if transai = CblasConjTrans, then op(A) = AH.

transb_array Array of size group_count. For the group i, transbi = transb_array[i]

specifies the form of op(Bi) used in the matrix multiplication:

if transbi = CblasNoTrans, then op(B) = B;

if transbi = CblasTrans, then op(B) = BT;

if transbi = CblasConjTrans, then op(B) = BH.

m_array Array of size group_count. For the group i, mi = m_array[i] specifies the
number of rows of the matrix op(A) and of the matrix C.

The value of each element of m_array must be at least zero.

n_array Array of size group_count. For the group i, ni = n_array[i] specifies the
number of columns of the matrix op(B) and the number of columns of the
matrix C.
The value of each element of n_array must be at least zero.

k_array Array of size group_count. For the group i, ki = k_array[i] specifies the
number of columns of the matrix op(A) and the number of rows of the
matrix op(B).

The value of each element of k_array must be at least zero.

alpha_array Array of size group_count. For the group i, alpha_array[i] specifies the
scalar alphai.

a_array Array, size total_batch_count, of pointers to arrays used to store A

matrices.

lda_array Array of size group_count. For the group i, ldai = lda_array[i]

specifies the leading dimension of the array storing matrix A as declared in
the calling (sub)program.

transai=CblasNoTrans transai=CblasTrans or
transai=CblasConjTrans

Layout = ldai must be at least ldai must be at least

CblasColMajor max(1, mi). max(1, ki)

Layout = ldai must be at least ldai must be at least

CblasRowMajor max(1, ki) max(1, mi).

b_array Array, size total_batch_count, of pointers to arrays used to store B

matrices.

ldb_array Array of size group_count. For the group i, ldbi = ldb_array[i]

specifies the leading dimension of the array storing matrix B as declared in
the calling (sub)program.

transbi=CblasNoTrans transbi=CblasTrans or
transbi=CblasConjTrans

373
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Layout = ldbi must be at least ldbi must be at least

CblasColMajor max(1, ki). max(1, ni).

Layout = ldbi must be at least ldbi must be at least

CblasRowMajor max(1, ni). max(1, ki).

beta_array Array of size group_count. For the group i, beta_array[i] specifies the
scalar betai.
When betai is equal to zero, then C matrices in group i need not be set on
input.

c_array Array, size total_batch_count, of pointers to arrays used to store C

matrices.

ldc_array Array of size group_count. For the group i, ldci = ldc_array[i]

specifies the leading dimension of all arrays storing matrix C in group i as
declared in the calling (sub)program.
When Layout = CblasColMajorldci must be at least max(1, mi).

When Layout = CblasRowMajorldci must be at least max(1, ni).

group_count Specifies the number of groups. Must be at least 0.

group_size Array of size group_count. The element group_size[i] specifies the

number of matrices in group i. Each element in group_size must be at
least 0.

Output Parameters

c_array Output buffer, overwritten by total_batch_count matrix multiply operations

of the form alpha*op(A)*op(B) + beta*C.

cblas_?gemm_batch_strided
Computes groups of matrix-matrix product with
general matrices.

Syntax
void cblas_sgemm_batch_strided (const CBLAS_LAYOUT layout, const CBLAS_TRANSPOSE
transa, const CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT
k, const float alpha, const float *a, const MKL_INT lda, const MKL_INT stridea, const
float *b, const MKL_INT ldb, const MKL_INT strideb, const float beta, float *c, const
MKL_INT ldc, const MKL_INT stridec, const MKL_INT batch_size);
void cblas_dgemm_batch_strided (const CBLAS_LAYOUT layout, const CBLAS_TRANSPOSE
transa, const CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT
k, const double alpha, const double *a, const MKL_INT lda, const MKL_INT stridea, const
double *b, const MKL_INT ldb, const MKL_INT strideb, const double beta, double *c,
const MKL_INT ldc, const MKL_INT stridec, const MKL_INT batch_size);

374
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void cblas_cgemm_batch_strided (const CBLAS_LAYOUT layout, const CBLAS_TRANSPOSE
transa, const CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT
k, const void *alpha, const void *a, const MKL_INT lda, const MKL_INT stridea, const
void *b, const MKL_INT ldb, const MKL_INT strideb, const void *beta, void *c, const
MKL_INT ldc, const MKL_INT stridec, const MKL_INT batch_size);
void cblas_zgemm_batch_strided (const CBLAS_LAYOUT layout, const CBLAS_TRANSPOSE
transa, const CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT
k, const void *alpha, const void *a, const MKL_INT lda, const MKL_INT stridea, const
void *b, const MKL_INT ldb, const MKL_INT strideb, const void *beta, void *c, const
MKL_INT ldc, const MKL_INT stridec, const MKL_INT batch_size);

Include Files
• mkl.h

Description
The cblas_?gemm_batch_strided routines perform a series of matrix-matrix operations with general
matrices. They are similar to the cblas_?gemm routine counterparts, but the cblas_?gemm_batch_strided
routines perform matrix-matrix operations with groups of matrices. The groups contain matrices with the
same parameters.
All matrix a (respectively, b or c) have the same parameters (size, leading dimension, transpose operation,
alpha, beta scaling) and are stored at constant stridea (respectively, strideb or stridec) from each other. The
operation is defined as

For i = 0 … batch_size – 1
Ai, Bi and Ci are matrices at offset i * stridea, i * strideb and i * stridec in a, b and c
Ci = alpha * Ai * Bi + beta * Ci
end for

Input Parameters

layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

transa Specifies op(A) the transposition operation applied to the matrices A.

if transa = CblasNoTrans, then op(A) = A;
if transa = CblasTrans, then op(A) = AT;
if transa = CblasConjTrans, then op(A) = AH.

transb Specifies op(B) the transposition operation applied to the matrices B.

if transb = CblasNoTrans, then op(B) = B;
if transb = CblasTrans, then op(B) = BT;
if transb = CblasConjTrans, then op(B) = BH.

m Number of rows of the op(A) and C matrices. Must be at least 0.

n Number of columns of the op(B) and C matrices. Must be at least 0.

k Number of columns of the op(A) matrix and number of rows of the op(B)
matrix. Must be at least 0.

alpha Specifies the scalar alpha.

375
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

a Array of size at least stridea*batch_size holding the a matrices.

transa=CblasNoTrans transa=CblasTrans or
CblasConjTrans

layout = Before entry, the leading Before entry, the leading

CblasColMajor m-by-k part of the array a k-by-m part of the array
+ i * stridea must contain a + i * stridea must
the matrix Ai. contain the matrix Ai.

layout = Before entry, the leading k- Before entry, the leading

CblasRowMajor by-m part of the array a + i m-by-k part of the array
* stridea must contain the a + i * stridea must
matrix Ai. contain the matrix Ai.

lda Specifies the leading dimension of the a matrices.

transa=CblasNoTrans transa=CblasTrans
or CblasConjTrans

layout = lda must be at least lda must be at least

CblasColMajor max(1,m) max(1,k).

layout = lda must be at least lda must be at least

CblasRowMajor max(1,k). max(1,m)

stridea Stride between two consecutive a matrices.

transa=CblasNoTrans transa=CblasTrans or
CblasConjTrans

layout = Must be at least ldak Must be at least ldam

CblasColMajor

layout = Must be at least ldam Must be at least ldak

CblasRowMajor

b Array of size at least strideb*batch_size holding the b matrices.

transb=CblasNoTrans transb=CblasTrans or
CblasConjTrans

layout = Before entry, the leading k- Before entry, the leading

CblasColMajor by-n part of the array b + i n-by-k part of the array b
* strideb must contain the + i * strideb must
matrix Bi. contain the matrix Bi.

layout = Before entry, the leading n- Before entry, the leading

CblasRowMajor by-k part of the array b + i k-by-n part of the array b
* strideb must contain the + i * strideb must
matrix Bi. contain the matrix Bi.

ldb Specifies the leading dimension of the b matrices.

376
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

transab=CblasNoTrans transb=CblasTrans
or CblasConjTrans

layout = ldb must be at least ldb must be at least

CblasColMajor max(1,k) max(1,n).

layout = ldb must be at least ldb must be at least

CblasRowMajor max(1,n). max(1,k)

strideb Stride between two consecutive b matrices.

transa=CblasNoTrans transa=CblasTrans or
CblasConjTrans

layout = Must be at least ldbn Must be at least ldbk

CblasColMajor

layout = Must be at least ldbk Must be at least ldbn

CblasRowMajor

beta Specifies the scalar beta.

c Array of size at least stridec*batch_size holding the c matrices.

If layout=CblasColMajor, before entry, the leading m-by-n part of the array
c + i * stridec must contain the matrix Ci.
If layout=CblasRowMajor, before entry, the leading n-by-m part of the array
c + i * stridec must contain the matrix Ci.

ldc Specifies the leading dimension of the c matrices.

Must be at least max(1,m) if layout=CblasColMajor or max(1,n) if
layout=CblasRowMajor.

stridec Specifies the stride between two consecutive c matrices.

Must be at least ldc*nif layout=CblasColMajor or ldc*m if
layout=CblasRowMajor.

batch_size Number of gemm computations to perform and a, b and c matrices. Must be

at least 0.

Output Parameters

c Array holding the batch_size updated c matrices.

cblas_?gemm3m_batch_strided
Computes groups of matrix-matrix product with
general matrices.

377
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
void cblas_cgemm3m_batch_strided (const CBLAS_LAYOUT layout, const CBLAS_TRANSPOSE
transa, const CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT
k, const void *alpha, const void *a, const MKL_INT lda, const MKL_INT stridea, const
void *b, const MKL_INT ldb, const MKL_INT strideb, const void *beta, void *c, const
MKL_INT ldc, const MKL_INT stridec, const MKL_INT batch_size);
void cblas_zgemm3m_batch_strided (const CBLAS_LAYOUT layout, const CBLAS_TRANSPOSE
transa, const CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT
k, const void *alpha, const void *a, const MKL_INT lda, const MKL_INT stridea, const
void *b, const MKL_INT ldb, const MKL_INT strideb, const void *beta, void *c, const
MKL_INT ldc, const MKL_INT stridec, const MKL_INT batch_size);

Include Files
• mkl.h

Description
The cblas_?gemm3m_batch_strided routines perform a series of matrix-matrix operations with general
matrices. They are similar to the cblas_?gemm routine counterparts, but the
cblas_?gemm3m_batch_strided routines perform matrix-matrix operations with groups of matrices. The
groups contain matrices with the same parameters.
All matrix a (respectively, b or c) have the same parameters (size, leading dimension, transpose operation,
alpha, beta scaling) and are stored at constant stridea (respectively, strideb or stridec) from each other. The
operation is defined as

For i = 0 … batch_size – 1
Ai, Bi and Ci are matrices at offset i * stridea, i * strideb and i * stridec in a, b and c
Ci = alpha * Ai * Bi + beta * Ci
end for
The cblas_?gemm3m_batch_strided routines use fewer matrix multiplications than the cblas_?gemm
routines, as described in the Application Notes below.

Input Parameters

layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

transa Specifies op(A) the transposition operation applied to the matrices A.

if transa = CblasNoTrans, then op(A) = A;
if transa = CblasTrans, then op(A) = AT;
if transa = CblasConjTrans, then op(A) = AH.

transb Specifies op(B) the transposition operation applied to the matrices B.

if transb = CblasNoTrans, then op(B) = B;
if transb = CblasTrans, then op(B) = BT;
if transb = CblasConjTrans, then op(B) = BH.

m Number of rows of the op(A) and C matrices. Must be at least 0.

n Number of columns of the op(B) and C matrices. Must be at least 0.

378
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
k Number of columns of the op(A) matrix and number of rows of the op(B)
matrix. Must be at least 0.

alpha Specifies the scalar alpha.

a Array of size at least stridea*batch_size holding the a matrices.

transa=CblasNoTrans transa=CblasTrans or
CblasConjTrans

layout = Before entry, the leading Before entry, the leading

CblasColMajor m-by-k part of the array a k-by-m part of the array
+ i * stridea must contain a + i * stridea must
the matrix Ai. contain the matrix Ai.

layout = Before entry, the leading k- Before entry, the leading

CblasRowMajor by-m part of the array a + i m-by-k part of the array
* stridea must contain the a + i * stridea must
matrix Ai. contain the matrix Ai.

lda Specifies the leading dimension of the a matrices.

transa=CblasNoTrans transa=CblasTrans
or CblasConjTrans

layout = lda must be at least lda must be at least

CblasColMajor max(1,m). max(1,k).

layout = lda must be at least lda must be at least

CblasRowMajor max(1,k). max(1,m).

stridea Stride between two consecutive a matrices.

transa=CblasNoTrans transa=CblasTrans or
CblasConjTrans

layout = Must be at least ldak. Must be at least ldam.

CblasColMajor

layout = Must be at least ldam. Must be at least ldak.

CblasRowMajor

b Array of size at least strideb*batch_size holding the b matrices.

transb=CblasNoTrans transb=CblasTrans or
CblasConjTrans

layout = Before entry, the leading k- Before entry, the leading

CblasColMajor by-n part of the array b + i n-by-k part of the array b
* strideb must contain the + i * strideb must
matrix Bi. contain the matrix Bi.

379
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

layout = Before entry, the leading n- Before entry, the leading

CblasRowMajor by-k part of the array b + i k-by-n part of the array b
* strideb must contain the + i * strideb must
matrix Bi. contain the matrix Bi.

ldb Specifies the leading dimension of the b matrices.

transab=CblasNoTrans transb=CblasTrans
or CblasConjTrans

layout = ldb must be at least ldb must be at least

CblasColMajor max(1,k). max(1,n).

layout = ldb must be at least ldb must be at least

CblasRowMajor max(1,n). max(1,k).

strideb Stride between two consecutive b matrices.

transa=CblasNoTrans transa=CblasTrans or
CblasConjTrans

layout = Must be at least ldbn. Must be at least ldbk.

CblasColMajor

layout = Must be at least ldbk. Must be at least ldbn.

CblasRowMajor

beta Specifies the scalar beta.

c Array of size at least stridec*batch_size holding the c matrices.

ldc Specifies the leading dimension of the c matrices.

Must be at least max(1,m) if layout=CblasColMajor or max(1,n) if
layout=CblasRowMajor.

stridec Specifies the stride between two consecutive c matrices.

Must be at least ldc*nif layout=CblasColMajor or ldc*m if
layout=CblasRowMajor.

batch_size Number of gemm computations to perform and a, b and c matrices. Must be

at least 0.

Output Parameters

c Array holding the batch_size updated c matrices.

380
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Application Notes
These routines perform a complex matrix multiplication by forming the real and imaginary parts of the input
matrices. This uses three real matrix multiplications and five real matrix additions instead of the conventional
four real matrix multiplications and two real matrix additions. The use of three real matrix multiplications
reduces the time spent in matrix operations by 25%, resulting in significant savings in compute time for
large matrices.
If the errors in the floating point calculations satisfy the following conditions:
fl(x op y)=(x op y)(1+δ),|δ|≤u, op=×,/, fl(x±y)=x(1+α)±y(1+β), |α|,|β|≤u
then for an n-by-n matrix Ĉ=fl(C1+iC2)=fl((A1+iA2)(B1+iB2))=Ĉ1+iĈ2, the following bounds are
satisfied:

║Ĉ1-C1║≤ 2(n+1)u║A║∞║B║∞+O(u2),
║Ĉ2-C2║≤ 4(n+4)u║A║∞║B║∞+O(u2),
where ║A║∞=max(║A1║∞,║A2║∞), and ║B║∞=max(║B1║∞,║B2║∞).

Thus the corresponding matrix multiplications are stable.

cblas_?gemm3m_batch
Computes scalar-matrix-matrix products and adds the
results to scalar matrix products for groups of general
matrices.

Syntax
void cblas_cgemm3m_batch (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE*
transa_array, const CBLAS_TRANSPOSE* transb_array, const MKL_INT* m_array, const
MKL_INT* n_array, const MKL_INT* k_array, const void *alpha_array, const void
**a_array, const MKL_INT* lda_array, const void **b_array, const MKL_INT* ldb_array,
const void *beta_array, void **c_array, const MKL_INT* ldc_array, const MKL_INT
group_count, const MKL_INT* group_size);
void cblas_zgemm3m_batch (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE*
transa_array, const CBLAS_TRANSPOSE* transb_array, const MKL_INT* m_array, const
MKL_INT* n_array, const MKL_INT* k_array, const void *alpha_array, const void
**a_array, const MKL_INT* lda_array, const void **b_array, const MKL_INT* ldb_array,
const void *beta_array, void **c_array, const MKL_INT* ldc_array, const MKL_INT
group_count, const MKL_INT* group_size);

Include Files
• mkl.h

Description

The ?gemm3m_batch routines perform a series of matrix-matrix operations with general matrices. They are
similar to the ?gemm3m routine counterparts, but the ?gemm3m_batch routines perform matrix-matrix
operations with groups of matrices, processing a number of groups at once. The groups contain matrices with
the same parameters. The ?gemm3m_batch routines use fewer matrix multiplications than the ?gemm_batch
routines, as described in the Application Notes.
The operation is defined as

idx = 0
for i = 0..group_count - 1
alpha and beta in alpha_array[i] and beta_array[i]

381
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

for j = 0..group_size[i] - 1
A, B, and C matrix in a_array[idx], b_array[idx], and c_array[idx]
C := alpha*op(A)*op(B) + beta*C,
idx = idx + 1
end for
end for
where:
op(X) is one of op(X) = X, or op(X) = XT, or op(X) = XH,
alpha and beta are scalar elements of alpha_array and beta_array,

A, B and C are matrices such that for m, n, and k which are elements of m_array, n_array, and k_array:

op(A) is an m-by-k matrix,

See also gemm for a detailed description of multiplication for general matrices and gemm_batch, BLAS-like
extension routines for similar matrix-matrix operations.

NOTE
Error checking is not performed for Intel® oneAPI Math Kernel Library (oneMKL) Windows*
single dynamic libraries for the?gemm3m_batch routines.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

transa_array Array of size group_count. For the group i, transai = transa_array[i]

specifies the form of op(A) used in the matrix multiplication:

if transai = CblasNoTrans, then op(A) = A;

if transai = CblasTrans, then op(A) = AT;

if transai = CblasConjTrans, then op(A) = AH.

transb_array Array of size group_count. For the group i, transbi = transb_array[i]

specifies the form of op(Bi) used in the matrix multiplication:

if transbi = CblasNoTrans, then op(B) = B;

if transbi = CblasTrans, then op(B) = BT;

if transbi = CblasConjTrans, then op(B) = BH.

m_array Array of size group_count. For the group i, mi = m_array[i] specifies the
number of rows of the matrix op(A) and of the matrix C.

The value of each element of m_array must be at least zero.

382
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
n_array Array of size group_count. For the group i, ni = n_array[i] specifies the
number of columns of the matrix op(B) and the number of columns of the
matrix C.
The value of each element of n_array must be at least zero.

k_array Array of size group_count. For the group i, ki = k_array[i] specifies the
number of columns of the matrix op(A) and the number of rows of the
matrix op(B).

The value of each element of k_array must be at least zero.

alpha_array Array of size group_count. For the group i, alpha_array[i] specifies the
scalar alphai.

a_array Array, size total_batch_count, of pointers to arrays used to store A

matrices.

lda_array Array of size group_count. For the group i, ldai = lda_array[i]

specifies the leading dimension of the array storing matrix A as declared in
the calling (sub)program.

transai=CblasNoTrans transai=CblasTrans or
transai=CblasConjTrans

Layout = ldai must be at least ldai must be at least

CblasColMajor max(1, mi). max(1, ki)

Layout = ldai must be at least ldai must be at least

CblasRowMajor max(1, ki) max(1, mi).

b_array Array, size total_batch_count, of pointers to arrays used to store B

matrices.

ldb_array Array of size group_count. For the group i, ldbi = ldb_array[i]

specifies the leading dimension of the array storing matrix B as declared in
the calling (sub)program.

transbi=CblasNoTrans transbi=CblasTrans or
transbi=CblasConjTrans

Layout = ldbi must be at least ldbi must be at least

CblasColMajor max(1, ki). max(1, ni).

Layout = ldbi must be at least ldbi must be at least

CblasRowMajor max(1, ni). max(1, ki).

beta_array For the group i, beta_array[i] specifies the scalar betai.

When betai is equal to zero, then C matrices in group i need not be set on
input.

c_array Array, size total_batch_count, of pointers to arrays used to store C

matrices.

383
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ldc_array Array of size group_count. For the group i, ldci = ldc_array[i]

specifies the leading dimension of all arrays storing matrix C in group i as
declared in the calling (sub)program.
When Layout = CblasColMajorldci must be at least max(1, mi).

When Layout = CblasRowMajorldci must be at least max(1, ni).

group_count Specifies the number of groups. Must be at least 0.

group_size Array of size group_count. The element group_size[i] specifies the

number of matrices in group i. Each element in group_size must be at
least 0.

Output Parameters

c_array Overwritten by the mi-by-ni matrix (alphaiop(A)op(B) + betai*C) for

group i.

Application Notes
These routines perform a complex matrix multiplication by forming the real and imaginary parts of the input
matrices. This uses three real matrix multiplications and five real matrix additions instead of the conventional
four real matrix multiplications and two real matrix additions. The use of three real matrix multiplications
reduces the time spent in matrix operations by 25%, resulting in significant savings in compute time for
large matrices.
If the errors in the floating point calculations satisfy the following conditions:
fl(x op y)=(x op y)(1+δ),|δ|≤u, op=×,/, fl(x±y)=x(1+α)±y(1+β), |α|,|β|≤u
then for an n-by-n matrix Ĉ=fl(C1+iC2)= fl((A1+iA2)(B1+iB2))=Ĉ1+iĈ2, the following bounds are
satisfied:
║Ĉ1-C1║≤ 2(n+1)u║A║∞║B║∞+O(u2),
║Ĉ2-C2║≤ 4(n+4)u║A║∞║B║∞+O(u2),
where ║A║∞=max(║A1║∞,║A2║∞), and ║B║∞=max(║B1║∞,║B2║∞).

Thus the corresponding matrix multiplications are stable.

cblas_?trsm_batch
Solves a triangular matrix equation for a group of
matrices.

Syntax
void cblas_strsm_batch (const CBLAS_LAYOUT Layout, const CBLAS_SIDE *Side_Array, const
CBLAS_UPLO *Uplo_Array, const CBLAS_TRANSPOSE *TransA_Array, const CBLAS_DIAG
*Diag_Array, const MKL_INT *M_Array, const MKL_INT *N_Array, const float *alpha_Array,
const float * *A_Array, const MKL_INT *lda_Array, float * *B_Array, const MKL_INT
*ldb_Array, const MKL_INT group_count, const MKL_INT *group_size );
void cblas_dtrsm_batch (const CBLAS_LAYOUT Layout, const CBLAS_SIDE *Side_Array, const
CBLAS_UPLO *Uplo_Array, const CBLAS_TRANSPOSE *Transa_Array, const CBLAS_DIAG
*Diag_Array, const MKL_INT *M_Array, const MKL_INT *N_Array, const double *alpha_Array,
const double * *A_Array, const MKL_INT *lda_Array, double * *B_Array, const MKL_INT
*ldb_Array, const MKL_INT group_count, const MKL_INT *group_size );

384
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void cblas_ctrsm_batch (const CBLAS_LAYOUT Layout, const CBLAS_SIDE *Side_Array, const
CBLAS_UPLO *Uplo_Array, const CBLAS_TRANSPOSE *Transa_Array, const CBLAS_DIAG
*Diag_Array, const MKL_INT *M_Array, const MKL_INT *N_Array, const void *alpha_Array,
const void * *A_Array, const MKL_INT *lda_Array, void * *B_Array, const MKL_INT
*ldb_Array, const MKL_INT group_count, const MKL_INT *group_size );
void cblas_ztrsm_batch (const CBLAS_LAYOUT Layout, const CBLAS_SIDE *Side_Array, const
CBLAS_UPLO *Uplo_Array, const CBLAS_TRANSPOSE *Transa_Array, const CBLAS_DIAG
*Diag_Array, const MKL_INT *M_Array, const MKL_INT *N_Array, const void *alpha_Array,
const void * *A_Array, const MKL_INT *lda_Array, void * *B_Array, const MKL_INT
*ldb_Array, const MKL_INT group_count, const MKL_INT *group_size );

Include Files
• mkl.h

Description

The ?trsm_batch routines solve a series of matrix equations. They are similar to the ?trsm routines except
that they operate on groups of matrices which have the same parameters. The ?trsm_batch routines
process a number of groups at once.

idx = 0
for i = 0..group_count - 1
alpha in alpha_array[i]
for j = 0..group_size[i] - 1
A and B matrix in a_array[idx] and b_array[idx]
Solve op(A)*X = alpha*B
or
Solve X*op(A) = alpha*B
idx = idx + 1
end for
end for
where:
alpha is a scalar element of alpha_array,

X and B are m-by-n matrices for m and n which are elements of m_array and n_array, respectively,

A is a unit, or non-unit, upper or lower triangular matrix,

and op(A) is one of op(A) = A, or op(A) = AT, or op(A) = conjg(AT).

A and B represent matrices stored at addresses pointed to by a_array and b_array, respectively. There are
total_batch_count entries in each of a_array and b_array, where total_batch_count is the sum of all the
group_size entries.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

side_array Array of size group_count. For group i, 0 ≤i≤group_count - 1, sidei =

side_array[i] specifies whether op(A) appears on the left or right of X in
the equation:
if sidei = CblasLeft, then op(A)*X = alpha*B;

if sidei = CblasRight, then Xop(A) = alphaB.

385
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

uplo_array Array of size group_count. For group i, 0 ≤i≤group_count - 1, uploi =

uplo_array[i] specifies whether the matrix A is upper or lower triangular:
uploi = CblasUpper
if uploi = CblasLower, then the matrix is low triangular.

transa_array Array of size group_count. For group i, 0 ≤i≤group_count - 1, transai =

transa_array[i] specifies the form of op(A) used in the matrix
multiplication:
if transai=CblasNoTrans, then op(A) = A;

if transai=CblasTrans;

if transai=CblasConjTrans, then op(A) = conjg(A').

diag_array Array of size group_count. For group i, 0 ≤i≤group_count - 1, diagi =

diag_array[i] specifies whether the matrix A is unit triangular:
if diagi = CblasUnit then the matrix is unit triangular;

if diagi = CblasNonUnit , then the matrix is not unit triangular.

m_array Array of size group_count. For group i, 0 ≤i≤group_count - 1, mi =

m_array[i] specifies the number of rows of B. The value of mi must be at
least zero.

n_array Array of size group_count. For group i, 0 ≤i≤group_count - 1, ni =

n_array[i] specifies the number of columns of B. The value of ni must be
at least zero.

alpha_array Array of size group_count. For group i, 0 ≤i≤group_count - 1,

alpha_array[i] specifies the scalar alphai.

a_array Array, size total_batch_count, of pointers to arrays used to store A

matrices.
For group i, 0 ≤i≤group_count - 1, k is mi when sidei = CblasLeft and is ni
when sidei = CblasRight and a is any of the group_size[i] arrays
starting with a_array[group_size[0] + group_size[1] + ... +
group_size(i - 1)]:
Before entry with uploi = CblasUpper, the leading k by k upper triangular
part of the array a must contain the upper triangular matrix and the strictly
lower triangular part of a is not referenced.
Before entry with uploi = CblasLower lower triangular part of the array a
must contain the lower triangular matrix and the strictly upper triangular
part of a is not referenced.
When diagi = CblasUnit, the diagonal elements of a are not referenced
either, but are assumed to be unity.

lda_array Array of size group_count. For group i, 0 ≤i≤group_count - 1, ldai =

lda_array[i] specifies the leading dimension of a as declared in the
calling (sub)program. When sidei = CblasLeft, then ldai must be at least
max(1, mi), when sidei = CblasRight, then ldai must be at least max(1,
ni).

386
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
b_array Array, size total_batch_count, of pointers to arrays used to store B
matrices.
For group i, 0 ≤i≤group_count - 1, b is any of the group_size[i] arrays
starting with b_array[group_size[0] + group_size[1] + ... +
group_size(i - 1)]:
For Layout = CblasColMajor: before entry, the leading mi-by-ni part of
the array b must contain the matrix B.
For Layout = CblasRowMajor: before entry, the leading ni-by-mi part of
the array b must contain the matrix B.

ldb_array Array of size group_count. Specifies the leading dimension of b as declared

in the calling (sub)program. When Layout = CblasColMajor, ldb must be
at least max(1, m); otherwise, ldb must be at least max(1, n).

Array of size group_count. For group i, 0 ≤i≤group_count - 1, ldbi =

ldb_array[i] specifies the leading dimension of b as declared in the
calling (sub)program. When Layout = CblasColMajor, ldbi must be at
least max(1, mi); otherwise, ldbi must be at least max(1, ni).

group_count Specifies the number of groups. Must be at least 0.

group_size Array of size group_count. The element group_size[i] specifies the

number of matrices in group i. Each element in group_size must be at
least 0.

Output Parameters

b_array Overwritten by the solution matrix X.

cblas_?trsm_batch_strided
Solves groups of triangular matrix equations.

Syntax
void cblas_strsm_batch_strided(const CBLAS_LAYOUT layout, const CBLAS_SIDE side, const
CBLAS_UPLO uplo, const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m,
const MKL_INT n, const float alpha, const float *a, const MKL_INT lda, const MKL_INT
stridea, float *b, const MKL_INT ldb, const MKL_INT strideb, MKL_INT batch_size);
void cblas_dtrsm_batch_strided(const CBLAS_LAYOUT layout, const CBLAS_SIDE side, const
CBLAS_UPLO uplo, const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m,
const MKL_INT n, const double alpha, const double *a, const MKL_INT lda, const MKL_INT
stridea, double *b, const MKL_INT ldb, const MKL_INT strideb, const MKL_INT
batch_size);
void cblas_ctrsm_batch_strided(const CBLAS_LAYOUT layout, const CBLAS_SIDE side, const
CBLAS_UPLO uplo, const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m,
const MKL_INT n, const void *alpha, const void *a, const MKL_INT lda, const MKL_INT
stridea, void *b, const MKL_INT ldb, const MKL_INT strideb, const MKL_INT batch_size);
void zblas_ctrsm_batch_strided(const CBLAS_LAYOUT layout, const CBLAS_SIDE side, const
CBLAS_UPLO uplo, const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m,
const MKL_INT n, const void *alpha, const void *a, const MKL_INT lda, const MKL_INT
stridea, void *b, const MKL_INT ldb, const MKL_INT strideb, const MKL_INT batch_size);

387
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description
The cblas_?trsm_batch_strided routines solve a series of triangular matrix equations. They are similar to
the cblas_?trsm routine counterparts, but the cblas_?trsm_batch_strided routines solve triangular
matrix equations with groups of matrices. All matrix a have the same parameters (size, leading dimension,
side, uplo, diag, transpose operation) and are stored at constant stridea from each other. Similarly, all matrix
b have the same parameters (size, leading dimension, alpha scaling) and are stored at constant strideb from
each other.
The operation is defined as

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

side Specifies whether op(A) appears on the left or right of X in the equation.

if side = CblasLeft, then op(A)X = alphaB;

if side = CblasRight, then Xop(A) = alphaB.

uplo Specifies whether the matrices A are upper or lower triangular.

if uplo = CblasUpper, then A are upper triangular;
if uplo = CblasLower, then A are lower triangular.

transa Specifies op(A) the transposition operation applied to the matrices A.

if transa = CblasNoTrans, then op(A) = A;

if transa = CblasTrans, then op(A) = AT;

if transa = CblasConjTrans, then op(A) = AH;

diag Specifies whether the matrices A are unit triangular.

if diag = CblasUnit, then A are unit triangular;
if diag = CblasLower, then A are non-unit triangular.

m Number of rows of B matrices. Must be at least 0

n Number of columns of B matrices. Must be at least 0

alpha Specifies the scalar alpha.

a Array of size at least stridea*batch_size holding the A matrices. Each A

matrix is stored at constant stridea from each other.
Each A matrix has size lda* k, where k is m when side = CblasLeft and is
n when side = CblasRight.
Before entry with uplo = CblasUpper, the leading k-by-k upper triangular
part of the array A must contain the upper triangular matrix and the strictly
lower triangular part of A is not referenced.

388
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Before entry with uplo = CblasLower lower triangular part of the array A
must contain the lower triangular matrix and the strictly upper triangular
part of A is not referenced.
When diag = CblasUnit, the diagonal elements of A are not referenced
either, but are assumed to be unity.

lda Specifies the leading dimension of the A matrices. When side = CblasLeft,
then lda must be at least max(1, m), when side = side = CblasRight, then
lda must be at least max(1, n).

stridea Stride between two consecutive A matrices.

When side = CblasLeft, then stridea must be at least lda*m.

When side = side = CblasRight, then stridea must be at least lda*n.

b Array of size at least strideb*batch_size holding the B matrices. Each B

matrix is stored at constant strideb from each other.
When layout= CblasColMajor, each B matrix has size ldb* n. Before entry,
the leading m-by-n part of the array B must contain the matrix B.
When layout= CblasRowMajor, each B matrix has size ldb* m. Before entry,
the leading n-by-m part of the array B must contain the matrix B.

ldb Specifies the leading dimension of the B matrices.

When layout= CblasColMajor, strideb must be at least max(1,m).
Otherwise, strideb must be at least max(1,n).

strideb Stride between two consecutive B matrices.

When layout= CblasColMajor, strideb must be at least ldb*n. Otherwise,
strideb must be at least ldb*m.

batch_size Number of trsm computations to perform. Must be at least 0.

Output Parameters

b Overwritten by the solution batch_size X matrices.

mkl_?imatcopy
Performs scaling and in-place transposition/copying of
matrices.

Syntax
void mkl_simatcopy (const char ordering, const char trans, size_t rows, size_t cols,
const float alpha, float * AB, size_t lda, size_t ldb);
void mkl_dimatcopy (const char ordering, const char trans, size_t rows, size_t cols,
const double alpha, double * AB, size_t lda, size_t ldb);
void mkl_cimatcopy (const char ordering, const char trans, size_t rows, size_t cols,
const MKL_Complex8 alpha, MKL_Complex8 * AB, size_t lda, size_t ldb);
void mkl_zimatcopy (const char ordering, const char trans, size_t rows, size_t cols,
const MKL_Complex16 alpha, MKL_Complex16 * AB, size_t lda, size_t ldb);

389
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description

The mkl_?imatcopy routine performs scaling and in-place transposition/copying of matrices. A transposition
operation can be a normal matrix copy, a transposition, a conjugate transposition, or just a conjugation. The
operation is defined as follows:
AB := alpha*op(AB).

NOTE
Different arrays must not overlap.

Input Parameters

ordering Ordering of the matrix storage.

If ordering = 'R' or 'r', the ordering is row-major.

If ordering = 'C' or 'c', the ordering is column-major.

trans Parameter that specifies the operation type.

If trans = 'N' or 'n', op(AB)=AB and the matrix AB is assumed
unchanged on input.
If trans = 'T' or 't', it is assumed that AB should be transposed.

If trans = 'C' or 'c', it is assumed that AB should be conjugate

transposed.
If trans = 'R' or 'r', it is assumed that AB should be only conjugated.

If the data is real, then trans = 'R' is the same as trans = 'N', and
trans = 'C' is the same as trans = 'T'.

rows The number of rows in matrix AB before the transpose operation.

cols The number of columns in matrix AB before the transpose operation.

ab Array.

alpha This parameter scales the input matrix by alpha.

lda Distance between the first elements in adjacent columns (in the case of the
column-major order) or rows (in the case of the row-major order) in the
source matrix; measured in the number of elements.
This parameter must be at least rows if ordering = 'C' or 'c', and
max(1,cols) otherwise.

ldb Distance between the first elements in adjacent columns (in the case of the
column-major order) or rows (in the case of the row-major order) in the
destination matrix; measured in the number of elements.
To determine the minimum value of ldb on output, consider the following
guideline:

390
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If ordering = 'C' or 'c', then

• If trans = 'T' or 't' or 'C' or 'c', this parameter must be at least

max(1,cols)
• If trans = 'N' or 'n' or 'R' or 'r', this parameter must be at least
max(1,rows)
If ordering = 'R' or 'r', then

• If trans = 'T' or 't' or 'C' or 'c', this parameter must be at least

max(1,rows)
• If trans = 'N' or 'n' or 'R' or 'r', this parameter must be at least
max(1,cols)

Output Parameters

ab Array.
Contains the matrix AB.

Application Notes
For threading to be active in mkl_?imatcopy, the pointer AB must be aligned on the 64-byte boundary. This
requirement can be met by allocating AB with mkl_malloc.

Interfaces

mkl_?imatcopy_batch
Computes a group of in-place scaled matrix copy or
transposition operations on general matrices.

Syntax
void mkl_simatcopy_batch (char layout, const char * trans_array, const size_t *
rows_array, const size_t * cols_array, const float * alpha_array, float ** ab_array,
const size_t * lda_array, const size_t * ldb_array, size_t group_count, const size_t *
group_size);
void mkl_dimatcopy_batch (char layout, const char * trans_array, const size_t *
rows_array, const size_t * cols_array, const double * alpha_array, double ** ab_array,
const size_t * lda_array, const size_t * ldb_array, size_t group_count, const size_t *
group_size);
void mkl_cimatcopy_batch (char layout, const char * trans_array, const size_t *
rows_array, const size_t * cols_array, const MKL_Complex8 * alpha_array, MKL_Complex8 **
ab_array, const size_t * lda_array, const size_t * ldb_array, size_t group_count, const
size_t * group_size);
void mkl_zimatcopy_batch (char layout, const char * trans_array, const size_t *
rows_array, const size_t * cols_array, const MKL_Complex16 * alpha_array, MKL_Complex16
** ab_array, const size_t * lda_array, const size_t * ldb_array, size_t group_count,
const size_t * group_size);

391
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Description
The mkl_?imatcopy_batch routine performs a series of in-place scaled matrix copies or transpositions. They
are similar to the mkl_?imatcopy routine counterparts, but the mkl_?imatcopy_batch routine performs
matrix operations with groups of matrices. Each group has the same parameters (matrix size, leading
dimension, and scaling parameter), but a single call to mkl_?imatcopy_batch operates on multiple groups,
and each group can have different parameters, unlike the related mkl_?imatcopy_batch_strided routines.

The operation is defined as

idx = 0
for i = 0..group_count - 1
m in rows_array[i], n in cols_array[i], and alpha in alpha_array[i]
for j = 0..group_size[i] - 1
AB matrices in AB_array[idx]
AB := alpha*op(AB)
idx = idx + 1
end for
end for
Where op(X) is one of op(X)=X, op(X)=X', op(X)=conjg(X'), or op(X)=conjg(X). On entry, AB is a m-
by-n matrix such that m and n are elements of rows_array and cols_array.
AB represents a matrix stored at addresses pointed to by AB_array. The number of entries in AB_array is
total_batch_count = the sum of all of the group_size entries.

Input Parameters

layout Specifies whether two-dimensional array storage is row-major (R) or

column-major (C).

trans_array Array of size group_count. For the group i, trans = trans_array[i]

specifies the form of op(AB), the transposition operation applied to the AB
matrix:
If trans = 'N' or 'n', op(AB)=AB.

If trans = 'T' or 't', op(AB)=AB'

If trans = 'C' or 'c', op(AB)=conjg(AB')

If trans = 'R' or 'r', op(AB)=conjg(AB)

rows_array Array of size group_count. Specifies the number of rows of the input
matrix AB. The value of each element must be at least zero.

cols_array Array of size group_count. Specifies the number of columns of the input
matrix AB. The value of each element must be at least zero.

alpha_array Array of size group_count. Specifies the scalar alpha.

AB_array Array of size total_batch_count, holding pointers to arrays used to store AB

matrices.

lda_array Array of size group_count. The leading dimension of the matrix input AB.
It must be positive and at least m if column major layout is used or at least
n if row major layout is used.

ldb_array Array of size group_count. The leading dimension of the matrix input AB.
It must be positive and at least

392
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
m if column major layout is used and op(AB) = AB or conjg(AB)

n if row major layout is used and op(AB) = AB' or conjg(AB')

n otherwise

group_count Specifies the number of groups. Must be at least 0

group_size Array of size group_count. The element group_size[i] specifies the

number of matrices in group i. Each element in group_size must be at
least 0.

Output Parameters

AB_array Output array of size total_batch_count, holding pointers to arrays used to

store the updated AB matrices.

mkl_?imatcopy_batch_strided
Computes a group of in-place scaled matrix copy or
transposition using general matrices.

Syntax
void mkl_simatcopy_batch_strided (const char layout, const char trans, size_t row,
size_t col, const float alpha, float * ab, size_t lda, size_t ldb, size_t stride, size_t
batch_size);
void mkl_dimatcopy_batch_strided (const char layout, const char trans, size_t row,
size_t col, const double alpha, double * ab, size_t lda, size_t ldb, size_t stride,
size_t batch_size);
void mkl_cimatcopy_batch_strided (const char layout, const char trans, size_t row,
size_t col, MKL_complex8 alpha, MKL_complex8 * ab, size_t lda, size_t ldb, size_t
stride, size_t batch_size);
void mkl_zimatcopy_batch_strided (const char layout, const char trans, size_t row,
size_t col, MKL_complex16 alpha, MKL_complex16 * ab, size_t lda, size_t ldb, size_t
stride, size_t batch_size);

Description
The mkl_?imatcopy_batch_strided routine performs a series of scaled matrix copy or transposition. They
are similar to the mkl_?imatcopy routine counterparts, but the mkl_?imatcopy_batch_strided routine
performs matrix operations with a group of matrices.
All matrices ab have the same parameters (size, transposition operation…) and are stored at constant stride
from each other. The operation is defined as

for i = 0 … batch_size – 1
AB is a matrix at offset i * stride in ab
AB = alpha * op(AB)
end for

Input Parameters

layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor)

393
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

trans Specifies op(AB), the transposition operation applied to the AB matrices.

If trans = 'N' or 'n', op(AB)=AB.

If trans = 'T' or 't', op(AB)=AB'

If trans = 'C' or 'c', op(AB)=conjg(AB')

If trans = 'R' or 'r', op(AB)=conjg(AB)

row Specifies the number of rows of the matrices AB. The value of row must be
at least zero.

col Specifies the number of columns of the matrices AB. The value of col must
be at least zero.

alpha Specifies the scalar alpha.

ab Array holding all the input matrix AB. Must be of size at least batch_size
* stride.

lda The leading dimension of the matrix input AB. It must be positive and at
least row if column major layout is used or at least col if row major layout
is used.

ldb The leading dimension of the matrix input AB. It must be positive and at
least
row if column major layout is used and op(AB) = AB or conjg(AB)
row if row major layout is used and op(AB) = AB' or conjg(AB')
col otherwise

stride Stride between two consecutive AB matrices, must be at least

max(ldb,lda)*max(ka, kb) where

• ka is row if column major layout is used or col if row major layout is

used
• kb is col if column major layout is used and op(AB) = AB or
conjg(AB) or row major layout is used and op(AB) = AB' or
conjg(AB'); kb is row otherwise.

batch_size Number of imatcopy computations to perform and AB matrices. Must be at

least 0.

Output Parameters

ab Array holding the batch_size updated matrices AB.

mkl_?omatadd_batch_strided
Computes a group of out-of-place scaled matrix
additions using general matrices.

Syntax
void mkl_somatadd_batch_strided(char ordering, char transa, char transb, size_t rows,
size_t cols, float alpha, const float * A, size_t lda, size_t stridea, float beta,
const float * B, size_t ldb, size_t strideb, float * C, size_t ldc, size_t stridec,
size_t batch_size);

394
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void mkl_domatadd_batch_strided(char ordering, char transa, char transb, size_t rows,
size_t cols, double alpha, const double * A, size_t lda, size_t stridea, double beta,
const double * B, size_t ldb, size_t strideb, double * C, size_t ldc, size_t stridec,
size_t batch_size);
void mkl_comatadd_batch_strided(char ordering, char transa, char transb, size_t rows,
size_t cols, MKL_Complex8 alpha, const MKL_Complex8 * A, size_t lda, size_t stridea,
MKL_Complex8 beta, const MKL_Complex8 * B, size_t ldb, size_t strideb, MKL_Complex8 *
C, size_t ldc, size_t stridec, size_t batch_size);
void mkl_zomatadd_batch_strided(char ordering, char transa, char transb, size_t rows,
size_t cols, MKL_Complex16 alpha, const MKL_Complex16 * A, size_t lda, size_t stridea,
MKL_Complex16 beta, const MKL_Complex16 * B, size_t ldb, size_t strideb, MKL_Complex16
* C, size_t ldc, size_t stridec, size_t batch_size);

Description
The mkl_omatadd_batch_strided routines perform a series of scaled matrix additions. They are similar to
the mkl_omatadd routines, but the mkl_omatadd_batch_strided routines perform matrix operations with a
group of matrices.
The matrices A, B, and C are stored at a constant stride from each other in memory, given by the parameters
stridea, strideb, and stridec. The operation is defined as:

for i = 0 … batch_size – 1
A is a matrix at offset i * stridea in the array a
B is a matrix at offset i * strideb in the array b
C is a matrix at offset i * stridec in the array c
C = alpha * op(A) + beta * op(B)
end for
where:

• op(X) is one of op(X) = X, op(X) = X', op(X) = conjg(X) or op(X) = conjg(X').

• alpha and beta are scalars.
• A, B, and C are matrices.
The input arrays a and b contain all the input matrices, and the single output array c contains all the output
matrices. The locations of the individual matrices within the array are given by stride lengths, while the
number of matrices is given by the batch_size parameter.

In general, the a, b, and c arrays must not overlap in memory, with the exception of the following in-place
operations:

• a and c can point to the same memory if transa is non-transpose and all the A matrices within a have
the same parameters as all the respective C matrices within c.
• b and c can point to the same memory if transb is non-transpose and all the B matrices within b have
the same parameters as all the respective C matrices within c.

Input Parameters

layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).
transa Specifies op(A), the transposition operation applied to the matrices
A. 'N' or 'n' indicates no operation, 'T' or 't' is transposition, 'R' or 'r'
is complex conjugation wtihout tranpsosition, and 'C' or 'c' is
conjugate transposition.

395
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

transb Specifies op(B), the transposition operation applied to the matrices

B.
rows Number of rows for the result matrix C. Must be at least zero.

cols Number of columns for the result matrix C. Must be at least zero.

alpha Scaling factor for the matrices A.

a Array holding the input matrices A. If alpha is zero, a is never

accessed and may be a null pointer. Otherwise it must have size at
least stride_a*batch_size.
lda Leading dimension of the A matrices. If matrices are stored using
column major layout, lda must be at least rows if A is not
transposed or cols if A is transposed. If matrices are stored using
row major layout, lda must be at least cols if A is not transposed or
at least rows if A is transposed. Must be positive.
stride_a Stride between the different A matrices. If matrices are stored using
column major layout, stride_a must be at least lda*rows if A is not
transposed or at least lda*cols if A is transposed. If matrices are
stored using row major layout, stride_a must be at least lda*rows
if B is not transposed or at least lda*cols if A is transposed.
beta Scaling factor for the matrices B.

b Array holding the input matrices B. If beta is zero, b is never

accessed and may be a null pointer. Otherwise it must have size at
least stride_b*batch_size.
ldb Leading dimension of the B matrices. If matrices are stored using
column major layout, ldb must be at least rows if B is not
transposed or cols if B is transposed. If matrices are stored using
row major layout, ldb must be at least cols if B is not transposed or
at least rows if B is transposed. Must be positive.
stride_b Stride between the different B matrices. If matrices are stored using
column major layout, stride_b must be at least ldb*cols if B is not
transposed or at least ldb*rows if B is transposed. If matrices are
stored using row major layout, stride_b must be at least ldb*rows
if B is not transposed or at least ldb*cols if B is transposed.
c Output array, overwritten by batch_size matrix addition operations
of the form alpha*op(A) + beta*op(B). Must have size at least
stride_c*batch_size.
ldc Leading dimension of the A matrices. If matrices are stored using
column major layout, lda must be at least rows. If matrices are
stored using row major layout, lda must be at least cols. Must be
positive.
stride_c Stride between the different C matrices. If matrices are stored using
column major layout, stride_c must be at least ldc*cols. If
matrices are stored using row major layout, stride_c must be at
least ldc*rows.
batch_size Specifies the number of input and output matrices to add.

Output Parameters

c Array holding the updated matrices C.

396
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

mkl_?omatcopy
Performs scaling and out-place transposition/copying
of matrices.

Syntax
void mkl_somatcopy (char ordering, char trans, size_t rows, size_t cols, const float
alpha, const float * A, size_t lda, float * B, size_t ldb);
void mkl_domatcopy (char ordering, char trans, size_t rows, size_t cols, const double
alpha, const double * A, size_t lda, double * B, size_t ldb);
void mkl_comatcopy (char ordering, char trans, size_t rows, size_t cols, const
MKL_Complex8 alpha, const MKL_Complex8 * A, size_t lda, MKL_Complex8 * B, size_t ldb);
void mkl_zomatcopy (char ordering, char trans, size_t rows, size_t cols, const
MKL_Complex16 alpha, const MKL_Complex16 * A, size_t lda, MKL_Complex16 * B, size_t
ldb);

Include Files
• mkl.h

Description

The mkl_?omatcopy routine performs scaling and out-of-place transposition/copying of matrices. A

transposition operation can be a normal matrix copy, a transposition, a conjugate transposition, or just a
conjugation. The operation is defined as follows:
B := alpha*op(A)

NOTE
Different arrays must not overlap.

Input Parameters

ordering Ordering of the matrix storage.

If ordering = 'R' or 'r', the ordering is row-major.

If ordering = 'C' or 'c', the ordering is column-major.

trans Parameter that specifies the operation type.

If trans = 'N' or 'n', op(A)=A and the matrix A is assumed unchanged
on input.
If trans = 'T' or 't', it is assumed that A should be transposed.

If trans = 'C' or 'c', it is assumed that A should be conjugate

transposed.
If trans = 'R' or 'r', it is assumed that A should be only conjugated.

If the data is real, then trans = 'R' is the same as trans = 'N', and
trans = 'C' is the same as trans = 'T'.

rows The number of rows in matrix A (the input matrix).

397
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

cols The number of columns in matrix A (the input matrix).

alpha This parameter scales the input matrix by alpha.

a Input array.
If ordering = 'R' or 'r', the size of a is lda*rows.

If ordering = 'C' or 'c', the size of a is lda*cols.

lda If ordering = 'R' or 'r', lda represents the number of elements in array
a between adjacent rows of matrix A; lda must be at least equal to the
number of columns of matrix A.
If ordering = 'C' or 'c', lda represents the number of elements in array
a between adjacent columns of matrix A; lda must be at least equal to the
number of row in matrix A.

b Output array.
If ordering = 'R' or 'r';

• If trans = 'T' or 't' or 'C' or 'c', the size of b is ldb * cols.

• If trans = 'N' or 'n' or 'R' or 'r', the size of b is ldb * rows.

If ordering = 'C' or 'c';

• If trans = 'T' or 't' or 'C' or 'c', the size of b is ldb * rows.

• If trans = 'N' or 'n' or 'R' or 'r', the size of b is ldb * cols.

ldb If ordering = 'R' or 'r', ldb represents the number of elements in array
b between adjacent rows of matrix B.
• If trans = 'T' or 't' or 'C' or 'c', ldb must be at least equal to
rows.
• If trans = 'N' or 'n' or 'R' or 'r', ldb must be at least equal to
cols.
If ordering = 'C' or 'c', ldb represents the number of elements in array
b between adjacent columns of matrix B.
• If trans = 'T' or 't' or 'C' or 'c', ldb must be at least equal to
cols.
• If trans = 'N' or 'n' or 'R' or 'r', ldb must be at least equal to
rows.

Output Parameters

b Output array.
Contains the destination matrix.

Interfaces

mkl_?omatcopy_batch
Computes a group of out of place scaled matrix copy
or transposition operations on general matrices.

398
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void mkl_somatcopy_batch (char layout, const char * trans_array, const size_t *
rows_array, const size_t * cols_array, const float * alpha_array, float ** A_array,
const size_t * lda_array, float ** B_array, const size_t * ldb_array, size_t
group_count, const size_t * group_size);
void mkl_domatcopy_batch (char layout, const char * trans_array, const size_t *
rows_array, const size_t * cols_array, const double * alpha_array, float ** A_array,
const size_t * lda_array, double ** B_array, const size_t * ldb_array, size_t
group_count, const size_t * group_size);
void mkl_comatcopy_batch (char layout, const char * trans_array, const size_t *
rows_array, const size_t * cols_array, const MKL_Complex8 * alpha_array, MKL_Complex8 **
A_array, const size_t * lda_array, MKL_Complex8 ** B_array, const size_t * ldb_array,
size_t group_count, const size_t * group_size);
void mkl_zomatcopy_batch (char layout, const char * trans_array, const size_t *
rows_array, const size_t * cols_array, const MKL_Complex16 * alpha_array, MKL_Complex16
** A_array, const size_t * lda_array, MKL_Complex16 ** B_array, const size_t *
ldb_array, size_t group_count, const size_t * group_size);

Description
The mkl_?omatcopy_batch routine performs a series of out-of-place scaled matrix copies or transpositions.
They are similar to the mkl_?omatcopy routine counterparts, but the mkl_?omatcopy_batch routine
performs matrix operations with groups of matrices. Each group has the same parameters (matrix size,
leading dimension, and scaling parameter), but a single call to mkl_?omatcopy_batch operates on multiple
groups, and each group can have different parameters, unlike the related mkl_?omatcopy_batch_strided
routines.
The operation is defined as

idx = 0
for i = 0..group_count - 1
m in rows_array[i], n in cols_array[i], and alpha in alpha_array[i]
for j = 0..group_size[i] - 1
A and B matrices in a_array[idx] and b_array[idx], respectively
B := alpha*op(A)
idx = idx + 1
end for
end for
Where op(X) is one of op(X)=X, op(X)=X', op(X)=conjg(X'), or op(X)=conjg(X). A is a m-by-n matrix
such that m and n are elements of rows_array and cols_array.
A and B represent matrices stored at addresses pointed to by A_array and B_array. The number of entries in
A_array and B_array is total_batch_count = the sum of all of the group_size entries.

Input Parameters

layout Specifies whether two-dimensional array storage is row-major (R) or

column-major (C).

trans_array Array of size group_count. For the group i, trans = trans_array[i]

specifies the form of op(A), the transposition operation applied to the A
matrix:
If trans = 'N' or 'n', op(A)=A.

399
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If trans = 'T' or 't', op(A)=A'

If trans = 'C' or 'c', op(A)=conjg(A')

If trans = 'R' or 'r', op(A)=conjg(A)

rows_array Array of size group_count. Specifies the number of rows of the matrix A.
The value of each element must be at least zero.

cols_array Array of size group_count. Specifies the number of columns of the matrix
A. The value of each element must be at least zero.

alpha_array Array of size group_count. Specifies the scalar alpha.

A_array Array of size total_batch_count, holding pointers to arrays used to store A

input matrices.

lda_array Array of size group_count. The leading dimension of the input matrix A. It
must be positive and at least m if column major layout is used or at least n
if row major layout is used.

ldb_array Array of size group_count. The leading dimension of the output matrix B.
It must be positive and at least
m if column major layout is used and op(A) = A or conjg(A)

n if row major layout is used and op(A) = A' or conjg(A')

n otherwise

group_count Specifies the number of groups. Must be at least 0

group_size Array of size group_count. The element group_size[i] specifies the

number of matrices in group i. Each element in group_size must be at
least 0.

Output Parameters

B_array Output array of size total_batch_count, holding pointers to arrays used to

store the B output matrices, the contents of which are overwritten by the
operation of the form alpha*op(A).

mkl_?omatcopy_batch_strided
Computes a group of out of place scaled matrix copy
or transposition using general matrices.

Syntax
void mkl_somatcopy_batch_strided (const char layout, const char trans, size_t row,
size_t col, const float alpha, const float * a, size_t lda, size_t stridea, float * b,
size_t ldb, size_t strideb, size_t batch_size);
void mkl_domatcopy_batch_strided (const char layout, const char trans, size_t row,
size_t col, const double alpha, const double * a, size_t lda, size_t stridea, double *
b, size_t ldb, size_t strideb, size_t batch_size);
void mkl_comatcopy_batch_strided (const char layout, const char trans, size_t row,
size_t col, const MKL_complex8 alpha, const MKL_complex8 * a, size_t lda, size_t
stridea, MKL_complex8 * b, size_t ldb, size_t strideb, size_t batch_size);

400
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void mkl_zomatcopy_batch_strided (const char layout, const char trans, size_t row,
size_t col, const MKL_complex16 alpha, const MKL_complex16 * a, size_t lda, size_t
stridea, MKL_complex16 * b, size_t ldb, size_t strideb, size_t batch_size);

Description
The mkl_?omatcopy_batch_strided routine performs a series of out-of-place scaled matrix copy or
transposition. They are similar to the mkl_?omatcopy routine counterparts, but the
mkl_?omatcopy_batch_strided routine performs matrix operations with group of matrices.
All matrices a and b have the same parameters (size, transposition operation…) and are stored at constant
stride from each other respectively given by stridea and strideb. The operation is defined as

for i = 0 … batch_size – 1
A and B are matrices at offset i * stridea in a and I * strideb in b
B = alpha * op(A)
end for

Input Parameters

layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

trans Specifies op(A), the transposition operation applied to the AB matrices.

If trans = 'N' or 'n', op(A)=A.

If trans = 'T' or 't', op(A)=A'

If trans = 'C' or 'c', op(A)=conig(A')

If trans = 'R' or 'r', op(A)=conig(A)

row Specifies the number of rows of the matrices A and B. The value of row
must be at least zero.

col Specifies the number of columns of the matrices A and B. The value of col
must be at least zero.

alpha Specifies the scalar alpha.

a Array holding all the input matrices A. Must be of size at least lda * k +
stridea * (batch_size - 1) * stridea where k is col if column
major is used and row otherwise.

lda The leading dimension of the matrix input A. It must be positive and at
least row if column major layout is used or at least col if row major layout
is used.

stridea Stride between two consecutive A matrices, must be at least 0.

b Array holding all the output matrices B. Must be of size at least batch_size
* strideb. The b array must be independent from the a array.

ldb The leading dimension of the output matrix B. It must be positive and at
least:
• row if column major layout is used and op(A) = A or conjg(A)
• row if row major layout is used and op(A) = A' or conjg(A')

401
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

• col otherwise

strideb Stride between two consecutive B matrices. It must be positive and at

least:
• ldb* col if column major layout is used and op(A) = A or conjg(A)
• ldb* col if row major layout is used and op(A) = A' or conjg(A')
• ldb*row otherwise

batch_size

Output Parameters

b Array holding the batch_size updated matrices B.

mkl_?omatcopy2
Performs two-strided scaling and out-of-place
transposition/copying of matrices.

Syntax
void mkl_somatcopy2 (char ordering, char trans, size_t rows, size_t cols, const float
alpha, const float * A, size_t lda, size_t stridea, float * B, size_t ldb, size_t
strideb);
void mkl_domatcopy2 (char ordering, char trans, size_t rows, size_t cols, const double
alpha, const double * A, size_t lda, size_t stridea, double * B, size_t ldb, size_t
strideb);
void mkl_comatcopy2 (char ordering, char trans, size_t rows, size_t cols, const
MKL_Complex8 alpha, const MKL_Complex8 * A, size_t lda, size_t stridea, MKL_Complex8 *
B, size_t ldb, size_t strideb);
void mkl_zomatcopy2 (char ordering, char trans, size_t rows, size_t cols, const
MKL_Complex16 alpha, const MKL_Complex16 * A, size_t lda, size_t stridea, MKL_Complex16
* B, size_t ldb, size_t strideb);

Include Files
• mkl.h

Description

The mkl_?omatcopy2 routine performs two-strided scaling and out-of-place transposition/copying of

matrices. A transposition operation can be a normal matrix copy, a transposition, a conjugate transposition,
or just a conjugation. The operation is defined as follows:
B := alpha*op(A)
Normally, matrices in the BLAS or LAPACK are specified by a single stride index. For instance, in the column-
major order, A(2,1) is stored in memory one element away from A(1,1), but A(1,2) is a leading dimension
away. The leading dimension in this case is at least the number of rows of the source matrix. If a matrix has
two strides, then both A(2,1) and A(1,2) may be an arbitrary distance from A(1,1).

402
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
NOTE
Different arrays must not overlap.

Input Parameters

ordering Ordering of the matrix storage.

If ordering = 'R' or 'r', the ordering is row-major.

If ordering = 'C' or 'c', the ordering is column-major.

trans Parameter that specifies the operation type.

If trans = 'N' or 'n', op(A)=A and the matrix A is assumed unchanged
on input.
If trans = 'T' or 't', it is assumed that A should be transposed.

If trans = 'C' or 'c', it is assumed that A should be conjugate

transposed.
If trans = 'R' or 'r', it is assumed that A should be only conjugated.

If the data is real, then trans = 'R' is the same as trans = 'N', and
trans = 'C' is the same as trans = 'T'.

rows number of rows for the input matrix A. Must be at least zero.

cols Number of columns for the input matrix A. Must be at least zero.

alpha Scaling factor for the matrix transposition or copy.

a Array holding the input matrix A. Must have size at least lda * n for column
major ordering and at least lda * m for row major ordering.

lda Leading dimension of the matrix A. If matrices are stored using column
major layout, lda is the number of elements in the array between adjacent
columns of the matrix and must be at least stridea * (m-1) + 1. If
using row major layout, lda is the number of elements between adjacent
rows of the matrix and must be at least stridea * (n-1) + 1.

stridea The second stride of the matrix A. For column major layout, stridea is the
number of elements in the array between adjacent rows of the matrix. For
row major layout stridea is the number of elements between adjacent
columns of the matrix. In both cases stridea must be at least 1.

b Array holding the output matrix B.

trans = trans =
transpose::nontrans transpose::trans, or
trans =
transpose::conjtrans

Column major B is m x n matrix. Size of B is n x m matrix. Size

array b must be at least of array b must be at
ldb * n. least ldb * m.
Row major B is m x n matrix. Size of B is n x m matrix. Size
array b must be at least of array b must be at
ldb * m. least ldb * n.

403
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ldb The leading dimension of the matrix B. Must be positive.

trans = trans =
transpose::nontrans transpose::trans, or
trans =
transpose::conjtrans

Column major ldb must be at least ldb must be at least

strideb * (m-1) + strideb * (n-1) +
1. 1.
Row major ldb must be at least ldb must be at least
strideb * (n-1) + strideb * (m-1) +
1. 1.

strideb The second stride of the matrix B. For column major layout, strideb is the
number of elements in the array between adjacent rows of the matrix. For
row major layout, strideb is the number of elements between adjacent
columns of the matrix. In both cases strideb must be at least 1.

Output Parameters

b Array, size at least m.

Contains the destination matrix.

Interfaces

mkl_?omatadd
Scales and sums two matrices in addition to
performing out-of-place transposition operations.

Syntax
void mkl_somatadd (char ordering, char transa, char transb, size_t m, size_t n, const
float alpha, const float * A, size_t lda, const float beta, const float * B, size_t ldb,
float * C, size_t ldc);
void mkl_domatadd (char ordering, char transa, char transb, size_t m, size_t n, const
double alpha, const double * A, size_t lda, const double beta, const double * B, size_t
ldb, double * C, size_t ldc);
void mkl_comatadd (char ordering, char transa, char transb, size_t m, size_t n, const
MKL_Complex8 alpha, const MKL_Complex8 * A, size_t lda, const MKL_Complex8 beta, const
MKL_Complex8 * B, size_t ldb, MKL_Complex8 * C, size_t ldc);
void mkl_zomatadd (char ordering, char transa, char transb, size_t m, size_t n, const
MKL_Complex16 alpha, const MKL_Complex16 * A, size_t lda, const MKL_Complex16 beta,
const MKL_Complex16 * B, size_t ldb, MKL_Complex16 * C, size_t ldc);

Include Files
• mkl.h

Description

404
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The mkl_?omatadd routine scales and adds two matrices in addition to performing out-of-place transposition
operations. A transposition operation can be no operation, a transposition, a conjugate transposition, or a
conjugation (without transposition). The following out-of-place memory movement is done:
C := alpha*op(A) + beta*op(B)
where the op(A) and op(B) operations are transpose, conjugate-transpose, conjugate (no transpose), or no
transpose, depending on the values of transa and transb. If no transposition of the source matrices is
required, m is the number of rows and n is the number of columns in the source matrices A and B. In this
case, the output matrix C is m-by-n.
In general, a, b, and c must not overlap in memory, with the exception of the following in-place operations:

• a and c can point to the same memory if transa is non-transpose and lda = ldc.
• b and c can point to the same memory if transb is non-transpose and ldb = ldc.

Input Parameters

ordering Ordering of the matrix storage.

If ordering = 'R' or 'r', the ordering is row-major.

If ordering = 'C' or 'c', the ordering is column-major.

transa Parameter that specifies the operation type on matrix A.

If transa = 'N' or 'n', op(A)=A and the matrix A is assumed unchanged
on input.
If transa = 'T' or 't', it is assumed that A should be transposed.

If transa = 'C' or 'c', it is assumed that A should be conjugate

transposed.
If transa = 'R' or 'r', it is assumed that A should be conjugated (and not
transposed).
If the data is real, then transa = 'R' is the same as transa = 'N', and
transa = 'C' is the same as transa = 'T'.

transb Parameter that specifies the operation type on matrix B.

If transb = 'N' or 'n', op(B)=B and the matrix B is assumed unchanged
on input.
If transb = 'T' or 't', it is assumed that B should be transposed.

If transb = 'C' or 'c', it is assumed that B should be conjugate

transposed.
If transb = 'R' or 'r', it is assumed that B should be conjugated (and not
transposed).
If the data is real, then transb = 'R' is the same as transb = 'N', and
transb = 'C' is the same as transb = 'T'.

m The number of matrix rows in op(A), op(B), and C.

n The number of matrix columns in op(A), op(B), and C.

alpha This parameter scales the input matrix by alpha.

405
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

a Pointer to array for input matrix A. If alpha is zero, a is never accessed and
may be a null pointer.

lda Distance between the first elements in adjacent columns (in the case of the
column-major order) or rows (in the case of the row-major order) in the
source matrix A; measured in the number of elements.
For ordering = 'C' or 'c': when transa = 'N', 'n', 'R', or 'r', lda
must be at least max(1,m); otherwise lda must be max(1,n).

For ordering = 'R' or 'r': when transa = 'N', 'n', 'R', or 'r', lda
must be at least max(1,n); otherwise lda must be max(1,m).

beta This parameter scales the input matrix by beta.

b Pointer to array for input matrix B. If beta is zero, b is never accessed and
may be a null pointer.

ldb Distance between the first elements in adjacent columns (in the case of the
column-major order) or rows (in the case of the row-major order) in the
source matrix B; measured in the number of elements.
For ordering = 'C' or 'c': when transa = 'N', 'n', 'R', or 'r', ldb
must be at least max(1,m); otherwise ldb must be max(1,n).

For ordering = 'R' or 'r': when transa = 'N', 'n', 'R', or 'r', ldb
must be at least max(1,n); otherwise ldb must be max(1,m).

ldc Distance between the first elements in adjacent columns (in the case of the
column-major order) or rows (in the case of the row-major order) in the
destination matrix C; measured in the number of elements.
If ordering = 'C' or 'c', then ldc must be at least max(1, m),
otherwise ldc must be at least max(1, n).

Output Parameters

c Array.

Interfaces

cblas_?gemm_pack_get_size, cblas_gemm_*_pack_get_size
Returns the number of bytes required to store the
packed matrix.

Syntax
size_t cblas_hgemm_pack_get_size (const CBLAS_IDENTIFIER identifier, const MKL_INT m,
const MKL_INT n, const MKL_INT k)
size_t cblas_sgemm_pack_get_size (const CBLAS_IDENTIFIER identifier, const MKL_INT m,
const MKL_INT n, const MKL_INT k)
size_t cblas_dgemm_pack_get_size (const CBLAS_IDENTIFIER identifier, const MKL_INT m,
const MKL_INT n, const MKL_INT k)
size_t cblas_gemm_s8u8s32_pack_get_size (const CBLAS_IDENTIFIER identifier, const
MKL_INT m, const MKL_INT n, const MKL_INT k)

406
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
size_t cblas_gemm_s16s16s32_pack_get_size (const CBLAS_IDENTIFIER identifier, const
MKL_INT m, const MKL_INT n, const MKL_INT k)
size_t cblas_gemm_bf16bf16f32_pack_get_size (const CBLAS_IDENTIFIER identifier, const
MKL_INT m, const MKL_INT n, const MKL_INT k)
size_t cblas_gemm_f16f16f32_pack_get_size (const CBLAS_IDENTIFIER identifier, const
MKL_INT m, const MKL_INT n, const MKL_INT k)

Include Files
• mkl.h

Description
The cblas_?gemm_pack_get_size and cblas_gemm_*_pack_get_size routines belong to a set of related
routines that enable the use of an internal packed storage. Call the cblas_?gemm_pack_get_size and
cblas_gemm_*_pack_get_size routines first to query the size of storage required for a packed matrix
structure to be used in subsequent calls. Ultimately, the packed matrix structure is used to compute
C := alpha*op(A)*op(B) + beta*C for bfloat16, half, single and double precision or
C := alpha*(op(A)+ A_offset)*(op(B)+ B_offset) + beta*C + C_offset for integer type.
where:
op(X) is one of the operations op(X) = X or op(X) = XT
alpha and beta are scalars,
A , A_offset,B, B_offset,C, and C_offset are matrices
op(A) is an m-by-k matrix,
op(B) is a k-by-n matrix,
C is an m-by-n matrix.
A_offset is an m-by-k matrix.
B_offset is an k-by-n matrix.
C_offset is an m-by-n matrix.

Input Parameters
Parameter Type Description

identifier CBLAS_IDENTIFIER
Specifies which matrix is to be packed:
If identifier = CblasAMatrix, the size
returned is the size required to store matrix A
in an internal format.
If identifier = CblasBMatrix, the size
returned is the size required to store matrix B
in an internal format.
m MKL_INT
Specifies the number of rows of matrix op(A)
and of the matrix C. The value of m must be
at least zero.
n MKL_INT
Specifies the number of columns of matrix
op(B) and the number of columns of matrix
C. The value of n must be at least zero.

407
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Parameter Type Description

k MKL_INT
Specifies the number of columns of matrix
op(A) and the number of rows of matrix
op(B). The value of k must be at least zero.

Return Values
Parameter Type Description

size size_t
Returns the size (in bytes) required to store
the matrix when packed into the internal
format of Intel® oneAPI Math Kernel Library
(oneMKL).

Example
See the following examples in the MKL installation directory to understand the use of these routines:
cblas_hgemm_pack_get_size: examples\cblas\source\cblas_hgemm_computex.c
cblas_sgemm_pack_get_size: examples\cblas\source\cblas_sgemm_computex.c
cblas_dgemm_pack_get_size: examples\cblas\source\cblas_dgemm_computex.c
cblas_gemm_s8u8s32_pack_get_size: examples\cblas\source\cblas_gemm_s8u8s32_computex.c
cblas_gemm_s16u16s32_pack_get_size: examples\cblas\source\cblas_gemm_s16s16s32_computex.c
cblas_gemm_bf16bf16f32_pack_get_size: examples\cblas\source\cblas_gemm_bf16bf16f32_computex.c
cblas_gemm_f16f16f32_pack_get_size: examples\cblas\source\cblas_gemm_f16f16f32_computex.c

See Also
cblas_?gemm_pack and cblas_gemm_*_pack
to pack the matrix into a buffer allocated previously.
cblas_?gemm_compute and cblas_gemm_*_compute
to compute a matrix-matrix product with general matrices (where one or both input matrices are stored in
a packed data structure) and add the result to a scalar-matrix product.

cblas_?gemm_pack
Performs scaling and packing of the matrix into the
previously allocated buffer.

Syntax
void cblas_hgemm_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier,
const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const
MKL_F16 alpha, const MKL_F16 *src, const MKL_INT ld, MKL_F16 *dest);
void cblas_sgemm_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier,
const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const
float alpha, const float *src, const MKL_INT ld, float *dest);
void cblas_dgemm_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier,
const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const
double alpha, const double *src, const MKL_INT ld, double *dest);

Include Files
• mkl.h

408
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
The cblas_?gemm_pack routine is one of a set of related routines that enable use of an internal packed
storage. Call cblas_?gemm_pack after you allocate a buffer whose size is given by
cblas_?gemm_pack_getsize. The cblas_?gemm_pack routine scales the identified matrix by alpha and
packs it into the buffer allocated previously.

NOTE
Do not copy the packed matrix to a different address because the internal implementation
depends on the alignment of internally-stored metadata.

The cblas_?gemm_pack routine performs this operation:

dest := alphaop(src) as part of the computation C := alphaop(A)op(B) + betaC

where:
op(X) is one of the operations op(X) = X, op(X) = XT, or op(X) = XH,
alpha and beta are scalars,
src is a matrix,
A , B, and C are matrices
op(src) is an m-by-k matrix if identifier = CblasAMatrix,
op(src) is a k-by-n matrix if identifier = CblasBMatrix,
dest is an internal packed storage buffer.

NOTE
You must use the same value of the Layout parameter for the entire sequence of related
cblas_?gemm_pack and cblas_?gemm_compute calls.
For best performance, use the same number of threads for packing and for computing.
If packing for both A and B matrices, you must use the same number of threads for packing A as for
packing B.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

identifier Specifies which matrix is to be packed:

If identifier = CblasAMatrix, the routine allocates storage to pack
matrix A.
If identifier = CblasBMatrix, the routine allocates storage to pack
matrix B.

trans Specifies the form of op(src) used in the packing:

If trans = CblasNoTrans op(src) = src.

If trans = CblasTrans op(src) = srcT.

If trans = CblasConjTrans op(src) = srcH.

409
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

m Specifies the number of rows of the matrix op(A) and of the matrix C. The
value of m must be at least zero.

n Specifies the number of columns of the matrix op(B) and the number of
columns of the matrix C. The value of n must be at least zero.

k Specifies the number of columns of the matrix op(A) and the number of
rows of the matrix op(B). The value of k must be at least zero.

alpha Specifies the scalar alpha.

src Array:

identifier = identifier = CblasBMatrix

CblasAMatrix

trans = trans = trans = trans =

CblasNoT CblasTra CblasNoTrans CblasTrans
rans ns or or trans =
trans = CblasConjTra
CblasCon ns
jTrans

Layout = Size Size Size ldn. Size ldk.

Layout = Size Size Size ldk. Size ldn.

ld Specifies the leading dimension of src as declared in the calling

(sub)program.

410
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
identifier = identifier = CblasBMatrix
CblasAMatrix

trans = trans = trans = trans =

CblasNoT CblasTra CblasNoTrans CblasTrans
rans ns or or trans =
trans = CblasConjTra
CblasCon ns
jTrans

Layout = ld must ld must ld must be at ld must be at

CblasCol be at be at least max(1, least max(1,
Major least least k). n).
max(1, max(1,
m). k).

Layout = ld must ld must ld must be at ld must be at

CblasRow be at be at least max(1, least max(1,
Major least least n). k).
max(1, max(1,
k). m).

dest Scaled and packed internal storage buffer.

Output Parameters

dest Overwritten by the matrix alpha*op(src).

See Also
cblas_?gemm_pack_get_size Returns the number of bytes required to store the packed matrix.
cblas_?gemm_compute Computes a matrix-matrix product with general matrices where one or both
input matrices are stored in a packed data structure and adds the result to a scalar-matrix
product.
cblas_?gemm
for a detailed description of general matrix multiplication.

cblas_gemm_*_pack
Pack the matrix into the buffer allocated previously.

Syntax
void cblas_gemm_s8u8s32_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER
identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const
MKL_INT k, const void *src, const MKL_INT ld, void *dest);
void cblas_gemm_s16s16s32_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER
identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const
MKL_INT k, const MKL_INT16 *src, const MKL_INT ld, MKL_INT16 *dest);
void cblas_gemm_bf16bf16f32_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER
identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const
MKL_INT k, const MKL_BF16 *src, const MKL_INT ld, MKL_BF16 *dest);

411
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

void cblas_gemm_f16f16f32_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER

identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const
MKL_INT k, const MKL_F16 *src, const MKL_INT ld, MKL_F16 *dest);

Include Files
• mkl.h

Description
The cblas_gemm_*_pack routine is one of a set of related routines that enable the use of an internal packed
storage. Call cblas_gemm_*_pack after you allocate a buffer whose size is given by
cblas_gemm_*_pack_get_size. The cblas_gemm_*_pack routine packs the identified matrix into the
buffer allocated previously.
The cblas_gemm_*_pack routine performs this operation:

dest := op(src) as part of the computation C := alpha(op(A) + A_offset)(op(B) + B_offset) +

beta*C + C_offset for integer types.
C := alpha*op(A) * op(B) + beta*C for bfloat16 type.
where:
op(X) is one of the operations op(X) = X or op(X) = XT
alpha and beta are scalars,
src is a matrix,
A , A_offset,B, B_offset,c,and C_offset are matrices
op(src) is an m-by-k matrix if identifier = CblasAMatrix,
op(src) is a k-by-n matrix if identifier =CblasBMatrix ,
dest is the buffer previously allocated to store the matrix packed into an internal format
A_offset is an m-by-k matrix.
B_offset is an k-by-n matrix.
C_offset is an m-by-n matrix.

NOTE
You must use the same value of the Layout parameter for the entire sequence of related
cblas_gemm_*_pack and cblas_gemm_*_compute calls.
For best performance, use the same number of threads for packing and for computing.
If packing for both A and B matrices, you must use the same number of threads for packing A as for
packing B.

Input Parameters

Layout CBLAS_LAYOUT
Specifies whether two-dimensional array storage is row-major
(CblasRowMajor) or column-major(CblasColMajor).

identifier CBLAS_IDENTIFIER
Specifies which matrix is to be packed:
If identifier = CblasAMatrix, the A matrix is packed.

If identifier = CblasBMatrix, the B matrix is packed.

412
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
trans CBLAS_TRANSPOSE
Specifies the form of op(src) used in the packing:

If trans = CblasNoTrans op(src) = src.

If trans = CblasTrans op(src) = srcT.

m MKL_INT
Specifies the number of rows of matrix op(A) and of the matrix C. The value
of m must be at least zero.

n MKL_INT
Specifies the number of columns of matrix op(B) and the number of
columns of matrix C. The value of n must be at least zero.

k MKL_INT
Specifies the number of columns of matrix op(A) and the number of rows of
matrix op(B). The value of k must be at least zero.

src MKL_BF16* for cblas_gemm_bf16bf16f32_pack, MKL_F16* for

cblas_gemm_f16f16f32_pack, void* for cblas_gemm_s8u8s32_pack
and MKL_INT16* for cblas_gemm_s16s16s32_pack

identifier = identifier = CblasBMatrix

CblasAMatrix

trans = trans = trans = trans =

CblasNoT CblasTra CblasNoTrans CblasTrans
rans ns

Layout = Size Size Size ldn. Size ldk.

CblasCol ld*k. ld*m.
Before entry, Before entry,
Major Before Before the leading k- the leading n-
entry, the entry, the by-n part of by-k part of
leading m- leading k- the array src the array src
by-k part by-m part must contain must contain
of the of the the matrix B. the matrix B.
array src array src
For For
must must
cblas_gemm_ cblas_gemm_
contain contain
s8u8s32_pac s8u8s32_pac
the matrix the matrix
k the element k the element
A. A.
in src array in src array
For For must be an 8- must be an 8-
cblas_ge cblas_ge bit unsigned bit unsigned
mm_s8u8s mm_s8u8s integer. integer.
32_pack 32_pack
the the
element element
in src in src
array array
must be must be
an 8-bit an 8-bit
signed signed
integer. integer.

413
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

identifier = identifier = CblasBMatrix

CblasAMatrix

trans = trans = trans = trans =

CblasNoT CblasTra CblasNoTrans CblasTrans
rans ns

Layout = Size Size Size ldk. Size ldn.

CblasRow ld*m. ld*k.
Before entry, Before entry,
Major Before Before the leading n- the leading k-
entry, the entry, the by-k part of by-n part of
leading k- leading m- the array src the array src
by-m part by-k part must contain must contain
of the of the the matrix B. the matrix B.
array src array src
For For
must must
cblas_gemm_ cblas_gemm_
contain contain
s8u8s32_pac s8u8s32_pac
the matrix the matrix
k the element k the element
A. A.
in src array in src array
For For must be an 8- must be an 8-
cblas_ge cblas_ge bit signed bit signed
mm_s8u8s mm_s8u8s integer. integer.
32_pack 32_pack
the the
element element
in src in src
array array
must be must be
an 8-bit an 8-bit
unsigned unsigned
integer. integer.

ld MKL_INTSpecifies the leading dimension of src as declared in the calling

(sub)program.

identifier = identifier = CblasBMatrix

CblasAMatrix

trans = trans = trans = trans =

CblasNoT CblasTra CblasNoTrans CblasTrans
rans ns

Layout = ld must ld must ld must be at ld must be at

CblasCol be at be at least max(1, least max(1,
Major least least k). n).
max(1, max(1,
m). k).

414
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
identifier = identifier = CblasBMatrix
CblasAMatrix

trans = trans = trans = trans =

CblasNoT CblasTra CblasNoTrans CblasTrans
rans ns

Layout = ld must ld must ld must be at ld must be at

CblasRow be at be at least max(1, least max(1,
Major least least n). k).
max(1, max(1,
k). m).

dest MKL_BF16* for cblas_gemm_bf16bf16f32_pack, MKL_F16* for

cblas_gemm_f16f16f32_pack, void* for cblas_gemm_s8u8s32_pack or
MKL_INT16* for cblas_gemm_s16s16s32_pack
Buffer for the packed matrix.

Output Parameters

dest MKL_BF16* for cblas_gemm_bf16bf16f32_pack, MKL_F16* for

cblas_gemm_f16f16f32_pack, void* for
cblas_gemm_s8u8s32_pack or MKL_INT16* for
cblas_gemm_s16s16s32_pack
Overwritten by the matrix op(src)stored in a format internal to Intel®
oneAPI Math Kernel Library (oneMKL).

Example
See the following examples in the MKL installation directory to understand the use of these routines:
cblas_gemm_s8u8s32_pack: examples\cblas\source\cblas_gemm_s8u8s32_computex.c
cblas_gemm_s16s16s32_pack: examples\cblas\source\cblas_gemm_s16s16s32_computex.c
cblas_gemm_bf16bf16f32_pack: examples\cblas\source\cblas_gemm_bf16bf16f32_computex.c
cblas_gemm_f16f16f32_pack: examples\cblas\source\cblas_gemm_f16f16f32_computex.c

Application Notes
When using cblas_gemm_s8u8s32_pack with row-major layout , the data types of A and B must be
swapped. That is, you must provide an 8-bit unsigned integer array for matrix A and an 8-bit signed integer
array for matrix B .

See Also
cblas_gemm_*_pack_get_size
to return the number of bytes needed to store the packed matrix.
cblas_gemm_*_compute
to compute a matrix-matrix product with general integer matrices (where one or both input matrices are
stored in a packed data structure) and add the result to a scalar-matrix product.

415
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

cblas_?gemm_compute
Computes a matrix-matrix product with general
matrices where one or both input matrices are stored
in a packed data structure and adds the result to a
scalar-matrix product.

Syntax
void cblas_hgemm_compute (const CBLAS_LAYOUT Layout, const MKL_INT transa, const
MKL_INT transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const MKL_F16 *a,
const MKL_INT lda, const MKL_F16 *b, const MKL_INT ldb, const MKL_F16 beta, MKL_F16 *c,
const MKL_INT ldc);
void cblas_sgemm_compute (const CBLAS_LAYOUT Layout, const MKL_INT transa, const
MKL_INT transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const float *a,
const MKL_INT lda, const float *b, const MKL_INT ldb, const float beta, float *c, const
MKL_INT ldc);
void cblas_dgemm_compute (const CBLAS_LAYOUT Layout, const MKL_INT transa, const
MKL_INT transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const double *a,
const MKL_INT lda, const double *b, const MKL_INT ldb, const double beta, double *c,
const MKL_INT ldc);

Include Files
• mkl.h

Description
The cblas_?gemm_compute routine is one of a set of related routines that enable use of an internal packed
storage. After calling cblas_?gemm_pack call cblas_?gemm_compute to compute

C := op(A)*op(B) + beta*C,
where:
op(X) is one of the operations op(X) = X, op(X) = XT, or op(X) = XH,
beta is a scalar,
A , B, and C are matrices:
op(A) is an m-by-k matrix,
op(B) is a k-by-n matrix,
C is an m-by-n matrix.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

416
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
transa Specifies the form of op(A) used in the matrix multiplication, one of the
CBLAS_TRANSPOSE or CBLAS_STORAGE enumerated types:
If transa = CblasNoTrans op(A) = A.

If transa = CblasTrans op(A) = AT.

If transa = CblasConjTrans op(A) = AH.

If transa = CblasPacked the matrix in array a is packed and lda is

ignored.

transb Specifies the form of op(B) used in the matrix multiplication, one of the
CBLAS_TRANSPOSE or CBLAS_STORAGE enumerated types:
If transb = CblasNoTrans op(B) = B.

If transb = CblasTrans op(B) = BT.

If transb = CblasConjTrans op(B) = BH.

If transb = CblasPacked the matrix in array b is packed and ldb is

ignored.

m Specifies the number of rows of the matrix op(A) and of the matrix C. The
value of m must be at least zero.

n Specifies the number of columns of the matrix op(B) and the number of
columns of the matrix C. The value of n must be at least zero.

k Specifies the number of columns of the matrix op(A) and the number of
rows of the matrix op(B). The value of k must be at least zero.

a Array:

transa = transa = transa =

CblasNoTrans CblasTrans or CblasPacked
transa =
CblasConjTrans

Layout = Size ldak. Size ldam. Stored in

CblasColMajor internal
Before entry, the Before entry, the
packed
leading m-by-k leading k-by-m part
format.
part of the array of the array a must
a must contain contain the matrix
the matrix A. A.

Layout = Size ldam. Size ldak. Stored in

CblasRowMajor internal
Before entry, the Before entry, the
packed
leading k-by-m leading m-by-k part
format.
part of the array of the array a must
a must contain contain the matrix
the matrix A. A.

lda Specifies the leading dimension of a as declared in the calling

(sub)program.

417
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

transa = transa = transa =

CblasNoTrans CblasTrans or CblasPacked
transa =
CblasConjTrans

Layout = lda must be at lda must be at lda is ignored.

CblasColMajor least max(1, least max(1, k).
m).

Layout = lda must be at lda must be at lda is ignored.

CblasRowMajor least max(1, least max(1, m).
k).

b Array:

transb = transb = transb =

CblasNoTrans CblasTrans or CblasPacked
transb =
CblasConjTrans

Layout = Size ldbn. Size ldbk. Stored in

CblasColMajor internal
Before entry, the Before entry, the
packed
leading k-by-n leading n-by-k part
format.
part of the array of the array b must
b must contain contain the matrix
the matrix B. B.

Layout = Size ldbk. Size ldbn. Stored in

CblasRowMajor internal
Before entry, the Before entry, the
packed
leading n-by-k leading k-by-n part
format.
part of the array of the array b must
b must contain contain the matrix
the matrix B. B.

ldb Specifies the leading dimension of b as declared in the calling

(sub)program.

transb = transb = transb =

CblasNoTrans CblasTransor CblasPacked
transb =
CblasConjTrans

Layout = ldb must be at ldb must be at ldb is ignored.

CblasColMajor least max(1, least max(1, n).
k).

Layout = ldb must be at ldb must be at ldb is ignored.

CblasRowMajor least max(1, least max(1, k).
n).

418
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
beta Specifies the scalar beta. When beta is equal to zero, then c need not be
set on input.

c Array:

Layout = Size ldc*n.

CblasColMajor
Before entry, the leading m-by-n part of the array c
must contain the matrix C, except when beta is
equal to zero, in which case c need not be set on
entry.

Layout = Size ldc*m.

CblasRowMajor
Before entry, the leading n-by-m part of the array c
must contain the matrix C, except when beta is
equal to zero, in which case c need not be set on
entry.

ldc Specifies the leading dimension of c as declared in the calling

(sub)program.

Layout = CblasColMajor ldc must be at least max(1, m).

Layout = CblasRowMajor ldc must be at least max(1, n).

Output Parameters

c Overwritten by the m-by-n matrix op(A)op(B) + betaC.

See Also
cblas_?gemm_pack_get_size Returns the number of bytes required to store the packed matrix.
cblas_?gemm_pack Performs scaling and packing of the matrix into the previously allocated buffer.
cblas_?gemm
for a detailed description of general matrix multiplication.

cblas_gemm_*_compute
Computes a matrix-matrix product with general
integer matrices (where one or both input matrices
are stored in a packed data structure) and adds the
result to a scalar-matrix product.

Syntax
void cblas_gemm_s8u8s32_compute(const CBLAS_LAYOUT Layout, const MKL_INT transa, const
MKL_INT transb, const CBLAS_OFFSET offsetc, const MKL_INT m, const MKL_INT n, const
MKL_INT k, const float alpha, const void *a, const MKL_INT lda, const MKL_INT8 oa,
const void *b, const MKL_INT ldb, const MKL_INT8 ob, const float beta, MKL_INT32 *c,
const MKL_INT ldc, const MKL_INT32 *oc);
void cblas_gemm_s16s16s32_compute(const CBLAS_LAYOUT Layout, const MKL_INT transa,
const MKL_INT transb, const CBLAS_OFFSET offsetc, const MKL_INT m, const MKL_INT n,
const MKL_INT k, const float alpha, const MKL_INT16 *a, const MKL_INT lda, const
MKL_INT16 oa, const MKL_INT16 *b, const MKL_INT ldb, const MKL_INT16 ob, const float
beta, MKL_INT32 *c, const MKL_INT ldc, const MKL_INT32 *oc);

419
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description
The cblas_gemm_*_compute routine is one of a set of related routines that enable use of an internal packed
storage. After calling cblas_gemm_*_pack call cblas_gemm_*_compute to compute

C := alpha(op(A) + A_offset)(op(B) + B_offset) + beta*C + C_offset,

where:
op(X) is either op(X) = X or op(X) = XT
alpha and betaare scalars
A , B, and C are matrices:
op(A) is an m-by-k matrix,
op(B) is a k-by-n matrix,
C is an m-by-n matrix.
A_offset is an m-by-k matrix with every element equal to the value oa.
B_offset is an k-by-n matrix with every element equal to the value ob.
C_offset is an m-by-n matrix defined by the oc array as described in the description of the offsetc
parameter.

NOTE
You must use the same value of the Layout parameter for the entire sequence of related
cblas_?gemm_pack and cblas_?gemm_compute calls.
For best performance, use the same number of threads for packing and for computing.
If you are packing for both A and B matrices, you must use the same number of threads for packing A
as for packing B.

Input Parameters

Layout CBLAS_LAYOUT
Specifies whether two-dimensional array storage is row-major (CblasRowMajor) or column-
major(CblasColMajor).

transa MKL_INTSpecifies the form of op(A) used in the packing:

If transa = CblasNoTrans op(A) = A.

If transa = CblasTrans op(A) = AT.

If transa = CblasPacked the matrix in array ais packed into a format internal to Intel® oneAPI
Math Kernel Library (oneMKL) andlda is ignored.

transb MKL_INT Specifies the form of op(B) used in the packing:

If transb = CblasNoTrans op(B) = B.

If transb = CblasTrans op(B) = BT.

If transb = CblasPacked the matrix in array bis packed into a format internal to Intel® oneAPI
Math Kernel Library (oneMKL) andldb is ignored.

offsetc CBLAS_OFFSET Specifies the form of C_offset used in the matrix multiplication.

420
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If offsetc=CblasFixOffset :oc has a single element and every element of C_offset is equal to
this element.
If offsetc=CblasColOffset :oc has a size of m and every element of C_offset is equal to oc.

If offsetc=CblasRowOffset :oc has a size of n and every element of C_offset is equal to oc.

m MKL_INTSpecifies the number of rows of the matrix op(A) and of the matrix C. The value of m
must be at least zero.

n MKL_INTSpecifies the number of columns of the matrix op(B) and the number of columns of the
matrix C. The value of n must be at least zero.

k MKL_INTSpecifies the number of columns of the matrix op(A) and the number of rows of the
matrix op(B). The value of k must be at least zero.

alpha floatSpecifies the scalar alpha.

a void* for gemm_s8u8s32_compute

MKL_INT16* for gemm_s16s16s32_compute

Layout = CblasColMajor

transa = CblasNoTrans Array, size lda*k.

Before entry, the leading m-by-k part of the

array a must contain the matrix A.

For cblas_gemm_s8u8s32_compute, the

element in the a array must be an 8-bit
signed integer.

transa = CblasTrans Array, size lda*m.

Before entry, the leading k-by-m part of the

array a must contain the matrix A.

For cblas_gemm_s8u8s32_compute, the

element in the a array must be an 8-bit
signed integer.

transa = CblasPacked Array of size returned by

cblas_gemm_*_pack_get_size and
initialized using cblas_gemm_*_pack

Layout = CblasRowMajor

transa = CblasNoTrans Array, size lda*m.

Before entry, the leading k-by-m part of the

array a must contain the matrix A.

For cblas_gemm_s8u8s32_compute, the

element in the a array must be an 8-bit
unsigned integer.

transa = CblasTrans Array, size lda*k.

421
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Layout = CblasRowMajor

Before entry, the leading m-by-k part of the

array a must contain the matrix A.

For cblas_gemm_s8u8s32_compute, the

element in the a array must be an 8-bit
unsigned integer.

transa = CblasPacked Array size returned by

cblas_gemm_*_pack_get_size and
initialized using cblas_gemm_*_pack

lda MKL_INTSpecifies the leading dimension of a as declared in the calling (sub)program.

transa = CblasNoTrans transa = CblasTrans

Layout = lda must be at least max(1, m). lda must be at least max(1, k).
CblasColMajor

Layout = lda must be at least max(1, k). lda must be at least max(1, m).
CblasRowMajor

oa MKL_INT8 for cblas_gemm_s8u8s32_compute

MKL_INT16 for cblas_gemm_s16s16s32_compute

Specifies the scalar offset value for the matrix A.

b void* for gemm_s8u8s32_compute

MKL_INT16* for gemm_s16s16s32_compute

Layout = CblasColMajor

transa = CblasNoTrans Array, size ldb*n.

Before entry, the leading k-by-n part of the

array b must contain the matrix B.

For cblas_gemm_s8u8s32_compute, the

element in the b array must be an 8-bit
unsigned integer.

transa = CblasTrans Array, size ldb*k.

Before entry, the leading n-by-k part of the

array b must contain the matrix B.

For cblas_gemm_s8u8s32_compute, the

element in the b array must be an 8-bit
unsigned integer.

transa = CblasPacked Array of size returned by

cblas_gemm_*_pack_get_size and
initialized using cblas_gemm_*_pack

422
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Layout = CblasRowMajor

transa = CblasNoTrans Array, sizeldb*k.

Before entry, the leading n-by-k part of the

array b must contain the matrix B.

For cblas_gemm_s8u8s32_compute, the

element in the b array must be an 8-bit
signed integer.

transa = CblasTrans Array, size ldb*n.

Before entry, the leading k-by-n part of the

array b must contain the matrix B.

For cblas_gemm_s8u8s32_compute, the

element in the b array must be an 8-bit
signed integer.

transa = CblasPacked Array of size returned by

cblas_gemm_*_pack_get_size and
initialized using cblas_gemm_*_pack

ldb MKL_INT Specifies the leading dimension of b as declared in the calling (sub)program.

transb = CblasNoTrans transb = CblasTrans

Layout = ldb must be at least max(1, k). ldb must be at least max(1, n).
CblasColMajor

Layout = ldb must be at least max(1, n). ldb must be at least max(1, k).
CblasRowMajor

ob MKL_INT8 for cblas_gemm_s8u8s32_compute

MKL_INT16 for cblas_gemm_s16s16s32_compute

Specifies the scalar offset value for the matrix B.

beta float
Specifies the scalar beta.

c MKL_INT32*
Array:

Layout = Array, size ldc*n.

CblasColMajor
Before entry, the leading m-by-n part of the array c must contain the
matrix C, except when beta is equal to zero, in which case c need not
be set on entry.

Layout = Array, size ldc*m.

CblasRowMajor

423
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Before entry, the leading n-by-m part of the array c must contain the
matrix C, except when beta is equal to zero, in which case c need not
be set on entry.

ldc MKL_INT Specifies the leading dimension of c as declared in the calling (sub)program.

Layout = CblasColMajor ldc must be at least max(1, m)

Layout = CblasRowMajor ldc must be at least max(1, n)

oc MKL_INT32*
Array, size len. Specifies the scalar offset value for the matrix C.

If offsetc = CblasFixOffset , len must be at least 1.

If offsetc = CblasColOffset , len must be at least max(1, m).

If offsetc = CblasRowOffset , len must be at least max(1, n).

Output Parameters

c MKL_INT32*
Overwritten by the matrix alpha*(op(A) + A_offset)*(op(B) +
B_offset) + beta*C + C_offset.

Example
See the following examples in the MKL installation directory to understand the use of these routines:
cblas_gemm_s8u8s32_compute: examples\cblas\source\cblas_gemm_s8u8s32_computex.c
cblas_gemm_s16s16s32_compute: examples\cblas\source\cblas_gemm_s16s16s32_computex.c

Application Notes
You can expand the matrix-matrix product in this manner:
(op(A) + A_offset)*(op(B) + B_offset) = op(A)*op(B) + op(A)*B_offset + A_offset*op(B) +
A_offset*B_offset
After computing these four multiplication terms separately, they are summed from left to right. The results
from the matrix-matrix product and the C matrix are scaled with alpha and beta floating-point values
respectively using double-precision arithmetic. Before storing the results to the output c array, the floating-
point values are rounded to the nearest integers.
In the event of overflow or underflow, the results depend on the architecture. The results are either
unsaturated (wrapped) or saturated to maximum or minimum representable integer values for the data type
of the output matrix.
When using cblas_gemm_s8u8s32_compute with row-major layout , the data types of A and B must be
swapped. That is, you must provide an 8-bit unsigned integer array for matrix A and an 8-bit signed integer
array for matrix B .

See Also
cblas_gemm_*_pack_get_size
to return the number of bytes needed to store the packed matrix.
cblas_gemm_*_pack

424
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
to pack the matrix into the buffer allocated previously.

cblas_gemm_bf16bf16f32_compute
Computes a matrix-matrix product with general
bfloat16 matrices (where one or both input matrices
are stored in a packed data structure) and adds the
result to a scalar-matrix product.

Syntax
C:
void cblas_gemm_bf16bf16f32_compute (const CBLAS_LAYOUT Layout, const MKL_INT transa,
const MKL_INT transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const float
alpha, const MKL_BF16 *a, const MKL_INT lda, const MKL_BF16 *b, const MKL_INT ldb,
const float beta, float *c, const MKL_INT ldc);

Include Files
• mkl.h

Description
The cblas_gemm_bf16bf16f32_compute routine is one of a set of related routines that enable use of an
internal packed storage. After calling cblas_gemm_bf16bf16f32_pack call
cblas_gemm_bf16bf16f32_compute to compute
C := alpha* op(A)*op(B) + beta*C,
where:
op(X) is either op(X) = X or op(X) = XT,
alpha and beta are scalars,
A , B, and C are matrices:
op(A) is an m-by-k matrix,
op(B) is a k-by-n matrix,
C is an m-by-n matrix.

NOTE
You must use the same value of the Layout parameter for the entire sequence of related
cblas_gemm_bf16bf16f32_pack and cblas_gemm_bf16bf16f32_compute calls.
For best performance, use the same number of threads for packing and for computing.
If packing for both A and B matrices, you must use the same number of threads for packing A as for
packing B.

Input Parameters

Layout CBLAS_LAYOUT
Specifies whether two-dimensional array storage is row-major
(CblasRowMajor) or column-major (CblasColMajor).

transa MKL_INT
Specifies the form of op(A) used in the packing:

If transa = CblasNoTrans op(A) = A.

425
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If transa = CblasTrans op(A) = AT.

If transa = CblasPacked the matrix in array a is packed into a

format internal to Intel® oneAPI Math Kernel Library (oneMKL) and
lda is ignored.

transb MKL_INT
Specifies the form of op(B) used in the packing:

If transb = CblasNoTrans op(B) = B.

If transb = CblasTrans op(B) = BT.

If transb = CblasPacked the matrix in array b is packed into a

format internal to Intel® oneAPI Math Kernel Library (oneMKL) and
ldb is ignored.

m MKL_INT
Specifies the number of rows of the matrix op(A) and of the matrix C.
The value of m must be at least zero.

n MKL_INT
Specifies the number of columns of the matrix op(B) and the number
of columns of the matrix C. The value of n must be at least zero.

k MKL_INT
Specifies the number of columns of the matrix op(A) and the number
of rows of the matrix op(B). The value of k must be at least zero.

alpha float
Specifies the scalar alpha.

a MKL_BF16*

transa = transa = transa = CblasPacked

CblasNoTrans CblasTrans

Layout = Array, size Array, size Array of size returned by

CblasColMajor lda*k. lda*m. cblas_gemm_bf16bf16f32_pack
and initialized using
Before entry, Before
cblas_gemm_bf16bf16f32_pack
the leading m- entry, the
by-k part of leading k-
the array a by-m part of
must contain the array a
the matrix A. must
contain the
matrix A.

Layout = Array, size Array, size Array size returned by

CblasRowMajor lda*m. lda*k. cblas_gemm_bf16bf16f32_pack
and initialized using
Before entry, Before
cblas_gemm_bf16bf16f32_pack
the leading k- entry, the
by-m part of leading m-

426
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

the array a by-k part of

must contain the array a
the matrix A. must
contain the
matrix A.

lda MKL_INT
Specifies the leading dimension of a as declared in the calling
(sub)program.

transa = transa =
CblasNoTrans CblasTrans

Layout = lda must be at least lda must be at least

CblasColMajor max(1, m). max(1, k).

Layout = lda must be at least lda must be at least

CblasRowMajor max(1, k). max(1, m).

b MKL_BF16*

transa = transa = transa = CblasPacked

CblasNoTrans CblasTrans

Layout = Array, size Array, size Array of size returned by

CblasColMajor ldb*n. ldb*k. cblas_gemm_bf16bf16f32_pack
and initialized using
Before entry, Before
cblas_gemm_bf16bf16f32_pack
the leading k- entry, the
by-n part of leading n-
the array b by-k part of
must contain the array b
the matrix B. must
contain the
matrix B.

Layout = Array, size Array, size Array size returned by

CblasRowMajor ldb*k. ldb*n. cblas_gemm_bf16bf16f32_pack
and initialized using
Before entry, Before
cblas_gemm_bf16bf16f32_pack
the leading n- entry, the
by-k part of leading k-
the array b by-n part of
must contain the array b
the matrix B. must
contain the
matrix B.

ldb MKL_INT
Specifies the leading dimension of b as declared in the calling
(sub)program.

427
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

transb = transb =
CblasNoTrans CblasTrans

Layout = ldb must be at least ldb must be at least

CblasColMajor max(1, k). max(1, n).

Layout = ldb must be at least ldb must be at least

CblasRowMajor max(1, n). max(1, k).

beta float
Specifies the scalar beta.

c float*

Layout = Array, size ldc*n.

CblasColMajor
Before entry, the leading m-by-n part of the
array c must contain the matrix C, except
when beta is equal to zero, in which case c
need not be set on entry.

Layout = Array, size ldc*m.

CblasRowMajor
Before entry, the leading n-by-m part of the
array c must contain the matrix C, except
when beta is equal to zero, in which case c
need not be set on entry.

ldc MKL_INT
Specifies the leading dimension of c as declared in the calling
(sub)program.

Layout = CblasColMajor ldc must be at least max(1, m).

Layout = CblasRowMajor ldc must be at least max(1, n).

Output Parameters

c float*
Overwritten by the matrix alpha * op(A)*op(B) + beta*C.

Example
See the following examples in the Intel® oneAPI Math Kernel Library (oneMKL) installation directory to
understand the use of these routines:

cblas_gemm_bf16bf16f32_compute:
examples\cblas\source\cblas_gemm_bf16bf16f32_computex.c

428
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Application Notes
On architectures without native bfloat16 hardware instructions, matrix A and B are upconverted to single
precision and SGEMM is called to compute matrix multiplication operation.

cblas_gemm_bf16bf16f32
Computes a matrix-matrix product with general
bfloat16 matrices.

Syntax
void cblas_gemm_bf16bf16f32 (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa,
const CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const
float alpha, const MKL_BF16 *a, const MKL_INT lda, const MKL_BF16 *b, const MKL_INT
ldb, const float beta, float *c, const MKL_INT ldc);

Include Files
• mkl.h

Description
The cblas_gemm_bf16bf16f32 routines compute a scalar-matrix-matrix product and adds the result to a
scalar-matrix product. The operation is defined as:

C := alphaop(A) op(B) + beta*C

where :
op(X) is one of op(X) = X or op(X) = XT,
alpha and beta are scalars,
A, B, and C are matrices
op(A) is m-by-k matrix,
op(B) is k-by-n matrix,
C is an m-by-n matrix.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

transa Specifies the form of op(A) used in the matrix multiplication:

if transa=CblasNoTrans, then op(A) = A;

if transa=CblasTrans, then op(A) = AT.

transb Specifies the form of op(B) used in the matrix multiplication:

if transb=CblasNoTrans, then op(B) = B;

if transb=CblasTrans, then op(B) = BT.

m Specifies the number of rows of the matrix op(A) and of the matrix C.
The value of m must be at least zero.

n Specifies the number of columns of the matrix op(B) and the number
of columns of the matrix C. The value of n must be at least zero.

429
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

k Specifies the number of columns of the matrix op(A) and the number
of rows of the matrix op(B). The value of k must be at least zero.

alpha Specifies the scalar alpha.

a
transa=CblasNoTrans transa=CblasTrans

Layout = Array, size ldak Array, size ldam

CblasColMajor
Before entry, the leading Before entry, the
m-by-k part of the array leading k-by-m part of
a must contain the the array a must
matrix A. contain the matrix A.

Layout = Array, size lda* m Array, size lda*k

CblasRowMajor
Before entry, the leading Before entry, the
k-by-m part of the array leading m-by-k part of
a must contain the the array a must
matrix. contain the matrix.

lda Specifies the leading dimension of a as declared in the calling

(sub)program.

transa=CblasNoTrans transa=CblasTrans

Layout = lda must be at least lda must be at least

CblasColMajor max(1, m). max(1, k).

Layout = lda must be at least lda must be at least

CblasRowMajor max(1, k). max(1, m).

b
transb=CblasNoTrans transb=CblasTrans

Layout = Array, size ldb by n Array, size ldb by k

CblasColMajor
Before entry, the leading Before entry the
k-by-n part of the array leading n-by-k part of
b must contain the the array b must
matrix B. contain the matrix B.

Layout = Array, size ldb by k Array, size ldb by n

CblasRowMajor
Before entry the leading Before entry, the
n-by-k part of the array leading k-by-n part of
b must contain the the array b must
matrix B. contain the matrix B.

ldb Specifies the leading dimension of b as declared in the calling

(sub)program.

transb=CblasNoTrans transb=CblasTrans

430
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

Layout = ldb must be at least ldb must be at least

CblasColMajor max(1, k). max(1, n).

Layout = ldb must be at least ldb must be at least

CblasRowMajor max(1, n). max(1, k).

beta Specifies the scalar beta. When beta is equal to zero, then c need not
be set on input.

c
Layout = Array, size ldc by n. Before entry, the leading
CblasColMajor m-by-n part of the array c must contain the
matrix C, except when beta is equal to zero,
in which case c need not be set on entry.

Layout = Array, size ldc by m. Before entry, the leading

CblasRowMajor n-by-m part of the array c must contain the
matrix C, except when beta is equal to zero,
in which case c need not be set on entry.

ldc Specifies the leading dimension of c as declared in the calling

(sub)program.

Layout = CblasColMajor ldc must be at least max(1, m).

Layout = CblasRowMajor ldc must be at least max(1, n).

Output Parameters

c Overwritten by alpha* op(A) * op(B) + beta*C.

Example
For examples of routine usage, see these code examples in the Intel® oneAPI Math Kernel Library (oneMKL)
installation directory:

• cblas_gemm_bf16bf16f32: examples\cblas\source\cblas_gemm_bf16bf16f32x.c

Application Notes
On architectures without native bfloat16 hardware instructions, matrix A and B are upconverted to single
precision and SGEMM is called to compute matrix multiplication operation.

cblas_gemm_f16f16f32_compute
Computes a matrix-matrix product with general
matrices of half-precision data type (where one or
both input matrices are stored in a packed data
structure) and adds the result to a scalar-matrix
product.

Syntax
C:

431
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

void cblas_gemm_f16f16f32_compute (const CBLAS_LAYOUT Layout, const MKL_INT transa,

const MKL_INT transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const float
alpha, const MKL_F16 *a, const MKL_INT lda, const MKL_F16 *b, const MKL_INT ldb, const
float beta, float *c, const MKL_INT ldc);

Include Files
• mkl.h

Description
The cblas_gemm_f16f16f32_compute routine is one of a set of related routines that enable use of an
internal packed storage. After calling cblas_gemm_f16f16f32_pack call cblas_gemm_f16f16f32_compute
to compute
C := alpha* op(A)*op(B) + beta*C,
where:
op(X) is either op(X) = X or op(X) = XT,
alpha and beta are scalars,
A , B, and C are matrices:
op(A) is an m-by-k matrix,
op(B) is a k-by-n matrix,
C is an m-by-n matrix.

NOTE
You must use the same value of the Layout parameter for the entire sequence of related
cblas_gemm_f16f16f32_pack and cblas_gemm_f16f16f32_compute calls.
For best performance, use the same number of threads for packing and for computing.
If packing for both A and B matrices, you must use the same number of threads for packing A as for
packing B.

Input Parameters

Layout CBLAS_LAYOUT
Specifies whether two-dimensional array storage is row-major
(CblasRowMajor) or column-major (CblasColMajor).

transa MKL_INT
Specifies the form of op(A) used in the packing:

If transa = CblasNoTrans op(A) = A.

If transa = CblasTrans op(A) = AT.

If transa = CblasPacked the matrix in array a is packed into a

format internal to Intel® oneAPI Math Kernel Library (oneMKL) and
lda is ignored.

transb MKL_INT
Specifies the form of op(B) used in the packing:

If transb = CblasNoTrans op(B) = B.

432
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If transb = CblasTrans op(B) = BT.

If transb = CblasPacked the matrix in array b is packed into a

format internal to Intel® oneAPI Math Kernel Library (oneMKL) and
ldb is ignored.

m MKL_INT
Specifies the number of rows of the matrix op(A) and of the matrix C.
The value of m must be at least zero.

n MKL_INT
Specifies the number of columns of the matrix op(B) and the number
of columns of the matrix C. The value of n must be at least zero.

k MKL_INT
Specifies the number of columns of the matrix op(A) and the number
of rows of the matrix op(B). The value of k must be at least zero.

alpha float
Specifies the scalar alpha.

a MKL_F16*

transa = transa = transa = CblasPacked

CblasNoTrans CblasTrans

Layout = Array, size Array, size Array of size returned by

CblasColMajor lda*k. lda*m. cblas_gemm_f16f16f32_pack_g
and initialized using
Before entry, Before
cblas_gemm_f16f16f32_pack.
the leading m- entry, the
by-k part of leading k-
the array a by-m part of
must contain the array a
the matrix A. must
contain the
matrix A.

Layout = Array, size Array, size Array size returned by

CblasRowMajor lda*m. lda*k. cblas_gemm_f16f16f32_pack_g
and initialized using
Before entry, Before
cblas_gemm_f16f16f32_pack.
the leading k- entry, the
by-m part of leading m-
the array a by-k part of
must contain the array a
the matrix A. must
contain the
matrix A.

lda MKL_INT
Specifies the leading dimension of a as declared in the calling
(sub)program.

433
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

transa = transa =
CblasNoTrans CblasTrans

Layout = lda must be at least lda must be at least

CblasColMajor max(1, m). max(1, k).

Layout = lda must be at least lda must be at least

CblasRowMajor max(1, k). max(1, m).

b MKL_F16*

transa = transa = transa = CblasPacked

CblasNoTrans CblasTrans

Layout = Array, size Array, size Array of size returned by

CblasColMajor ldb*n. ldb*k. cblas_gemm_f16f16f32_pack_g
and initialized using
Before entry, Before
cblas_gemm_f16f16f32_pack.
the leading k- entry, the
by-n part of leading n-
the array b by-k part of
must contain the array b
the matrix B. must
contain the
matrix B.

Layout = Array, size Array, size Array size returned by

CblasRowMajor ldb*k. ldb*n. cblas_gemm_f16f16f32_pack_g
and initialized using
Before entry, Before
cblas_gemm_f16f16f32_pack.
the leading n- entry, the
by-k part of leading k-
the array b by-n part of
must contain the array b
the matrix B. must
contain the
matrix B.

ldb MKL_INT
Specifies the leading dimension of b as declared in the calling
(sub)program.

transb = transb =
CblasNoTrans CblasTrans

Layout = ldb must be at least ldb must be at least

CblasColMajor max(1, k). max(1, n).

Layout = ldb must be at least ldb must be at least

CblasRowMajor max(1, n). max(1, k).

beta float

434
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Specifies the scalar beta.

c float*

Layout = Array, size ldc*n.

CblasColMajor
Before entry, the leading m-by-n part of the
array c must contain the matrix C, except
when beta is equal to zero, in which case c
need not be set on entry.

Layout = Array, size ldc*m.

CblasRowMajor
Before entry, the leading n-by-m part of the
array c must contain the matrix C, except
when beta is equal to zero, in which case c
need not be set on entry.

ldc MKL_INT
Specifies the leading dimension of c as declared in the calling
(sub)program.

Layout = CblasColMajor ldc must be at least max(1, m).

Layout = CblasRowMajor ldc must be at least max(1, n).

Output Parameters

c float*
Overwritten by the matrix alpha * op(A)*op(B) + beta*C.

Example
See the following examples in the Intel® oneAPI Math Kernel Library (oneMKL) installation directory to
understand the use of these routines:

cblas_gemm_f16f16f32_compute:
examples\cblas\source\cblas_gemm_f16f16f32_computex.c

Application Notes
On architectures without native half precision hardware instructions, matrix A and B are upconverted to
single precision and SGEMM is called to compute matrix multiplication operation.

cblas_gemm_f16f16f32
Computes a matrix-matrix product with general
matrices of half precision data type.

Syntax
void cblas_gemm_f16f16f32 (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa,
const CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const
float alpha, const MKL_F16 *a, const MKL_INT lda, const MKL_F16 *b, const MKL_INT ldb,
const float beta, float *c, const MKL_INT ldc);

435
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description
The cblas_gemm_f16f16f32 routines compute a scalar-matrix-matrix product and adds the result to a
scalar-matrix product. The operation is defined as:

C := alphaop(A) op(B) + beta*C

where :
op(X) is one of op(X) = X or op(X) = XT,
alpha and beta are scalars,
A, B, and C are matrices
op(A) is m-by-k matrix,
op(B) is k-by-n matrix,
C is an m-by-n matrix.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

transa Specifies the form of op(A) used in the matrix multiplication:

if transa=CblasNoTrans, then op(A) = A;

if transa=CblasTrans, then op(A) = AT.

transb Specifies the form of op(B) used in the matrix multiplication:

if transb=CblasNoTrans, then op(B) = B;

if transb=CblasTrans, then op(B) = BT.

m Specifies the number of rows of the matrix op(A) and of the matrix C.
The value of m must be at least zero.

n Specifies the number of columns of the matrix op(B) and the number
of columns of the matrix C. The value of n must be at least zero.

k Specifies the number of columns of the matrix op(A) and the number
of rows of the matrix op(B). The value of k must be at least zero.

alpha Specifies the scalar alpha.

a
transa=CblasNoTrans transa=CblasTrans

Layout = Array, size ldak Array, size ldam

CblasColMajor
Before entry, the leading Before entry, the
m-by-k part of the array leading k-by-m part of
a must contain the the array a must
matrix A. contain the matrix A.

Layout = Array, size lda* m Array, size lda*k

CblasRowMajor

436
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

Before entry, the leading Before entry, the

k-by-m part of the array leading m-by-k part of
a must contain the the array a must
matrix. contain the matrix.

lda Specifies the leading dimension of a as declared in the calling

(sub)program.

transa=CblasNoTrans transa=CblasTrans

Layout = lda must be at least lda must be at least

CblasColMajor max(1, m). max(1, k).

Layout = lda must be at least lda must be at least

CblasRowMajor max(1, k). max(1, m).

b
transb=CblasNoTrans transb=CblasTrans

Layout = Array, size ldb by n Array, size ldb by k

CblasColMajor
Before entry, the leading Before entry the
k-by-n part of the array leading n-by-k part of
b must contain the the array b must
matrix B. contain the matrix B.

Layout = Array, size ldb by k Array, size ldb by n

CblasRowMajor
Before entry the leading Before entry, the
n-by-k part of the array leading k-by-n part of
b must contain the the array b must
matrix B. contain the matrix B.

ldb Specifies the leading dimension of b as declared in the calling

(sub)program.

transb=CblasNoTrans transb=CblasTrans

Layout = ldb must be at least ldb must be at least

CblasColMajor max(1, k). max(1, n).

Layout = ldb must be at least ldb must be at least

CblasRowMajor max(1, n). max(1, k).

beta Specifies the scalar beta. When beta is equal to zero, then c need not
be set on input.

437
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Layout = Array, size ldc by m. Before entry, the leading

CblasRowMajor n-by-m part of the array c must contain the
matrix C, except when beta is equal to zero,
in which case c need not be set on entry.

ldc Specifies the leading dimension of c as declared in the calling

(sub)program.

Layout = CblasColMajor ldc must be at least max(1, m).

Layout = CblasRowMajor ldc must be at least max(1, n).

Output Parameters

c Overwritten by alpha* op(A) * op(B) + beta*C.

Example
For examples of routine usage, see these code examples in the Intel® oneAPI Math Kernel Library (oneMKL)
installation directory:

• cblas_gemm_f16f16f32: examples\cblas\source\cblas_gemm_f16f16f32x.c

cblas_?gemm_free
Frees the storage previously allocated for the packed
matrix (deprecated).

Syntax
void cblas_sgemm_free (float *dest);
void cblas_dgemm_free (double *dest);

Include Files
• mkl.h

Description
The cblas_?gemm_free routine is one of a set of related routines that enable use of an internal packed
storage. Call the cblas_?gemm_free routine last to release storage for the packed matrix structure allocated
with cblas_?gemm_alloc (deprecated).

Input Parameters

dest Previously allocated storage.

Output Parameters

dest The freed buffer.

438
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
See Also
cblas_?gemm_pack Performs scaling and packing of the matrix into the previously allocated buffer.
cblas_?gemm_compute Computes a matrix-matrix product with general matrices where one or both
input matrices are stored in a packed data structure and adds the result to a scalar-matrix
product.
cblas_?gemm
for a detailed description of general matrix multiplication.

cblas_gemm_*
Computes a matrix-matrix product with general
integer matrices.

Syntax
void cblas_gemm_s8u8s32 (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, const
CBLAS_TRANSPOSE transb, const CBLAS_OFFSET offsetc, const MKL_INT m, const MKL_INT n,
const MKL_INT k, const float alpha, const void *a, const MKL_INT lda, const MKL_INT8
oa, const void *b, const MKL_INT ldb, const MKL_INT8 ob, const float beta, MKL_INT32 *c,
const MKL_INT ldc, const MKL_INT32 *oc);

void cblas_gemm_s16s16s32 (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa,

const CBLAS_TRANSPOSE transb, const CBLAS_OFFSET offsetc, const MKL_INT m, const
MKL_INT n, const MKL_INT k, const float alpha, const MKL_INT16 *a, const MKL_INT lda,
const MKL_INT16 oa, const MKL_INT16 *b, const MKL_INT ldb, const MKL_INT16 ob, const
float beta, MKL_INT32 *c, const MKL_INT ldc, const MKL_INT32 *oc);

Include Files
• mkl.h

Description
The cblas_gemm_* routines compute a scalar-matrix-matrix product and adds the result to a scalar-matrix
product. To get the final result, a vector is added to each row or column of the output matrix. The operation
is defined as:

C := alpha(op(A) + A_offset)(op(B) + B_offset) + beta*C + C_offset

where :
op(X) is either op(X) = X or op(X) = XT,
A_offset is an m-by-k matrix with every element equal to the value oa,
B_offset is a k-by-n matrix with every element equal to the value ob,
C_offset is an m-by-n matrix defined by the oc array as described in the description of the offsetc
parameter,
alpha and beta are scalars,
A is a matrix such that op(A) is m-by-k,
B is a matrix such that op(B) is k-by-n,
and C is an m-by-n matrix.

Input Parameters

Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

transa Specifies the form of op(A) used in the matrix multiplication:

439
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

if transa=CblasNoTrans, then op(A) = A;

if transa=CblasTrans, then op(A) = AT.

transb Specifies the form of op(B) used in the matrix multiplication:

if transb=CblasNoTrans, then op(B) = B;

if transb=CblasTrans, then op(B) = BT.

offsetc Specifies the form of C_offset used in the matrix multiplication.

offsetc = CblasFixOffset: oc has a single element and every
element of C_offset is equal to this element.
offsetc = CblasColOffset: oc has a size of m and every column of
C_offset is equal to oc.
offsetc = CblasRowOffset: oc has a size of n and every row of
C_offset is equal to oc.

m Specifies the number of rows of the matrix op(A) and of the matrix C. The
value of m must be at least zero.

n Specifies the number of columns of the matrix op(B) and the number of
columns of the matrix C. The value of n must be at least zero.

k Specifies the number of columns of the matrix op(A) and the number of
rows of the matrix op(B). The value of k must be at least zero.

alpha . Specifies the scalar alpha.

a
transa=CblasNoTrans transa=CblasTrans

Layout = Array, size ldak Array, size ldam

CblasColMajor
Before entry, the leading Before entry, the leading
m-by-k part of the array a k-by-m part of the array a
must contain the matrix A must contain the matrix A
of 8-bit signed integers for of 8-bit signed integers for
cblas_gemm_s8u8s32 or cblas_gemm_s8u8s32 or
16-bit signed integers for 16-bit signed integers for
cblas_gemm_s16s16s32. cblas_gemm_s16s16s32.

Layout = Array, size lda* m Array, size lda*k

CblasRowMajor
Before entry, the leading Before entry, the leading
k-by-m part of the array a m-by-k part of the array a
must contain the matrix A must contain the matrix A
of 8-bit unsigned integers of 8-bit unsigned integers
for cblas_gemm_s8u8s32 for cblas_gemm_s8u8s32
or 16-bit signed integers or 16-bit signed integers
for for
cblas_gemm_s16s16s32. cblas_gemm_s16s16s32.

lda Specifies the leading dimension of a as declared in the calling

(sub)program.

transa=CblasNoTrans transa=CblasTrans

440
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

Layout = lda must be at least lda must be at least

CblasColMajor max(1, m). max(1, k).

Layout = lda must be at least lda must be at least

CblasRowMajor max(1, k). max(1, m).

oa Specifies the scalar offset value for matrix A.

b
transb=CblasNoTrans transb=CblasTrans

Layout = Array, size ldb by n Array, size ldb by k

CblasColMajor
Before entry, the leading Before entry the leading n-
k-by-n part of the array b by-k part of the array b
must contain the matrix B must contain the matrix B
of 8-bit unsigned integers of 8-bit unsigned integers
for cblas_gemm_s8u8s32 for cblas_gemm_s8u8s32
or 16-bit signed integers or 16-bit signed integers
for for
cblas_gemm_s16s16s32. cblas_gemm_s16s16s32.

Layout = Array, size ldb by k Array, size ldb by n

CblasRowMajor
Before entry the leading n- Before entry, the leading
by-k part of the array b k-by-n part of the array b
must contain the matrix B must contain the matrix B
of 8-bit signed integers for of 8-bit signed integers for
cblas_gemm_s8u8s32 or cblas_gemm_s8u8s32 or
16-bit signed integers for 16-bit signed integers for
cblas_gemm_s16s16s32. cblas_gemm_s16s16s32.

ldb Specifies the leading dimension of b as declared in the calling

(sub)program.

transb=CblasNoTrans transb=CblasTrans

Layout = ldb must be at least ldb must be at least

CblasColMajor max(1, k). max(1, n).

Layout = ldb must be at least ldb must be at least

CblasRowMajor max(1, n). max(1, k).

ob Specifies the scalar offset value for matrix B.

beta Specifies the scalar beta. When beta is equal to zero, then c need not be
set on input.

441
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Layout = Array, size ldc by m. Before entry, the leading n-

CblasRowMajor by-m part of the array c must contain the matrix C,
except when beta is equal to zero, in which case c
need not be set on entry.

ldc Specifies the leading dimension of c as declared in the calling

(sub)program.

Layout = CblasColMajor ldc must be at least max(1, m).

Layout = CblasRowMajor ldc must be at least max(1, n).

oc Array, size len. Specifies the offset values for matrix C.

If offsetc = CblasFixOffset: len must be at least 1.

If offsetc = CblasColOffset: len must be at least max(1, m).
If offsetc = CblasRowOffset: oc must be at least max(1, n).

Output Parameters

c Overwritten by alpha(op(A) + A_offset)(op(B) + B_offset)

+ beta*C+ C_offset.

Example
For examples of routine usage, see the code in in the following links and in the Intel® oneAPI Math Kernel
Library (oneMKL) installation directory:

• cblas_gemm_s8u8s32: examples\cblas\source\cblas_gemm_s8u8s32x.c
• cblas_gemm_s16s16s32: examples\cblas\source\cblas_gemm_s16s16s32x.c

Application Notes
The matrix-matrix product can be expanded:
(op(A) + A_offset)*(op(B) + B_offset)
= op(A)*op(B) + op(A)*B_offset + A_offset*op(B) + A_offset*B_offset
After computing these four multiplication terms separately, they are summed from left to right. The results
from the matrix-matrix product and the C matrix are scaled with alpha and beta floating-point values
respectively using double-precision arithmetic. Before storing the results to the output c array, the floating-
point values are rounded to the nearest integers. In the event of overflow or underflow, the results depend
on the architecture . The results are either unsaturated (wrapped) or saturated to maximum or minimum
representable integer values for the data type of the output matrix.
When using cblas_gemm_s8u8s32 with row-major layout, the data types of A and B must be swapped. That
is, you must provide an 8-bit unsigned integer array for matrix A and an 8-bit signed integer array for matrix
B.
Intermediate integer computations in cblas_gemm_s8u8s32 on 64-bit Intel® Advanced Vector Extensions 2
(Intel® AVX2) and Intel® Advanced Vector Extensions 512 (Intel® AVX-512) architectures without Vector
Neural Network Instructions (VNNI) extensions can saturate. This is because only 16-bits are available for
the accumulation of intermediate results. You can avoid integer saturation by maintaining all integer
elements of A or B matrices under 8 bits.

442
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

cblas_?gemv_batch_strided
Computes groups of matrix-vector product with
general matrices.

Syntax
void cblas_sgemv_batch_strided (const CBLAS_LAYOUT layout, const CBLAS_TRANSPOSE trans,
const MKL_INT m, const MKL_INT n, const float alpha, const float *a, const MKL_INT lda,
const MKL_INT stridea, const float *x, const MKL_INT incx, const MKL_INT stridex, const
float beta, float *y, const MKL_INT incy, const MKL_INT stridey, const MKL_INT
batch_size);
void cblas_dgemv_batch_strided (const CBLAS_LAYOUT layout, const CBLAS_TRANSPOSE trans,
const MKL_INT m, const MKL_INT n, const double alpha, const double *a, const MKL_INT
lda, const MKL_INT stridea, const double *x, const MKL_INT incx, const MKL_INT stridex,
const double beta, double *y, const MKL_INT incy, const MKL_INT stridey, const MKL_INT
batch_size);
void cblas_cgemv_batch_strided (const CBLAS_LAYOUT layout, const CBLAS_TRANSPOSE trans,
const MKL_INT m, const MKL_INT n, const void alpha, const void *a, const MKL_INT lda,
const MKL_INT stridea, const void *x, const MKL_INT incx, const MKL_INT stridex, const
void beta, void *y, const MKL_INT incy, const MKL_INT stridey, const MKL_INT
batch_size);
void cblas_zgemv_batch_strided (const CBLAS_LAYOUT layout, const CBLAS_TRANSPOSE trans,
const MKL_INT m, const MKL_INT n, const void alpha, const void *a, const MKL_INT lda,
const MKL_INT stridea, const void *x, const MKL_INT incx, const MKL_INT stridex, const
void beta, void *y, const MKL_INT incy, const MKL_INT stridey, const MKL_INT
batch_size);

Include Files
• mkl.h

Description
The cblas_?gemv_batch_strided routines perform a series of matrix-vector product added to a scaled
vector. They are similar to the cblas_?gemv routine counterparts, but the cblas_?gemv_batch_strided
routines perform matrix-vector operations with groups of matrices and vectors.
All matrices a and vectors x and y have the same parameters (size, increments) and are stored at constant
stridea, stridex, and stridey from each other. The operation is defined as

for i = 0 … batch_size – 1
A is a matrix at offset i * stridea in a
X and Y are vectors at offset i * stridex and i * stridey in x and y
Y = alpha * op(A) * X + beta * Y
end for

Input Parameters

layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

trans Specifies op(A) the transposition operation applied to the A matrices.

if trans = CblasNoTrans, then op(A) = A;
if trans = CblasTrans, then op(A) = A';

443
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

if trans = CblasConjTrans, then op(A) = conjg(A').

m Number of rows of the matrices A. The value of m must be at least 0.

n Number of columns of the matrices A. The value of n must be at least 0.

alpha Specifies the scalar alpha.

a Array holding all the input matrix A. Must be of size at least lda*k + stridea
* (batch_size -1) where k is n if column major layout is used or m if row
major layout is used.

lda Specifies the leading dimension of the matrixA. It must be positive and at
least mif column major layout is used or at least n if row major layout is
used.

stridea Stride between two consecutive A matrices. Must be at least 0.

x Array holding all the input vector x. Must be of size at least (1 +

(len-1)*abs(incx)) + stridex * (batch_size - 1) where len is n if the A
matrix is not transposed or m otherwise.

incx Stride between two consecutive elements of the x vectors. Must not be
zero.

stridex Stride between two consecutive x vectors, must be at least 0.

beta Specifies the scalar beta.

y Array holding all the input vectors y. Must be of size at least batch_size *
stridey.

incy Stride between two consecutive elements of the y vectors. Must not be
zero.

stridey Stride between two consecutive y vectors, must be at least (1 +

(len-1)*abs(incy)) where len is m if the matrix A is non transpose or n
otherwise.

batch_size Number of gemv computations to perform and a matrices, x and y vectors.

Must be at least 0.

Output Parameters

y Array holding the batch_size updated vector y.

cblas_?gemv_batch
Computes groups of matrix-vector product with
general matrices.

Syntax
void cblas_sgemv_batch (const CBLAS_LAYOUT layout, const CBLAS_TRANSPOSE *trans_array,
const MKL_INT *m_array, const MKL_INT *n_array, const float *alpha_array, const float
**a_array, const MKL_INT *lda_array, const float **x_array, const MKL_INT *incx_array,
const float *beta_array, float **y_array, const MKL_INT *incy_array, const MKL_INT
group_count, const MKL_INT *group_size);

444
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void cblas_dgemv_batch (const CBLAS_LAYOUT layout, const CBLAS_TRANSPOSE *trans_array,
const MKL_INT *m_array, const MKL_INT *n_array, const double *alpha_array, const double
**a_array, const MKL_INT *lda_array, const double **x_array, const MKL_INT *incx_array,
const double *beta_array, double **y_array, const MKL_INT *incy_array, const MKL_INT
group_count, const MKL_INT *group_size);
void cblas_cgemv_batch (const CBLAS_LAYOUT layout, const CBLAS_TRANSPOSE *trans_array,
const MKL_INT *m_array, const MKL_INT *n_array, const void *alpha_array, const void
**a_array, const MKL_INT *lda_array, const void **x_array, const MKL_INT *incx_array,
const void *beta_array, void **y_array, const MKL_INT *incy_array, const MKL_INT
group_count, const MKL_INT *group_size);
void cblas_zgemv_batch (const CBLAS_LAYOUT layout, const CBLAS_TRANSPOSE *trans_array,
const MKL_INT *m_array, const MKL_INT *n_array, const void *alpha_array, const void
**a_array, const MKL_INT *lda_array, const void **x_array, const MKL_INT *incx_array,
const void *beta_array, void **y_array, const MKL_INT *incy_array, const MKL_INT
group_count, const MKL_INT *group_size);

Include Files
• mkl.h

Description
The cblas_?gemv_batch routines perform a series of matrix-vector product added to a scaled vector. They
are similar to the cblas_?gemv routine counterparts, but the cblas_?gemv_batch routines perform matrix-
vector operations with groups of matrices and vectors.
Each group contains matrices and vectors with the same parameters (size, increments). The operation is
defined as:

idx = 0
For i = 0 … group_count – 1
trans, m, n, alpha, lda, incx, beta, incy and group_size at position i in trans_array,
m_array, n_array, alpha_array, lda_array, incx_array, beta_array, incy_array and group_size_array
for j = 0 … group_size – 1
a is a matrix of size mxn at position idx in a_array
x and y are vectors of size m or n depending on trans, at position idx in x_array and
y_array
y := alpha * op(a) * x + beta * y
idx := idx + 1
end for
end for
The number of entries in a_array, x_array, and y_array is total_batch_count = the sum of all of the
group_size entries.

Input Parameters

layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

trans_array Array of size group_count. For the group i, transi = trans_array[i] specifies
the transposition operation applied to A.
if trans = CblasNoTrans, then op(A) = A;
if trans = CblasTrans, then op(A) = A';
if trans = CblasConjTrans, then op(A) = conjg(A').

445
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

m_array Array of size group_count. For the group i, mi = m_array[i] is the number
of rows of the matrix A.

n_array Array of size group_count. For the group i, ni = n_array[i] is the number of
columns in the matrix A.

alpha_array Array of size group_count. For the group i, alphai = alpha_array[i] is the
scalar alpha.

a_array Array of size total_batch_count of pointers used to store A matrices. The

array allocated for the A matrices of the group i must be of size at least ldai
* ni if column major layout is used or at least ldai * mi is row major layout
is used.

lda_array Array of size group_count. For the group i, ldai = lda_array[i] is the leading
dimension of the matrix A. It must be positive and at least miif column
major layout is used or at least ni if row major layout is used..

x_array Array of size total_batch_count of pointers used to store x vectors. The

array allocated for the x vectors of the group i must be of size at least (1 +
leni – 1)*abs(incxi)) where leni is ni if the A matrix is not transposed or mi
otherwise.

incx_array Array of size group_count. For the group i, incxi = incx_array[i] is the stride
of vector x. Must not be zero.

beta_array Array of size group_count. For the group i, betai = beta_array[i] is the
scalar beta.

y_array Array of size total_batch_count of pointers used to store y vectors. The

array allocated for the y vectors of the group i must be of size at least (1 +
leni – 1)*abs(incyi)) where leni is mi if the A matrix is not transposed or ni
otherwise.

incy_array Array of size group_count. For the group i, incyi = incy_array[i] is the stride
of vector y. Must not be zero.

group_count Number of groups. Must be at least 0.

group_size Array of size group_count. The element group_count[i] is the number of

operations in the group i. Each element in group_count must be at least 0.

Output Parameters

y_array Array of pointers holding the total_batch_count updated vector y.

cblas_?dgmm_batch_strided
Computes groups of matrix-vector product using
general matrices.

Syntax
void cblas_sdgmm_batch_strided (const CBLAS_LAYOUT layout, const CBLAS_SIDE left_right,
const MKL_INT m, const MKL_INT n, const float *a, const MKL_INT lda, const MKL_INT
stridea, const float *x, const MKL_INT incx, const MKL_INT stridex, const float *c,
const MKL_INT ldc, const MKL_INT stridec, const MKL_INT batch_size);

446
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void cblas_ddgmm_batch_strided (const CBLAS_LAYOUT layout, const CBLAS_SIDE left_right,
const MKL_INT m, const MKL_INT n, const double *a, const MKL_INT lda, const MKL_INT
stridea, const double *x, const MKL_INT incx, const MKL_INT stridex, const double *c,
const MKL_INT ldc, const MKL_INT stridec, const MKL_INT batch_size);
void cblas_cdgmm_batch_strided (const CBLAS_LAYOUT layout, const CBLAS_SIDE left_right,
const MKL_INT m, const MKL_INT n, const void *a, const MKL_INT lda, const MKL_INT
stridea, const void *x, const MKL_INT incx, const MKL_INT stridex, const void *c, const
MKL_INT ldc, const MKL_INT stridec, const MKL_INT batch_size);
void cblas_zdgmm_batch_strided (const CBLAS_LAYOUT layout, const CBLAS_SIDE left_right,
const MKL_INT m, const MKL_INT n, const void *a, const MKL_INT lda, const MKL_INT
stridea, const void *x, const MKL_INT incx, const MKL_INT stridex, const void *c, const
MKL_INT ldc, const MKL_INT stridec, const MKL_INT batch_size);

Include Files
• mkl.h

Description
The cblas_?dgmm_batch_strided routines perform a series of diagonal matrix-matrix product. The
diagonal matrices are stored as dense vectors and the operations are performed with group of matrices and
vectors.
All matrices a and c and vector x have the same parameters (size, increments) and are stored at constant
stride, respectively, given by stridea, stridec, and stridex from each other. The operation is defined as

for i = 0 … batch_size – 1
A and C are matrices at offset i * stridea in a and i * stridec in c
X is a vector at offset i * stridex in x
C = diag(X) * A or C = A * diag(X)
end for

Input Parameters

layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

left_right Specifies the position of the diagonal matrix in the matrix product
if left_right = CblasLeft, then C = diag(X) * A;
if left_right = CblasRight, then C = A * diag(X).

m Number of rows of the matrices A and C. The value of m must be at least 0.

n Number of columns of the matrices A and C. The value of n must be at least

a Array holding all the input matrix A. Must be of size at least lda*k + stridea
* (batch_size -1) where k is n if column major layout is used or m if row
major layout is used.

lda Specifies the leading dimension of the matrixA. It must be positive and at
least mif column major layout is used or at least n if row major layout is
used.

stridea Stride between two consecutive A matrices, must be at least 0.

447
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

x Array holding all the input vector x. Must be of size at least (1 + (len
-1)*abs(incx)) + stridex * (batch_size - 1) where len is n if the diagonal
matrix is on the right of the product or m otherwise.

incx Stride between two consecutive elements of the x vectors.

stridex Stride between two consecutive x vectors, must be at least 0.

c Array holding all the input matrix C. Must be of size at least batch_size *
stridec.

ldc Specifies the leading dimension of the matrix C. It must be positive and at
least mif column major layout is used or at least n if row major layout is
used.

stridec Stride between two consecutive A matrices, must be at least ldc * nif
column major layout is used or ldc * m if row major layout is used.

batch_size Number of dgmm computations to perform and a c matrices and x vectors.

Must be at least 0.

Output Parameters

c Array holding the batch_size updated matrices c.

cblas_?dgmm_batch
Computes groups of matrix-vector product using
general matrices.

Syntax
void cblas_sdgmm_batch (const CBLAS_LAYOUT layout, const CBLAS_SIDE *left_right_array,
const MKL_INT *m_array, const MKL_INT *n_array, const float **a_array, const MKL_INT
*lda_array, const float **x_array, const MKL_INT *incx_array, float **c_array, const
MKL_INT *ldc_array, const MKL_INT group_count, const MKL_INT *group_size);
void cblas_ddgmm_batch (const CBLAS_LAYOUT layout, const CBLAS_SIDE *left_right_array,
const MKL_INT *m_array, const MKL_INT *n_array, const double **a_array, const MKL_INT
*lda_array, const double **x_array, const MKL_INT *incx_array, double **c_array, const
MKL_INT *ldc_array, const MKL_INT group_count, const MKL_INT *group_size);
void cblas_cdgmm_batch (const CBLAS_LAYOUT layout, const CBLAS_SIDE *left_right_array,
const MKL_INT *m_array, const MKL_INT *n_array, const void **a_array, const MKL_INT
*lda_array, const void **x_array, const MKL_INT *incx_array, void **c_array, const
MKL_INT *ldc_array, const MKL_INT group_count, const MKL_INT *group_size);
void cblas_zdgmm_batch (const CBLAS_LAYOUT layout, const CBLAS_SIDE *left_right_array,
const MKL_INT *m_array, const MKL_INT *n_array, const void **a_array, const MKL_INT
*lda_array, const void **x_array, const MKL_INT *incx_array, void **c_array, const
MKL_INT *ldc_array, const MKL_INT group_count, const MKL_INT *group_size);

Include Files
• mkl.h

Description
The cblas_?dgmm_batch routines perform a series of diagonal matrix-matrix product. The diagonal matrices
are stored as dense vectors and the operations are performed with group of matrices and vectors. .

448
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Each group contains matrices and vectors with the same parameters (size, increments). The operation is
defined as:

idx = 0
For i = 0 … group_count – 1
left_right, m, n, lda, incx, ldc and group_size at position i in left_right_array, m_array,
n_array, lda_array, incx_array, ldc_array and group_size_array
for j = 0 … group_size – 1
a and c are matrices of size mxn at position idx in a_array and c_array
x is a vector of size m or n depending on left_right, at position idx in x_array
if (left_right == oneapi::mkl::side::left) c := diag(x) * a
else c := a * diag(x)
idx := idx + 1
end for
end for
The number of entries in a_array, x_array, and c_array is total_batch_count = the sum of all of the
group_size entries.

Input Parameters

layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).

left_right_array Array of size group_count. For the group i, left_righti = left_right_array[i]

specifies the position of the diagonal matrix in the matrix product.
if left_righti = CblasLeft, then C = diag(X) * A.
if left_righti = CblasRight, then C = A * diag(X).

m_array Array of size group_count. For the group i, mi = m_array[i] is the number
of rows of the matrix A and C.

n_array Array of size group_count. For the group i, ni = n_array[i] is the number of
columns in the matrix A and C.

a_array Array of size total_batch_count of pointers used to store A matrices. The

array allocated for the A matrices of the group i must be of size at least ldai
* niif column major layout is used or at least ldai * mi is row major layout is
used.

x_array Array of size total_batch_count of pointers used to store x vectors. The

array allocated for the x vectors of the group i must be of size at least (1 +
leni – 1)*abs(incxi)) where leni is ni if the diagonal matrix is on the right of
the product or mi otherwise.

incx_array Array of size group_count. For the group i, incxi = incx_array[i] is the stride
of vector x.

c_array Array of size total_batch_count of pointers used to store C matrices. The

array allocated for the C matrices of the group i must be of size at least ldci
* ni, if column major layout is used or at least ldci * mi if row major layout
is used.

449
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ldc_array Array of size group_count. For the group i, ldci = ldc_array[i] is the leading
dimension of the matrix C. It must be positive and at least miif column
major layout is used or at least ni if row major layout is used..

group_count Number of groups. Must be at least 0.

group_size Array of size group_count. The element group_count[i] is the number of

operations in the group i. Each element in group_size must be at least 0.

Output Parameters

c_array Array of pointers holding the total_batch_count updated matrix C.

mkl_jit_create_?gemm
Create a GEMM kernel that computes a scalar-matrix-
matrix product and adds the result to a scalar-matrix
product.

Syntax
mkl_jit_status_t mkl_jit_create_sgemm(void** jitter, const MKL_LAYOUT layout, const
MKL_TRANPOSE transa, const MKL_TRANSPOSE transb, const MKL_INT m, const MKL_INT n,
const MKL_INT k, const float alpha, const MKL_INT lda, const MKL_INT ldb, const float
beta, const MKL_INT ldc);
mkl_jit_status_t mkl_jit_create_dgemm(void** jitter, const MKL_LAYOUT layout, const
MKL_TRANPOSE transa, const MKL_TRANSPOSE transb, const MKL_INT m, const MKL_INT n,
const MKL_INT k, const double alpha, const MKL_INT lda, const MKL_INT ldb, const double
beta, const MKL_INT ldc);
mkl_jit_status_t mkl_jit_create_cgemm(void** jitter, const MKL_LAYOUT layout, const
MKL_TRANPOSE transa, const MKL_TRANSPOSE transb, const MKL_INT m, const MKL_INT n,
const MKL_INT k, const void* alpha, const MKL_INT lda, const MKL_INT ldb, const void*
beta, const MKL_INT ldc);
mkl_jit_status_t mkl_jit_create_zgemm(void** jitter, const MKL_LAYOUT layout, const
MKL_TRANPOSE transa, const MKL_TRANSPOSE transb, const MKL_INT m, const MKL_INT n,
const MKL_INT k, const void* alpha, const MKL_INT lda, const MKL_INT ldb, const void*
beta, const MKL_INT ldc);

Include Files
• mkl.h

Description
The mkl_jit_create_?gemm functions belong to a set of related routines that enable use of just-in-time
code generation.
The mkl_jit_create_?gemm functions create a handle to a just-in-time code generator (a jitter) and
generate a GEMM kernel that computes a scalar-matrix-matrix product and adds the result to a scalar-matrix
product, with general matrices. The operation of the generated GEMM kernel is defined as follows:

C := alpha*op(A)*op(B) + beta*C
Where:

• op(X) is either op(X) = X or op(X) = XT or op(X) = XH

• alpha and beta are scalars

450
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• A, B, and C are matrices
• op(A) is an m-by-k matrix
• op(B) is a k-by-n matrix
• C is an m-by-n matrix

NOTE
Generating a new kernel with mkl_jit_create_?gemm involves moderate runtime overhead.
To benefit from JIT code generation, use this feature when you need to call the generated
kernel many times (for example, several hundred calls).

Input Parameters

layout Specifies whether two-dimensional array storage is row-major

(MKL_ROW_MAJOR) or column-major (MKL_COL_MAJOR).

transa Specifies the form of op(A) used in the generated matrix multiplication:

• if transa = MKL_NOTRANS, then op(A) = A

• if transa = MKL_TRANS, then op(A) = AT
• if transa = MKL_CONJTRANS, then op(A) = AH

transb Specifies the form of op(B) used in the generated matrix multiplication:

• if transb = MKL_NOTRANS, then op(B) = B

• if transb = MKL_TRANS, then op(B) = BT
• if transb = MKL_CONJTRANS, then op(B) = BH

m Specifies the number of rows of the matrix op(A) and of the matrix C. The
value of m must be at least zero.

n Specifies the number of columns of the matrix op(B) and of the matrix C.
The value of n must be at least zero.

k Specifies the number of columns of the matrix op(A) and the number of
rows of the matrix op(B). The value of k must be at least zero.

alpha Specifies the scalar alpha.

NOTE
alpha is passed by pointer for mkl_jit_create_cgemm and
mkl_jit_create_zgemm.

lda Specifies the leading dimension of a.

transa=MKL_NOTRAN transa=MKL_TRANS
S or
transa=MKL_CONJTR
ANS
layout=MKL_ROW_MA lda must be at least lda must be at least
JOR max(1,k) max(1,m)
layout=MKL_COL_MA lda must be at least lda must be at least
JOR max(1,m) max(1,k)

451
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ldb Specifies the leading dimension of b:

transb=MKL_NOTRAN transb=MKL_TRANS
S or
transb=MKL_CONJTR
ANS
layout=MKL_ROW_MA ldb must be at least ldb must be at least
JOR max(1,n) max(1,k)
layout=MKL_COL_MA ldb must be at least ldb must be at least
JOR max(1,k) max(1,n)

beta Specifies the scalar beta.

NOTE
beta is passed by pointer for mkl_jit_create_cgemm and
mkl_jit_create_zgemm.

ldc Specifies the leading dimension of c.

layout=MKL_ROW_MAJOR ldc must be at least max(1,n)
layout=MKL_COL_MAJOR ldc must be at least max(1,m)

Output Parameters

jitter Pointer to a handle to the newly created code generator.

Return Values

status Returns one of the following:

• MKL_JIT_ERROR if the handle cannot be created (no memory)

—or—
• MKL_JIT_SUCCESS if the jitter has been created and the GEMM kernel was
successfully created
—or—
• MKL_NO_JIT if the jitter has been created, but a JIT GEMM kernel was not
created because JIT is not beneficial for the given input parameters. The
function pointer returned by mkl_jit_get_?gemm_ptr will call standard
(non-JIT) GEMM.

mkl_jit_get_?gemm_ptr
Return the GEMM kernel associated with a jitter
previously created with mkl_jit_create_?gemm.

Syntax
sgemm_jit_kernel_t mkl_jit_get_sgemm_ptr(const void* jitter);
dgemm_jit_kernel_t mkl_jit_get_dgemm_ptr(const void* jitter);
cgemm_jit_kernel_t mkl_jit_get_cgemm_ptr(const void* jitter);

452
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
zgemm_jit_kernel_t mkl_jit_get_zgemm_ptr(const void* jitter);

Include Files
• mkl.h

Description
The mkl_jit_get_?gemm_ptr functions belong to a set of related routines that enable use of just-in-time
code generation.
The mkl_jit_get_?gemm_ptr functions take as input a jitter previously created with
mkl_jit_create_?gemm, and return the GEMM kernel associated with that jitter. The returned GEMM kernel
computes a scalar-matrix-matrix product and adds the result to a scalar-matrix product, with general
matrices. The operation is defined as follows:

C := alpha*op(A)*op(B) + beta*C
Where:

• op(X) is one of op(X) = X or op(X) = XT or op(X) = XH

• alpha and beta are scalars
• A, B, and C are matrices
• op(A) is an m-by-k matrix
• op(B) is a k-by-n matrix
• C is an m-by-n matrix

Input Parameter

jitter Handle to the code generator.

Return Values

func • sgemm_jit_kernel_t – A function pointer type expecting four

inputs of type void*, float*, float*, and float*

typedef void (*sgemm_jit_kernel_t)

(void*,float*,float*,float*);
• dgemm_jit_kernel_t – A function pointer type expecting four
inputs of type void*, double*, double*, and double*

typedef void(*dgemm_jit_kernel_t)
(void*,double*,double*,double*);
• cgemm_jit_kernel_t – A function pointer type expecting four
inputs of type void*, MKL_Complex8*, MKL_Complex8*, and
MKL_Complex8*

typedef void(*cgemm_jit_kernel_t)
(void*,MKL_Complex8*,MKL_Complex8*,MKL_Complex8*);

453
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

• zgemm_jit_kernel_t – A function pointer type expecting four

inputs of type void*, MKL_Complex16*, MKL_Complex16*, and
MKL_Complex16*

typedef void(*zgemm_jit_kernel_t)
(void*,MKL_Complex16*,MKL_Complex16*,MKL_Complex16*);
If the jitter input is not NULL, returns a function pointer to a GEMM
kernel. The GEMM kernel is called with four parameters: the jitter and
the three matrices a, b, and c. Otherwise, returns NULL.

If layout, transa, transb, m, n, k, lda, ldb, and ldc are the parameters used during the creation of the
input jitter, then:

a layout = layout =
MKL_COL_MAJOR MKL_ROW_MAJOR

transa = Array of size ldak Array of size ldam

MKL_NOTRANS
Before calling the Before calling the
returned function returned function
pointer, the leading m- pointer, the leading k-
by-k part of the array by-m part of the array
a must contain the a must contain the
matrix A. matrix A.

transa = Array of size ldam Array of size ldak

MKL_TRANS or
Before calling the Before calling the
transa = returned function returned function
MKL_CONJTRANS pointer, the leading k- pointer, the leading m-
by-m part of the array by-k part of the array
a must contain the a must contain the
matrix A. matrix A.

b
layout = layout =
MKL_COL_MAJOR MKL_ROW_MAJOR

transb = Array of size ldbn Array of size ldbk

MKL_NOTRANS
Before calling the Before calling the
returned function returned function
pointer, the leading k- pointer, the leading n-
by-n part of the array by-k part of the array
b must contain the b must contain the
matrix B. matrix B.

transb = Array of size ldbk Array of size ldbn

MKL_TRANS or
Before calling the Before calling the
transb = returned function returned function
MKL_CONJTRANS pointer, the leading n- pointer, the leading k-
by-k part of the array by-n part of the array
b must contain the b must contain the
matrix B. matrix B.

454
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
c layout = MKL_COL_MAJOR layout = MKL_ROW_MAJOR

Array of size ldcn Array of size ldcm

Before calling the returned Before calling the returned

function pointer, the leading m- function pointer, the leading n-
by-n part of the array c must by-m part of the array c must
contain the matrix C. contain the matrix C.

mkl_jit_destroy
Delete the jitter previously created with
mkl_jit_create_?gemm as well as the GEMM kernel
that it contains.

Syntax
mkl_jit_status_t mkl_jit_destroy (void* jitter);

Include Files
• mkl.h

Description
The mkl_jit_destroy function belongs to a set of related routines that enable use of just-in-time code
generation.
The mkl_jit_destroy function takes as input a jitter previously created with mkl_jit_create_?gemm and
deletes the jitter as well as the GEMM kernel that it contains.

Input Parameter

jitter Jitter handle

Return Values

status Returns one of the following:

• MKL_JIT_ERROR if the pointer is not NULL and is not a handle on a jitter—that

is, if it was not created with mkl_jit_create_?gemm

—or—
• MKL_JIT_SUCCESS if the jitter has been successfully destroyed

455
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

LAPACK Routines
Intel® oneAPI Math Kernel Library (oneMKL)implements routines from the LAPACK package that are used for
solving systems of linear equations, linear least squares problems, eigenvalue and singular value problems,
and performing a number of related computational tasks. The library includes LAPACK routines for both real
and complex data. Routines are supported for systems of equations with the following types of matrices:
• General
• Banded
• Symmetric or Hermitian positive-definite (full, packed, and rectangular full packed (RFP) storage)
• Symmetric or Hermitian positive-definite banded
• Symmetric or Hermitian indefinite (both full and packed storage)
• Symmetric or Hermitian indefinite banded
• Triangular (full, packed, and RFP storage)
• Triangular banded
• Tridiagonal
• Diagonally dominant tridiagonal.

NOTE
Different arrays used as parameters to Intel® MKL LAPACK routines must not overlap.

Warning
LAPACK routines assume that input matrices do not contain IEEE 754 special values such as INF or
NaN values. Using these special values may cause LAPACK to return unexpected results or become
unstable.

C Interface Conventions for LAPACK Routines

The C interfaces are implemented for most of the Intel® oneAPI Math Kernel Library (oneMKL) LAPACK driver
and computational routines.

NaN Checking in LAPACKE

NaN checking can affect the performance of an application. By default, it is ON.
See the Support Functions section for details on the methods and options to turn NaN check off or back on
with LAPACKE:.

Function Prototypes
Intel® oneAPI Math Kernel Library (oneMKL) supports four distinct floating-point precisions. Each
corresponding prototype looks similar, usually differing only in the data type. C interface LAPACK function
names follow the form<?><name>[_64], where <?> is:

• LAPACKE_s for float

• LAPACKE_d for double
• LAPACKE_c for lapack_complex_float
• LAPACKE_z for lapack_complex_double

On 64-bit platforms, Intel® oneAPI Math Kernel Library (oneMKL) provides LAPACK C interfaces with the _64
suffix to support large data arrays in the LP64 interface library. For more interface library details, see "Using
the ILP64 Interface vs. LP64 Interface" in the developer guide.
A specific example follows. To solve a system of linear equations with a packed Cholesky-factored Hermitian
positive-definite matrix with complex precision, use the following:

456
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_cpptrs(int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_float* ap, lapack_complex_float* b, lapack_int ldb);
For matrices whose dimensions are greater than 231-1, you can use either LAPACKE_cpptrs in the ILP64
interface library or LAPACKE_cpptrs_64 in the LP64 interface library.

Workspace Arrays
In contrast to the Fortran interface, the LAPACK C interface omits workspace parameters because workspace
is allocated during runtime and released upon completion of the function operation.
If you prefer to allocate workspace arrays yourself, the LAPACK C interface provides alternate interfaces with
work parameters. The name of the alternate interface is the same as the LAPACK C interface with _work
appended. For example, the syntax for the singular value decomposition of a real bidiagonal matrix is:

Fortran: call sbdsdc ( uplo, compq, n, d, e, u, ldu, vt, ldvt, q, iq,

work, iwork, info )
C LAPACK interface: lapack_int LAPACKE_sbdsdc ( int matrix_layout, char uplo, char
compq, lapack_int n, float* d, float* e, float* u, lapack_int
ldu, float* vt, lapack_int ldvt, float* q, lapack_int* iq );
Alternate C LAPACK lapack_int LAPACKE_sbdsdc_work( int matrix_layout, char uplo,
interface with work char compq, lapack_int n, float* d, float* e, float* u,
parameters: lapack_int ldu, float* vt, lapack_int ldvt, float* q, lapack_int*
iq, float* work, lapack_int* iwork );

See the install_dir/include/mkl_lapacke.h file for the full list of alternative C LAPACK interfaces.

The Intel® oneAPI Math Kernel Library (oneMKL) Fortran-specific documentation contains details about
workspace arrays.

Mapping Fortran Data Types against C Data Types

Fortran Data Types vs. C Data Types
FORTRAN C

INTEGER lapack_int

LOGICAL lapack_logical

REAL float

DOUBLE PRECISION double

COMPLEX lapack_complex_float

COMPLEX*16/DOUBLE COMPLEX lapack_complex_double

CHARACTER char

C Type Definitions
You can find type definitions specific to Intel® oneAPI Math Kernel Library (oneMKL) such asMKL_INT,
MKL_Complex8, and MKL_Complex16 in install_dir/mkl_types.h.

C types #ifndef lapack_int

#define lapack_int MKL_INT
#endif

457
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

#ifndef lapack_logical
#define lapack_logical lapack_int
#endif

Complex Type Definitions Complex type for single precision:

#ifndef lapack_complex_float
#define lapack_complex_float MKL_Complex8
#endif
Complex type for double precision:

#ifndef lapack_complex_double
#define lapack_complex_double MKL_Complex16
#endif

Matrix Layout Definitions #define LAPACK_ROW_MAJOR 101

#define LAPACK_COL_MAJOR 102
See Matrix Layout for LAPACK Routines above for an explanation of row-major
order and column-major order storage.

Error Code Definitions #define LAPACK_WORK_MEMORY_ERROR -1010 /* Failed to allocate

memory
for a working array */
#define LAPACK_TRANSPOSE_MEMORY_ERROR -1011 /* Failed to allocate
memory
for transposed matrix */

Matrix Layout for LAPACK Routines

There are two general methods of storing a two dimensional matrix in linear (one dimensional) memory:
column-wise (column major order) or row-wise (row major order). Consider an M-by-N matrix A:

458
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Column Major Layout
In column major layout the first index, i, of matrix elements ai,j changes faster than the second index when
accessing sequential memory locations. In other words, for 1 ≤i < M, if the element ai,j is stored in a specific
location in memory, the element ai+1,j is stored in the next location, and, for 1 ≤j < N, the element aM,j is
stored in the location previous to element a1,j+1. So the matrix elements are located in memory according to
this sequence:
{a1,1a2,1 ... aM,1a1,2a2,2 ... aM,2 ... ... a1,Na2,N ... aM,N}

Row Major Layout

In row major layout the second index, j, of matrix elements ai,j changes faster than the first index when
accessing sequential memory locations. In other words, for 1 ≤j < N, if the element ai,j is stored in a specific
location in memory, the element ai,j+1 is stored in the next location, and, for 1 ≤i < M, the element ai,N is
stored in the location previous to element ai+1,1. So the matrix elements are located in memory according to
this sequence:
{a1,1a1,2 ... a1,Na2,1a2,2 ... a2,N ... ... aN,1aN,2 ... aM,N}

Leading Dimension Parameter

A leading dimension parameter allows use of LAPACK routines on a submatrix of a larger matrix. For
example, the submatrix B can be extracted from the original matrix A defined previously:

B is formed from rows with indices i0 + 1 to i0 + K and columns j0 + 1 to j0 + L of matrix A. To specify matrix
B, LAPACK routines require four parameters:
• the number of rows K;
• the number of columns L;
• a pointer to the start of the array containing elements of B;
• the leading dimension of the array containing elements of B.
The leading dimension depends on the layout of the matrix:
• Column major layout
Leading dimension ldb=M, the number of rows of matrix A.

Starting address: offset by i0 + j0*ldb from a1,1.

• Row major layout

459
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Leading dimension ldb=N, the number of columns of matrix A.

Starting address: offset by i0*ldb + j0 from a1,1.

Matrix Storage Schemes for LAPACK Routines

LAPACK routines use the following matrix storage schemes:
• Full Storage
• Packed Storage
• Band Storage
• Rectangular Full Packed (RFP) Storage

Full Storage
Consider an m-by-n matrix A :
a1, 1 a1, 2 a1, 3 ⋯ a1, n
a2, 1 a2, 2 a2, 3 ⋯ a2, n
A = a3, 1 a3, 2 a3, 3 ⋯ a3, n
⋮ ⋮ ⋮ ⋱ ⋮
am, 1 am, 2 am, 3 ⋯ am, n

It is stored in a one-dimensional array a of length at least lda*n for column major layout or m*lda for row
major layout. Element ai,j is stored as array element a[k] where the mapping of k(i, j) is defined as

• column major layout: k(i, j) = i - 1 + (j - 1)*lda

• row major layout: k(i, j) = (i - 1)*lda + j - 1

NOTE
Although LAPACK accepts parameter values of zero for matrix size, in general the size of the array
used to store an m-by-n matrix A with leading dimension lda should be greater than or equal to
max(1, n*lda) for column major layout and max (1, m*lda) for row major layout.

NOTE
Even though the array used to store a matrix is one-dimensional, for simplicity the documentation
sometimes refers parts of the array such as rows, columns, upper and lower triangular part, and
diagonals. These refer to the parts of the matrix stored within the array. For example, the lower
triangle of array a is defined as the subset of elements a[k(i,j)] with i≥j.

460
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Packed Storage
The packed storage format compactly stores matrix elements when only one part of the matrix, the upper or
lower triangle, is necessary to determine all of the elements of the matrix. This is the case when the matrix
is upper triangular, lower triangular, symmetric, or Hermitian. For an n-by-n matrix of one of these types, a
linear array ap of length n*(n + 1)/2 is adequate. Two parameters define the storage scheme:
matrix_layout, which specifies column major (with the value LAPACK_COL_MAJOR) or row major (with the
value LAPACK_ROW_MAJOR) matrix layout, and uplo, which specifies that the upper triangle (with the value
'U') or the lower triangle (with the value 'L') is stored.
Element ai,j is stored as array element a[k] where the mapping of k(i, j) is defined as

matrix_layout = LAPACK_COL_MAJOR matrix_layout = LAPACK_ROW_MAJOR

uplo = 'U' 1 ≤i≤j≤n k(i, j) = i - 1 + j(j - 1)/2 k(i, j) = j - 1 + (i - 1)(2*n - i)/2

uplo = 'L' 1 ≤j≤i≤n k(i, j) = i - 1 + (j - 1)(2n - j)/2 k(i, j) = j - 1 + i*(i - 1)/2

NOTE
Although LAPACK accepts parameter values of zero for matrix size, in general the size of the array
should be greater than or equal to max(1, nx*(n + 1)/2).

Band Storage
When the non-zero elements of a matrix are confined to diagonal bands, it is possible to store the elements
more efficiently using band storage. For example, consider an m-by-n band matrix A with kl subdiagonals
and ku superdiagonals:
a1, 1 a1, 2 ⋯ a1, k + 1
u
⋮ ⋮ ⋱ ⋱ ⋱
ak + 1, 1 ak + 1, 2 ⋯ ak + 1, k + 1 ⋱ ak + 1, k + k + 1
l l l u l l u
A= ak + 2, 2 ⋱ ⋱ ⋱ ⋱ ⋱
l
⋱ ⋱ ⋱ ⋱ ⋱⋱
ak + j, j ak + 1, j + 1 ⋯ ⋯ ⋯ ak + j, k + k + j
l l l l u
⋱ ⋱ ⋱ ⋱ ⋱ ⋱
This matrix can be stored compactly in a one dimensional array ab. There are two operations involved in
storing the matrix: packing the band matrix into matrix AB, and converting the packed matrix to a one-
dimensional array.
• Packing the Band Matrix: How the band matrix is packed depends on the matrix layout.
• Column major layout: matrix A is packed in an ldab-by-n matrix AB column-wise so that the diagonals
of A become rows of array AB.

461
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

a1, k + 1 a1, k + 2 a1, k + 3 ⋯

u u u
⋰ ⋮ ⋮ ⋮ ⋯
a1, 3 ⋯ ak − 1, k + 1 ak , k + 2 ak + 1, k + 3 ⋯
u u u u u u
a1, 2 a2, 3 ⋯ ak , k + 1 ak + 1, k + 2 ak + 2, k + 3 ⋯
u u u u u u
AB =
a1, 1 a2, 2 a3, 3 ⋯ ak + 1, k + 1 ak + 2, k + 2 ak + 3, k + 3 ⋯
u u u u u u
a2, 1 a3, 2 a4, 3 ⋯ ak + 2, k + 1 ak + 3, k + 2 ak + 4, k + 3 ⋯
u u u u u u
⋮ ⋮ ⋮ ⋱ ⋮ ⋮ ⋮ ⋯
ak + 1, 1 ak + 2, 2 ak + 3, 3 ⋯ ak + k + 1, k + 1 ak + k + 2, k + 2 ak + k + 3, k + 3 ⋯
l l l u l. u u l. u u l. u

The number of rows of ABldab≥kl + ku + 1, and the number of columns of AB is n.

• Row major layout: matrix A is packed in an m-by-ldab matrix AB row-wise so that the diagonals of A
become columns of AB.
a1, 1 a1, 2 ⋯ a1, k + 1
u
a2, 1 a2, 2 a2, 3 ⋯ a2, k + 2
u
a3, 1 a3, 2 a3, 3 a3, 4 ⋯ a3, k + 3
u
⋰ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
AB =
ak + 1, 1 ak + 1, 2 ⋯ ⋯ ak + 1, k + 1 ak + 1, k + 2 ⋯ ak + 1, k + k + 1
l l l l l l l u l
ak + 2, 2 ak + 2, 3 ⋯ ⋯ ak + 2, k + 2 ak + 2, k + 3 ⋯ ak + 2, k + k + 2
l l l l l l l u l
ak + 3, 3 ak + 3, 4 ⋯ ⋯ ak + 3, k + 3 ak + 3, k + 4 ⋯ ak + 3, k + k + 3
l l l l l l l u l
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
The number of columns of ABldab≥kl + ku + 1, and the number of rows of AB is m.

NOTE
For both column major and row major layout, elements of the upper left triangle of AB are not used.
Depending on the relationship of the dimensions m, n, kl, and ku, the lower right triangle might not be
used.

• Converting the Packed Matrix to a One-Dimensional Array: The packed matrix AB is stored in a linear
array ab as described in Full Storage . The size of ab should be greater than or equal to the total number
of elements of matrix AB: ldab*n for column major layout or ldab*m for row major layout. The leading
dimension of ab, ldab, must be greater than or equal to kl + ku + 1 (and some routines require it to be
even larger).
Element ai,j is stored as array element a[k(i, j)] where the mapping of k(i, j) is defined as

• column major layout: k(i, j) = i + ku - j + (j - 1)*ldab; 1 ≤j≤n, max(1, j - ku) ≤i≤ min(m, j + kl)
• row major layout: k(i,j) = j-i+kl+(i-1)(kl+ku+1), 1 ≤ i ≤ m, max(1, i - kl) ≤ j ≤ min(n, i + ku)

NOTE
Although LAPACK accepts parameter values of zero for matrix size, in general the size of the array
should be greater than or equal to max(1, n*ldab) for column major layout and max (1, m*ldab) for
row major layout.

462
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Rectangular Full Packed Storage
A combination of full and packed storage, rectangular full packed storage can be used to store the upper or
lower triangle of a matrix which is upper triangular, lower triangular, symmetric, or Hermitian. It offers the
storage savings of packed storage plus the efficiency of using full storage Level 3 BLAS and LAPACK routines.
Three parameters define the storage scheme: matrix_layout, which specifies column major (with the value
LAPACK_COL_MAJOR) or row major (with the value LAPACK_ROW_MAJOR) matrix layout; uplo, which specifies
that the upper triangle (with the value 'U') or the lower triangle (with the value 'L') is stored;and transr,
which specifies normal (with the value 'N'), transpose (with the value 'T'), or conjugate transpose (with the
value 'C') operation on the matrix.
Consider an N-by-N matrix A:
a0, 0 a0, 1 a0, 2 ⋯ a0, N − 1
a1, 0 a1, 1 a1, 2 ⋯ a1, N − 1
A= a2, 0 a2, 1 a2, 2 ⋯ a2, N − 1
⋮ ⋮ ⋮ ⋱ ⋮
aN − 1, 0 aN − 1, 1 aN − 1, 2 ⋯ aN − 1, N − 1

The upper or lower triangle of A can be stored in the array ap of length N*(N + 1)/2.

Additionally, define k as the integer part of N/2, such that N=2*k if N is even, and N=2*k + 1 if N is odd.
Storing the matrix involves packing the matrix into a rectangular matrix, and then storing the matrix in a
one-dimensional array. The size of rectangular matrix AP required for the N-by-N matrix A is N + 1 by N/2 for
even N, and N by (N + 1)/2 for odd N.
These examples illustrate the rectangular full packed storage method.
• Upper triangular - uplo = 'U'

Consider a matrix A with N = 6:

a0, 0 a0, 1 a0, 2 a0, 3 a0, 4 a0, 5
a1, 0 a1, 1 a1, 2 a1, 3 a1, 4 a1, 5
a2, 0 a2, 1 a2, 2 a2, 3 a2, 4 a2, 5
A=
a3, 0 a3, 1 a3, 2 a3, 3 a3, 4 a3, 5
a4, 0 a4, 1 a4, 2 a4, 3 a4, 4 a4, 5
a5, 0 a5, 1 a5, 2 a5, 3 a5, 4 a5, 5

• Not transposed - transr = 'N'

The elements of the upper triangle of A can be packed in a matrix with the dimensions (N + 1)-by-
(N/2) = 7 by 3:
a0, 3 a0, 4 a0, 5
a1, 3 a1, 4 a1, 5
a2, 3 a2, 4 a2, 5

AP = a3, 3 a3, 4 a3, 5

a0, 0 a4, 4 a4, 5
a0, 1 a1, 1 a5, 5
a0, 2 a1, 2 a2, 2
• Transposed or conjugate transposed - transr = 'T' or transr = 'C'

The elements of the upper triangle of A can be packed in a matrix with the dimensions (N/2) by (N +
1) = 3 by 7:

463
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

a0, 3 a1, 3 a2, 3 a3, 3 a0, 0 a0, 1 a0, 2

AP = a0, 4 a1, 4 a2, 4 a3, 4 a4, 4 a1, 1 a1, 2

a0, 5 a1, 5 a2, 5 a3, 5 a4, 5 a5, 5 a2, 2

Consider a matrix A with N = 5:

a0, 0 a0, 1 a0, 2 a0, 3 a0, 4
a1, 0 a1, 1 a1, 2 a1, 3 a1, 4

A = a2, 0 a2, 1 a2, 2 a2, 3 a2, 4

a3, 0 a3, 1 a3, 2 a3, 3 a3, 4
a4, 0 a4, 1 a4, 2 a4, 3 a4, 4

• Not transposed - transr = 'N'

The elements of the upper triangle of A can be packed in a matrix with the dimensions (N)-by-((N
+1)/2) = 5 by 3:
a0, 2 a0, 3 a0, 4
a1, 2 a1, 3 a1, 4

AP = a2, 2 a2, 3 a2, 4

a0, 0 a3, 3 a3, 4
a0, 1 a1, 1 a4, 4
• Transposed or conjugate transposed - transr = 'T' or transr = 'C'

The elements of the upper triangle of A can be packed in a matrix with the dimensions ((N+1)/2) by
(N ) = 5 by 3:
a0, 2 a1, 2 a2, 3 a0, 0 a0, 1

AP = a0, 3 a1, 3 a2, 3 a3, 3 a1, 1

a0, 4 a1, 4 a2, 4 a3, 4 a4, 4
• Lower triangular - uplo = 'L'

Consider a matrix A with N = 6:

• Not transposed - transr = 'N'

The elements of the lower triangle of A can be packed in a matrix with the dimensions (N + 1)-by-
(N/2) = 7 by 3:

464
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
a3, 3 a4, 3 a5, 3
a0, 0 a4, 4 a5, 4
a1, 0 a1, 1 a5, 5

AP = a2, 0 a2, 1 a2, 2

a3, 0 a3, 1 a3, 2
a3, 0 a4, 1 a4, 2
a5, 0 a5, 1 a5, 2
• Transposed or conjugate transposed - transr = 'T' or transr = 'C'

The elements of the lower triangle of A can be packed in a matrix with the dimensions (N/2) by (N +
1) = 3 by 7:
a3, 3 a0, 0 a1, 0 a2, 0 a3, 0 a4, 0 a5, 0

AP = a4, 3 a4, 4 a1, 1 a2, 1 a3, 1 a4, 1 a5, 1

a5, 3 a5, 4 a5, 5 a2, 2 a3, 2 a4, 2 a5, 2

Consider a matrix A with N = 5:

a0, 0 a0, 1 a0, 2 a0, 3 a0, 4
a1, 0 a1, 1 a1, 2 a1, 3 a1, 4

A = a2, 0 a2, 1 a2, 2 a2, 3 a2, 4

a3, 0 a3, 1 a3, 2 a3, 3 a3, 4
a4, 0 a4, 1 a4, 2 a4, 3 a4, 4

• Not transposed - transr = 'N'

The elements of the lower triangle of A can be packed in a matrix with the dimensions (N)-by-((N
+1)/2) = 5 by 3:
a0, 0 a3, 3 a4, 3
a1, 0 a1, 1 a4, 4

AP = a2, 0 a2, 1 a2, 2

a3, 0 a3, 1 a3, 2
a4, 0 a4, 1 a4, 2
• Transposed or conjugate transposed - transr = 'T' or transr = 'C'

The elements of the lower triangle of A can be packed in a matrix with the dimensions ((N+1)/2) by
(N ) = 5 by 3:
a0, 0 a1, 0 a2, 0 a3, 0 a4, 0

AP = a3, 3 a1, 1 a2, 1 a3, 1 a4, 1

a4, 3 a4, 4 a2, 2 a3, 2 a4, 2

The packed matrix AP can be stored using column major layout or row major layout.

NOTE
The matrix_layout and transr parameters can specify the same storage scheme: for example, the
storage scheme for matrix_layout = LAPACK_COL_MAJOR and transr = 'N' is the same as that for
matrix_layout = LAPACK_ROW_MAJOR and transr = 'T'.

465
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Element ai,j is stored as array element ap[l] where the mapping of l(i, j) is defined in the following tables.

• Column major layout: matrix_layout = LAPACK_COL_MAJOR a

trans uplo N l(i, j) = i j

'N' 'U' 2k (j - k)(N + 1) + i 0 ≤i < N max(i, k) ≤j

< N

i*(N + 1) + j + k + 1 0 ≤i < k i≤j < k

2k + (j - k)N + i 0 ≤i < N max(i, k) ≤j

1 < N

i*N + j + k + 1 0 ≤i < k i≤j < k

'L' 2k j(N + 1) + i + 1 0 ≤i < N 0 ≤j≤ min(i,

(i - k)*(N + 1) + j - k k≤i < N k≤j≤i

2k + jN + i 0 ≤i < N 0 ≤j≤ min(i,

1 k)

(i - k)*N + j - k - 1 k + 1 ≤i < N k + 1 ≤j≤i

'T' or 'U' 2k ik + j - k 0 ≤i < N max(i, k) ≤j

'C' < N

(j + k + 1)*k + i 0 ≤i < k i≤j < k

2k + i(k + 1) + j - k 0 ≤i < N max(i, k) ≤j

1 < N

(j + k + 1)*(k + 1) + i 0 ≤i < k i≤j < k

'L' 2k (i + 1)k + j 0 ≤i < N 0 ≤j≤ min(i,

(j - k)*k + i - k k≤i < N k≤j≤i

2k + i(k + 1) + j 0 ≤i < N 0 ≤j≤ min(i,

1 k)

(j - k - 1)*(k + 1) + i - k k + 1 ≤i < N k + 1 ≤j≤i

• Row major layout: matrix_layout = LAPACK_ROW_MAJOR

trans uplo N l(i, j) = i j

'N' 'U' 2k ik + j - k 0 ≤i < N max(i, k) ≤j

< N

(k + j + 1)*k + i 0 ≤i < k i≤j < k

466
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
trans uplo N l(i, j) = i j
r

2k + i(k + 1) + j - k 0 ≤i < N max(i, k) ≤j

1 < N

(k + j + 1)*(k + 1) + i 0 ≤i < k i≤j < k

'L' 2k (i + 1)k + j 0 ≤i < N 0 ≤j≤ min(i,

(j - k)*k + i - k k≤i < N k≤j≤i

2k + i(k + 1) + j 0 ≤i < N 0 ≤j≤ min(i,

1 k)

(j - k - 1)*(k + 1) + i - k k + 1 ≤i < N k + 1 ≤j≤i

'T' or 'U' 2k (j - k)(N + 1) + i 0 ≤i < N max(i, k) ≤j

'C' < N

i*(N + 1) + k + j + 1 0 ≤i < k i≤j < k

2k + (j - k)N + i 0 ≤i < N max(i, k) ≤j

1 < N

i*N + k + j + 1 0 ≤i < k i≤j < k

'L' 2k j(N + 1) + i + 1 0 ≤i < N 0 ≤j≤ min(i,

(i - k)*(N + 1) + j - k k≤i < N k≤j≤i

2k + jN + i 0 ≤i < N 0 ≤j≤ min(i,

1 k)

(i - k)*N + j - k - 1 k + 1 ≤i < N k + 1 ≤j≤i

NOTE
Although LAPACK accepts parameter values of zero for matrix size, in general the size of the array
should be greater than or equal to max(1, N*(N + 1)/2).

Mathematical Notation for LAPACK Routines

Descriptions of LAPACK routines use the following notation:
AH For an M-by-N matrix A, denotes the conjugate transposed N-by-M
matrix with elements:

For a real-valued matrix, AH = AT.

467
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

x·y The dot product of two vectors, defined as:

Ax = b A system of linear equations with an n-by-n matrix A = {aij}, a

right-hand side vector b = {bi}, and an unknown vector x = {xi}.

AX = B A set of systems with a common matrix A and multiple right-hand

sides. The columns of B are individual right-hand sides, and the
columns of X are the corresponding solutions.

|x| the vector with elements |xi| (absolute values of xi).

|A| the matrix with elements |aij| (absolute values of aij).

||x||∞ = maxi|xi| The infinity-norm of the vector x.

||A||∞ = maxiΣj|aij| The infinity-norm of the matrix A.

||A||1 = maxjΣi|aij| The one-norm of the matrix A. ||A||1 = ||AT||∞ = ||AH||∞

||x||2 The 2-norm of the vector x: ||x||2 = (Σi|xi|2)1/2 = ||x||E (see

the definition for Euclidean norm in this topic).

||A||2 The 2-norm (or spectral norm) of the matrix A.

||A||E The Euclidean norm of the matrix A: ||A||E2 = ΣiΣj|aij|2.

κ(A) = ||A||·||A-1|| The condition number of the matrix A.

λi Eigenvalues of the matrix A (for the definition of eigenvalues, see

Eigenvalue Problems).

σi Singular values of the matrix A. They are equal to square roots of the
eigenvalues of AHA. (For more information, see Singular Value
Decomposition).

Error Analysis
In practice, most computations are performed with rounding errors. Besides, you often need to solve a
system Ax = b, where the data (the elements of A and b) are not known exactly. Therefore, it is important
to understand how the data errors and rounding errors can affect the solution x.
Data perturbations. If x is the exact solution of Ax = b, and x + δx is the exact solution of a perturbed
problem (A + δA)(x + δx) = (b + δb), then this estimate, given up to linear terms of perturbations,
holds:

where A + δA is nonsingular and

468
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

In other words, relative errors in A or b may be amplified in the solution vector x by a factor κ(A) = ||A||
||A-1|| called the condition number of A.
Rounding errors have the same effect as relative perturbations c(n)ε in the original data. Here ε is the
machine precision, defined as the smallest positive number x such that 1 + x > 1; and c(n) is a modest
function of the matrix order n. The corresponding solution error is
||δx||/||x||≤c(n)κ(A)ε. (The value of c(n) is seldom greater than 10n.)

NOTE
Machine precision depends on the data type used. For example, it is usually defined in the float.h
file as FLT_EPSILON the float datatype and DBL_EPSILON for the double datatype.

Thus, if your matrix A is ill-conditioned (that is, its condition number κ(A) is very large), then the error in
the solution x can also be large; you might even encounter a complete loss of precision. LAPACK provides
routines that allow you to estimate κ(A) (see Routines for Estimating the Condition Number) and also give
you a more precise estimate for the actual solution error (see Refining the Solution and Estimating Its Error).

LAPACK Linear Equation Routines

This section describes routines for performing the following computations:
– factoring the matrix (except for triangular matrices)
– equilibrating the matrix (except for RFP matrices)
– solving a system of linear equations
– estimating the condition number of a matrix (except for RFP matrices)
– refining the solution of linear equations and computing its error bounds (except for RFP matrices)
– inverting the matrix.
To solve a particular problem, you can call two or more computational routines or call a corresponding driver
routine that combines several tasks in one call. For example, to solve a system of linear equations with a
general matrix, call ?getrf (LU factorization) and then ?getrs (computing the solution). Then, call ?gerfs
to refine the solution and get the error bounds. Alternatively, use the driver routine ?gesvx that performs all
these tasks in one call.

LAPACK Linear Equation Computational Routines

Table "Computational Routines for Systems of Equations with Real Matrices" lists the LAPACK computational
routines for factorizing, equilibrating, and inverting real matrices, estimating their condition numbers, solving
systems of equations with real matrices, refining the solution, and estimating its error. Table "Computational
Routines for Systems of Equations with Complex Matrices" lists similar routines for complex matrices.
Computational Routines for Systems of Equations with Real Matrices
Matrix type, Factorize Equilibrate Solve Condition Estimate Invert matrix
storage scheme matrix matrix system number error

general ?getrf ?geequ, ?getrs ?gecon ?gerfs, ?getri

469
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Matrix type, Factorize Equilibrate Solve Condition Estimate Invert matrix

storage scheme matrix matrix system number error

?geequb ?gerfsx

general band ?gbtrf ?gbequ, ?gbtrs ?gbcon ?gbrfs,

?gbequb ?gbrfsx

general tridiagonal ?gttrf ?gttrs ?gtcon ?gtrfs

diagonally ?dttrfb ?dttrsb

dominant
tridiagonal

symmetric ?potrf ?poequ, ?potrs ?pocon ?porfs, ?potri

positive-definite
?poequb ?porfsx

symmetric ?pptrf ?ppequ ?pptrs ?ppcon ?pprfs ?pptri

positive-definite,
packed storage

symmetric ?pftrf ?pftrs ?pftri

positive-definite,
RFP storage

symmetric ?pbtrf ?pbequ ?pbtrs ?pbcon ?pbrfs

positive-definite,
band

symmetric ?pttrf ?pttrs ?ptcon ?ptrfs

positive-definite,
tridiagonal

symmetric ?sytrf ?syequb ?sytrs ?sycon ?syrfs, ?sytri

indefinite
?sytrf_rk ?sytrs2 ?sycon_3 ?syrfsx ?sytri2
?sytrf_aa ?sytrs3 ?sytri2x
?sytrs_aa ?sytri_3

symmetric ?sptrf ?sptrs ?spcon ?sprfs ?sptri

indefinite, packed
storage
mkl_?spffrt2, mkl_?spffrtx

triangular ?trtrs ?trcon ?trrfs ?trtri

triangular, packed ?tptrs ?tpcon ?tprfs ?tptri

storage

triangular, RFP ?tftri

storage

triangular band ?tbtrs ?tbcon ?tbrfs

Computational Routines for Systems of Equations with Complex Matrices

Matrix type, Factorize Equilibrate Solve Condition Estimate Invert matrix
storage scheme matrix matrix system number error

general ?getrf ?geequ, ?getrs ?gecon ?gerfs, ?getri

?geequb ?gerfsx

470
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Matrix type, Factorize Equilibrate Solve Condition Estimate Invert matrix
storage scheme matrix matrix system number error

general band ?gbtrf ?gbequ, ?gbtrs ?gbcon ?gbrfs,

?gbequb ?gbrfsx

general tridiagonal ?gttrf ?gttrs ?gtcon ?gtrfs

Hermitian ?potrf ?poequ, ?potrs ?pocon ?porfs, ?potri

positive-definite
?poequb ?porfsx

Hermitian ?pptrf ?ppequ ?pptrs ?ppcon ?pprfs ?pptri

positive-definite,
packed storage

Hermitian ?pftrf ?pftrs ?pftri

positive-definite,
RFP storage

Hermitian ?pbtrf ?pbequ ?pbtrs ?pbcon ?pbrfs

positive-definite,
band

Hermitian ?pttrf ?pttrs ?ptcon ?ptrfs

positive-definite,
tridiagonal

Hermitian ?hetrf ?heequb ?hetrs ?hecon ?herfs, ?hetri

indefinite
?hetrf_rk ?hetrs2 ?hecon_3 ?herfsx ?hetri2
?hetrf_aa ?hetrs_3 ?hetri2x
?hetrs_aa ?hetri_3

symmetric ?sytrf ?syequb ?sytrs ?sycon ?syrfs, ?sytri

indefinite
?sytrf_rk ?sytrs2 ?sycon_3 ?syrfsx ?sytri2
?sytrs3 ?sytri2x
?sytri_3

Hermitian ?hptrf ?hptrs ?hpcon ?hprfs ?hptri

indefinite, packed
storage

symmetric ?sptrf ?sptrs ?spcon ?sprfs ?sptri

indefinite, packed
storage
mkl_?spffrt2, mkl_?spffrtx

triangular ?trtrs ?trcon ?trrfs ?trtri

triangular, packed ?tptrs ?tpcon ?tprfs ?tptri

storage

triangular, RFP ?tftri

storage

triangular band ?tbtrs ?tbcon ?tbrfs

471
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Matrix Factorization: LAPACK Computational Routines

This section describes the LAPACK routines for matrix factorization. The following factorizations are
supported:
• LU factorization
• Cholesky factorization of real symmetric positive-definite matrices
• Cholesky factorization of real symmetric positive-definite matrices with pivoting
• Cholesky factorization of Hermitian positive-definite matrices
• Cholesky factorization of Hermitian positive-definite matrices with pivoting
• Bunch-Kaufman factorization of real and complex symmetric matrices
• Bunch-Kaufman factorization of Hermitian matrices.
You can compute:
• the LU factorization using full and band storage of matrices
• the Cholesky factorization using full, packed, RFP, and band storage
• the Bunch-Kaufman factorization using full and packed storage.

?getrf
Computes the LU factorization of a general m-by-n
matrix.

Syntax
lapack_int LAPACKE_sgetrf (int matrix_layout , lapack_int m , lapack_int n , float *
a , lapack_int lda , lapack_int * ipiv );
lapack_int LAPACKE_dgetrf (int matrix_layout , lapack_int m , lapack_int n , double *
a , lapack_int lda , lapack_int * ipiv );
lapack_int LAPACKE_cgetrf (int matrix_layout , lapack_int m , lapack_int n ,
lapack_complex_float * a , lapack_int lda , lapack_int * ipiv );
lapack_int LAPACKE_zgetrf (int matrix_layout , lapack_int m , lapack_int n ,
lapack_complex_double * a , lapack_int lda , lapack_int * ipiv );

Include Files
• mkl.h

Description

The routine computes the LU factorization of a general m-by-n matrix A as

A = P*L*U,
where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m >
n) and U is upper triangular (upper trapezoidal if m < n). The routine uses partial pivoting, with row
interchanges.

NOTE
This routine supports the Progress Routine feature. See Progress Function for details.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

472
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
m The number of rows in the matrix A (m≥ 0).

n The number of columns in A; n≥ 0.

a Array, size at least max(1, lda*n) for column-major layout or max(1,

lda*m) for row-major layout. Contains the matrix A.

lda The leading dimension of array a, which must be at least max(1, m)

for column-major layout or max(1, n) for row-major layout.

Output Parameters

a Overwritten by L and U. The unit diagonal elements of L are not

stored.

ipiv Array, size at least max(1,min(m, n)). Contains the pivot indices; for
1 ≤i≤ min(m, n), row i was interchanged with row ipiv(i).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, uii is 0. The factorization has been completed, but U is exactly singular. Division by 0 will
occur if you use the factor U for solving a system of linear equations.

Application Notes
The computed L and U are the exact factors of a perturbed matrix A + E, where

|E| ≤c(min(m,n))εP|L||U|
c(n) is a modest linear function of n, and ε is the machine precision.
The approximate number of floating-point operations for real flavors is

(2/3)n3 If m = n,

(1/3)n2(3m-n) If m>n,

(1/3)m2(3n-m) If m<n.

The number of operations for complex flavors is four times greater.

After calling this routine with m = n, you can call the following:

?getrs to solve A*X = B or ATX = B or AHX = B

?gecon to estimate the condition number of A

?getri to compute the inverse of A.

See Also
mkl_progress

Matrix Storage Schemes

473
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

mkl_?getrfnp
Computes the LU factorization of a general m-by-n
matrix without pivoting.

Syntax
lapack_int LAPACKE_mkl_sgetrfnp (int matrix_layout , lapack_int m , lapack_int n ,
float * a , lapack_int lda );
lapack_int LAPACKE_mkl_dgetrfnp (int matrix_layout , lapack_int m , lapack_int n ,
double * a , lapack_int lda );
lapack_int LAPACKE_mkl_cgetrfnp (int matrix_layout , lapack_int m , lapack_int n ,
lapack_complex_float * a , lapack_int lda );
lapack_int LAPACKE_mkl_zgetrfnp (int matrix_layout , lapack_int m , lapack_int n ,
lapack_complex_double * a , lapack_int lda );

Include Files
• mkl.h

Description

The routine computes the LU factorization of a general m-by-n matrix A as

A = L*U,
where L is lower triangular with unit-diagonal elements (lower trapezoidal if m > n) and U is upper triangular
(upper trapezoidal if m < n). The routine does not use pivoting.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

m The number of rows in the matrix A (m≥ 0).

n The number of columns in A; n≥ 0.

a Array, size at least max(1, lda*n) for column-major layout or max(1,

lda*m) for row-major layout. Contains the matrix A.

lda The leading dimension of array a, which must be at least max(1, m)

for column-major layout or max(1, n) for row-major layout.

Output Parameters

a Overwritten by L and U. The unit diagonal elements of L are not

stored.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, parameter i had an illegal value.

474
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = i, uii is 0. The factorization has been completed, but U is exactly singular. Division by 0 will
occur if you use the factor U for solving a system of linear equations.

Application Notes
The approximate number of floating-point operations for real flavors is

(2/3)n3 If m = n,

(1/3)n2(3m-n) If m>n,

(1/3)m2(3n-m) If m<n.

The number of operations for complex flavors is four times greater.

After calling this routine with m = n, you can call the following:

mkl_?getrinp to compute the inverse of A

See Also
mkl_progress

Matrix Storage Schemes

mkl_?getrfnpi
Performs LU factorization (complete or incomplete) of
a general matrix without pivoting.

Syntax
lapack_int LAPACKE_mkl_sgetrfnpi (int matrix_layout, lapack_int m, lapack_int n,
lapack_int nfact, float* a, lapack_int lda);
lapack_int LAPACKE_mkl_dgetrfnpi (int matrix_layout, lapack_int m, lapack_int n,
lapack_int nfact, double* a, lapack_int lda);
lapack_int LAPACKE_mkl_cgetrfnpi (int matrix_layout, lapack_int m, lapack_int n,
lapack_int nfact, lapack_complex_float* a, lapack_int lda);
lapack_int LAPACKE_mkl_zgetrfnpi (int matrix_layout, lapack_int m, lapack_int n,
lapack_int nfact, lapack_complex_double* a, lapack_int lda);

Include Files
• mkl.h

Description
The routine computes the LU factorization of a general m-by-n matrix A without using pivoting. It supports
incomplete factorization. The factorization has the form:
A = L*U,
where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n) and U is upper triangular
(upper trapezoidal if m < n).
Incomplete factorization has the form:

where L is lower trapezoidal with unit diagonal elements, U is upper trapezoidal, and is the unfactored
part of matrix A. See the application notes section for further details.

475
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

NOTE
Use ?getrf if it is possible that the matrix is not diagonal dominant.

Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

m The number of rows in matrix A; m≥ 0.

n The number of columns in matrix A; n≥ 0.

nfact The number of rows and columns to factor; 0 ≤nfact≤ min(m, n). Note that
if nfact < min(m, n), incomplete factorization is performed.

a Array of size at least lda*n for column major layout and at least lda*m for
row major layout. Contains the matrix A.

lda The leading dimension of array a. lda≥ max(1, m) for column major layout
and lda≥ max(1, n) for row major layout.

Output Parameters

a Overwritten by L and U. The unit diagonal elements of L are not stored.

When incomplete factorization is specified by setting nfact < min(m, n), a

also contains the unfactored submatrix . See the application notes

section for further details.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, uii is 0. The requested factorization has been completed, but U is exactly singular. Division by 0
will occur if factorization is completed and factor U is used for solving a system of linear equations.

Application Notes
The computed L and U are the exact factors of a perturbed matrix A + E, with

|E| ≤c(min(m, n))ε|L||U|

where c(n) is a modest linear function of n, and ε is the machine precision.

The approximate number of floating-point operations for real flavors is

(2/3)n3 If m = n = nfact

(1/3)n2(3m-n) If m>n = nfact

(1/3)m2(3n-m) If m = nfact<n

476
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
(2/3)n3 - (n-nfact)3 If m = n,nfact< min(m, n)

(1/3)(n2(3m-n) - (n-nfact)2(3m - If m>n > nfact

2nfact - n))

(1/3)(m2(3n-m) - (m-nfact)2(3n - If nfact < m < n

2nfact - m))

The number of operations for complex flavors is four times greater.

When incomplete factorization is specified, the first nfact rows and columns are factored, with the update of
the remaining rows and columns of A as follows:

If matrix A is represented as a block 2-by-2 matrix:

where

• A11 is a square matrix of order nfact,

• A21 is an (m - nfact)-by-nfact matrix,
• A12 is an nfact-by-(n - nfact) matrix, and
• A22 is an (m - nfact)-by-(n - nfact) matrix.

The result is

L1 is a lower triangular square matrix of order nfact with unit diagonal and U1 is an upper triangular square
matrix of order nfact. L1 and U1 result from LU factorization of matrix A11: A11 = L1U1.

L2 is an (m - nfact)-by-nfact matrix and L2 = A21U1-1. U2 is an nfact-by-(n - nfact) matrix and U2 =

L1-1A12.

is an (m - nfact)-by-(n - nfact) matrix and = A22 - L2U2.

477
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

On exit, elements of the upper triangle U1 are stored in place of the upper triangle of block A11 in array a;
elements of the lower triangle L1 are stored in the lower triangle of block A11 in array a (unit diagonal

elements are not stored). Elements of L2 replace elements of A21; U2 replaces elements of A12 and
replaces elements of A22.

?getrf2
Computes LU factorization using partial pivoting with
row interchanges.

Syntax
lapack_int LAPACKE_sgetrf2 (int matrix_layout, lapack_int m, lapack_int n, float * a,
lapack_int lda, lapack_int * ipiv);
lapack_int LAPACKE_dgetrf2 (int matrix_layout, lapack_int m, lapack_int n, double * a,
lapack_int lda, lapack_int * ipiv);
lapack_int LAPACKE_cgetrf2 (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float * a, lapack_int lda, lapack_int * ipiv);
lapack_int LAPACKE_zgetrf2 (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double * a, lapack_int lda, lapack_int * ipiv);

Include Files
• mkl.h

Description
?getrf2 computes an LU factorization of a general m-by-n matrix A using partial pivoting with row
interchanges.
The factorization has the form
A=P*L*U
where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m >
n), and U is upper triangular (upper trapezoidal if m < n).
This is the recursive version of the algorithm. It divides the matrix into four submatrices:
A11 A12
A=
A21 A22

478
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
where A11 is n1 by n1 and A22 is n2 by n2 with n1 = min(m, n), and n2 = n - n1.

A11
The subroutine calls itself to factor ,
A12
A12
do the swaps on , solve A12, update A22, then it calls itself to factor A22 and do the swaps on A21.
A22

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

m The number of rows of the matrix A. m >= 0.

n The number of columns of the matrix A. n >= 0.

a Array, size lda*n.

On entry, the m-by-n matrix to be factored.

lda The leading dimension of the array a. lda >= max(1,m).

Output Parameters

a On exit, the factors L and U from the factorization A = P * L * U; the

unit diagonal elements of L are not stored.

ipiv Array, size (min(m,n)).

The pivot indices; for 1 <= i <= min(m,n), row i of the matrix was
interchanged with row ipiv[i - 1].

Return Values
This function returns a value info.

= 0: successful exit
< 0: if info = -i, the i-th argument had an illegal value.
> 0: if info = i, Ui, i is exactly zero. The factorization has been completed, but the factor U is exactly singular,
and division by zero will occur if it is used to solve a system of equations.

?gbtrf
Computes the LU factorization of a general m-by-n
band matrix.

Syntax
lapack_int LAPACKE_sgbtrf (int matrix_layout , lapack_int m , lapack_int n , lapack_int
kl , lapack_int ku , float * ab , lapack_int ldab , lapack_int * ipiv );
lapack_int LAPACKE_dgbtrf (int matrix_layout , lapack_int m , lapack_int n , lapack_int
kl , lapack_int ku , double * ab , lapack_int ldab , lapack_int * ipiv );
lapack_int LAPACKE_cgbtrf (int matrix_layout , lapack_int m , lapack_int n , lapack_int
kl , lapack_int ku , lapack_complex_float * ab , lapack_int ldab , lapack_int * ipiv );
lapack_int LAPACKE_zgbtrf (int matrix_layout , lapack_int m , lapack_int n , lapack_int
kl , lapack_int ku , lapack_complex_double * ab , lapack_int ldab , lapack_int *
ipiv );

479
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description

The routine forms the LU factorization of a general m-by-n band matrix A with kl non-zero subdiagonals and
ku non-zero superdiagonals, that is,

A = P*L*U,
where P is a permutation matrix; L is lower triangular with unit diagonal elements and at most kl non-zero
elements in each column; U is an upper triangular band matrix with kl + ku superdiagonals. The routine uses
partial pivoting, with row interchanges (which creates the additional kl superdiagonals in U).

NOTE
This routine supports the Progress Routine feature. See Progress Function for details.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

m The number of rows in matrix A; m≥ 0.

n The number of columns in matrix A; n≥ 0.

kl The number of subdiagonals within the band of A; kl≥ 0.

ku The number of superdiagonals within the band of A; ku≥ 0.

ab Array, size at least max(1, ldab*n) for column-major layout or

max(1, ldab*m) for row-major layout.

The array ab contains the matrix A in band storage as described in

Band Storage.

ldab The leading dimension of the array ab. (ldab≥ 2*kl + ku + 1)

Output Parameters

ab Overwritten with elements of L and U. U is stored as an upper

triangular band matrix with kl + ku superdiagonals, and L is stored
as a lower triangular band matrix with kl subdiagonals (diagonal unit
values are not stored). Since the output array has more nonzero
elements than the initial matrix A, there are limitations on the value of
ldab and the placement of elements of A in array ab.
See Application Notes below for further details.

ipiv Array, size at least max(1,min(m, n)). The pivot indices; for 1 ≤i≤
min(m, n) , row i was interchanged with row ipiv(i).

Return Values
This function returns a value info.

480
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, uiiis 0. The factorization has been completed, but U is exactly singular. Division by 0 will occur
if you use the factor U for solving a system of linear equations.

Application Notes
The computed L and U are the exact factors of a perturbed matrix A + E, where

|E| ≤c(kl+ku+1) εP|L||U|

c(k) is a modest linear function of k, and ε is the machine precision.
The total number of floating-point operations for real flavors varies between approximately 2n(ku+1)kl and
2n(kl+ku+1)kl. The number of operations for complex flavors is four times greater. All these estimates
assume that kl and ku are much less than min(m,n).

As described in Band Storage, storage of a band matrix can be considered in two steps: packing band matrix
elements into a matrix AB, then storing the elements in a linear array ab using a full storage scheme. The
effect of the ?gbtrf routine on matrix AB is illustrated by this example, for m = n = 6, kl = 2, ku = 1.

• matrix_layout = LAPACK_COL_MAJOR

On entry: On exit:

• matrix_layout = LAPACK_ROW_MAJOR

On entry: On exit:

Elements marked * are not used; elements marked + need not be set on entry, but are required by the
routine to store elements of U because of fill-in resulting from the row interchanges.
After calling this routine with m = n, you can call the following routines:

gbtrs to solve AX = B or ATX = B or AH*X = B

gbcon to estimate the condition number of A.

481
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

See Also
mkl_progress

Matrix Storage Schemes

?gttrf
Computes the LU factorization of a tridiagonal matrix.

Syntax
lapack_int LAPACKE_sgttrf (lapack_int n , float * dl , float * d , float * du , float *
du2 , lapack_int * ipiv );
lapack_int LAPACKE_dgttrf (lapack_int n , double * dl , double * d , double * du ,
double * du2 , lapack_int * ipiv );
lapack_int LAPACKE_cgttrf (lapack_int n , lapack_complex_float * dl ,
lapack_complex_float * d , lapack_complex_float * du , lapack_complex_float * du2 ,
lapack_int * ipiv );
lapack_int LAPACKE_zgttrf (lapack_int n , lapack_complex_double * dl ,
lapack_complex_double * d , lapack_complex_double * du , lapack_complex_double * du2 ,
lapack_int * ipiv );

Include Files
• mkl.h

Description

The routine computes the LU factorization of a real or complex tridiagonal matrix A using elimination with
partial pivoting and row interchanges.
The factorization has the form

A = L*U,
where L is a product of permutation and unit lower bidiagonal matrices and U is upper triangular with
nonzeroes in only the main diagonal and first two superdiagonals.

Input Parameters

n The order of the matrix A; n≥ 0.

dl, d, du Arrays containing elements of A.

The array dl of dimension (n - 1) contains the subdiagonal elements
of A.
The array d of dimension n contains the diagonal elements of A.
The array du of dimension (n - 1) contains the superdiagonal
elements of A.

Output Parameters

dl Overwritten by the (n-1) multipliers that define the matrix L from the
LU factorization of A.

482
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
d Overwritten by the n diagonal elements of the upper triangular matrix
U from the LU factorization of A.

du Overwritten by the (n-1) elements of the first superdiagonal of U.

du2 Array, dimension (n -2). On exit, du2 contains (n-2) elements of

the second superdiagonal of U.

ipiv Array, dimension (n). The pivot indices: for 1 ≤ i ≤ n, row i was
interchanged with row ipiv[i-1]. ipiv[i-1] is always i or i+1;
ipiv[i-1] = i indicates a row interchange was not required.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, uiiis 0. The factorization has been completed, but U is exactly singular. Division by zero will
occur if you use the factor U for solving a system of linear equations.

Application Notes

?gbtrs to solve AX = B or ATX = B or AH*X = B

?gbcon to estimate the condition number of A.

?dttrfb
Computes the factorization of a diagonally dominant
tridiagonal matrix.

Syntax
void sdttrfb (const MKL_INT * n , float * dl , float * d , const float * du , MKL_INT *
info );
void ddttrfb (const MKL_INT * n , double * dl , double * d , const double * du ,
MKL_INT * info );
void cdttrfb (const MKL_INT * n , MKL_Complex8 * dl , MKL_Complex8 * d , const
MKL_Complex8 * du , MKL_INT * info );
void zdttrfb_ (const MKL_INT * n , MKL_Complex16 * dl , MKL_Complex16 * d , const
MKL_Complex16 * du , MKL_INT * info );

Include Files
• mkl.h

Description

The ?dttrfb routine computes the factorization of a real or complex tridiagonal matrix A with the BABE
(Burning At Both Ends) algorithm without pivoting. The factorization has the form

A = L1*U*L2
where

483
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

• L1 and L2 are unit lower bidiagonal with k and n - k - 1 subdiagonal elements, respectively, where k =
n/2, and
• U is an upper bidiagonal matrix with nonzeroes in only the main diagonal and first superdiagonal.

Input Parameters

n The order of the matrix A; n≥ 0.

dl, d, du Arrays containing elements of A.

The array dl of dimension (n - 1) contains the subdiagonal
elements of A.
The array d of dimension n contains the diagonal elements of A.

The array du of dimension (n - 1) contains the superdiagonal

elements of A.

Output Parameters

dl Overwritten by the (n -1) multipliers that define the matrix L from

the LU factorization of A.

d Overwritten by the n diagonal element reciprocals of the upper

triangular matrix U from the factorization of A.

du Overwritten by the (n-1) elements of the superdiagonal of U.

info If info = 0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, uii is 0. The factorization has been completed, but U is

exactly singular. Division by zero will occur if you use the factor U for
solving a system of linear equations.

Application Notes
A diagonally dominant tridiagonal system is defined such that |di| > |dli-1| + |dui| for any i:

1 < i < n, and |d1| > |du1|, |dn| > |dln-1|

The underlying BABE algorithm is designed for diagonally dominant systems. Such systems are free from the
numerical stability issue unlike the canonical systems that use elimination with partial pivoting (see ?gttrf).
The diagonally dominant systems are much faster than the canonical systems.

NOTE
• The current implementation of BABE has a potential accuracy issue on very small or large data
close to the underflow or overflow threshold respectively. Scale the matrix before applying the
solver in the case of such input data.
• Applying the ?dttrfb factorization to non-diagonally dominant systems may lead to an accuracy
loss, or false singularity detected due to no pivoting.

?potrf
Computes the Cholesky factorization of a symmetric
(Hermitian) positive-definite matrix.

484
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_spotrf (int matrix_layout , char uplo , lapack_int n , float * a ,
lapack_int lda );
lapack_int LAPACKE_dpotrf (int matrix_layout , char uplo , lapack_int n , double * a ,
lapack_int lda );
lapack_int LAPACKE_cpotrf (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * a , lapack_int lda );
lapack_int LAPACKE_zpotrf (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * a , lapack_int lda );

Include Files
• mkl.h

Description

The routine forms the Cholesky factorization of a symmetric positive-definite or, for complex data, Hermitian
positive-definite matrix A:

A = UT* U for real data, A = UH* U for complex data if uplo='U'

A = L*LT for real data, A = L*LH for complex data if uplo='L'

where L is a lower triangular matrix and U is upper triangular.

NOTE
This routine supports the Progress Routine feature. See Progress Function for details.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored and how
A is factored:
If uplo = 'U', the array a stores the upper triangular part of the matrix A,
and the strictly lower triangular part of the matrix is not referenced.
If uplo = 'L', the array a stores the lower triangular part of the matrix A,
and the strictly upper triangular part of the matrix is not referenced.

n Specifies the order of the matrix A. The value of n must be at least zero.

a Array, size max(1, lda*n). The array a contains either the upper or the
lower triangular part of the matrix A (see uplo).

lda The leading dimension of a. Must be at least max(1, n).

485
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

a The upper or lower triangular part of a is overwritten by the Cholesky factor

U or L, as specified by uplo.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, the leading minor of order i (and therefore the matrix A itself) is not positive-definite, and the
factorization could not be completed. This may indicate an error in forming the matrix A.

Application Notes
If uplo = 'U', the computed factor U is the exact factor of a perturbed matrix A + E, where

c(n) is a modest linear function of n, and ε is the machine precision.

A similar estimate holds for uplo = 'L'.

The total number of floating-point operations is approximately (1/3)n3 for real flavors or (4/3)n3 for
complex flavors.
After calling this routine, you can call the following routines:

?potrs to solve A*X = B

?pocon to estimate the condition number of A

?potri to compute the inverse of A.

See Also
mkl_progress

Matrix Storage Schemes

?potrf2
Computes Cholesky factorization using a recursive
algorithm.

Syntax
lapack_int LAPACKE_spotrf2 (int matrix_layout, char uplo, lapack_int n, float * a,
lapack_int lda);
lapack_int LAPACKE_dpotrf2 (int matrix_layout, char uplo, lapack_int n, double * a,
lapack_int lda);
lapack_int LAPACKE_cpotrf2 (int matrix_layout, char uplo, lapack_int n,
lapack_complex_float * a, lapack_int lda);
lapack_int LAPACKE_zpotrf2 (int matrix_layout, char uplo, lapack_int n,
lapack_complex_double * a, lapack_int lda);

486
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h

Description
?potrf2 computes the Cholesky factorization of a real or complex symmetric positive definite matrix A using
the recursive algorithm.
The factorization has the form
for real flavors:
A = UT * U, if uplo = 'U', or

A = L * LT, if uplo = 'L',

for complex flavors:

A = UH * U, if uplo = 'U',

or A = L * LH, if uplo = 'L',

where U is an upper triangular matrix and L is lower triangular.

This is the recursive version of the algorithm. It divides the matrix into four submatrices:
A11 A12
A=
A21 A22
where A11 is n1 by n1 and A22 is n2 by n2, with n1 = n/2 and n2 = n-n1.

The subroutine calls itself to factor A11. Update and scale A21 or A12, update A22 then call itself to factor
A22.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo = 'U': Upper triangle of A is stored;

= 'L': Lower triangle of A is stored.

n The order of the matrix A.

n≥ 0.

a Array, size (lda*n).

On entry, the symmetric matrix A.

If uplo = 'U', the leading n-by-n upper triangular part of a contains the
upper triangular part of the matrix A, and the strictly lower triangular part
of a is not referenced.

If uplo = 'L', the leading n-by-n lower triangular part of a contains the
lower triangular part of the matrix A, and the strictly upper triangular part
of a is not referenced.

lda The leading dimension of the array a.

lda≥ max(1,n).

487
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

a On exit, if info = 0, the factor U or L from the Cholesky factorization.

For real flavors:

A = UT*U or A = L*LT;
For complex flavors:
A = UH*U or A = L*LH.

Return Values
This function returns a value info.

= 0: successful exit
< 0: if info = -i, the i-th argument had an illegal value

> 0: if info = i, the leading minor of order i is not positive definite, and the factorization could not be
completed.

?pstrf
Computes the Cholesky factorization with complete
pivoting of a real symmetric (complex Hermitian)
positive semidefinite matrix.

Syntax
lapack_int LAPACKE_spstrf( int matrix_layout, char uplo, lapack_int n, float* a,
lapack_int lda, lapack_int* piv, lapack_int* rank, float tol );
lapack_int LAPACKE_dpstrf( int matrix_layout, char uplo, lapack_int n, double* a,
lapack_int lda, lapack_int* piv, lapack_int* rank, double tol );
lapack_int LAPACKE_cpstrf( int matrix_layout, char uplo, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_int* piv, lapack_int* rank, float tol );
lapack_int LAPACKE_zpstrf( int matrix_layout, char uplo, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_int* piv, lapack_int* rank, double
tol );

Include Files
• mkl.h

Description

The routine computes the Cholesky factorization with complete pivoting of a real symmetric (complex
Hermitian) positive semidefinite matrix. The form of the factorization is:

PT * A * P = UT * U , if uplo ='U' for real flavors,

PT * A * P = UH * U , if uplo ='U' for complex flavors,
PT * A * P = L * LT, if uplo ='L' for real flavors,
PT * A * P = L * LH, if uplo ='L' for complex flavors,

where P is a permutation matrix stored as vector piv, and U and L are upper and lower triangular matrices,
respectively.
This algorithm does not attempt to check that A is positive semidefinite. This version of the algorithm calls
level 3 BLAS.

488
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored:

If uplo = 'U', the array a stores the upper triangular part of the
matrix A, and the strictly lower triangular part of the matrix is not
referenced.
If uplo = 'L', the array a stores the lower triangular part of the
matrix A, and the strictly upper triangular part of the matrix is not
referenced.

n The order of matrix A; n≥ 0.

a Array a, size max(1,lda*n). The array a contains either the upper or

the lower triangular part of the matrix A (see uplo). .

tol User defined tolerance. If tol < 0, then n*ε*max(Ak,k), where ε is the
machine precision, will be used (see Error Analysis for the definition of
machine precision). The algorithm terminates at the (k-1)-st step, if
the pivot ≤tol.

lda The leading dimension of a; at least max(1, n).

Output Parameters

a If info = 0, the factor U or L from the Cholesky factorization is as

described in Description.

piv Array, size at least max(1, n). The array piv is such that the nonzero
entries are Ppiv[k-1],k (1 ≤k≤n).

rank The rank of a given by the number of steps the algorithm completed.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -k, the k-th argument had an illegal value.

If info > 0, the matrix A is either rank deficient with a computed rank as returned in rank, or is not
positive semidefinite.

See Also
Matrix Storage Schemes

?pftrf
Computes the Cholesky factorization of a symmetric
(Hermitian) positive-definite matrix using the
Rectangular Full Packed (RFP) format .

489
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
lapack_int LAPACKE_spftrf (int matrix_layout , char transr , char uplo , lapack_int n ,
float * a );
lapack_int LAPACKE_dpftrf (int matrix_layout , char transr , char uplo , lapack_int n ,
double * a );
lapack_int LAPACKE_cpftrf (int matrix_layout , char transr , char uplo , lapack_int n ,
lapack_complex_float * a );
lapack_int LAPACKE_zpftrf (int matrix_layout , char transr , char uplo , lapack_int n ,
lapack_complex_double * a );

Include Files
• mkl.h

Description

The routine forms the Cholesky factorization of a symmetric positive-definite or, for complex data, a
Hermitian positive-definite matrix A:

A = UTU for real data, A = UHU for complex data if uplo='U'

A = L*LT for real data, A = L*LH for complex data if uplo='L'

where L is a lower triangular matrix and U is upper triangular.

The matrix A is in the Rectangular Full Packed (RFP) format. For the description of the RFP format, see Matrix
Storage Schemes.
This is the block version of the algorithm, calling Level 3 BLAS.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

transr Must be 'N', 'T' (for real data) or 'C' (for complex data).

If transr = 'N', the Normal transr of RFP A is stored.

If transr = 'T', the Transpose transr of RFP A is stored.

If transr = 'C', the Conjugate-Transpose transr of RFP A is stored.

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored:

If uplo = 'U', the array a stores the upper triangular part of the
matrix A.
If uplo = 'L', the array a stores the lower triangular part of the
matrix A.

n The order of the matrix A; n≥ 0.

a Array, size (n*(n+1)/2). The array a contains the matrix A in the RFP
format.

490
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

a a is overwritten by the Cholesky factor U or L, as specified by uplo

and trans.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, parameter i had an illegal value.

See Also
Matrix Storage Schemes

?pptrf
Computes the Cholesky factorization of a symmetric
(Hermitian) positive-definite matrix using packed
storage.

Syntax
lapack_int LAPACKE_spptrf (int matrix_layout , char uplo , lapack_int n , float * ap );
lapack_int LAPACKE_dpptrf (int matrix_layout , char uplo , lapack_int n , double *
ap );
lapack_int LAPACKE_cpptrf (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * ap );
lapack_int LAPACKE_zpptrf (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * ap );

Include Files
• mkl.h

Description

The routine forms the Cholesky factorization of a symmetric positive-definite or, for complex data, Hermitian
positive-definite packed matrix A:

A = UTU for real data, A = UHU for complex data if uplo='U'

A = L*LT for real data, A = L*LH for complex data if uplo='L'

where L is a lower triangular matrix and U is upper triangular.

NOTE
This routine supports the Progress Routine feature. See Progress Function for details.

491
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is packed in

the array ap, and how A is factored:
If uplo = 'U', the array ap stores the upper triangular part of the
matrix A, and A is factored as UH*U.
If uplo = 'L', the array ap stores the lower triangular part of the
matrix A; A is factored as L*LH.

n The order of matrix A; n≥ 0.

ap Array, size at least max(1, n(n+1)/2). The array ap contains either the
upper or the lower triangular part of the matrix A (as specified by
uplo) in packed storage (see Matrix Storage Schemes).

Output Parameters

ap Overwritten by the Cholesky factor U or L, as specified by uplo.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
If uplo = 'U', the computed factor U is the exact factor of a perturbed matrix A + E, where

c(n) is a modest linear function of n, and ε is the machine precision.

A similar estimate holds for uplo = 'L'.

The total number of floating-point operations is approximately (1/3)n3 for real flavors and (4/3)n3 for
complex flavors.
After calling this routine, you can call the following routines:

?pptrs to solve A*X = B

?ppcon to estimate the condition number of A

?pptri to compute the inverse of A.

See Also
mkl_progress

492
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

Matrix Storage Schemes

?pbtrf
Computes the Cholesky factorization of a symmetric
(Hermitian) positive-definite band matrix.

Syntax
lapack_int LAPACKE_spbtrf (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , float * ab , lapack_int ldab );
lapack_int LAPACKE_dpbtrf (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , double * ab , lapack_int ldab );
lapack_int LAPACKE_cpbtrf (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , lapack_complex_float * ab , lapack_int ldab );
lapack_int LAPACKE_zpbtrf (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , lapack_complex_double * ab , lapack_int ldab );

Include Files
• mkl.h

Description

The routine forms the Cholesky factorization of a symmetric positive-definite or, for complex data, Hermitian
positive-definite band matrix A:

A = UTU for real data, A = UHU for complex data if uplo='U'

A = L*LT for real data, A = L*LH for complex data if uplo='L'

where L is a lower triangular matrix and U is upper triangular.

NOTE
This routine supports the Progress Routine feature. See Progress Function for details.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored in

the array ab, and how A is factored:
If uplo = 'U', the upper triangle of A is stored.

If uplo = 'L', the lower triangle of A is stored.

n The order of matrix A; n≥ 0.

kd The number of superdiagonals or subdiagonals in the matrix A; kd≥ 0.

493
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ab Array, size max(1, ldab*n). The array ab contains either the upper or
the lower triangular part of the matrix A (as specified by uplo) in band
storage (see Matrix Storage Schemes).

ldab The leading dimension of the array ab. (ldab≥kd + 1)

Output Parameters

ab The upper or lower triangular part of A (in band storage) is

overwritten by the Cholesky factor U or L, as specified by uplo.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
If uplo = 'U', the computed factor U is the exact factor of a perturbed matrix A + E, where

c(n) is a modest linear function of n, and ε is the machine precision.

A similar estimate holds for uplo = 'L'.

The total number of floating-point operations for real flavors is approximately n(kd+1)2. The number of
operations for complex flavors is 4 times greater. All these estimates assume that kd is much less than n.
After calling this routine, you can call the following routines:

?pbtrs to solve A*X = B

?pbcon to estimate the condition number of A.

See Also
mkl_progress

Matrix Storage Schemes

?pttrf
Computes the factorization of a symmetric (Hermitian)
positive-definite tridiagonal matrix.

Syntax
lapack_int LAPACKE_spttrf( lapack_int n, float* d, float* e );
lapack_int LAPACKE_dpttrf( lapack_int n, double* d, double* e );
lapack_int LAPACKE_cpttrf( lapack_int n, float* d, lapack_complex_float* e );
lapack_int LAPACKE_zpttrf( lapack_int n, double* d, lapack_complex_double* e );

494
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h

Description

The routine forms the factorization of a symmetric positive-definite or, for complex data, Hermitian positive-
definite tridiagonal matrix A:
A = L*D*LT for real flavors, or
A = L*D*LH for complex flavors,
where D is diagonal and L is unit lower bidiagonal. The factorization may also be regarded as having the form
A = UT*D*U for real flavors, or A = UH*D*U for complex flavors, where U is unit upper bidiagonal.

Input Parameters

n The order of the matrix A; n≥ 0.

d Array, dimension (n). Contains the diagonal elements of A.

e Array, dimension (n -1). Contains the subdiagonal elements of A.

Output Parameters

d Overwritten by the n diagonal elements of the diagonal matrix D from

the L*D*LT (for real flavors) or L*D*LH (for complex flavors)
factorization of A.

e Overwritten by the (n - 1) sub-diagonal elements of the unit

bidiagonal factor L or U from the factorization of A.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, the leading minor of order i (and therefore the matrix A itself) is not positive-definite; if i < n,
the factorization could not be completed, while if i = n, the factorization was completed, but d[n - 1] ≤
0.

?sytrf
Computes the Bunch-Kaufman factorization of a
symmetric matrix.

Syntax
lapack_int LAPACKE_ssytrf (int matrix_layout , char uplo , lapack_int n , float * a ,
lapack_int lda , lapack_int * ipiv );
lapack_int LAPACKE_dsytrf (int matrix_layout , char uplo , lapack_int n , double * a ,
lapack_int lda , lapack_int * ipiv );
lapack_int LAPACKE_csytrf (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * a , lapack_int lda , lapack_int * ipiv );

495
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lapack_int LAPACKE_zsytrf (int matrix_layout , char uplo , lapack_int n ,

lapack_complex_double * a , lapack_int lda , lapack_int * ipiv );

Include Files
• mkl.h

Description

The routine computes the factorization of a real/complex symmetric matrix A using the Bunch-Kaufman
diagonal pivoting method. The form of the factorization is:

if uplo='U', A = U*D*UT
if uplo='L', A = L*D*LT

where A is the input matrix, U and L are products of permutation and triangular matrices with unit diagonal
(upper triangular for U and lower triangular for L), and D is a symmetric block-diagonal matrix with 1-by-1
and 2-by-2 diagonal blocks. U and L have 2-by-2 unit diagonal blocks corresponding to the 2-by-2 blocks of
D.

NOTE This routine supports the Progress Routine feature. See Progress Routine for details.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored and
how A is factored:
If uplo = 'U', the array a stores the upper triangular part of the
matrix A, and A is factored as U*D*UT.

If uplo = 'L', the array a stores the lower triangular part of the
matrix A, and A is factored as L*D*LT.

n The order of matrix A; n≥ 0.

a Array, size max(1, lda*n). The array a contains either the upper or
the lower triangular part of the matrix A (see uplo).

lda The leading dimension of a; at least max(1, n).

Output Parameters

a The upper or lower triangular part of a is overwritten by details of the

block-diagonal matrix D and the multipliers used to obtain the factor U
(or L).

496
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If uplo = 'U' and ipiv[i] =ipiv[i-1] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and i-th row and column of A was
interchanged with the m-th row and column.
If uplo = 'L' and ipiv[i] =ipiv[i-1] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and (i+1)-th row and column of A
was interchanged with the m-th row and column.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, Dii is 0. The factorization has been completed, but D is exactly singular. Division by 0 will occur
if you use D for solving a system of linear equations.

Application Notes
The 2-by-2 unit diagonal blocks and the unit diagonal elements of U and L are not stored. The remaining
elements of U and L are stored in the corresponding columns of the array a, but additional row interchanges
are required to recover U or L explicitly (which is seldom necessary).
If ipiv[i-1] = i for all i =1...n, then all off-diagonal elements of U (L) are stored explicitly in the
corresponding elements of the array a.
If uplo = 'U', the computed factors U and D are the exact factors of a perturbed matrix A + E, where

|E| ≤c(n)εP|U||D||UT|PT
c(n) is a modest linear function of n, and ε is the machine precision. A similar estimate holds for the
computed L and D when uplo = 'L'.

The total number of floating-point operations is approximately (1/3)n3 for real flavors or (4/3)n3 for
complex flavors.
After calling this routine, you can call the following routines:

?sytrs to solve A*X = B

?sycon to estimate the condition number of A

?sytri to compute the inverse of A.

If uplo = 'U', then A = UDU', where

U = P(n)U(n) ... P(k)U(k)*...,

that is, U is a product of terms P(k)*U(k), where

• k decreases from n to 1 in steps of 1 and 2.

• D is a block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks D(k).
• P(k) is a permutation matrix as defined by ipiv[k-1].
• U(k) is a unit upper triangular matrix, such that if the diagonal block D(k) is of order s (s = 1 or 2), then

497
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If s = 1, D(k) overwrites A(k,k), and v overwrites A(1:k-1,k).

If s = 2, the upper triangle of D(k) overwrites A(k-1,k-1), A(k-1,k) and A(k,k), and v overwrites A(1:k-2,k
-1:k).

If uplo = 'L', then A = LDL', where

L = P(1)L(1) ... P(k)L(k)*...,

that is, L is a product of terms P(k)*L(k), where

• k increases from 1 to n in steps of 1 and 2.

• D is a block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks D(k).
• P(k) is a permutation matrix as defined by ipiv(k).
• L(k) is a unit lower triangular matrix, such that if the diagonal block D(k) is of order s (s = 1 or 2), then

If s = 1, D(k) overwrites A(k,k), and v overwrites A(k+1:n,k).

If s = 2, the lower triangle of D(k) overwrites A(k,k), A(k+1,k), and A(k+1,k+1), and v overwrites A(k
+2:n,k:k+1).

See Also
mkl_progress

Matrix Storage Schemes

?sytrf_aa
Computes the factorization of a symmetric matrix
using Aasen's algorithm.
lapack_int LAPACKE_ssytrf_aa (int matrix_layout, char uplo, lapack_int n, float * A,
lapack_int lda, lapack_int * ipiv);

498
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_dsytrf_aa (int matrix_layout, char uplo, lapack_int n, double * A,
lapack_int lda, lapack_int * ipiv);
lapack_int LAPACKE_csytrf_aa (int matrix_layout, char uplo, lapack_int n,
lapack_complex_float * A, lapack_int lda, lapack_int * ipiv);
lapack_int LAPACKE_zsytrf_aa (int matrix_layout, char uplo, lapack_int n,
lapack_complex_double * A, lapack_int lda, lapack_int * ipiv);

Description
?sytrf_aa computes the factorization of a symmetric matrix A using Aasen's algorithm. The form of the
factorization is A = U*T*UT or A = L*T*LT where U (or L) is a product of permutation and unit upper (lower)
triangular matrices, and T is a complex symmetric tridiagonal matrix.
This is the blocked version of the algorithm, calling Level 3 BLAS.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo • = 'U': The upper triangle of A is stored.

• = 'L': The lower triangle of A is stored.

n The order of the matrix A. n ≥ 0.

A Array of size max(1, lda*n). The array A contains either the upper or the
lower triangular part of the matrix A (see uplo).

lda The leading dimension of the array A.

Output Parameters

A On exit, the tridiagonal matrix is stored in the diagonals and the

subdiagonals of A just below (or above) the diagonals, and L is stored below
(or above) the subdiagonals, when uplo is 'L' (or 'U').

ipiv Array of size n. On exit, it contains the details of the interchanges; that is,
the row and column k of A were interchanged with the row and column
ipiv(k).

Return Values
This function returns a value info.

= 0: Successful exit.
< 0: If info = -i, the ith argument had an illegal value.

> 0: If info = i, D(i,i) is exactly zero. The factorization has been completed, but the block diagonal matrix D
is exactly singular, and division by zero will occur if it is used to solve a system of equations.

?sytrf_rook
Computes the bounded Bunch-Kaufman factorization
of a symmetric matrix.

Syntax
lapack_int LAPACKE_ssytrf_rook (int matrix_layout, char uplo, lapack_int n, float * a,
lapack_int lda, lapack_int * ipiv);

499
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lapack_int LAPACKE_dsytrf_rook (int matrix_layout, char uplo, lapack_int n, double * a,

lapack_int lda, lapack_int * ipiv);
lapack_int LAPACKE_csytrf_rook (int matrix_layout, char uplo, lapack_int n,
lapack_complex_float * a, lapack_int lda, lapack_int * ipiv);
lapack_int LAPACKE_zsytrf_rook (int matrix_layout, char uplo, lapack_int n,
lapack_complex_double * a, lapack_int lda, lapack_int * ipiv);

Include Files
• mkl.h

Description

The routine computes the factorization of a real/complex symmetric matrix A using the bounded Bunch-
Kaufman ("rook") diagonal pivoting method. The form of the factorization is:

if uplo='U', A = U*D*UT
if uplo='L', A = L*D*LT,

Input Parameters

matrix_layout Specifies whether matrix storage layout for array b is row major
(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored and

how A is factored:
If uplo = 'U', the array a stores the upper triangular part of the
matrix A, and A is factored as U*D*UT.

If uplo = 'L', the array a stores the lower triangular part of the
matrix A, and A is factored as L*D*LT.

n The order of matrix A; n≥ 0.

a Array, size lda*n. The array a contains either the upper or the lower
triangular part of the matrix A (see uplo).

lda The leading dimension of a; at least max(1, n).

Output Parameters

a The upper or lower triangular part of a is overwritten by details of the

block-diagonal matrix D and the multipliers used to obtain the factor U
(or L).

ipiv If ipiv(k) > 0, then rows and columns k and ipiv(k) were
interchanged and Dk, k is a 1-by-1 diagonal block.

500
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If uplo = 'U' and ipiv(k) < 0 and ipiv(k - 1) < 0, then rows
and columns k and -ipiv(k) were interchanged, rows and columns k -
1 and -ipiv(k - 1) were interchanged, and Dk-1:k, k-1:k is a 2-by-2
diagonal block.
If uplo = 'L' and ipiv(k) < 0 and ipiv(k + 1) < 0, then rows
and columns k and -ipiv(k) were interchanged, rows and columns k +
1 and -ipiv(k + 1) were interchanged, and Dk:k+1, k:k+1 is a 2-by-2
diagonal block.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, Dii is 0. The factorization has been completed, but D is exactly singular. Division by 0 will occur
if you use D for solving a system of linear equations.

Application Notes
The total number of floating-point operations is approximately (1/3)n3 for real flavors or (4/3)n3 for
complex flavors.
After calling this routine, you can call the following routines:

?sytrs_rook to solve A*X = B

?sycon_rook (Fortran only) to estimate the condition number of A

?sytri_rook (Fortran only) to compute the inverse of A.

If uplo = 'U', then A = UDU', where

U = P(n)U(n) ... P(k)U(k)*...,

that is, U is a product of terms P(k)*U(k), where

• k decreases from n to 1 in steps of 1 and 2.

501
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If s = 1, D(k) overwrites A(k,k), and v overwrites A(1:k-1,k).

If s = 2, the upper triangle of D(k) overwrites A(k-1,k-1), A(k-1,k) and A(k,k), and v overwrites A(1:k-2,k
-1:k).

If uplo = 'L', then A = LDL', where

L = P(1)L(1) ... P(k)L(k)*...,

that is, L is a product of terms P(k)*L(k), where

• k increases from 1 to n in steps of 1 and 2.

If s = 1, D(k) overwrites A(k,k), and v overwrites A(k+1:n,k).

If s = 2, the lower triangle of D(k) overwrites A(k,k), A(k+1,k), and A(k+1,k+1), and v overwrites A(k
+2:n,k:k+1).

See Also
Matrix Storage Schemes

?sytrf_rk
Computes the factorization of a real or complex
symmetric indefinite matrix using the bounded Bunch-
Kaufman (rook) diagonal pivoting method (BLAS3
blocked algorithm).
lapack_int LAPACKE_ssytrf_rk (int matrix_layout, char uplo, lapack_int n, float * A,
lapack_int lda, float * e, lapack_int * ipiv);
lapack_int LAPACKE_dsytrf_rk (int matrix_layout, char uplo, lapack_int n, double * A,
lapack_int lda, double * e, lapack_int * ipiv);
lapack_int LAPACKE_csytrf_rk (int matrix_layout, char uplo, lapack_int n,
lapack_complex_float * A, lapack_int lda, lapack_complex_float * e, lapack_int * ipiv);
lapack_int LAPACKE_zsytrf_rk (int matrix_layout, char uplo, lapack_int n,
lapack_complex_double * A, lapack_int lda, lapack_complex_double * e, lapack_int *
ipiv);

Description
?sytrf_rk computes the factorization of a real or complex symmetric matrix A using the bounded Bunch-
Kaufman (rook) diagonal pivoting method: A= P*U*D*(UT)*(PT) or A = P*L*D*(LT)*(PT), where U (or L) is
unit upper (or lower) triangular matrix, UT (or LT) is the transpose of U (or L), P is a permutation matrix, PT
is the transpose of P, and D is symmetric and block diagonal with 1-by-1 and 2-by-2 diagonal blocks.
This is the blocked version of the algorithm, calling Level-3 BLAS.

502
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo Specifies whether the upper or lower triangular part of the symmetric
matrix A is stored:

• = 'U': Upper triangular

• = 'L': Lower triangular

n The order of the matrix A. n ≥ 0.

A Array of size max(1, lda*n). On entry, the symmetric matrix A. If uplo =

'U', the leading n-by-n upper triangular part of A contains the upper
triangular part of the matrix A, and the strictly lower triangular part of A is
not referenced. If uplo = 'L', the leading n-by-n lower triangular part of A
contains the lower triangular part of the matrix A, and the strictly upper
triangular part of A is not referenced.

lda The leading dimension of the array A.

Output Parameters

A On exit, contains:

• Only diagonal elements of the symmetric block diagonal matrix D on the

diagonal of A; that is, D(k,k) = A(k,k); (superdiagonal (or subdiagonal)
elements of D are stored on exit in array e).
• If uplo = 'U', factor U in the superdiagonal part of A. If uplo = 'L',
factor L in the subdiagonal part of A.

e Array of size n. On exit, contains the superdiagonal (or subdiagonal)

elements of the symmetric block diagonal matrix D with 1-by-1 or 2-by-2
diagonal blocks. If uplo = 'U', e(i) = D(i-1,i), i=2:N, and e(1) is set to 0.
If uplo = 'L', e(i) = D(i+1,i), i=1:N-1, and e(n) is set to 0.

NOTE For 1-by-1 diagonal block D(k), where 1 ≤ k ≤ n, the

element e[k-1] is set to 0 in both the uplo = 'U' and uplo =
'L' cases.

ipiv Array of size n.ipiv describes the permutation matrix P in the factorization
of matrix A as follows: The absolute value of ipiv(k) represents the index of
the row and column that were interchanged with the kth row and column.
The value of uplo describes the order in which the interchanges were
applied. Also, the sign of ipiv represents the block structure of the
symmetric block diagonal matrix D with 1-by-1 or 2-by-2 diagonal blocks,
which correspond to 1 or 2 interchanges at each factorization step. If uplo
= 'U' (in factorization order, k decreases from n to 1):

1. A single positive entry ipiv(k) > 0 means that D(k,k) is a 1-by-1

diagonal block. If ipiv(k) != k, rows and columns k and ipiv(k) were
interchanged in the matrix A(1:N,1:N). If ipiv(k) = k, no interchange
occurred.

503
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

2. A pair of consecutive negative entries ipiv(k) < 0 and ipiv(k-1). < 0

means that D(k-1:k,k-1:k) is a 2-by-2 diagonal block. (Note that
negative entries in ipiv appear only in pairs.)

• If -ipiv(k) != k, rows and columns k and -ipiv(k) were

interchanged in the matrix A(1:N,1:N). If -ipiv(k) = k, no
interchange occurred.
• If -ipiv(k-1) != k-1, rows and columns k-1 and -ipiv(k-1) were
interchanged in the matrix A(1:N,1:N). If -ipiv(k-1) = k-1, no
interchange occurred.
3. In both cases 1 and 2, always ABS( ipiv(k) ) ≤ k.

NOTE Any entry ipiv(k) is always nonzero on output.

If uplo = 'L' (in factorization order, k increases from 1 to n):

1. A single positive entry ipiv(k) > 0 means that D(k,k) is a 1-by-1

diagonal block. If ipiv(k) != k, rows and columns k and ipiv(k) were
interchanged in the matrix A(1:N,1:N). If ipiv(k) = k, no interchange
occurred.
2. A pair of consecutive negative entries ipiv(k) < 0 and ipiv(k+1) < 0
means that D(k:k+1,k:k+1) is a 2-by-2 diagonal block. (Note that
negative entries in ipiv appear only in pairs.)

• If -ipiv(k) != k, rows and columns k and -ipiv(k) were

interchanged in the matrix A(1:N,1:N). If -ipiv(k) = k, no
interchange occurred.
• If -ipiv(k+1) != k+1, rows and columns k-1 and -ipiv(k-1) were
interchanged in the matrix A(1:N,1:N). If -ipiv(k+1) = k+1, no
interchange occurred.
3. In both cases 1 and 2, always ABS( ipiv(k) ) ≥ k.

NOTE Any entry ipiv(k) is always nonzero on output.

Return Values
This function returns a value info.

= 0: Successful exit.
< 0: If info = -k, the kth argument had an illegal value.

> 0: If info = k, the matrix A is singular. If uplo = 'U', column k in the upper triangular part of A contains
all zeros. If uplo = 'L', column k in the lower triangular part of A contains all zeros. Therefore, D(k,k) is
exactly zero, and superdiagonal elements of column k of U (or subdiagonal elements of column k of L) are all
zeros. The factorization has been completed, but the block diagonal matrix D is exactly singular, and division
by zero will occur if it is used to solve a system of equations.

?hetrf
Computes the Bunch-Kaufman factorization of a
complex Hermitian matrix.

504
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_chetrf (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * a , lapack_int lda , lapack_int * ipiv );
lapack_int LAPACKE_zhetrf (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * a , lapack_int lda , lapack_int * ipiv );

Include Files
• mkl.h

Description

The routine computes the factorization of a complex Hermitian matrix A using the Bunch-Kaufman diagonal
pivoting method:

if uplo='U', A = U*D*UH
if uplo='L', A = L*D*LH,

where A is the input matrix, U and L are products of permutation and triangular matrices with unit diagonal
(upper triangular for U and lower triangular for L), and D is a Hermitian block-diagonal matrix with 1-by-1
and 2-by-2 diagonal blocks. U and L have 2-by-2 unit diagonal blocks corresponding to the 2-by-2 blocks of
D.

NOTE
This routine supports the Progress Routine feature. See Progress Routine for details.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored and

how A is factored:
If uplo = 'U', the array a stores the upper triangular part of the
matrix A, and A is factored as U*D*UH.

If uplo = 'L', the array a stores the lower triangular part of the
matrix A, and A is factored as L*D*LH.

n The order of matrix A; n≥ 0.

a Array, size max(1, lda*n).

The array a contains the upper or the lower triangular part of the
matrix A (see uplo).

lda The leading dimension of a; at least max(1, n).

505
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

a The upper or lower triangular part of a is overwritten by details of the

block-diagonal matrix D and the multipliers used to obtain the factor U
(or L).

ipiv Array, size at least max(1, n). Contains details of the interchanges
and the block structure of D. If ipiv[i-1] = k >0, then dii is a 1-
by-1 block, and the i-th row and column of A was interchanged with
the k-th row and column.
If uplo = 'U' and ipiv[i] =ipiv[i-1] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and i-th row and column of A was
interchanged with the m-th row and column.
If uplo = 'L' and ipiv[i] =ipiv[i-1] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and (i+1)-th row and column of A
was interchanged with the m-th row and column.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, dii is 0. The factorization has been completed, but D is exactly singular. Division by 0 will occur
if you use D for solving a system of linear equations.

Application Notes
This routine is suitable for Hermitian matrices that are not known to be positive-definite. If A is in fact
positive-definite, the routine does not perform interchanges, and no 2-by-2 diagonal blocks occur in D.
The 2-by-2 unit diagonal blocks and the unit diagonal elements of U and L are not stored. The remaining
elements of U and L are stored in the corresponding columns of the array a, but additional row interchanges
are required to recover U or L explicitly (which is seldom necessary).
Ifipiv[i-1] = i for all i =1...n, then all off-diagonal elements of U (L) are stored explicitly in the
corresponding elements of the array a.
If uplo = 'U', the computed factors U and D are the exact factors of a perturbed matrix A + E, where

|E| ≤c(n)εP|U||D||UT|PT
c(n) is a modest linear function of n, and ε is the machine precision.
A similar estimate holds for the computed L and D when uplo = 'L'.

The total number of floating-point operations is approximately (4/3)n3.

After calling this routine, you can call the following routines:

?hetrs to solve A*X = B

?hecon to estimate the condition number of A

?hetri to compute the inverse of A.

See Also
mkl_progress

506
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Matrix Storage Schemes

?hetrf_aa
Computes the factorization of a complex hermitian
matrix using Aasen's algorithm.
LAPACK_DECL lapack_int LAPACKE_chetrf_aa (int matrix_layout, char uplo, lapack_int n,
lapack_complex_float * a, lapack_int lda, lapack_int * ipiv );
LAPACK_DECL lapack_int LAPACKE_zhetrf_aa (int matrix_layout, char uplo, lapack_int n,
lapack_complex_double * a, lapack_int lda, lapack_int * ipiv );

Description
?hetrf_aa computes the factorization of a complex Hermitian matrix A using Aasen's algorithm. The form of
the factorization is A = U * T * UH or a = L*T*LH where U (or L) is a product of permutation and unit upper
(lower) triangular matrices, and T is a Hermitian tridiagonal matrix. This is the blocked version of the
algorithm, calling Level 3 BLAS.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo = 'U': Upper triangle of A is stored; = 'L': Lower triangle of a is stored.

n The order of the matrix A. n≥ 0.

a Array of size lda*n. On entry, the Hermitian matrix A.

If uplo = 'U', the leading n-by-n upper triangular part of a contains the
upper triangular part of the matrix A, and the strictly lower triangular part
of a is not referenced.

If uplo = 'L', the leading n-by-n lower triangular part of a contains the
lower triangular part of the matrix A, and the strictly upper triangular part
of a is not referenced.

lda The leading dimension of the array a. lda≥ max(1,n).

lwork See Syntax - Workspace. The length of work. lwork≥ 2*n. For optimum
performance lwork≥n*(1 + nb), where nb is the optimal block size. If
lwork = -1, then a workspace query is assumed; the routine only
calculates the optimal size of the work array, returns this value as the first
entry of the work array, and no error message related to lwork is issued by
xerbla.

Output Parameters

a On exit, the tridiagonal matrix is stored in the diagonals and the

subdiagonals of a just below (or above) the diagonals, and L is stored below
(or above) the subdiagonals, when uplo is 'L' (or 'U').

ipiv array, dimension (n) On exit, it contains the details of the interchanges: the
row and column k of a were interchanged with the row and column
ipiv[k].

work See Syntax - Workspace. Array of size (max(1, lwork)). On exit, if info =
0, work[0] returns the optimal lwork.

507
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Return Values
This function returns a value info.

If info = 0: successful exit < 0: if info = -i, the i-th argument had an illegal value,

If info > 0: if info = i, Di, i is exactly zero. The factorization has been completed, but the block diagonal
matrix D is exactly singular, and division by zero will occur if it is used to solve a system of equations.

Syntax - Workspace
Use this interface if you want to explicitly provide the workspace array.
LAPACK_DECL lapack_int LAPACKE_chetrf_aa_work (int matrix_layout, char uplo, lapack_int
n, lapack_complex_float * a, lapack_int lda, lapack_int * ipiv, lapack_complex_float *
work, lapack_int lwork );
LAPACK_DECL lapack_int LAPACKE_zhetrf_aa_work (int matrix_layout, char uplo, lapack_int
n, lapack_complex_double * a, lapack_int lda, lapack_int * ipiv, lapack_complex_double
* work, lapack_int lwork );

?hetrf_rook
Computes the bounded Bunch-Kaufman factorization
of a complex Hermitian matrix.

Syntax
lapack_int LAPACKE_chetrf_rook (int matrix_layout, char uplo, lapack_int n,
lapack_complex_float * a, lapack_int lda, lapack_int * ipiv);
lapack_int LAPACKE_zhetrf_rook (int matrix_layout, char uplo, lapack_int n,
lapack_complex_double * a, lapack_int lda, lapack_int * ipiv);

Include Files
• mkl.h

Description

The routine computes the factorization of a complex Hermitian matrix A using the bounded Bunch-Kaufman
diagonal pivoting method:

if uplo='U', A = U*D*UH
if uplo='L', A = L*D*LH,

where A is the input matrix, U (or L ) is a product of permutation and unit upper ( or lower) triangular
matrices, and D is a Hermitian block-diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks.
This is the blocked version of the algorithm, calling Level 3 BLAS.

Input Parameters

matrix_layout Specifies whether matrix storage layout for array b is row major
(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored:

If uplo = 'U', the array a stores the upper triangular part of the
matrix A.

508
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If uplo = 'L', the array a stores the lower triangular part of the
matrix A.

n The order of matrix A; n≥ 0.

a Array a, size (lda*n)

The array a contains the upper or the lower triangular part of the
matrix A (see uplo).
If uplo = 'U', the leading n-by-n upper triangular part of a contains
the upper triangular part of the matrix A, and the strictly lower
triangular part of a is not referenced. If uplo = 'L', the leading n-by-n
lower triangular part of a contains the lower triangular part of the
matrix A, and the strictly upper triangular part of a is not referenced.

lda The leading dimension of a; at least max(1, n).

Output Parameters

a The block diagonal matrix D and the multipliers used to obtain the
factor U or L (see Application Notes for further details).

ipiv • If uplo = 'U':

If ipiv(k) > 0, then rows and columns k and ipiv(k) were
interchanged and Dk, k is a 1-by-1 diagonal block.
If ipiv(k) < 0 and ipiv(k - 1) < 0, then rows and columns k and
-ipiv(k) were interchanged and rows and columns k - 1 and -
ipiv(k - 1) were interchanged, Dk - 1:k,k - 1:k is a 2-by-2 diagonal
block.
• If uplo = 'L':
If ipiv(k) > 0, then rows and columns k and ipiv(k) were
interchanged and Dk,k is a 1-by-1 diagonal block.
If ipiv(k) < 0 and ipiv(k + 1) < 0, then rows and columns k and
-ipiv(k) were interchanged and rows and columns k + 1 and -
ipiv(k + 1) were interchanged, Dk:k + 1,k:k + 1 is a 2-by-2 diagonal
block.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, Dii is exactly 0. The factorization has been completed, but the block diagonal matrix D is
exactly singular, and division by 0 will occur if you use D for solving a system of linear equations.

Application Notes

If uplo = 'U', thenA = UDUH, where

U = P(n)U(n) ... P(k)U(k) ...,

509
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

i.e., U is a product of terms P(k)*U(k), where k decreases from n to 1 in steps of 1 or 2, and D is a block
diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks D(k). P(k) is a permutation matrix as defined by
ipiv(k), and U(k) is a unit upper triangular matrix, such that if the diagonal block D(k) is of order s (s = 1
or 2), then

k−s s n−k
k−s I v 0
U k =
s 0 I 0
n−k 0 0 I
If s = 1, D(k) overwrites A(k,k), and v overwrites A(1:k-1,k).
If s = 2, the upper triangle of D(k) overwrites A(k-1,k-1), A(k-1,k), and A(k,k), and v overwrites
A(1:k-2,k-1:k).
If uplo = 'L', then A = L*D*LH, where

L = P(1)L(1) ... P(k)L(k)* ...,

i.e., L is a product of terms P(k)*L(k), where k increases from 1 to n in steps of 1 or 2, and D is a block
diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks D(k). P(k) is a permutation matrix as defined by
ipiv(k), and L(k) is a unit lower triangular matrix, such that if the diagonal block D(k) is of order s (s = 1 or
2), then

k−1 s n−k−s+1
k−1 I 0 0
Lk =
s 0 I 0
n−k−s+1 0 v I
If s = 1, D(k) overwrites A(k,k), and v overwrites A(k+1:n,k).
If s = 2, the lower triangle of D(k) overwrites A(k,k), A(k+1,k), and A(k+1,k+1), and v overwrites A(k
+2:n,k:k+1).

See Also
mkl_progress

Matrix Storage Schemes

?hetrf_rk
Computes the factorization of a complex Hermitian
indefinite matrix using the bounded Bunch-Kaufman
(rook) diagonal pivoting method (BLAS3 blocked
algorithm).
lapack_int LAPACKE_chetrf_rk (int matrix_layout, char uplo, lapack_int n,
lapack_complex_float * A, lapack_int lda, lapack_complex_float * e, lapack_int * ipiv);
lapack_int LAPACKE_zhetrf_rk (int matrix_layout, char uplo, lapack_int n,
lapack_complex_double * A, lapack_int lda, lapack_complex_double * e, lapack_int *
ipiv);

Description
?hetrf_rk computes the factorization of a complex Hermitian matrix A using the bounded Bunch-Kaufman
(rook) diagonal pivoting method: A = P*U*D*(UH)*(PT) or A = P*L*D*(LH)*(PT), where U (or L) is unit upper
(or lower) triangular matrix, UH (or LH) is the conjugate of U (or L), P is a permutation matrix, PT is the
transpose of P, and D is Hermitian and block diagonal with 1-by-1 and 2-by-2 diagonal blocks.
This is the blocked version of the algorithm, calling Level 3 BLAS.

510
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo Specifies whether the upper or lower triangular part of the Hermitian matrix
A is stored:

• = 'U': Upper triangular.

• = 'L': Lower triangular.

n The order of the matrix A. n ≥ 0.

A Array of size max(1, lda*n). On entry, the Hermitian matrix A. If uplo =

'U': The leading n-by-n upper triangular part of A contains the upper
triangular part of the matrix A, and the strictly lower triangular part of A is
not referenced. If uplo = 'L': The leading n-by-n lower triangular part of A
contains the lower triangular part of the matrix A, and the strictly upper
triangular part of A is not referenced.

lda The leading dimension of the array A.

Output Parameters

A On exit, contains:

• Only diagonal elements of the Hermitian block diagonal matrix D on the

diagonal of A; that is, D(k,k) = A(k,k). Superdiagonal (or subdiagonal)
elements of D are stored on exit in array e.

—and—
• If uplo = 'U', factor U in the superdiagonal part of A. If uplo = 'L',
factor L in the subdiagonal part of A.

e Array of size n. On exit, contains the superdiagonal (or subdiagonal)

elements of the Hermitian block diagonal matrix D with 1-by-1 or 2-by-2
diagonal blocks. If uplo = 'U', e(i) = D(i-1,i), i=2:N, and e(1) is set to 0.
If uplo = 'L', e(i) = D(i+1,i), i=1:N-1, and e(n) is set to 0.

NOTE For 1-by-1 diagonal block D(k), where 1 ≤ k ≤ n, the

element e[k-1] is set to 0 in both the uplo = 'U' and uplo =
'L' cases.

ipiv Array of size n. ipiv describes the permutation matrix P in the factorization
of matrix A as follows: The absolute value of ipiv[k-1] represents the
index of row and column that were interchanged with the kth row and
column. The value of uplo describes the order in which the interchanges
were applied. Also, the sign of ipiv represents the block structure of the
Hermitian block diagonal matrix D with 1-by-1 or 2-by-2 diagonal blocks
that correspond to 1 or 2 interchanges at each factorization step. If uplo =
'U' (in factorization order, k decreases from n to 1):

511
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

1. A single positive entry ipiv(k) > 0 means that D(k,k) is a 1-by-1

diagonal block. If ipiv(k) != k, rows and columns k and ipiv(k) were
interchanged in the matrix A(1:N,1:N). If ipiv(k) = k, no interchange
occurred.
2. A pair of consecutive negative entries ipiv(k) < 0 and ipiv(k-1) < 0
means that D(k-1:k,k-1:k) is a 2-by-2 diagonal block. (Note that
negative entries in ipiv appear only in pairs.)

• If -ipiv(k) != k, rows and columns k and -ipiv(k) were

NOTE Any entry ipiv(k) is always nonzero on output.

If uplo = 'L' (in factorization order, k increases from 1 to n):

1. A single positive entry ipiv(k) > 0 means that D(k,k) is a 1-by-1

• If -ipiv(k) != k, rows and columns k and -ipiv(k) were

NOTE Any entry ipiv(k) is always nonzero on output.

Return Values
This function returns a value info.

= 0: Successful exit.
< 0: If info = -k, the kth argument had an illegal value.

> 0: If info = k, the matrix A is singular. If uplo = 'U', the column k in the upper triangular part of A
contains all zeros. If uplo = 'L', the column k in the lower triangular part of A contains all zeros. Therefore
D(k,k) is exactly zero, and superdiagonal elements of column k of U (or subdiagonal elements of column k of
L ) are all zeros. The factorization has been completed, but the block diagonal matrix D is exactly singular,
and division by zero will occur if it is used to solve a system of equations.

512
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
?sptrf
Computes the Bunch-Kaufman factorization of a
symmetric matrix using packed storage.

Syntax
lapack_int LAPACKE_ssptrf (int matrix_layout , char uplo , lapack_int n , float * ap ,
lapack_int * ipiv );
lapack_int LAPACKE_dsptrf (int matrix_layout , char uplo , lapack_int n , double * ap ,
lapack_int * ipiv );
lapack_int LAPACKE_csptrf (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * ap , lapack_int * ipiv );
lapack_int LAPACKE_zsptrf (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * ap , lapack_int * ipiv );

Include Files
• mkl.h

Description

The routine computes the factorization of a real/complex symmetric matrix A stored in the packed format
using the Bunch-Kaufman diagonal pivoting method. The form of the factorization is:

if uplo='U', A = U*D*UT
if uplo='L', A = L*D*LT,

where U and L are products of permutation and triangular matrices with unit diagonal (upper triangular for U
and lower triangular for L), and D is a symmetric block-diagonal matrix with 1-by-1 and 2-by-2 diagonal
blocks. U and L have 2-by-2 unit diagonal blocks corresponding to the 2-by-2 blocks of D.

NOTE
This routine supports the Progress Routine feature. See Progress Function for details.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is packed in

the array ap and how A is factored:
If uplo = 'U', the array ap stores the upper triangular part of the
matrix A, and A is factored as U*D*UT.

If uplo = 'L', the array ap stores the lower triangular part of the
matrix A, and A is factored as L*D*LT.

n The order of matrix A; n≥ 0.

ap Array, size at least max(1, n(n+1)/2). The array ap contains the upper
or the lower triangular part of the matrix A (as specified by uplo) in
packed storage (see Matrix Storage Schemes).

513
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

ap The upper or lower triangle of A (as specified by uplo) is overwritten

by details of the block-diagonal matrix D and the multipliers used to
obtain the factor U (or L).

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, dii is 0. The factorization has been completed, but D is exactly singular. Division by 0 will occur
if you use D for solving a system of linear equations.

Application Notes
The 2-by-2 unit diagonal blocks and the unit diagonal elements of U and L are not stored. The remaining
elements of U and L overwrite elements of the corresponding columns of the array ap, but additional row
interchanges are required to recover U or L explicitly (which is seldom necessary).
If ipiv(i) = i for all i = 1...n, then all off-diagonal elements of U (L) are stored explicitly in packed form.

If uplo = 'U', the computed factors U and D are the exact factors of a perturbed matrix A + E, where

|E| ≤c(n)εP|U||D||UT|PT
c(n) is a modest linear function of n, and ε is the machine precision. A similar estimate holds for the
computed L and D when uplo = 'L'.

The total number of floating-point operations is approximately (1/3)n3 for real flavors or (4/3)n3 for
complex flavors.
After calling this routine, you can call the following routines:

?sptrs to solve A*X = B

?spcon to estimate the condition number of A

?sptri to compute the inverse of A.

See Also
mkl_progress

Matrix Storage Schemes

514
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
?hptrf
Computes the Bunch-Kaufman factorization of a
complex Hermitian matrix using packed storage.

Syntax
lapack_int LAPACKE_chptrf (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * ap , lapack_int * ipiv );
lapack_int LAPACKE_zhptrf (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * ap , lapack_int * ipiv );

Include Files
• mkl.h

Description

The routine computes the factorization of a complex Hermitian packed matrix A using the Bunch-Kaufman
diagonal pivoting method:

if uplo='U', A = U*D*UH
if uplo='L', A = L*D*LH,

NOTE
This routine supports the Progress Routine feature. See Progress Function for details.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is packed

and how A is factored:
If uplo = 'U', the array ap stores the upper triangular part of the
matrix A, and A is factored as U*D*UH.

If uplo = 'L', the array ap stores the lower triangular part of the
matrix A, and A is factored as L*D*LH.

n The order of matrix A; n≥ 0.

ap Array, size at least max(1, n(n+1)/2). The array ap contains the upper
or the lower triangular part of the matrix A (as specified by uplo) in
packed storage (see Matrix Storage Schemes).

515
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

ap The upper or lower triangle of A (as specified by uplo) is overwritten

by details of the block-diagonal matrix D and the multipliers used to
obtain the factor U (or L).

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, dii is 0. The factorization has been completed, but D is exactly singular. Division by 0 will occur
if you use D for solving a system of linear equations.

Application Notes
The 2-by-2 unit diagonal blocks and the unit diagonal elements of U and L are not stored. The remaining
elements of U and L are stored in the array ap, but additional row interchanges are required to recover U or L
explicitly (which is seldom necessary).
If ipiv[i-1] = i for all i = 1...n, then all off-diagonal elements of U (L) are stored explicitly in the
corresponding elements of the array a.
If uplo = 'U', the computed factors U and D are the exact factors of a perturbed matrix A + E, where

|E| ≤c(n)εP|U||D||UT|PT
c(n) is a modest linear function of n, and ε is the machine precision.
A similar estimate holds for the computed L and D when uplo = 'L'.

The total number of floating-point operations is approximately (4/3)n3.

After calling this routine, you can call the following routines:

?hptrs to solve A*X = B

?hpcon to estimate the condition number of A

?hptri to compute the inverse of A.

See Also
mkl_progress

Matrix Storage Schemes

516
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
mkl_?spffrt2, mkl_?spffrtx
Computes the partial LDLT factorization of a
symmetric matrix using packed storage.

Syntax
void mkl_sspffrt2 (float *ap , const MKL_INT *n , const MKL_INT *ncolm , float *work ,
float *work2 );
void mkl_dspffrt2 (double *ap , const MKL_INT *n , const MKL_INT *ncolm , double
*work , double *work2 );
void mkl_cspffrt2 (MKL_Complex8 *ap , const MKL_INT *n , const MKL_INT *ncolm ,
MKL_Complex8 *work , MKL_Complex8 *work2 );
void mkl_zspffrt2 (MKL_Complex16 *ap , const MKL_INT *n , const MKL_INT *ncolm ,
MKL_Complex16 *work , MKL_Complex16 *work2 );
void mkl_sspffrtx (float *ap , const MKL_INT *n , const MKL_INT *ncolm , float *work ,
float *work2 );
void mkl_dspffrtx (double *ap , const MKL_INT *n , const MKL_INT *ncolm , double
*work , double *work2 );
void mkl_cspffrtx (MKL_Complex8 *ap , const MKL_INT *n , const MKL_INT *ncolm ,
MKL_Complex8 *work , MKL_Complex8 *work2 );
void mkl_zspffrtx (MKL_Complex16 *ap , const MKL_INT *n , const MKL_INT *ncolm ,
MKL_Complex16 *work , MKL_Complex16 *work2 );

Include Files
• mkl.h

Description

The routine computes the partial factorization A = LDLT , where L is a lower triangular matrix and D is a
diagonal matrix.

Caution
The routine assumes that the matrix A is factorizable. The routine does not perform pivoting
and does not handle diagonal elements which are zero, which cause the routine to produce
incorrect results without any indication.

T
a b
Consider the matrix A = , where a is the element in the first row and first column of A, b is a column
b C
vector of size n - 1 containing the elements from the second through n-th column of A, C is the lower-right
square submatrix of A, and I is the identity matrix.
The mkl_?spffrt2 routine performs ncolm successive factorizations of the form
T −1 T
a b a 0 a 0 a b
A= = .
b C b I 0 −1
C − ba b
T 0 I

The mkl_?spffrtx routine performs ncolm successive factorizations of the form

T 1 0 a 0 −1 T
a b 1 ba
A= = −1 T .
b C ba I 0 C − ba −1b 0 I

517
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The approximate number of floating point operations performed by real flavors of these routines is
(1/6)*ncolm*(2*ncolm2 - 6*ncolm*n + 3*ncolm + 6*n2 - 6*n + 7).

The approximate number of floating point operations performed by complex flavors of these routines is
(1/3)*ncolm*(4*ncolm2 - 12*ncolm*n + 9*ncolm + 12*n2 - 18*n + 8).

Input Parameters

ap Array, size at least max(1, n(n+1)/2). The array ap contains the lower
triangular part of the matrix A in packed storage (see Matrix Storage
Schemes for uplo = 'L').

n The order of matrix A; n≥ 0.

ncolm The number of columns to factor, ncolm≤n.

work, work2 Workspace arrays, size of each at least n.

Output Parameters

ap Overwritten by the factor L. The first ncolm diagonal elements of the

input matrix A are replaced with the diagonal elements of D. The
subdiagonal elements of the first ncolm columns are replaced with the
corresponding elements of L. The rest of the input array is updated as
indicated in the Description section.

NOTE
Specifying ncolm = n results in complete factorization A =
LDLT.

See Also
mkl_progress

Matrix Storage Schemes

Solving Systems of Linear Equations: LAPACK Computational Routines

This section describes the LAPACK routines for solving systems of linear equations. Before calling most of
these routines, you need to factorize the matrix of your system of equations (see Routines for Matrix
Factorization). However, the factorization is not necessary if your system of equations has a triangular
matrix.

?getrs
Solves a system of linear equations with an LU-
factored square coefficient matrix, with multiple right-
hand sides.

Syntax
lapack_int LAPACKE_sgetrs (int matrix_layout , char trans , lapack_int n , lapack_int
nrhs , const float * a , lapack_int lda , const lapack_int * ipiv , float * b ,
lapack_int ldb );
lapack_int LAPACKE_dgetrs (int matrix_layout , char trans , lapack_int n , lapack_int
nrhs , const double * a , lapack_int lda , const lapack_int * ipiv , double * b ,
lapack_int ldb );

518
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_cgetrs (int matrix_layout , char trans , lapack_int n , lapack_int
nrhs , const lapack_complex_float * a , lapack_int lda , const lapack_int * ipiv ,
lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zgetrs (int matrix_layout , char trans , lapack_int n , lapack_int
nrhs , const lapack_complex_double * a , lapack_int lda , const lapack_int * ipiv ,
lapack_complex_double * b , lapack_int ldb );

Include Files
• mkl.h

Description

The routine solves for X the following systems of linear equations:

A*X = B if trans='N',

AT*X = B if trans='T',

AH*X = B if trans='C' (for complex matrices only).

Before calling this routine, you must call ?getrf to compute the LU factorization of A.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

trans Must be 'N' or 'T' or 'C'.

Indicates the form of the equations:

If trans = 'N', then A*X = B is solved for X.

If trans = 'T', then AT*X = B is solved for X.

If trans = 'C', then AH*X = B is solved for X.

n The order of A; the number of rows in B(n≥ 0).

nrhs The number of right-hand sides; nrhs≥ 0.

a Array of size max(1, lda*n).

The array a contains LU factorization of matrix A resulting from the

call of ?getrf.

b Array of size max(1,ldb*nrhs) for column major layout, and

max(1,ldb*n) for row major layout.

The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.

lda The leading dimension of a; lda≥ max(1, n).

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

ipiv Array, size at least max(1, n). The ipiv array, as returned by ?getrf.

519
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

b Overwritten by the solution matrix X.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
For each right-hand side b, the computed solution is the exact solution of a perturbed system of equations (A
+ E)x = b, where

|E| ≤c(n)εP|L||U|
c(n) is a modest linear function of n, and ε is the machine precision.
If x0 is the true solution, the computed solution x satisfies this error bound:

where cond(A,x)= || |A-1||A| |x| ||∞ / ||x||∞≤ ||A-1||∞ ||A||∞ = κ∞(A).

Note that cond(A,x) can be much smaller than κ∞(A); the condition number of AT and AH might or might
not be equal to κ∞(A).

The approximate number of floating-point operations for one right-hand side vector b is 2n2 for real flavors
and 8n2 for complex flavors.

To estimate the condition number κ∞(A), call ?gecon.

To refine the solution and estimate the error, call ?gerfs.

See Also
Matrix Storage Schemes

?gbtrs
Solves a system of linear equations with an LU-
factored band coefficient matrix, with multiple right-
hand sides.

Syntax
lapack_int LAPACKE_sgbtrs (int matrix_layout , char trans , lapack_int n , lapack_int
kl , lapack_int ku , lapack_int nrhs , const float * ab , lapack_int ldab , const
lapack_int * ipiv , float * b , lapack_int ldb );
lapack_int LAPACKE_dgbtrs (int matrix_layout , char trans , lapack_int n , lapack_int
kl , lapack_int ku , lapack_int nrhs , const double * ab , lapack_int ldab , const
lapack_int * ipiv , double * b , lapack_int ldb );

520
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_cgbtrs (int matrix_layout , char trans , lapack_int n , lapack_int
kl , lapack_int ku , lapack_int nrhs , const lapack_complex_float * ab , lapack_int
ldab , const lapack_int * ipiv , lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zgbtrs (int matrix_layout , char trans , lapack_int n , lapack_int
kl , lapack_int ku , lapack_int nrhs , const lapack_complex_double * ab , lapack_int
ldab , const lapack_int * ipiv , lapack_complex_double * b , lapack_int ldb );

Include Files
• mkl.h

Description
The routine solves for X the following systems of linear equations:

A*X = B if trans='N',

AT*X = B if trans='T',

AH*X = B if trans='C' (for complex matrices only).

Here A is an LU-factored general band matrix of order n with kl non-zero subdiagonals and ku nonzero
superdiagonals. Before calling this routine, call ?gbtrf to compute the LU factorization of A.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

trans Must be 'N' or 'T' or 'C'.

n The order of A; the number of rows in B; n≥ 0.

kl The number of subdiagonals within the band of A; kl≥ 0.

ku The number of superdiagonals within the band of A; ku≥ 0.

nrhs The number of right-hand sides; nrhs≥ 0.

ab Array ab size max(1, ldab*n)

The array ab contains elements of the LU factors of the matrix A as

returned by gbtrf.

b Array b size max(1, ldb*nrhs) for column major layout and max(1,
ldb*n) for row major layout.
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.

ldab The leading dimension of the array ab; ldab≥ 2*kl + ku +1.

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

ipiv Array, size at least max(1, n). The ipiv array, as returned by ?gbtrf.

Output Parameters

b Overwritten by the solution matrix X.

521
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
For each right-hand side b, the computed solution is the exact solution of a perturbed system of equations (A
+ E)x = b, where

|E| ≤c(kl + ku + 1)εP|L||U|

c(k) is a modest linear function of k, and ε is the machine precision.
If x0 is the true solution, the computed solution x satisfies this error bound:

where cond(A,x)= || |A-1||A| |x| ||∞ / ||x||∞≤ ||A-1||∞ ||A||∞ = κ∞(A).

Note that cond(A,x) can be much smaller than κ∞(A); the condition number of AT and AH might or might
not be equal to κ∞(A).

The approximate number of floating-point operations for one right-hand side vector is 2n(ku + 2kl) for real
flavors. The number of operations for complex flavors is 4 times greater. All these estimates assume that kl
and ku are much less than min(m,n).
To estimate the condition number κ∞(A), call ?gbcon.

To refine the solution and estimate the error, call ?gbrfs.

See Also
Matrix Storage Schemes

?gttrs
Solves a system of linear equations with a tridiagonal
coefficient matrix using the LU factorization computed
by ?gttrf.

Syntax
lapack_int LAPACKE_sgttrs (int matrix_layout , char trans , lapack_int n , lapack_int
nrhs , const float * dl , const float * d , const float * du , const float * du2 ,
const lapack_int * ipiv , float * b , lapack_int ldb );
lapack_int LAPACKE_dgttrs (int matrix_layout , char trans , lapack_int n , lapack_int
nrhs , const double * dl , const double * d , const double * du , const double * du2 ,
const lapack_int * ipiv , double * b , lapack_int ldb );
lapack_int LAPACKE_cgttrs (int matrix_layout , char trans , lapack_int n , lapack_int
nrhs , const lapack_complex_float * dl , const lapack_complex_float * d , const
lapack_complex_float * du , const lapack_complex_float * du2 , const lapack_int *
ipiv , lapack_complex_float * b , lapack_int ldb );

522
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_zgttrs (int matrix_layout , char trans , lapack_int n , lapack_int
nrhs , const lapack_complex_double * dl , const lapack_complex_double * d , const
lapack_complex_double * du , const lapack_complex_double * du2 , const lapack_int *
ipiv , lapack_complex_double * b , lapack_int ldb );

Include Files
• mkl.h

Description

The routine solves for X the following systems of linear equations with multiple right hand sides:

A*X = B if trans='N',

AT*X = B if trans='T',

AH*X = B if trans='C' (for complex matrices only).

Before calling this routine, you must call ?gttrf to compute the LU factorization of A.

Input Parameters

matrix_layout Specifies whether matrix storage layout for array b is row major
(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

trans Must be 'N' or 'T' or 'C'.

Indicates the form of the equations:

If trans = 'N', then A*X = B is solved for X.

If trans = 'T', then AT*X = B is solved for X.

If trans = 'C', then AH*X = B is solved for X.

n The order of A; n≥ 0.

nrhs The number of right-hand sides, that is, the number of columns in B;
nrhs≥ 0.

dl,d,du,du2 Arrays: dl(n -1), d(n), du(n -1), du2(n -2).

The array dl contains the (n - 1) multipliers that define the matrix L

from the LU factorization of A.
The array d contains the n diagonal elements of the upper triangular
matrix U from the LU factorization of A.
The array du contains the (n - 1) elements of the first superdiagonal
of U.
The array du2 contains the (n - 2) elements of the second
superdiagonal of U.

b Array of size max(1, ldb*nrhs) for column major layout and max(1,
n*ldb) for row major layout. Contains the matrix B whose columns
are the right-hand sides for the systems of equations.

ldb The leading dimension of b; ldb≥ max(1, n) for column major

layout and ldb≥nrhs for row major layout.

523
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ipiv Array, size (n). The ipiv array, as returned by ?gttrf.

Output Parameters

b Overwritten by the solution matrix X.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
For each right-hand side b, the computed solution is the exact solution of a perturbed system of equations (A
+ E)x = b, where

|E| ≤c(n)εP|L||U|
c(n) is a modest linear function of n, and ε is the machine precision.
If x0 is the true solution, the computed solution x satisfies this error bound:

where cond(A,x)= || |A-1||A| |x| ||∞ / ||x||∞≤ ||A-1||∞ ||A||∞ = κ∞(A).

Note that cond(A,x) can be much smaller than κ∞(A); the condition number of AT and AH might or might
not be equal to κ∞(A).

The approximate number of floating-point operations for one right-hand side vector b is 7n (including n
divisions) for real flavors and 34n (including 2n divisions) for complex flavors.

To estimate the condition number κ∞(A), call ?gtcon.

To refine the solution and estimate the error, call ?gtrfs.

See Also
Matrix Storage Schemes

?dttrsb
Solves a system of linear equations with a diagonally
dominant tridiagonal coefficient matrix using the LU
factorization computed by ?dttrfb.

Syntax
void sdttrsb (const char * trans, const MKL_INT * n, const MKL_INT * nrhs, const float
* dl, const float * d, const float * du, float * b, const MKL_INT * ldb, MKL_INT *
info );
void ddttrsb (const char * trans, const MKL_INT * n, const MKL_INT * nrhs, const double
* dl, const double * d, const double * du, double * b, const MKL_INT * ldb, MKL_INT *
info );

524
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void cdttrsb (const char * trans, const MKL_INT * n, const MKL_INT * nrhs, const
MKL_Complex8 * dl, const MKL_Complex8 * d, const MKL_Complex8 * du, MKL_Complex8 * b,
const MKL_INT * ldb, MKL_INT * info );
void zdttrsb (const char * trans, const MKL_INT * n, const MKL_INT * nrhs, const
MKL_Complex16 * dl, const MKL_Complex16 * d, const MKL_Complex16 * du, MKL_Complex16 *
b, const MKL_INT * ldb, MKL_INT * info );

Include Files
• mkl.h

Description

The ?dttrsb routine solves the following systems of linear equations with multiple right hand sides for X:

A*X = B if trans='N',

AT*X = B if trans='T',

AH*X = B if trans='C' (for complex matrices only).

Before calling this routine, call ?dttrfb to compute the factorization of A.

Input Parameters

trans Must be 'N' or 'T' or 'C'.

Indicates the form of the equations solved for X:

If trans = 'N', then A*X = B.

If trans = 'T', then AT*X = B.

If trans = 'C', then AH*X = B.

n The order of A; n≥ 0.

nrhs The number of right-hand sides, that is, the number of columns in B;
nrhs≥ 0.

dl, d, du Arrays: dl(n -1), d(n), du(n -1).

The array dl contains the (n - 1) multipliers that define the

matrices L1, L2 from the factorization of A.
The array d contains the n diagonal elements of the upper triangular
matrix U from the factorization of A.
The array du contains the (n - 1) elements of the superdiagonal of U.

b Array of size max(1, ldb*nrhs). Contains the matrix B whose

columns are the right-hand sides for the systems of equations.

ldb The leading dimension of b; ldb≥ max(1, n).

Output Parameters

b Overwritten by the solution matrix X.

info If info = 0, the execution is successful.

525
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If info = -i, the i-th parameter had an illegal value.

?potrs
Solves a system of linear equations with a Cholesky-
factored symmetric (Hermitian) positive-definite
coefficient matrix.

Syntax
lapack_int LAPACKE_spotrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const float * a , lapack_int lda , float * b , lapack_int ldb );
lapack_int LAPACKE_dpotrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const double * a , lapack_int lda , double * b , lapack_int ldb );
lapack_int LAPACKE_cpotrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_float * a , lapack_int lda , lapack_complex_float * b ,
lapack_int ldb );
lapack_int LAPACKE_zpotrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_double * a , lapack_int lda , lapack_complex_double * b ,
lapack_int ldb );

Include Files
• mkl.h

Description

The routine solves for X the system of linear equations A*X = B with a symmetric positive-definite or, for
complex data, Hermitian positive-definite matrix A, given the Cholesky factorization of A:

A = UTU for real data, A = UHU for complex data if uplo='U'

A = L*LT for real data, A = L*LH for complex data if uplo='L'

where L is a lower triangular matrix and U is upper triangular. The system is solved with multiple right-hand
sides stored in the columns of the matrix B.
Before calling this routine, you must call ?potrf to compute the Cholesky factorization of A.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', U is stored, whereA = UT*U for real data, A = UH*U
for complex data.
If uplo = 'L', L is stored, whereA = L*LT for real data, A = L*LH for
complex data.

n The order of matrix A; n≥ 0.

nrhs The number of right-hand sides (nrhs≥ 0).

526
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
a Array A of size at least max(1, lda*n)

The array a contains the factor U or L (see uplo) as returned by

potrf. .

lda The leading dimension of a. lda≥ max(1, n).

b The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations. The size of b must be at least
max(1, ldb*nrhs) for column major layout and max(1, ldb*n) for
row major layout.

ldb The leading dimension of b. ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

Output Parameters

b Overwritten by the solution matrix X.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
If uplo = 'U', the computed solution for each right-hand side b is the exact solution of a perturbed system
of equations (A + E)x = b, where

|E| ≤c(n)ε |UH||U|

c(n) is a modest linear function of n, and ε is the machine precision.
A similar estimate holds for uplo = 'L'. If x0 is the true solution, the computed solution x satisfies this
error bound:

where cond(A,x)= || |A-1||A| |x| ||∞ / ||x||∞≤ ||A-1||∞ ||A||∞ = κ∞(A).

Note that cond(A,x) can be much smaller than κ∞ (A). The approximate number of floating-point operations
for one right-hand side vector b is 2n2 for real flavors and 8n2 for complex flavors.

To estimate the condition number κ∞(A), call ?pocon.

To refine the solution and estimate the error, call ?porfs.

See Also
Matrix Storage Schemes

527
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

?pftrs
Solves a system of linear equations with a Cholesky-
factored symmetric (Hermitian) positive-definite
coefficient matrix using the Rectangular Full Packed
(RFP) format.

Syntax
lapack_int LAPACKE_spftrs (int matrix_layout , char transr , char uplo , lapack_int n ,
lapack_int nrhs , const float * a , float * b , lapack_int ldb );
lapack_int LAPACKE_dpftrs (int matrix_layout , char transr , char uplo , lapack_int n ,
lapack_int nrhs , const double * a , double * b , lapack_int ldb );
lapack_int LAPACKE_cpftrs (int matrix_layout , char transr , char uplo , lapack_int n ,
lapack_int nrhs , const lapack_complex_float * a , lapack_complex_float * b ,
lapack_int ldb );
lapack_int LAPACKE_zpftrs (int matrix_layout , char transr , char uplo , lapack_int n ,
lapack_int nrhs , const lapack_complex_double * a , lapack_complex_double * b ,
lapack_int ldb );

Include Files
• mkl.h

Description

The routine solves a system of linear equations A*X = B with a symmetric positive-definite or, for complex
data, Hermitian positive-definite matrix A using the Cholesky factorization of A:

A = UTU for real data, A = UHU for complex data if uplo='U'

A = L*LT for real data, A = L*LH for complex data if uplo='L'

Before calling ?pftrs, you must call ?pftrf to compute the Cholesky factorization of A. L stands for a lower
triangular matrix and U for an upper triangular matrix.
The matrix A is in the Rectangular Full Packed (RFP) format. For the description of the RFP format, see Matrix
Storage Schemes.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

transr Must be 'N', 'T' (for real data) or 'C' (for complex data).

If transr = 'N', the untransposed factor of Ais stored in RFP format.

If transr = 'T', the transposed factor of Ais stored in RFP format.

If transr = 'C', the conjugate-transposed factor of Ais stored in RFP

format.

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', U is stored, where A = UT*U for real data, A = UH*U
for complex data.

528
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If uplo = 'L', L is stored, where A = L*LT for real data, A = L*LH for
complex data

n The order of the matrix A; n≥ 0.

nrhs The number of right-hand sides, that is, the number of columns of the
matrix B; nrhs≥ 0.

a Array a of size max(1,n*(n + 1)/2).

The array a contains, in the RFP format, the factor U or L obtained by

factorization of matrix A.

b The array b of size max(1, ldb*nrhs) for column major layout and
max(1,ldb*n) for row major layout contains the matrix B whose
columns are the right-hand sides for the systems of equations.

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

Output Parameters

b The solution matrix X.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, parameter i had an illegal value.

See Also
Matrix Storage Schemes

?pptrs
Solves a system of linear equations with a packed
Cholesky-factored symmetric (Hermitian) positive-
definite coefficient matrix.

Syntax
lapack_int LAPACKE_spptrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const float * ap , float * b , lapack_int ldb );
lapack_int LAPACKE_dpptrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const double * ap , double * b , lapack_int ldb );
lapack_int LAPACKE_cpptrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_float * ap , lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zpptrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_double * ap , lapack_complex_double * b , lapack_int ldb );

Include Files
• mkl.h

529
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Description
The routine solves for X the system of linear equations A*X = B with a packed symmetric positive-definite or,
for complex data, Hermitian positive-definite matrix A, given the Cholesky factorization of A:

A = UTU for real data, A = UHU for complex data if uplo='U'

A = L*LT for real data, A = L*LH for complex data if uplo='L'

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', U is stored, where A = UT*U for real data, A = UH*U
for complex data.
If uplo = 'L', L is stored, where A = L*LT for real data, A = L*LH for
complex data

n The order of matrix A; n≥ 0.

nrhs The number of right-hand sides (nrhs≥ 0).

ap, b The size of ap must be at least max(1,n(n+1)/2).

The array ap contains the factor U or L, as specified by uplo, in
packed storage (see Matrix Storage Schemes).

b The array b of size max(1, ldb*nrhs) for column major layout and
max(1, ldb*n) for row major layout contains the matrix B whose
columns are the right-hand sides for the systems of equations.

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

Output Parameters

b Overwritten by the solution matrix X.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

530
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Application Notes
If uplo = 'U', the computed solution for each right-hand side b is the exact solution of a perturbed system
of equations (A + E)x = b, where

|E| ≤c(n)ε |UH||U|

c(n) is a modest linear function of n, and ε is the machine precision.
A similar estimate holds for uplo = 'L'.

If x0 is the true solution, the computed solution x satisfies this error bound:

where cond(A,x)= || |A-1||A| |x| ||∞ / ||x||∞≤ ||A-1||∞ ||A||∞ = κ∞(A).

Note that cond(A,x) can be much smaller than κ∞(A).

The approximate number of floating-point operations for one right-hand side vector b is 2n2 for real flavors
and 8n2 for complex flavors.

To estimate the condition number κ∞(A), call ?ppcon.

To refine the solution and estimate the error, call ?pprfs.

See Also
Matrix Storage Schemes

?pbtrs
Solves a system of linear equations with a Cholesky-
factored symmetric (Hermitian) positive-definite band
coefficient matrix.

Syntax
lapack_int LAPACKE_spbtrs (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , lapack_int nrhs , const float * ab , lapack_int ldab , float * b , lapack_int
ldb );
lapack_int LAPACKE_dpbtrs (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , lapack_int nrhs , const double * ab , lapack_int ldab , double * b , lapack_int
ldb );
lapack_int LAPACKE_cpbtrs (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , lapack_int nrhs , const lapack_complex_float * ab , lapack_int ldab ,
lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zpbtrs (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , lapack_int nrhs , const lapack_complex_double * ab , lapack_int ldab ,
lapack_complex_double * b , lapack_int ldb );

Include Files
• mkl.h

531
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Description

The routine solves for real data a system of linear equations A*X = B with a symmetric positive-definite or,
for complex data, Hermitian positive-definite band matrix A, given the Cholesky factorization of A:

A = UTU for real data, A = UHU for complex data if uplo='U'

A = L*LT for real data, A = L*LH for complex data if uplo='L'

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', U is stored in ab, where A = UT*U for real matrices
and A = UH*U for complex matrices.
If uplo = 'L', L is stored in ab, where A = L*LT for real matrices and
A = L*LH for complex matrices.

n The order of matrix A; n≥ 0.

kd The number of superdiagonals or subdiagonals in the matrix A; kd≥ 0.

nrhs The number of right-hand sides; nrhs≥ 0.

ab Array ab is of size max (1, ldab*n).

The array ab contains the Cholesky factor, as returned by the
factorization routine, in band storage form.
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.

b The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.
The size of b is at least max(1, ldb*nrhs) for column major layout
and max(1, ldb*n) for row major layout.

ldab The leading dimension of the array ab; ldab≥kd +1.

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

Output Parameters

b Overwritten by the solution matrix X.

532
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
For each right-hand side b, the computed solution is the exact solution of a perturbed system of equations (A
+ E)x = b, where

|E| ≤c(kd + 1)εP|UH||U| or |E| ≤c(kd + 1)εP|LH||L|

c(k) is a modest linear function of k, and ε is the machine precision.
If x0 is the true solution, the computed solution x satisfies this error bound:

where cond(A,x)= || |A-1||A| |x| ||∞ / ||x||∞≤ ||A-1||∞ ||A||∞ = κ∞(A).

Note that cond(A,x) can be much smaller than κ∞(A).

The approximate number of floating-point operations for one right-hand side vector is 4n*kd for real flavors
and 16n*kd for complex flavors.
To estimate the condition number κ∞(A), call ?pbcon.

To refine the solution and estimate the error, call ?pbrfs.

See Also
Matrix Storage Schemes

?pttrs
Solves a system of linear equations with a symmetric
(Hermitian) positive-definite tridiagonal coefficient
matrix using the factorization computed by ?pttrf.

Syntax
lapack_int LAPACKE_spttrs( int matrix_layout, lapack_int n, lapack_int nrhs, const
float* d, const float* e, float* b, lapack_int ldb );
lapack_int LAPACKE_dpttrs( int matrix_layout, lapack_int n, lapack_int nrhs, const
double* d, const double* e, double* b, lapack_int ldb );
lapack_int LAPACKE_cpttrs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const float* d, const lapack_complex_float* e, lapack_complex_float* b, lapack_int
ldb );
lapack_int LAPACKE_zpttrs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const double* d, const lapack_complex_double* e, lapack_complex_double* b, lapack_int
ldb );

533
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description

The routine solves for X a system of linear equations A*X = B with a symmetric (Hermitian) positive-definite
tridiagonal matrix A. Before calling this routine, call ?pttrf to compute the L*D*LT or UT*D*Ufor real data
and the L*D*LH or UH*D*Ufactorization of A for complex data.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Used for cpttrs/zpttrs only. Must be 'U' or 'L'.

Specifies whether the superdiagonal or the subdiagonal of the

tridiagonal matrix A is stored and how A is factored:
If uplo = 'U', the array e stores the conjugated values of the
superdiagonal of U, and A is factored as UH*D*U.

If uplo = 'L', the array e stores the subdiagonal of L, and A is

factored as L*D*LH.

n The order of A; n≥ 0.

nrhs The number of right-hand sides, that is, the number of columns of the
matrix B; nrhs≥ 0.

d Array, dimension (n). Contains the diagonal elements of the diagonal

matrix D from the factorization computed by ?pttrf.

e Array e is of size (n -1).

The array e contains the (n - 1) sub-diagonal elements of the unit
bidiagonal factor L or the conjugated values of the superdiagonal of U
from the factorization computed by ?pttrf (see uplo).

e, b The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.
The size of b is at least max(1, ldb*nrhs) for column major layout
and max(1, ldb*n) for row major layout.

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

Output Parameters

b Overwritten by the solution matrix X.

Return Values
This function returns a value info.

If info=0, the execution is successful.

534
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = -i, parameter i had an illegal value.

See Also
Matrix Storage Schemes

?sytrs
Solves a system of linear equations with a UDUT- or
LDLT-factored symmetric coefficient matrix.

Syntax
lapack_int LAPACKE_ssytrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const float * a , lapack_int lda , const lapack_int * ipiv , float * b ,
lapack_int ldb );
lapack_int LAPACKE_dsytrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const double * a , lapack_int lda , const lapack_int * ipiv , double * b ,
lapack_int ldb );
lapack_int LAPACKE_csytrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_float * a , lapack_int lda , const lapack_int * ipiv ,
lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zsytrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_double * a , lapack_int lda , const lapack_int * ipiv ,
lapack_complex_double * b , lapack_int ldb );

Include Files
• mkl.h

Description

The routine solves for X the system of linear equations A*X = B with a symmetric matrix A, given the Bunch-
Kaufman factorization of A:

if uplo='U', A = U*D*UT

if uplo='L', A = L*D*LT,

where U and L are upper and lower triangular matrices with unit diagonal and D is a symmetric block-
diagonal matrix. The system is solved with multiple right-hand sides stored in the columns of the matrix B.
You must supply to this routine the factor U (or L) and the array ipiv returned by the factorization
routine ?sytrf.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', the array a stores the upper triangular factor U of the
factorization A = U*D*UT.

535
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If uplo = 'L', the array a stores the lower triangular factor L of the
factorization A = L*D*LT.

n The order of matrix A; n≥ 0.

nrhs The number of right-hand sides; nrhs≥ 0.

ipiv Array, size at least max(1, n). The ipiv array, as returned by ?sytrf.

a The array aof size max(1, lda*n) contains the factor U or L (see
uplo). .

b The array b contains the matrix B whose columns are the right-hand
sides for the system of equations.
The size of b is at least max(1, ldb*nrhs) for column major layout
and max(1, ldb*n) for row major layout.

lda The leading dimension of a; lda≥ max(1, n).

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

Output Parameters

b Overwritten by the solution matrix X.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
For each right-hand side b, the computed solution is the exact solution of a perturbed system of equations (A
+ E)x = b, where

|E| ≤c(n)εP|U||D||UT|PT or |E| ≤c(n)εP|L||D||UT|PT

c(n) is a modest linear function of n, and ε is the machine precision.
If x0 is the true solution, the computed solution x satisfies this error bound:

where cond(A,x)= || |A-1||A| |x| ||∞ / ||x||∞≤ ||A-1||∞ ||A||∞ = κ∞(A).

Note that cond(A,x) can be much smaller than κ∞(A).

The total number of floating-point operations for one right-hand side vector is approximately 2n2 for real
flavors or 8n2 for complex flavors.

536
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
To estimate the condition number κ∞(A), call ?sycon.

To refine the solution and estimate the error, call ?syrfs.

See Also
Matrix Storage Schemes

?sytrs_aa
Solves a system of linear equations A * X = B with a
symmetric matrix.
lapack_int LAPACKE_ssytrs_aa (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const float * A, lapack_int lda, const lapack_int * ipiv, float * B, lapack_int
ldb);
lapack_int LAPACKE_dsytrs_aa (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const double * A, lapack_int lda, const lapack_int * ipiv, double * B, lapack_int
ldb);
lapack_int LAPACKE_csytrs_aa (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const lapack_complex_float * A, lapack_int lda, const lapack_int * ipiv,
lapack_complex_float * B, lapack_int ldb);
lapack_int LAPACKE_zsytrs_aa (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const lapack_complex_double * A, lapack_int lda, const lapack_int * ipiv,
lapack_complex_double * B, lapack_int ldb);

Description
?sytrs_aa solves a system of linear equations A * X = B with a symmetric matrix A using the factorization A
= U*T*UT or A = L*T*LT computed by ?sytrf_aa.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo Specifies whether the details of the factorization are stored as an upper or
lower triangular matrix.

• = 'U': Upper triangular; the form is A = UTUT.

• = 'L': Lower triangular; the form is A = L*T*LT.

n The order of the matrix A. n ≥ 0.

nrhs The number of right-hand sides; that is, the number of columns of the
matrix B. nrhs ≥ 0.

A Array of size max(1, lda*n). Details of factors computed by ?sytrf_aa.

lda The leading dimension of the array A.

ipiv Array of size n. Details of the interchanges as computed by ?sytrf_aa.

B Array of size max(1, ldb*nrhs). On entry, the right-hand side matrix B.

ldb The leading dimension of the array B. ldb ≥ max(1, n) for column-major
layout and ldb ≥ nrhs for row-major layout.

537
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

B On exit, the solution matrix X.

Return Values
This function returns a value info.

= 0: Successful exit.
< 0: If info = -i, the ith argument had an illegal value.

?sytrs_rook
Solves a system of linear equations with a UDU- or
LDL-factored symmetric coefficient matrix.

Syntax
lapack_int LAPACKE_ssytrs_rook (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const float * a, lapack_int lda, const lapack_int * ipiv, float * b, lapack_int
ldb);
lapack_int LAPACKE_dsytrs_rook (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const double * a, lapack_int lda, const lapack_int * ipiv, double * b, lapack_int
ldb);
lapack_int LAPACKE_csytrs_rook (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const lapack_complex_float * a, lapack_int lda, const lapack_int * ipiv,
lapack_complex_float * b, lapack_int ldb);
lapack_int LAPACKE_zsytrs_rook (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const lapack_complex_double * a, lapack_int lda, const lapack_int * ipiv,
lapack_complex_double * b, lapack_int ldb);

Include Files
• mkl.h

Description

The routine solves a system of linear equations A*X = B with a symmetric matrix A, using the factorization A
= U*D*UT or A = L*D*LT computed by ?sytrf_rook.

Input Parameters

matrix_layout Specifies whether matrix storage layout for array b is row major
(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', the factorization is of the form A = U*D*UT.

If uplo = 'L', the factorization is of the form A = LDLT.

n The order of matrix A; n≥ 0.

nrhs The number of right-hand sides; nrhs≥ 0.

538
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ipiv Array, size at least max(1, n). The ipiv array, as returned
by ?sytrf_rook.

a, b Arrays: a, size (ldan), b size (ldbnrhs).

The array a contains the block diagonal matrix D and the multipliers
used to obtain U or L as computed by ?sytrf_rook (see uplo).

The array b contains the matrix B whose columns are the right-hand
sides for the system of equations.

lda The leading dimension of a; lda≥ max(1, n).

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs) for row major layout.

Output Parameters

b Overwritten by the solution matrix X.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The total number of floating-point operations for one right-hand side vector is approximately 2n2 for real
flavors or 8n2 for complex flavors.

See Also
Matrix Storage Schemes

?hetrs
Solves a system of linear equations with a UDUT- or
LDLT-factored Hermitian coefficient matrix.

Syntax
lapack_int LAPACKE_chetrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_float * a , lapack_int lda , const lapack_int * ipiv ,
lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zhetrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_double * a , lapack_int lda , const lapack_int * ipiv ,
lapack_complex_double * b , lapack_int ldb );

Include Files
• mkl.h

Description

The routine solves for X the system of linear equations A*X = B with a Hermitian matrix A, given the Bunch-
Kaufman factorization of A:

539
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

if uplo='U', A = U*D*UH

if uplo='L', A = L*D*LH,

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', the array a stores the upper triangular factor U of the
factorization A = U*D*UH.

If uplo = 'L', the array a stores the lower triangular factor L of the
factorization A = L*D*LH.

n The order of matrix A; n≥ 0.

nrhs The number of right-hand sides; nrhs≥ 0.

ipiv Array, size at least max(1, n).

The ipiv array, as returned by ?hetrf.

a The array aof size max(1, lda*n) contains the factor U or L (see
uplo).

lda The leading dimension of a; lda≥ max(1, n).

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

Output Parameters

b Overwritten by the solution matrix X.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, parameter i had an illegal value.

540
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Application Notes
For each right-hand side b, the computed solution is the exact solution of a perturbed system of equations (A
+ E)x = b, where

|E| ≤c(n)εP|U||D||UH|PT or |E| ≤c(n)εP|L||D||LH|PT

c(n) is a modest linear function of n, and ε is the machine precision.
If x0 is the true solution, the computed solution x satisfies this error bound:

where cond(A,x)= || |A-1||A| |x| ||∞ / ||x||∞≤ ||A-1||∞ ||A||∞ = κ∞(A).

Note that cond(A,x) can be much smaller than κ∞(A).

The total number of floating-point operations for one right-hand side vector is approximately 8n2.

To estimate the condition number κ∞(A), call ?hecon.

To refine the solution and estimate the error, call ?herfs.

See Also
Matrix Storage Schemes

?hetrs_aa
BSolves a system of linear equations A*X = with a
complex Hermitian matrix.
LAPACK_DECL lapack_int LAPACKE_chetrs_aa (int matrix_layout, char uplo, lapack_int n,
lapack_int nrhs, const lapack_complex_float * a, lapack_int lda, const lapack_int *
ipiv, lapack_complex_float * b, lapack_int ldb );
LAPACK_DECL lapack_int LAPACKE_zhetrs_aa (int matrix_layout, char uplo, lapack_int n,
lapack_int nrhs, const lapack_complex_double * a, lapack_int lda, const lapack_int *
ipiv, lapack_complex_double * b, lapack_int ldb );

Description
?hetrs_aa solves a system of linear equations A*X = X with a complex Hermitian matrix A using the
factorization A = U * T * UH or A = L * T * LH computed by ?hetrf_aa.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo Specifies whether the details of the factorization are stored as an upper or
lower triangular matrix.
If uplo = 'U': Upper triangular of the form A = U * T * UH.

If uplo= 'L': Lower triangular of the form A = L * T * LH.

541
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

n The order of the matrix A. n≥ 0.

nrhs The number of right hand sides: the number of columns of the matrix b.
nrhs≥ 0.

a Array of size lda*n. Details of factors computed by ?hetrf_aa.

lda The leading dimension of the array a. lda≥ max(1,n).

ipiv Array of size (n). Details of the interchanges as computed by ?hetrf_aa.

b Array of size ldb*nrhs. On entry, the right hand side matrix B.

ldb The leading dimension of the array b. ldb≥ max(1, n).

work See Syntax - Workspace. Array of size (max(1, lwork)).

lwork See Syntax - Workspace. lwork≥ max(1, 3*n-2).

Output Parameters

b On exit, the solution matrix X.

Return Values
This function returns a value info.

If info = 0: successful exit.

If info < 0: if info = -i, the i-th argument had an illegal value.

Syntax - Workspace
Use this interface if you want to explicitly provide the workspace array.
LAPACK_DECL lapack_int LAPACKE_chetrs_aa_work (int matrix_layout, char uplo, lapack_int
n, lapack_int nrhs, const lapack_complex_float * a, lapack_int lda, const lapack_int *
ipiv, lapack_complex_float * b, lapack_int ldb, lapack_complex_float * work, lapack_int
lwork );
LAPACK_DECL lapack_int LAPACKE_zhetrs_aa_work (int matrix_layout, char uplo, lapack_int
n, lapack_int nrhs, const lapack_complex_double * a, lapack_int lda, const lapack_int *
ipiv, lapack_complex_double * b, lapack_int ldb, lapack_complex_double * work,
lapack_int lwork );

?hetrs_rook
Solves a system of linear equations with a UDU- or
LDL-factored Hermitian coefficient matrix.

Syntax
lapack_int LAPACKE_chetrs_rook (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const lapack_complex_float * a, lapack_int lda, const lapack_int * ipiv,
lapack_complex_float * b, lapack_int ldb);
lapack_int LAPACKE_zhetrs_rook (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const lapack_complex_double * a, lapack_int lda, const lapack_int * ipiv,
lapack_complex_double * b, lapack_int ldb);

542
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h

Description

The routine solves for a system of linear equations A*X = B with a complex Hermitian matrix A using the
factorization A = U*D*UH or A = L*D*LH computed by ?hetrf_rook.

Input Parameters

matrix_layout Specifies whether matrix storage layout for array b is row major
(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', the factorization is of the form A = U*D*UH.

If uplo = 'L', the factorization is of the form A = LDLH.

n The order of matrix A; n≥ 0.

nrhs The number of right-hand sides; nrhs≥ 0.

ipiv Array, size at least max(1, n).

The ipiv array, as returned by ?hetrf_rook.

a, b Arrays: a (ldan)), b(ldbnrhs).

The array a contains the block diagonal matrix D and the multipliers
used to obtain the factor U or L as computed by ?hetrf_rook (see
uplo).
The array b contains the matrix B whose columns are the right-hand
sides for the system of equations.

lda The leading dimension of a; lda≥ max(1, n).

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs) for row major layout.

Output Parameters

b Overwritten by the solution matrix X.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

?sytrs2
Solves a system of linear equations with a UDU- or
LDL-factored symmetric coefficient matrix.

543
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
lapack_int LAPACKE_ssytrs2 (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const float * a , lapack_int lda , const lapack_int * ipiv , float * b ,
lapack_int ldb );
lapack_int LAPACKE_dsytrs2 (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const double * a , lapack_int lda , const lapack_int * ipiv , double * b ,
lapack_int ldb );
lapack_int LAPACKE_csytrs2 (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_float * a , lapack_int lda , const lapack_int * ipiv ,
lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zsytrs2 (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_double * a , lapack_int lda , const lapack_int * ipiv ,
lapack_complex_double * b , lapack_int ldb );

Include Files
• mkl.h

Description
The routine solves a system of linear equations A*X = B with a symmetric matrix A using the factorization of
A:

if uplo='U', A = U*D*UT

if uplo='L', A = L*D*LT

where

• U and L are upper and lower triangular matrices with unit diagonal
• D is a symmetric block-diagonal matrix.
The factorization is computed by ?sytrf.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', the array a stores the upper triangular factor U of the
factorization A = U*D*UT.

If uplo = 'L', the array a stores the lower triangular factor L of the
factorization A = L*D*LT.

n The order of matrix A; n≥ 0.

nrhs The number of right-hand sides; nrhs≥ 0.

a The array aof size max(1, lda*n) contains the block diagonal matrix D
and the multipliers used to obtain the factor U or L as computed
by ?sytrf.

b The array b contains the right-hand side matrix B.

544
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The size of b is at least max(1, ldb*nrhs) for column major layout
and max(1, ldb*n) for row major layout.

lda The leading dimension of a; lda≥ max(1, n).

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

ipiv Array of size n. The ipiv array contains details of the interchanges
and the block structure of D as determined by ?sytrf.

Output Parameters

b Overwritten by the solution matrix X.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', the array a stores the upper triangular factor U of the
factorization A = U*D*UH.

If uplo = 'L', the array a stores the lower triangular factor L of the
factorization A = L*D*LH.

n The order of matrix A; n≥ 0.

nrhs The number of right-hand sides; nrhs≥ 0.

a The array a of size max(1, lda*n) contains the block diagonal matrix
D and the multipliers used to obtain the factor U or L as computed
by ?hetrf.

b The array b of size max(1, ldb*nrhs) for column major layout and
max(1, ldb*n) for row major layout contains the right-hand side
matrix B.

lda The leading dimension of a; lda≥ max(1, n).

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

ipiv Array of size n. The ipiv array contains details of the interchanges
and the block structure of D as determined by ?hetrf.

Output Parameters

b Overwritten by the solution matrix X.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo Specifies whether the details of the factorization are stored as an upper or
lower triangular matrix:

• = 'U': Upper triangular; the form is A= PUD(UT)(PT).

• = 'L': Lower triangular; the form is A = P*L*D*(LT)*(PT).

n The order of the matrix A. n ≥ 0.

nrhs The number of right-hand sides; that is, the number of columns of the
matrix B. nrhs ≥ 0.

A Array of size max(1, lda*n). Diagonal of the block diagonal matrix D and
factors U or L as computed by ?sytrf_rk:

• Only diagonal elements of the symmetric block diagonal matrix D on the

diagonal of A; that is, D(k,k) = A(k,k). Superdiagonal (or subdiagonal)
elements of D should be provided on entry in array e.

—and—
• If uplo = 'U', factor U in the superdiagonal part of A. If uplo = 'L',
factor L in the subdiagonal part of A.

lda The leading dimension of the array A.

e Array of size n. On entry, contains the superdiagonal (or subdiagonal)

elements of the symmetric block diagonal matrix D with 1-by-1 or 2-by-2
diagonal blocks. If uplo = 'U', e(i) = D(i-1,i),i=2:N, and e(1) is not
referenced. If uplo = 'L', e(i) = D(i+1,i), i=1:N-1, and e(n) is not
referenced.

547
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

NOTE For 1-by-1 diagonal block D(k), where 1 ≤ k ≤ n, the

element e[k-1] is not referenced in both the uplo = 'U' and
uplo = 'L' cases.

ipiv Array of size n. Details of the interchanges and the block structure of D as
determined by ?sytrf_rk.

B On entry, the right-hand side matrix B.

The size of B is at least max(1, ldb*nrhs) for column-major layout and
max(1, ldb*n) for row-major layout.

ldb The leading dimension of the array B. ldb ≥ max(1, n) for column-major
layout and ldb ≥ nrhs for row-major layout.

Output Parameters

B On exit, the solution matrix X.

Return Values
This function returns a value info.

= 0: Successful exit.
< 0: If info = -i, the ith argument had an illegal value.

?hetrs_3
Solves a system of linear equations A * X = B with a
complex Hermitian matrix using the factorization
computed by ?hetrf_rk.
lapack_int LAPACKE_chetrs_3 (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const lapack_complex_float * A, lapack_int lda, const lapack_complex_float * e,
const lapack_int * ipiv, lapack_complex_float * B, lapack_int ldb);
lapack_int LAPACKE_zhetrs_3 (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const lapack_complex_double * A, lapack_int lda, const lapack_complex_double * e,
const lapack_int * ipiv, lapack_complex_double * B, lapack_int ldb);

Description
?hetrs_3 solves a system of linear equations A * X = B with a complex Hermitian matrix A using the
factorization computed by ?hetrf_rk: A = P*U*D*(UH)*(PT) or A = P*L*D*(LH)*(PT), where U (or L) is unit
upper (or lower) triangular matrix, UH (or LH) is the conjugate of U (or L), P is a permutation matrix, PT is the
transpose of P, and D is a Hermitian and block diagonal with 1-by-1 and 2-by-2 diagonal blocks.
This algorithm uses Level 3 BLAS.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo Specifies whether the details of the factorization are stored as an upper or
lower triangular matrix:

548
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• = 'U': Upper triangular; form is A = P*U*D*(UH)*(PT).
• = 'L': Lower triangular; form is A = P*L*D*(LH)*(PT).

n The order of the matrix A. n ≥ 0.

nrhs The number of right-hand sides; that is, the number of columns in the
matrix B. nrhs ≥ 0.

A Array of size max(1, lda*n). Diagonal of the block diagonal matrix D and
factor U or L as computed by ?hetrf_rk:

• Only diagonal elements of the Hermitian block diagonal matrix D on the

diagonal of A; that is, D(k,k) = A(k,k). Superdiagonal (or subdiagonal)
elements of D should be provided on entry in array e.
• If uplo = 'U', factor U in the superdiagonal part of A. If uplo = 'L',
factor L in the subdiagonal part of A.

lda The leading dimension of the array A.

e Array of size n. On entry, contains the superdiagonal (or subdiagonal)

elements of the Hermitian block diagonal matrix D with 1-by-1 or 2-by-2
diagonal blocks. If uplo = 'U', e(i) = D(i-1,i),i=2:N, and e(1) is not
referenced. If uplo = 'L', e(i) = D(i+1,i),i=1:N-1, and e(n) is not
referenced.

NOTE For 1-by-1 diagonal block D(k), where 1 ≤ k ≤ n, the

element e[k-1] is not referenced in both the uplo = 'U' and
uplo = 'L' cases.

ipiv Array of size (n. Details of the interchanges and the block structure of D as
determined by ?hetrf_rk.

B On entry, the right-hand side matrix B.

The size of B is at least max(1, ldb*nrhs) for column-major layout and
max(1, ldb*n) for row-major layout.

ldb The leading dimension of the array B. ldb ≥ max(1, n) for column-major
layout and ldb ≥ nrhs for row-major layout.

Output Parameters

B On exit, the solution matrix X.

Return Values
This function returns a value info.

= 0: Successful exit.
< 0: If info = -i, the ith argument had an illegal value.

?sptrs
Solves a system of linear equations with a UDU- or
LDL-factored symmetric coefficient matrix using
packed storage.

549
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
lapack_int LAPACKE_ssptrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const float * ap , const lapack_int * ipiv , float * b , lapack_int ldb );
lapack_int LAPACKE_dsptrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const double * ap , const lapack_int * ipiv , double * b , lapack_int ldb );
lapack_int LAPACKE_csptrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_float * ap , const lapack_int * ipiv , lapack_complex_float
* b , lapack_int ldb );
lapack_int LAPACKE_zsptrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_double * ap , const lapack_int * ipiv ,
lapack_complex_double * b , lapack_int ldb );

Include Files
• mkl.h

Description

The routine solves for X the system of linear equations A*X = B with a symmetric matrix A, given the Bunch-
Kaufman factorization of A:

if uplo='U', A = U*D*UT

if uplo='L', A = L*D*LT,

where U and L are upper and lower packed triangular matrices with unit diagonal and D is a symmetric
block-diagonal matrix. The system is solved with multiple right-hand sides stored in the columns of the
matrix B. You must supply the factor U (or L) and the array ipiv returned by the factorization routine ?sptrf.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', the array ap stores the packed factor U of the
factorization A = U*D*UT. If uplo = 'L', the array ap stores the
packed factor L of the factorization A = L*D*LT.

n The order of matrix A; n≥ 0.

nrhs The number of right-hand sides; nrhs≥ 0.

ipiv Array, size at least max(1, n). The ipiv array, as returned by ?sptrf.

ap The dimension of array ap must be at least max(1, n(n+1)/2). The

array ap contains the factor U or L, as specified by uplo, in packed
storage (see Matrix Storage Schemes).

b The array b contains the matrix B whose columns are the right-hand
sides for the system of equations. The size of b is max(1, ldb*nrhs)
for column major layout and max(1, ldb*n) for row major layout.

550
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

Output Parameters

b Overwritten by the solution matrix X.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
For each right-hand side b, the computed solution is the exact solution of a perturbed system of equations (A
+ E)x = b, where

|E| ≤c(n)εP|U||D||UT|PT or |E| ≤c(n)εP|L||D||LT|PT

c(n) is a modest linear function of n, and ε is the machine precision.
If x0 is the true solution, the computed solution x satisfies this error bound:

where cond(A,x)= || |A-1||A| |x| ||∞ / ||x||∞≤ ||A-1||∞ ||A||∞ = κ∞(A).

Note that cond(A,x) can be much smaller than κ∞(A).

The total number of floating-point operations for one right-hand side vector is approximately 2n2 for real
flavors or 8n2 for complex flavors.

To estimate the condition number κ∞(A), call ?spcon.

To refine the solution and estimate the error, call ?sprfs.

See Also
Matrix Storage Schemes

?hptrs
Solves a system of linear equations with a UDU- or
LDL-factored Hermitian coefficient matrix using
packed storage.

Syntax
lapack_int LAPACKE_chptrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_float * ap , const lapack_int * ipiv , lapack_complex_float
* b , lapack_int ldb );

551
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lapack_int LAPACKE_zhptrs (int matrix_layout , char uplo , lapack_int n , lapack_int

nrhs , const lapack_complex_double * ap , const lapack_int * ipiv ,
lapack_complex_double * b , lapack_int ldb );

Include Files
• mkl.h

Description

The routine solves for X the system of linear equations A*X = B with a Hermitian matrix A, given the Bunch-
Kaufman factorization of A:

if uplo='U', A = U*D*UH

if uplo='L', A = L*D*LH,

where U and L are upper and lower packed triangular matrices with unit diagonal and D is a symmetric
block-diagonal matrix. The system is solved with multiple right-hand sides stored in the columns of the
matrix B.
You must supply to this routine the arrays ap (containing U or L)and ipiv in the form returned by the
factorization routine ?hptrf.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', the array ap stores the packed factor U of the
factorization A = U*D*UH. If uplo = 'L', the array ap stores the
packed factor L of the factorization A = L*D*LH.

n The order of matrix A; n≥ 0.

nrhs The number of right-hand sides; nrhs≥ 0.

ipiv Array, size at least max(1, n). The ipiv array, as returned by ?hptrf.

ap The dimension of array ap must be at least max(1,n(n+1)/2). The

array ap contains the factor U or L, as specified by uplo, in packed
storage (see Matrix Storage Schemes).

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

Output Parameters

b Overwritten by the solution matrix X.

552
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
For each right-hand side b, the computed solution is the exact solution of a perturbed system of equations (A
+ E)x = b, where

|E| ≤c(n)εP|U||D||UH|PT or |E| ≤c(n)εP|L||D||LH|PT

c(n) is a modest linear function of n, and ε is the machine precision.
If x0 is the true solution, the computed solution x satisfies this error bound:

where cond(A,x)= || |A-1||A| |x| ||∞ / ||x||∞≤ ||A-1||∞ ||A||∞ = κ∞(A).

Note that cond(A,x) can be much smaller than κ∞(A).

The total number of floating-point operations for one right-hand side vector is approximately 8n2 for complex
flavors.
To estimate the condition number κ∞(A), call ?hpcon.

To refine the solution and estimate the error, call ?hprfs.

See Also
Matrix Storage Schemes

?trtrs
Solves a system of linear equations with a triangular
coefficient matrix, with multiple right-hand sides.

Syntax
lapack_int LAPACKE_strtrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int nrhs , const float * a , lapack_int lda , float * b ,
lapack_int ldb );
lapack_int LAPACKE_dtrtrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int nrhs , const double * a , lapack_int lda , double * b ,
lapack_int ldb );
lapack_int LAPACKE_ctrtrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int nrhs , const lapack_complex_float * a , lapack_int lda ,
lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_ztrtrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int nrhs , const lapack_complex_double * a , lapack_int lda ,
lapack_complex_double * b , lapack_int ldb );

553
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description

The routine solves for X the following systems of linear equations with a triangular matrix A, with multiple
right-hand sides stored in B:

A*X = B if trans='N',

AT*X = B if trans='T',

AH*X = B if trans='C' (for complex matrices only).

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether A is upper or lower triangular:

If uplo = 'U', then A is upper triangular.

If uplo = 'L', then A is lower triangular.

trans Must be 'N' or 'T' or 'C'.

If trans = 'N', then A*X = B is solved for X.

If trans = 'T', then AT*X = B is solved for X.

If trans = 'C', then AH*X = B is solved for X.

diag Must be 'N' or 'U'.

If diag = 'N', then A is not a unit triangular matrix.

If diag = 'U', then A is unit triangular: diagonal elements of A are

assumed to be 1 and not referenced in the array a.

n The order of A; the number of rows in B; n≥ 0.

nrhs The number of right-hand sides; nrhs≥ 0.

a The array a contains the matrix A.

The size of a is max(1, lda*n).

b The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.
The size of b is max(1, ldb*nrhs) for column major layout and
max(1, ldb*n) for row major layout.

lda The leading dimension of a; lda≥ max(1, n).

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

554
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

b Overwritten by the solution matrix X.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
For each right-hand side b, the computed solution is the exact solution of a perturbed system of equations (A
+ E)x = b, where

|E| ≤c(n)ε |A|

c(n) is a modest linear function of n, and ε is the machine precision. If x0 is the true solution, the computed
solution x satisfies this error bound:

where cond(A,x)= || |A-1||A| |x| ||∞ / ||x||∞≤ ||A-1||∞ ||A||∞ = κ∞(A).

Note that cond(A,x) can be much smaller than κ∞(A); the condition number of AT and AH might or might
not be equal to κ∞(A).

The approximate number of floating-point operations for one right-hand side vector b is n2 for real flavors
and 4n2 for complex flavors.

To estimate the condition number κ∞(A), call ?trcon.

To estimate the error in the solution, call ?trrfs.

See Also
Matrix Storage Schemes

?tptrs
Solves a system of linear equations with a packed
triangular coefficient matrix, with multiple right-hand
sides.

Syntax
lapack_int LAPACKE_stptrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int nrhs , const float * ap , float * b , lapack_int ldb );
lapack_int LAPACKE_dtptrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int nrhs , const double * ap , double * b , lapack_int ldb );
lapack_int LAPACKE_ctptrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int nrhs , const lapack_complex_float * ap , lapack_complex_float
* b , lapack_int ldb );
lapack_int LAPACKE_ztptrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int nrhs , const lapack_complex_double * ap ,
lapack_complex_double * b , lapack_int ldb );

555
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description

The routine solves for X the following systems of linear equations with a packed triangular matrix A, with
multiple right-hand sides stored in B:

A*X = B if trans='N',

AT*X = B if trans='T',

AH*X = B if trans='C' (for complex matrices only).

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether A is upper or lower triangular:

If uplo = 'U', then A is upper triangular.

If uplo = 'L', then A is lower triangular.

trans Must be 'N' or 'T' or 'C'.

If trans = 'N', then A*X = B is solved for X.

If trans = 'T', then AT*X = B is solved for X.

If trans = 'C', then AH*X = B is solved for X.

diag Must be 'N' or 'U'.

If diag = 'N', then A is not a unit triangular matrix.

If diag = 'U', then A is unit triangular: diagonal elements are

assumed to be 1 and not referenced in the array ap.

n The order of A; the number of rows in B; n≥ 0.

nrhs The number of right-hand sides; nrhs≥ 0.

ap The dimension of arrayap must be at least max(1,n(n+1)/2). The

array ap contains the matrix A in packed storage (see Matrix Storage
Schemes).

b The array b contains the matrix B whose columns are the right-hand
sides for the system of equations.
The size of b is max(1, ldb*nrhs) for column major layout and
max(1, ldb*n) for row major layout.

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

556
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

b Overwritten by the solution matrix X.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
For each right-hand side b, the computed solution is the exact solution of a perturbed system of equations (A
+ E)x = b, where

|E| ≤c(n)ε |A|

c(n) is a modest linear function of n, and ε is the machine precision.
If x0 is the true solution, the computed solution x satisfies this error bound:

where cond(A,x)= || |A-1||A| |x| ||∞ / ||x||∞≤ ||A-1||∞ ||A||∞ = κ∞(A).

Note that cond(A,x) can be much smaller than κ∞(A); the condition number of AT and AH might or might
not be equal to κ∞(A).

The approximate number of floating-point operations for one right-hand side vector b is n2 for real flavors
and 4n2 for complex flavors.

To estimate the condition number κ∞(A), call ?tpcon.

To estimate the error in the solution, call ?tprfs.

See Also
Matrix Storage Schemes

?tbtrs
Solves a system of linear equations with a band
triangular coefficient matrix, with multiple right-hand
sides.

Syntax
lapack_int LAPACKE_stbtrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int kd , lapack_int nrhs , const float * ab , lapack_int ldab ,
float * b , lapack_int ldb );
lapack_int LAPACKE_dtbtrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int kd , lapack_int nrhs , const double * ab , lapack_int ldab ,
double * b , lapack_int ldb );
lapack_int LAPACKE_ctbtrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int kd , lapack_int nrhs , const lapack_complex_float * ab ,
lapack_int ldab , lapack_complex_float * b , lapack_int ldb );

557
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lapack_int LAPACKE_ztbtrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int kd , lapack_int nrhs , const lapack_complex_double * ab ,
lapack_int ldab , lapack_complex_double * b , lapack_int ldb );

Include Files
• mkl.h

Description

The routine solves for X the following systems of linear equations with a band triangular matrix A, with
multiple right-hand sides stored in B:

A*X = B if trans='N',

AT*X = B if trans='T',

AH*X = B if trans='C' (for complex matrices only).

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether A is upper or lower triangular:

If uplo = 'U', then A is upper triangular.

If uplo = 'L', then A is lower triangular.

trans Must be 'N' or 'T' or 'C'.

If trans = 'N', then A*X = B is solved for X.

If trans = 'T', then AT*X = B is solved for X.

If trans = 'C', then AH*X = B is solved for X.

diag Must be 'N' or 'U'.

If diag = 'N', then A is not a unit triangular matrix.

If diag = 'U', then A is unit triangular: diagonal elements are

assumed to be 1 and not referenced in the array ab.

n The order of A; the number of rows in B; n≥ 0.

kd The number of superdiagonals or subdiagonals in the matrix A; kd≥ 0.

nrhs The number of right-hand sides; nrhs≥ 0.

ab The array ab contains the matrix A in band storage form.

The size of ab must be max(1, ldab*n)

558
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ldab The leading dimension of ab; ldab≥kd + 1.

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

Output Parameters

b Overwritten by the solution matrix X.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
For each right-hand side b, the computed solution is the exact solution of a perturbed system of equations (A
+ E)x = b, where

|E|≤ c(n)ε|A|
c(n) is a modest linear function of n, and ε is the machine precision. If x0 is the true solution, the computed
solution x satisfies this error bound:

where cond(A,x)= || |A-1||A| |x| ||∞ / ||x||∞≤ ||A-1||∞ ||A||∞ = κ∞(A).

Note that cond(A,x) can be much smaller than κ∞(A); the condition number of AT and AH might or might
not be equal to κ∞(A).

The approximate number of floating-point operations for one right-hand side vector b is 2n*kd for real
flavors and 8n*kd for complex flavors.

To estimate the condition number κ∞(A), call ?tbcon.

To estimate the error in the solution, call ?tbrfs.

See Also
Matrix Storage Schemes

Estimating the Condition Number: LAPACK Computational Routines

This section describes the LAPACK routines for estimating the condition number of a matrix. The condition
number is used for analyzing the errors in the solution of a system of linear equations (see Error Analysis).
Since the condition number may be arbitrarily large when the matrix is nearly singular, the routines actually
compute the reciprocal condition number.

?gecon
Estimates the reciprocal of the condition number of a
general matrix in the 1-norm or the infinity-norm.

559
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
lapack_int LAPACKE_sgecon( int matrix_layout, char norm, lapack_int n, const float* a,
lapack_int lda, float anorm, float* rcond );
lapack_int LAPACKE_dgecon( int matrix_layout, char norm, lapack_int n, const double* a,
lapack_int lda, double anorm, double* rcond );
lapack_int LAPACKE_cgecon( int matrix_layout, char norm, lapack_int n, const
lapack_complex_float* a, lapack_int lda, float anorm, float* rcond );
lapack_int LAPACKE_zgecon( int matrix_layout, char norm, lapack_int n, const
lapack_complex_double* a, lapack_int lda, double anorm, double* rcond );

Include Files
• mkl.h

Description
The routine estimates the reciprocal of the condition number of a general matrix A in the 1-norm or infinity-
norm:
κ1(A) =||A||1||A-1||1 = κ∞(AT) = κ∞(AH)
κ∞(A) =||A||∞||A-1||∞ = κ1(AT) = κ1(AH).
An estimate is obtained for ||A-1||, and the reciprocal of the condition number is computed as rcond =
1 / (||A|| ||A-1||).
Before calling this routine:

• compute anorm (either ||A||1 = maxjΣi |aij| or ||A||∞ = maxiΣj |aij|)

• call ?getrf to compute the LU factorization of A.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

norm Must be '1' or 'O' or 'I'.

If norm = '1' or 'O', then the routine estimates the condition

number of matrix A in 1-norm.
If norm = 'I', then the routine estimates the condition number of
matrix A in infinity-norm.

n The order of the matrix A; n≥ 0.

a The array a contains the LU-factored matrix A, as returned

by ?getrf.

anorm The norm of the original matrix A (see Description).

lda The leading dimension of a; lda≥ max(1, n).

560
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

rcond An estimate of the reciprocal of the condition number. The routine sets
rcond = 0 if the estimate underflows; in this case the matrix is
singular (to working precision). However, anytime rcond is small
compared to 1.0, for the working precision, the matrix may be poorly
conditioned or even singular.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
The computed rcond is never less than r (the reciprocal of the true condition number) and in practice is
nearly always less than 10r. A call to this routine involves solving a number of systems of linear equations
A*x = b or AH*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires
approximately 2*n2 floating-point operations for real flavors and 8*n2 for complex flavors.

See Also
Matrix Storage Schemes

?gbcon
Estimates the reciprocal of the condition number of a
band matrix in the 1-norm or the infinity-norm.

Syntax
lapack_int LAPACKE_sgbcon( int matrix_layout, char norm, lapack_int n, lapack_int kl,
lapack_int ku, const float* ab, lapack_int ldab, const lapack_int* ipiv, float anorm,
float* rcond );
lapack_int LAPACKE_dgbcon( int matrix_layout, char norm, lapack_int n, lapack_int kl,
lapack_int ku, const double* ab, lapack_int ldab, const lapack_int* ipiv, double anorm,
double* rcond );
lapack_int LAPACKE_cgbcon( int matrix_layout, char norm, lapack_int n, lapack_int kl,
lapack_int ku, const lapack_complex_float* ab, lapack_int ldab, const lapack_int* ipiv,
float anorm, float* rcond );
lapack_int LAPACKE_zgbcon( int matrix_layout, char norm, lapack_int n, lapack_int kl,
lapack_int ku, const lapack_complex_double* ab, lapack_int ldab, const lapack_int*
ipiv, double anorm, double* rcond );

Include Files
• mkl.h

Description
The routine estimates the reciprocal of the condition number of a general band matrix A in the 1-norm or
infinity-norm:
κ1(A) = ||A||1||A-1||1 = κ∞(AT) = κ∞(AH)
κ∞(A) = ||A||∞||A-1||∞ = κ1(AT) = κ1(AH).

561
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

An estimate is obtained for ||A-1||, and the reciprocal of the condition number is computed as rcond =
1 / (||A|| ||A-1||).
Before calling this routine:

• compute anorm (either ||A||1 = maxjΣi |aij| or ||A||∞ = maxiΣj |aij|)

• call ?gbtrf to compute the LU factorization of A.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

norm Must be '1' or 'O' or 'I'.

If norm = '1' or 'O', then the routine estimates the condition

number of matrix A in 1-norm.
If norm = 'I', then the routine estimates the condition number of
matrix A in infinity-norm.

n The order of the matrix A; n≥ 0.

kl The number of subdiagonals within the band of A; kl≥ 0.

ku The number of superdiagonals within the band of A; ku≥ 0.

ldab The leading dimension of the array ab. (ldab≥ 2*kl + ku +1).

ipiv Array, size at least max(1, n). The ipiv array, as returned by ?gbtrf.

ab The array abof size max(1, ldab*n) contains the factored band matrix
A, as returned by ?gbtrf.

anorm The norm of the original matrix A(see Description).

Output Parameters

rcond An estimate of the reciprocal of the condition number. The routine sets
rcond =0 if the estimate underflows; in this case the matrix is singular
(to working precision). However, anytime rcond is small compared to
1.0, for the working precision, the matrix may be poorly conditioned
or even singular.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
The computed rcond is never less than r (the reciprocal of the true condition number) and in practice is
nearly always less than 10r. A call to this routine involves solving a number of systems of linear equations
A*x = b or AH*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires
approximately 2n(ku + 2kl) floating-point operations for real flavors and 8n(ku + 2kl) for complex
flavors.

562
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
See Also
Matrix Storage Schemes

?gtcon
Estimates the reciprocal of the condition number of a
tridiagonal matrix.

Syntax
lapack_int LAPACKE_sgtcon( char norm, lapack_int n, const float* dl, const float* d,
const float* du, const float* du2, const lapack_int* ipiv, float anorm, float* rcond );
lapack_int LAPACKE_dgtcon( char norm, lapack_int n, const double* dl, const double* d,
const double* du, const double* du2, const lapack_int* ipiv, double anorm, double*
rcond );
lapack_int LAPACKE_cgtcon( char norm, lapack_int n, const lapack_complex_float* dl,
const lapack_complex_float* d, const lapack_complex_float* du, const
lapack_complex_float* du2, const lapack_int* ipiv, float anorm, float* rcond );
lapack_int LAPACKE_zgtcon( char norm, lapack_int n, const lapack_complex_double* dl,
const lapack_complex_double* d, const lapack_complex_double* du, const
lapack_complex_double* du2, const lapack_int* ipiv, double anorm, double* rcond );

Include Files
• mkl.h

Description

The routine estimates the reciprocal of the condition number of a real or complex tridiagonal matrix A in the
1-norm or infinity-norm:
κ1(A) = ||A||1||A-1||1
κ∞(A) = ||A||∞||A-1||∞
An estimate is obtained for ||A-1||, and the reciprocal of the condition number is computed as rcond =
1 / (||A|| ||A-1||).
Before calling this routine:

• compute anorm (either ||A||1 = maxjΣi |aij| or ||A||∞ = maxiΣj |aij|)

• call ?gttrf to compute the LU factorization of A.

Input Parameters

norm Must be '1' or 'O' or 'I'.

If norm = '1' or 'O', then the routine estimates the condition

number of matrix A in 1-norm.
If norm = 'I', then the routine estimates the condition number of
matrix A in infinity-norm.

n The order of the matrix A; n≥ 0.

dl,d,du,du2 Arrays: dl(n -1), d(n), du(n -1), du2(n -2).

The array dl contains the (n - 1) multipliers that define the matrix L

from the LU factorization of A as computed by ?gttrf.

563
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The array d contains the n diagonal elements of the upper triangular

matrix U from the LU factorization of A.
The array du contains the (n - 1) elements of the first superdiagonal
of U.
The array du2 contains the (n - 2) elements of the second
superdiagonal of U.

ipiv Array, size (n). The array of pivot indices, as returned by ?gttrf.

anorm The norm of the original matrix A(see Description).

Output Parameters

rcond An estimate of the reciprocal of the condition number. The routine sets
rcond=0 if the estimate underflows; in this case the matrix is singular
(to working precision). However, anytime rcond is small compared to
1.0, for the working precision, the matrix may be poorly conditioned
or even singular.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
The computed rcond is never less than r (the reciprocal of the true condition number) and in practice is
nearly always less than 10r. A call to this routine involves solving a number of systems of linear equations
A*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires approximately 2n2
floating-point operations for real flavors and 8n2 for complex flavors.

?pocon
Estimates the reciprocal of the condition number of a
symmetric (Hermitian) positive-definite matrix.

Syntax
lapack_int LAPACKE_spocon( int matrix_layout, char uplo, lapack_int n, const float* a,
lapack_int lda, float anorm, float* rcond );
lapack_int LAPACKE_dpocon( int matrix_layout, char uplo, lapack_int n, const double* a,
lapack_int lda, double anorm, double* rcond );
lapack_int LAPACKE_cpocon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float* a, lapack_int lda, float anorm, float* rcond );
lapack_int LAPACKE_zpocon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double* a, lapack_int lda, double anorm, double* rcond );

Include Files
• mkl.h

564
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description

The routine estimates the reciprocal of the condition number of a symmetric (Hermitian) positive-definite
matrix A:
κ1(A) = ||A||1 ||A-1||1 (since A is symmetric or Hermitian, κ∞(A) = κ1(A)).
An estimate is obtained for ||A-1||, and the reciprocal of the condition number is computed as rcond =
1 / (||A|| ||A-1||).
Before calling this routine:

• compute anorm (either ||A||1 = maxjΣi |aij| or ||A||∞ = maxiΣj |aij|)

• call ?potrf to compute the Cholesky factorization of A.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', A is factored as A = UT*U for real flavors or A = UH*U
for complex flavors, and U is stored.
If uplo = 'L', A is factored as A = L*LT for real flavors or A = L*LH
for complex flavors, and L is stored.

n The order of the matrix A; n≥ 0.

a The array a of size max(1, lda*n) contains the factored matrix A, as

returned by ?potrf.

lda The leading dimension of a; lda≥ max(1, n).

anorm The norm of the original matrix A (see Description).

Output Parameters

rcond An estimate of the reciprocal of the condition number. The routine sets
rcond =0 if the estimate underflows; in this case the matrix is singular
(to working precision). However, anytime rcond is small compared to
1.0, for the working precision, the matrix may be poorly conditioned
or even singular.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

565
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Application Notes
The computed rcond is never less than r (the reciprocal of the true condition number) and in practice is
nearly always less than 10r. A call to this routine involves solving a number of systems of linear equations
A*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires approximately 2n2
floating-point operations for real flavors and 8n2 for complex flavors.

See Also
Matrix Storage Schemes

?ppcon
Estimates the reciprocal of the condition number of a
packed symmetric (Hermitian) positive-definite
matrix.

Syntax
lapack_int LAPACKE_sppcon( int matrix_layout, char uplo, lapack_int n, const float* ap,
float anorm, float* rcond );
lapack_int LAPACKE_dppcon( int matrix_layout, char uplo, lapack_int n, const double*
ap, double anorm, double* rcond );
lapack_int LAPACKE_cppcon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float* ap, float anorm, float* rcond );
lapack_int LAPACKE_zppcon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double* ap, double anorm, double* rcond );

Include Files
• mkl.h

Description

The routine estimates the reciprocal of the condition number of a packed symmetric (Hermitian) positive-
definite matrix A:
κ1(A) = ||A||1 ||A-1||1 (since A is symmetric or Hermitian, κ∞(A) = κ1(A)).
An estimate is obtained for ||A-1||, and the reciprocal of the condition number is computed as rcond =
1 / (||A|| ||A-1||).
Before calling this routine:

• compute anorm (either ||A||1 = maxjΣi |aij| or ||A||∞ = maxiΣj |aij|)

• call ?pptrf to compute the Cholesky factorization of A.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', A is factored as A = UT*U for real flavors or A = UH*U
for complex flavors, and U is stored.

566
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If uplo = 'L', A is factored as A = L*LT for real flavors or A = L*LH
for complex flavors, and L is stored.

n The order of the matrix A; n≥ 0.

ap The array ap contains the packed factored matrix A, as returned

by ?pptrf. The dimension of ap must be at least max(1,n(n+1)/2).

anorm The norm of the original matrix A (see Description).

Output Parameters

rcond An estimate of the reciprocal of the condition number. The routine sets
rcond =0 if the estimate underflows; in this case the matrix is singular
(to working precision). However, anytime rcond is small compared to
1.0, for the working precision, the matrix may be poorly conditioned
or even singular.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
The computed rcond is never less than r (the reciprocal of the true condition number) and in practice is
nearly always less than 10r. A call to this routine involves solving a number of systems of linear equations
A*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires approximately 2n2
floating-point operations for real flavors and 8n2 for complex flavors.

See Also
Matrix Storage Schemes

?pbcon
Estimates the reciprocal of the condition number of a
symmetric (Hermitian) positive-definite band matrix.

Syntax
lapack_int LAPACKE_spbcon( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
const float* ab, lapack_int ldab, float anorm, float* rcond );
lapack_int LAPACKE_dpbcon( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
const double* ab, lapack_int ldab, double anorm, double* rcond );
lapack_int LAPACKE_cpbcon( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
const lapack_complex_float* ab, lapack_int ldab, float anorm, float* rcond );
lapack_int LAPACKE_zpbcon( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
const lapack_complex_double* ab, lapack_int ldab, double anorm, double* rcond );

Include Files
• mkl.h

Description

567
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The routine estimates the reciprocal of the condition number of a symmetric (Hermitian) positive-definite
band matrix A:
κ1(A) = ||A||1 ||A-1||1 (since A is symmetric or Hermitian, κ∞(A) = κ1(A)).
An estimate is obtained for ||A-1||, and the reciprocal of the condition number is computed as rcond =
1 / (||A|| ||A-1||).
Before calling this routine:

• compute anorm (either ||A||1 = maxjΣi |aij| or ||A||∞ = maxiΣj |aij|)

• call ?pbtrf to compute the Cholesky factorization of A.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

n The order of the matrix A; n≥ 0.

kd The number of superdiagonals or subdiagonals in the matrix A; kd≥ 0.

ldab The leading dimension of the array ab. (ldab≥kd +1).

ab The array ab of size max(1, ldab*n) contains the factored matrix A in

band form, as returned by ?pbtrf.

anorm The norm of the original matrix A (see Description).

Output Parameters

rcond An estimate of the reciprocal of the condition number. The routine sets
rcond =0 if the estimate underflows; in this case the matrix is singular
(to working precision). However, anytime rcond is small compared to
1.0, for the working precision, the matrix may be poorly conditioned
or even singular.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
The computed rcond is never less than r (the reciprocal of the true condition number) and in practice is
nearly always less than 10r. A call to this routine involves solving a number of systems of linear equations
A*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires approximately
4*n(kd + 1) floating-point operations for real flavors and 16*n(kd + 1) for complex flavors.

568
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
See Also
Matrix Storage Schemes

?ptcon
Estimates the reciprocal of the condition number of a
symmetric (Hermitian) positive-definite tridiagonal
matrix.

Syntax
lapack_int LAPACKE_sptcon( lapack_int n, const float* d, const float* e, float anorm,
float* rcond );
lapack_int LAPACKE_dptcon( lapack_int n, const double* d, const double* e, double
anorm, double* rcond );
lapack_int LAPACKE_cptcon( lapack_int n, const float* d, const lapack_complex_float* e,
float anorm, float* rcond );
lapack_int LAPACKE_zptcon( lapack_int n, const double* d, const lapack_complex_double*
e, double anorm, double* rcond );

Include Files
• mkl.h

Description

The routine computes the reciprocal of the condition number (in the 1-norm) of a real symmetric or complex
Hermitian positive-definite tridiagonal matrix using the factorization A = L*D*LT for real flavors and A =
L*D*LH for complex flavors or A = UT*D*U for real flavors and A = UH*D*U for complex flavors computed
by ?pttrf :

κ1(A) = ||A||1 ||A-1||1 (since A is symmetric or Hermitian, κ∞(A) = κ1(A)).

The norm ||A-1|| is computed by a direct method, and the reciprocal of the condition number is computed
as rcond = 1 / (||A|| ||A-1||).

Before calling this routine:

• compute anorm as ||A||1 = maxjΣi |aij|

• call ?pttrf to compute the factorization of A.

Input Parameters

n The order of the matrix A; n≥ 0.

d Arrays, dimension (n).

The array d contains the n diagonal elements of the diagonal matrix D
from the factorization of A, as computed by ?pttrf ;

e Array, size (n -1).

Contains off-diagonal elements of the unit bidiagonal factor U or L

from the factorization computed by ?pttrf .

anorm The 1- norm of the original matrix A (see Description).

569
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

rcond An estimate of the reciprocal of the condition number. The routine sets
rcond =0 if the estimate underflows; in this case the matrix is singular
(to working precision). However, anytime rcond is small compared to
1.0, for the working precision, the matrix may be poorly conditioned
or even singular.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
The computed rcond is never less than r (the reciprocal of the true condition number) and in practice is
nearly always less than 10r. A call to this routine involves solving a number of systems of linear equations
A*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires approximately
4*n(kd + 1) floating-point operations for real flavors and 16*n(kd + 1) for complex flavors.

?sycon
Estimates the reciprocal of the condition number of a
symmetric matrix.

Syntax
lapack_int LAPACKE_ssycon( int matrix_layout, char uplo, lapack_int n, const float* a,
lapack_int lda, const lapack_int* ipiv, float anorm, float* rcond );
lapack_int LAPACKE_dsycon( int matrix_layout, char uplo, lapack_int n, const double* a,
lapack_int lda, const lapack_int* ipiv, double anorm, double* rcond );
lapack_int LAPACKE_csycon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float* a, lapack_int lda, const lapack_int* ipiv, float anorm, float*
rcond );
lapack_int LAPACKE_zsycon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double* a, lapack_int lda, const lapack_int* ipiv, double anorm, double*
rcond );

Include Files
• mkl.h

Description

The routine estimates the reciprocal of the condition number of a symmetric matrix A:
κ1(A) = ||A||1 ||A-1||1 (since A is symmetric, κ∞(A) = κ1(A)).
An estimate is obtained for ||A-1||, and the reciprocal of the condition number is computed as rcond =
1 / (||A|| ||A-1||).
Before calling this routine:

570
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• compute anorm (either ||A||1 = maxjΣi |aij| or ||A||∞ = maxiΣj |aij|)
• call ?sytrf to compute the factorization of A.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', the array a stores the upper triangular factor U of the
factorization A = U*D*UT.

If uplo = 'L', the array a stores the lower triangular factor L of the
factorization A = L*D*LT.

n The order of matrix A; n≥ 0.

a The array a of size max(1,lda*n) contains the factored matrix A, as

returned by ?sytrf.

lda The leading dimension of a; lda≥ max(1, n).

ipiv Array, size at least max(1, n).

The array ipiv, as returned by ?sytrf.

anorm The norm of the original matrix A (see Description).

Output Parameters

rcond An estimate of the reciprocal of the condition number. The routine sets
rcond =0 if the estimate underflows; in this case the matrix is singular
(to working precision). However, anytime rcond is small compared to
1.0, for the working precision, the matrix may be poorly conditioned
or even singular.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
The computed rcond is never less than r (the reciprocal of the true condition number) and in practice is
nearly always less than 10r. A call to this routine involves solving a number of systems of linear equations
A*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires approximately 2n2
floating-point operations for real flavors and 8n2 for complex flavors.

See Also
Matrix Storage Schemes

571
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

?sycon_3
Estimates the reciprocal of the condition number (in
the 1-norm) of a real or complex symmetric matrix A
using the factorization computed by ?sytrf_rk.
lapack_int LAPACKE_ssycon_3 (int matrix_layout, char uplo, lapack_int n, const float *
A, lapack_int lda, const float * e, const lapack_int * ipiv, float anorm, float *
rcond);
lapack_int LAPACKE_dsycon_3 (int matrix_layout, char uplo, lapack_int n, const double *
A, lapack_int lda, const double * e, const lapack_int * ipiv, double anorm, double *
rcond);
lapack_int LAPACKE_csycon_3 (int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float * A, lapack_int lda, const lapack_complex_float * e, const
lapack_int * ipiv, float anorm, float * rcond);
lapack_int LAPACKE_zsycon_3 (int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double * A, lapack_int lda, const lapack_complex_double * e, const
lapack_int * ipiv, double anorm, double * rcond);

Description
?sycon_3 estimates the reciprocal of the condition number (in the 1-norm) of a real or complex symmetric
matrix A using the factorization computed by ?sytrf_rk. A = P*U*D*(UT)*(PT) or A = P*L*D*(LT)*(PT),
where U (or L) is unit upper (or lower) triangular matrix, UT (or LT) is the transpose of U (or L), P is a
permutation matrix, PT is the transpose of P, and D is symmetric and block diagonal with 1-by-1 and 2-by-2
diagonal blocks.
An estimate is obtained for norm(inv(A)), and the reciprocal of the condition number is computed as rcond
= 1 / (anorm * norm(inv(A))).

This routine uses BLAS3 solver ?sytrs_3.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo Specifies whether the details of the factorization are stored as an upper or
lower triangular matrix:

• = 'U': Upper triangular. The form is A = PUD(UT)(PT).

• = 'L': Lower triangular. The form is A = P*L*D*(LT)*(PT).

n The order of the matrix A. n ≥ 0.

A Array of size max(1, lda*n). Diagonal of the block diagonal matrix D and
factors U or L as computed by ?sytrf_rk:

• Only diagonal elements of the symmetric block diagonal matrix D on the

diagonal of A; that is, D(k,k) = A(k,k). Superdiagonal (or subdiagonal)
elements of D should be provided on entry in array e).

—and—
• If uplo = 'U', factor U in the superdiagonal part of A. If uplo = 'L',
factor L in the subdiagonal part of A.

lda The leading dimension of the array A.

572
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
e Array of size n. On entry, contains the superdiagonal (or subdiagonal)
elements of the symmetric block diagonal matrix D with 1-by-1 or 2-by-2
diagonal blocks. If uplo = 'U', e(i) = D(i-1,i), i=2:N, and e(1) is not
referenced. If uplo = 'L', e(i) = D(i+1,i), i=1:N-1, and e(n) is not
referenced.

NOTE For 1-by-1 diagonal block D(k), where 1 ≤ k ≤ n, the

element e[k-1] is not referenced in both the uplo = 'U' and
uplo = 'L' cases.

ipiv Array of size n. Details of the interchanges and the block structure of D as
determined by ?sytrf_rk.

anorm The 1-norm of the original matrix A.

Output Parameters

rcond The reciprocal of the condition number of the matrix A, computed as rcond
= 1/(anorm * AINVNM), where AINVNM is an estimate of the 1-norm of
inv(A) computed in this routine.

Return Values
This function returns a value info.

= 0: Successful exit.
< 0: If info = -i, the ith argument had an illegal value.

?hecon
Estimates the reciprocal of the condition number of a
Hermitian matrix.

Syntax
lapack_int LAPACKE_checon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float* a, lapack_int lda, const lapack_int* ipiv, float anorm, float*
rcond );
lapack_int LAPACKE_zhecon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double* a, lapack_int lda, const lapack_int* ipiv, double anorm, double*
rcond );

Include Files
• mkl.h

Description

The routine estimates the reciprocal of the condition number of a Hermitian matrix A:
κ1(A) = ||A||1 ||A-1||1 (since A is Hermitian, κ∞(A) = κ1(A)).
Before calling this routine:

• compute anorm (either ||A||1 =maxjΣi |aij| or ||A||∞ =maxiΣj |aij|)

573
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

• call ?hetrf to compute the factorization of A.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', the array a stores the upper triangular factor U of the
factorization A = U*D*UH.

If uplo = 'L', the array a stores the lower triangular factor L of the
factorization A = L*D*LH.

n The order of matrix A; n≥ 0.

a The array a of size max(1, lda*n) contains the factored matrix A, as

returned by ?hetrf.

lda The leading dimension of a; lda≥ max(1, n).

ipiv Array, size at least max(1, n).

The array ipiv, as returned by ?hetrf.

anorm The norm of the original matrix A (see Description).

Output Parameters

rcond An estimate of the reciprocal of the condition number. The routine sets
rcond =0 if the estimate underflows; in this case the matrix is singular
(to working precision). However, anytime rcond is small compared to
1.0, for the working precision, the matrix may be poorly conditioned
or even singular.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
The computed rcond is never less than r (the reciprocal of the true condition number) and in practice is
nearly always less than 10r. A call to this routine involves solving a number of systems of linear equations
A*x = b; the number is usually 5 and never more than 11. Each solution requires approximately 8n2
floating-point operations.

See Also
Matrix Storage Schemes

?hecon_3
Estimates the reciprocal of the condition number (in
the 1-norm) of a complex Hermitian matrix A.

574
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_checon_3 (int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float * A, lapack_int lda, const lapack_complex_float * e, const
lapack_int * ipiv, float anorm, float * rcond);
lapack_int LAPACKE_zhecon_3 (int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double * A, lapack_int lda, const lapack_complex_double * e, const
lapack_int * ipiv, double anorm, double * rcond);

Description
?hecon_3 estimates the reciprocal of the condition number (in the 1-norm) of a complex Hermitian matrix A
using the factorization computed by ?hetrf_rk: A = P*U*D*(UH)*(PT) or A = P*L*D*(LH)*(PT), where U (or
L) is unit upper (or lower) triangular matrix, UH (or LH) is the conjugate of U (or L), P is a permutation
matrix, PT is the transpose of P, and D is Hermitian and block diagonal with 1-by-1 and 2-by-2 diagonal
blocks. An estimate is obtained for norm(inv(A)), and the reciprocal of the condition number is computed as
rcond = 1 / (anorm * norm(inv(A))).
This routine uses BLAS3 solver ?hetrs_3.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo Specifies whether the details of the factorization are stored as an upper or
lower triangular matrix: = 'U': Upper triangular, form is A =
P*U*D*(UH)*(PT); = 'L': Lower triangular, form is A = P*L*D*(LH)*(PT).

n The order of the matrix A. n ≥ 0.

A Array of size max(1, lda*n). Diagonal of the block diagonal matrix D and
factor U or L as computed by ?hetrf_rk:

• Only diagonal elements of the Hermitian block diagonal matrix D on the

diagonal of A—that is, D(k,k) = A(k,k). Superdiagonal (or subdiagonal)
elements of D must be provided on entry in array e.

—and—
• If uplo = 'U', factor U in the superdiagonal part of A. If uplo = 'L',
factor L in the subdiagonal part of A.

lda The leading dimension of the array A.

e Array of size n. On entry, contains the superdiagonal (or subdiagonal)

elements of the Hermitian block diagonal matrix D with 1-by-1 or 2-by-2
diagonal blocks. If uplo = 'U', e(i) = D(i-1, i),i=2:N, and e(1) is not
referenced. If uplo = 'L', e(i) = D(i+1,i), i=1:N-1, and e(n) is not
referenced.

NOTE For 1-by-1 diagonal block D(k), where 1 ≤ k ≤ n, the

element e[k-1] is not referenced in both the uplo = 'U' and
uplo = 'L' cases.

ipiv Array of size n. Details of the interchanges and the block structure of D as
determined by ?hetrf_rk.

anorm The 1-norm of the original matrix A.

575
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

rcond The reciprocal of the condition number of the matrix A, computed as rcond
= 1/(anorm * AINVNM), where AINVNM is an estimate of the 1-norm of
inv(A) computed in this routine.

Return Values
This function returns a value info.

= 0: Successful exit.
< 0: If info = -i, the ith argument had an illegal value.

?spcon
Estimates the reciprocal of the condition number of a
packed symmetric matrix.

Syntax
lapack_int LAPACKE_sspcon( int matrix_layout, char uplo, lapack_int n, const float* ap,
const lapack_int* ipiv, float anorm, float* rcond );
lapack_int LAPACKE_dspcon( int matrix_layout, char uplo, lapack_int n, const double*
ap, const lapack_int* ipiv, double anorm, double* rcond );
lapack_int LAPACKE_cspcon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float* ap, const lapack_int* ipiv, float anorm, float* rcond );
lapack_int LAPACKE_zspcon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double* ap, const lapack_int* ipiv, double anorm, double* rcond );

Include Files
• mkl.h

Description

The routine estimates the reciprocal of the condition number of a packed symmetric matrix A:
κ1(A) = ||A||1 ||A-1||1 (since A is symmetric, κ∞(A) = κ1(A)).
An estimate is obtained for ||A-1||, and the reciprocal of the condition number is computed as rcond =
1 / (||A|| ||A-1||).
Before calling this routine:

• compute anorm (either ||A||1 = maxjΣi |aij| or ||A||∞ = maxiΣj |aij|)

• call ?sptrf to compute the factorization of A.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

576
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If uplo = 'U', the array ap stores the packed upper triangular factor
U of the factorization A = U*D*UT.

If uplo = 'L', the array ap stores the packed lower triangular factor
L of the factorization A = L*D*LT.

n The order of matrix A; n≥ 0.

ap The array ap contains the packed factored matrix A, as returned

by ?sptrf. The dimension of ap must be at least max(1,n(n+1)/2).

ipiv Array, size at least max(1, n).

The array ipiv, as returned by ?sptrf.

anorm The norm of the original matrix A (see Description).

Output Parameters

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
The computed rcond is never less than r (the reciprocal of the true condition number) and in practice is
nearly always less than 10r. A call to this routine involves solving a number of systems of linear equations
A*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires approximately 2n2
floating-point operations for real flavors and 8n2 for complex flavors.

See Also
Matrix Storage Schemes

?hpcon
Estimates the reciprocal of the condition number of a
packed Hermitian matrix.

Syntax
lapack_int LAPACKE_chpcon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float* ap, const lapack_int* ipiv, float anorm, float* rcond );
lapack_int LAPACKE_zhpcon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double* ap, const lapack_int* ipiv, double anorm, double* rcond );

Include Files
• mkl.h

577
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Description

The routine estimates the reciprocal of the condition number of a Hermitian matrix A:
κ1(A) = ||A||1 ||A-1||1 (since A is Hermitian, κ∞(A) = k1(A)).
An estimate is obtained for ||A-1||, and the reciprocal of the condition number is computed as rcond =
1 / (||A|| ||A-1||).
Before calling this routine:

• compute anorm (either ||A||1 =maxjΣi |aij| or ||A||∞ =maxiΣj |aij|)

• call ?hptrf to compute the factorization of A.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', the array ap stores the packed upper triangular factor
U of the factorization A = U*D*UT.

If uplo = 'L', the array ap stores the packed lower triangular factor
L of the factorization A = L*D*LT.

n The order of matrix A; n≥ 0.

ap The array ap contains the packed factored matrix A, as returned

by ?hptrf. The dimension of ap must be at least max(1,n(n+1)/2).

ipiv Array, size at least max(1, n). The array ipiv, as returned by ?hptrf.

anorm The norm of the original matrix A (see Description).

Output Parameters

rcond An estimate of the reciprocal of the condition number. The routine sets
rcond =0 if the estimate underflows; in this case the matrix is singular
(to working precision). However, anytime rcond is small compared to
1.0, for the working precision, the matrix may be poorly conditioned
or even singular.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
The computed rcond is never less than r (the reciprocal of the true condition number) and in practice is
nearly always less than 10r. A call to this routine involves solving a number of systems of linear equations
A*x = b; the number is usually 5 and never more than 11. Each solution requires approximately 8n2
floating-point operations.

578
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
See Also
Matrix Storage Schemes

?trcon
Estimates the reciprocal of the condition number of a
triangular matrix.

Syntax
lapack_int LAPACKE_strcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, const float* a, lapack_int lda, float* rcond );
lapack_int LAPACKE_dtrcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, const double* a, lapack_int lda, double* rcond );
lapack_int LAPACKE_ctrcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, const lapack_complex_float* a, lapack_int lda, float* rcond );
lapack_int LAPACKE_ztrcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, const lapack_complex_double* a, lapack_int lda, double* rcond );

Include Files
• mkl.h

Description

The routine estimates the reciprocal of the condition number of a triangular matrix A in either the 1-norm or
infinity-norm:
κ1(A) =||A||1 ||A-1||1 = κ∞(AT) = κ∞(AH)
κ∞ (A) =||A||∞ ||A-1||∞ =k1 (AT) = κ1 (AH) .

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

norm Must be '1' or 'O' or 'I'.

If norm = '1' or 'O', then the routine estimates the condition

number of matrix A in 1-norm.
If norm = 'I', then the routine estimates the condition number of
matrix A in infinity-norm.

uplo Must be 'U' or 'L'.

Indicates whether A is upper or lower triangular:

If uplo = 'U', the array a stores the upper triangle of A, other array
elements are not referenced.
If uplo = 'L', the array a stores the lower triangle of A, other array
elements are not referenced.

diag Must be 'N' or 'U'.

If diag = 'N', then A is not a unit triangular matrix.

579
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If diag = 'U', then A is unit triangular: diagonal elements are

assumed to be 1 and not referenced in the array a.

n The order of the matrix A; n≥ 0.

a The array a of size max(1, lda*n) contains the matrix A.

lda The leading dimension of a; lda≥ max(1, n).

Output Parameters

rcond An estimate of the reciprocal of the condition number. The routine sets
rcond =0 if the estimate underflows; in this case the matrix is singular
(to working precision). However, anytime rcond is small compared to
1.0, for the working precision, the matrix may be poorly conditioned
or even singular.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
The computed rcond is never less than r (the reciprocal of the true condition number) and in practice is
nearly always less than 10r. A call to this routine involves solving a number of systems of linear equations
A*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires approximately n2
floating-point operations for real flavors and 4n2 operations for complex flavors.

See Also
Matrix Storage Schemes

?tpcon
Estimates the reciprocal of the condition number of a
packed triangular matrix.

Syntax
lapack_int LAPACKE_stpcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, const float* ap, float* rcond );
lapack_int LAPACKE_dtpcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, const double* ap, double* rcond );
lapack_int LAPACKE_ctpcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, const lapack_complex_float* ap, float* rcond );
lapack_int LAPACKE_ztpcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, const lapack_complex_double* ap, double* rcond );

Include Files
• mkl.h

Description

580
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The routine estimates the reciprocal of the condition number of a packed triangular matrix A in either the 1-
norm or infinity-norm:
κ1(A) =||A||1 ||A-1||1 = κ∞(AT) = κ∞(AH)
κ∞(A) =||A||∞ ||A-1||∞ =κ1 (AT) = κ1(AH) .

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

norm Must be '1' or 'O' or 'I'.

If norm = '1' or 'O', then the routine estimates the condition

number of matrix A in 1-norm.
If norm = 'I', then the routine estimates the condition number of
matrix A in infinity-norm.

uplo Must be 'U' or 'L'. Indicates whether A is upper or lower triangular:

If uplo = 'U', the array ap stores the upper triangle of A in packed

form.
If uplo = 'L', the array ap stores the lower triangle of A in packed
form.

diag Must be 'N' or 'U'.

If diag = 'N', then A is not a unit triangular matrix.

If diag = 'U', then A is unit triangular: diagonal elements are

assumed to be 1 and not referenced in the array ap.

n The order of the matrix A; n≥ 0.

ap The array ap contains the packed matrix A. The dimension of ap must

be at least max(1,n(n+1)/2).

Output Parameters

rcond An estimate of the reciprocal of the condition number. The routine sets
rcond =0 if the estimate underflows; in this case the matrix is singular
(to working precision). However, anytime rcond is small compared to
1.0, for the working precision, the matrix may be poorly conditioned
or even singular.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
The computed rcond is never less than r (the reciprocal of the true condition number) and in practice is
nearly always less than 10r. A call to this routine involves solving a number of systems of linear equations
A*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires approximately n2
floating-point operations for real flavors and 4n2 operations for complex flavors.

581
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

See Also
Matrix Storage Schemes

?tbcon
Estimates the reciprocal of the condition number of a
triangular band matrix.

Syntax
lapack_int LAPACKE_stbcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, lapack_int kd, const float* ab, lapack_int ldab, float* rcond );
lapack_int LAPACKE_dtbcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, lapack_int kd, const double* ab, lapack_int ldab, double* rcond );
lapack_int LAPACKE_ctbcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, lapack_int kd, const lapack_complex_float* ab, lapack_int ldab, float*
rcond );
lapack_int LAPACKE_ztbcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, lapack_int kd, const lapack_complex_double* ab, lapack_int ldab, double*
rcond );

Include Files
• mkl.h

Description

The routine estimates the reciprocal of the condition number of a triangular band matrix A in either the 1-
norm or infinity-norm:
κ1(A) =||A||1 ||A-1||1 = κ∞(AT) = κ∞(AH)
κ∞(A) =||A||∞ ||A-1||∞ =κ1 (AT) = κ1(AH) .

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

norm Must be '1' or 'O' or 'I'.

If norm = '1' or 'O', then the routine estimates the condition

number of matrix A in 1-norm.
If norm = 'I', then the routine estimates the condition number of
matrix A in infinity-norm.

uplo Must be 'U' or 'L'. Indicates whether A is upper or lower triangular:

If uplo = 'U', the array ap stores the upper triangle of A in packed

form.
If uplo = 'L', the array ap stores the lower triangle of A in packed
form.

diag Must be 'N' or 'U'.

If diag = 'N', then A is not a unit triangular matrix.

582
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If diag = 'U', then A is unit triangular: diagonal elements are
assumed to be 1 and not referenced in the array ab.

n The order of the matrix A; n≥ 0.

kd The number of superdiagonals or subdiagonals in the matrix A; kd≥ 0.

ab The array ab of size max(1, ldab*n) contains the band matrix A.

ldab The leading dimension of the array ab. (ldab≥kd +1).

Output Parameters

rcond An estimate of the reciprocal of the condition number. The routine sets
rcond =0 if the estimate underflows; in this case the matrix is singular
(to working precision). However, anytime rcond is small compared to
1.0, for the working precision, the matrix may be poorly conditioned
or even singular.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
The computed rcond is never less than r (the reciprocal of the true condition number) and in practice is
nearly always less than 10r. A call to this routine involves solving a number of systems of linear equations
A*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires approximately
2*n(kd + 1) floating-point operations for real flavors and 8*n(kd + 1) operations for complex flavors.

See Also
Matrix Storage Schemes

Refining the Solution and Estimating Its Error: LAPACK Computational Routines
This section describes the LAPACK routines for refining the computed solution of a system of linear equations
and estimating the solution error. You can call these routines after factorizing the matrix of the system of
equations and computing the solution (see Routines for Matrix Factorization and Routines for Solving
Systems of Linear Equations).

?gerfs
Refines the solution of a system of linear equations
with a general coefficient matrix and estimates its
error.

Syntax
lapack_int LAPACKE_sgerfs( int matrix_layout, char trans, lapack_int n, lapack_int
nrhs, const float* a, lapack_int lda, const float* af, lapack_int ldaf, const
lapack_int* ipiv, const float* b, lapack_int ldb, float* x, lapack_int ldx, float* ferr,
float* berr );
lapack_int LAPACKE_dgerfs( int matrix_layout, char trans, lapack_int n, lapack_int
nrhs, const double* a, lapack_int lda, const double* af, lapack_int ldaf, const
lapack_int* ipiv, const double* b, lapack_int ldb, double* x, lapack_int ldx, double*
ferr, double* berr );

583
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lapack_int LAPACKE_cgerfs( int matrix_layout, char trans, lapack_int n, lapack_int

nrhs, const lapack_complex_float* a, lapack_int lda, const lapack_complex_float* af,
lapack_int ldaf, const lapack_int* ipiv, const lapack_complex_float* b, lapack_int ldb,
lapack_complex_float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_zgerfs( int matrix_layout, char trans, lapack_int n, lapack_int
nrhs, const lapack_complex_double* a, lapack_int lda, const lapack_complex_double* af,
lapack_int ldaf, const lapack_int* ipiv, const lapack_complex_double* b, lapack_int
ldb, lapack_complex_double* x, lapack_int ldx, double* ferr, double* berr );

Include Files
• mkl.h

Description

The routine performs an iterative refinement of the solution to a system of linear equations A*X = B or AT*X
= B or AH*X = B with a general matrix A, with multiple right-hand sides. For each computed solution vector
x, the routine computes the component-wise backward errorβ. This error is the smallest relative
perturbation in elements of A and b such that x is the exact solution of the perturbed system:
|δaij| ≤β|aij|, |δbi| ≤β|bi| such that (A + δA)x = (b + δb).
Finally, the routine estimates the component-wise forward error in the computed solution ||x - xe||∞/||
x||∞ (here xe is the exact solution).
Before calling this routine:

• call the factorization routine ?getrf

• call the solver routine ?getrs.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

trans Must be 'N' or 'T' or 'C'.

Indicates the form of the equations:

If trans = 'N', the system has the form A*X = B.

If trans = 'T', the system has the form AT*X = B.

If trans = 'C', the system has the form AH*X = B.

n The order of the matrix A; n≥ 0.

nrhs The number of right-hand sides; nrhs≥ 0.

a,af,b,x Arrays:
a(size max(1, lda*n)) contains the original matrix A, as supplied
to ?getrf.

af(size max(1, ldaf*n)) contains the factored matrix A, as returned

by ?getrf.

bof size max(1, ldb*nrhs) for column major layout and max(1,
ldb*n) for row major layout contains the right-hand side matrix B.

584
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
xof size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout contains the solution matrix X.

lda The leading dimension of a; lda≥ max(1, n).

ldaf The leading dimension of af; ldaf≥ max(1, n).

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

ldx The leading dimension of x; ldx≥ max(1, n) for column major layout
and ldx≥nrhs for row major layout.

ipiv Array, size at least max(1, n).

The ipiv array, as returned by ?getrf.

Output Parameters

x The refined solution matrix X.

ferr, berr Arrays, size at least max(1, nrhs). Contain the component-wise
forward and backward errors, respectively, for each solution vector.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
The bounds returned in ferr are not rigorous, but in practice they almost always overestimate the actual
error.
For each right-hand side, computation of the backward error involves a minimum of 4n2 floating-point
operations (for real flavors) or 16n2 operations (for complex flavors). In addition, each step of iterative
refinement involves 6n2 operations (for real flavors) or 24n2 operations (for complex flavors); the number of
iterations may range from 1 to 5. Estimating the forward error involves solving a number of systems of linear
equations A*x = b with the same coefficient matrix A and different right hand sides b; the number is usually
4 or 5 and never more than 11. Each solution requires approximately 2n2 floating-point operations for real
flavors or 8n2 for complex flavors.

See Also
Matrix Storage Schemes

?gerfsx
Uses extra precise iterative refinement to improve the
solution to the system of linear equations with a
general coefficient matrix A and provides error bounds
and backward error estimates.

585
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
lapack_int LAPACKE_sgerfsx( int matrix_layout, char trans, char equed, lapack_int n,
lapack_int nrhs, const float* a, lapack_int lda, const float* af, lapack_int ldaf,
const lapack_int* ipiv, const float* r, const float* c, const float* b, lapack_int ldb,
float* x, lapack_int ldx, float* rcond, float* berr, lapack_int n_err_bnds, float*
err_bnds_norm, float* err_bnds_comp, lapack_int nparams, float* params );
lapack_int LAPACKE_dgerfsx( int matrix_layout, char trans, char equed, lapack_int n,
lapack_int nrhs, const double* a, lapack_int lda, const double* af, lapack_int ldaf,
const lapack_int* ipiv, const double* r, const double* c, const double* b, lapack_int
ldb, double* x, lapack_int ldx, double* rcond, double* berr, lapack_int n_err_bnds,
double* err_bnds_norm, double* err_bnds_comp, lapack_int nparams, double* params );
lapack_int LAPACKE_cgerfsx( int matrix_layout, char trans, char equed, lapack_int n,
lapack_int nrhs, const lapack_complex_float* a, lapack_int lda, const
lapack_complex_float* af, lapack_int ldaf, const lapack_int* ipiv, const float* r,
const float* c, const lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x,
lapack_int ldx, float* rcond, float* berr, lapack_int n_err_bnds, float* err_bnds_norm,
float* err_bnds_comp, lapack_int nparams, float* params );
lapack_int LAPACKE_zgerfsx( int matrix_layout, char trans, char equed, lapack_int n,
lapack_int nrhs, const lapack_complex_double* a, lapack_int lda, const
lapack_complex_double* af, lapack_int ldaf, const lapack_int* ipiv, const double* r,
const double* c, const lapack_complex_double* b, lapack_int ldb, lapack_complex_double*
x, lapack_int ldx, double* rcond, double* berr, lapack_int n_err_bnds, double*
err_bnds_norm, double* err_bnds_comp, lapack_int nparams, double* params );

Include Files
• mkl.h

Description

The routine improves the computed solution to a system of linear equations and provides error bounds and
backward error estimates for the solution. In addition to a normwise error bound, the code provides a
maximum componentwise error bound, if possible. See comments for err_bnds_norm and err_bnds_comp
for details of the error bounds.
The original system of linear equations may have been equilibrated before calling this routine, as described
by the parameters equed, r, and c below. In this case, the solution and error bounds returned are for the
original unequilibrated system.

Input Parameters

matrix_layout Specifies whether two-dimensional array storage is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

trans Must be 'N', 'T', or 'C'.

Specifies the form of the system of equations:

If trans = 'N', the system has the form A*X = B (No transpose).

If trans = 'T', the system has the form AT*X = B (Transpose).

If trans = 'C', the system has the form AH*X = B (Conjugate

transpose for complex flavors, Transpose for real flavors).

586
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
equed Must be 'N', 'R', 'C', or 'B'.

Specifies the form of equilibration that was done to A before calling

this routine.
If equed = 'N', no equilibration was done.

If equed = 'R', row equilibration was done, that is, A has been
premultiplied by diag(r).
If equed = 'C', column equilibration was done, that is, A has been
postmultiplied by diag(c).
If equed = 'B', both row and column equilibration was done, that is,
A has been replaced by diag(r)*A*diag(c). The right-hand side B
has been changed accordingly.

n The number of linear equations; the order of the matrix A; n≥ 0.

nrhs The number of right-hand sides; the number of columns of the

matrices B and X; nrhs≥ 0.

a, af, b Arrays: a (size max(1, ldan)), af (size max(1, ldafn)), b (size

max(1, ldb*nrhs) for column major layout and max(1, ldb*n) for
row major layout).
The array a contains the original n-by-n matrix A.

The array af contains the factored form of the matrix A, that is, the
factors L and U from the factorization A = P*L*U as computed
by ?getrf.

The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.

lda The leading dimension of a; lda≥ max(1, n).

ldaf The leading dimension of af; ldaf≥ max(1, n).

ipiv Array, size at least max(1, n). Contains the pivot indices as
computed by ?getrf; for row 1 ≤i≤n, row i of the matrix was
interchanged with row ipiv(i).

r, c Arrays: r (size n), c (size n). The array r contains the row scale
factors for A, and the array c contains the column scale factors for A.

equed = 'R' or 'B', A is multiplied on the left by diag(r); if equed =

'N' or 'C', r is not accessed.
If equed = 'R' or 'B', each element of r must be positive.

If equed = 'C' or 'B', A is multiplied on the right by diag(c); if

equed = 'N' or 'R', c is not accessed.
If equed = 'C' or 'B', each element of c must be positive.

Each element of r or c should be a power of the radix to ensure a

reliable solution and error estimates. Scaling by powers of the radix
does not cause rounding errors unless the result underflows or

587
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

overflows. Rounding errors during scaling lead to refining with a

matrix that is not equivalent to the input matrix, producing error
estimates that may not be reliable.

ldb The leading dimension of the array b; ldb≥ max(1, n) for column
major layout and ldb≥nrhs for row major layout.

x Array, of size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout.
The solution matrix X as computed by ?getrs

ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.

n_err_bnds Number of error bounds to return for each right hand side and each
type (normwise or componentwise). See err_bnds_norm and
err_bnds_comp descriptions in Output Arguments section below.

nparams Specifies the number of parameters set in params. If ≤ 0, the params

array is never referenced and default values are used.

params Array, size nparams. Specifies algorithm parameters. If an entry is

less than 0.0, that entry is filled with the default value used for that
parameter. Only positions up to nparams are accessed; defaults are
used for higher-numbered parameters. If defaults are acceptable, you
can pass nparams = 0, which prevents the source code from
accessing the params argument.

params[0] : Whether to perform iterative refinement or not. Default:

1.0

=0.0 No refinement is performed and no error

bounds are computed.

=1.0 Use the double-precision refinement

algorithm, possibly with doubled-single
computations if the compilation environment
does not support double precision.

(Other values are reserved for future use.)

params[1] : Maximum number of residual computations allowed for
refinement.

Default 10.0

Aggressive Set to 100.0 to permit convergence using

approximate factorizations or factorizations
other than LU. If the factorization uses a
technique other than Gaussian elimination,
the guarantees in err_bnds_norm and
err_bnds_comp may no longer be
trustworthy.

588
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
params[2] : Flag determining if the code will attempt to find a
solution with a small componentwise relative error in the double-
precision algorithm. Positive is true, 0.0 is false. Default: 1.0 (attempt
componentwise convergence).

Output Parameters

x The improved solution matrix X.

rcond Reciprocal scaled condition number. An estimate of the reciprocal

Skeel condition number of the matrix A after equilibration (if done). If
rcond is less than the machine precision, in particular, if rcond = 0,
the matrix is singular to working precision. Note that the error may
still be small even if this number is very small and the matrix appears
ill-conditioned.

berr Array, size at least max(1, nrhs). Contains the componentwise

relative backward error for each solution vector xj, that is, the
smallest relative change in any element of A or B that makes xj an
exact solution.

err_bnds_norm Array of size nrhs*n_err_bnds. For each right-hand side, contains

information about various error bounds and condition numbers
corresponding to the normwise relative error, which is defined as
follows:
Normwise relative error in the i-th solution vector

The array is indexed by the type of error information as described

below. There are currently up to three pieces of information returned.

err=1 "Trust/don't trust" boolean. Trust the answer

if the reciprocal condition number is less
than the threshold sqrt(n)*slamch(ε) for
single precision flavors and
sqrt(n)*dlamch(ε) for double precision
flavors.

err=2 "Guaranteed" error bound. The estimated

forward error, almost certainly within a factor
of 10 of the true error so long as the next
entry is greater than the threshold
sqrt(n)*slamch(ε) for single precision
flavors and sqrt(n)*dlamch(ε) for double
precision flavors. This error bound should
only be trusted if the previous boolean is
true.

589
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

err=3 Reciprocal condition number. Estimated

normwise reciprocal condition number.
Compared with the threshold
sqrt(n)*slamch(ε) for single precision
flavors and sqrt(n)*dlamch(ε) for double
precision flavors to determine if the error
estimate is "guaranteed". These reciprocal
condition numbers for some appropriately
scaled matrix Z are:

Let z=s*a, where s scales each row by a

power of the radix so all absolute row sums
of z are approximately 1.

The information for right-hand side i, where 1 ≤i≤nrhs, and type of

error err is stored in:

• Column major layout: err_bnds_norm[(err - 1)*nrhs + i -

1].
• Row major layout: err_bnds_norm[err - 1 + (i -
1)*n_err_bnds]

err_bnds_comp Array of size nrhs*n_err_bnds. For each right-hand side, contains

information about various error bounds and condition numbers
corresponding to the componentwise relative error, which is defined as
follows:
Componentwise relative error in the i-th solution vector:

The array is indexed by the type of error information as described

below. There are currently up to three pieces of information returned
for each right-hand side. If componentwise accuracy is not requested
(params[2] = 0.0), then err_bnds_comp is not accessed. If
n_err_bnds < 3, then at most the first n_err_bnds columns of the
err_bnds_comp array are returned.

err=1 "Trust/don't trust" boolean. Trust the answer

if the reciprocal condition number is less
than the threshold sqrt(n)*slamch(ε) for
single precision flavors and
sqrt(n)*dlamch(ε) for double precision
flavors.

err=2 "Guaranteed" error bound. The estimated

590
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
precision flavors. This error bound should
only be trusted if the previous boolean is
true.

err=3 Reciprocal condition number. Estimated

componentwise reciprocal condition number.
Compared with the threshold
sqrt(n)*slamch(ε) for single precision
flavors and sqrt(n)*dlamch(ε) for double
precision flavors to determine if the error
estimate is "guaranteed". These reciprocal
condition numbers for some appropriately
scaled matrix Z are:

Let z=s(adiag(x)), where x is the

solution for the current right-hand side and s
scales each row of a*diag(x) by a power of
the radix so all absolute row sums of z are
approximately 1.

The information for right-hand side i, where 1 ≤i≤nrhs, and type of

error err is stored in:

• Column major layout: err_bnds_comp[(err - 1)*nrhs + i -

1].
• Row major layout: err_bnds_comp[err - 1 + (i -
1)*n_err_bnds]

params Output parameter only if the input contains erroneous values, namely,
in params[0], params[1], params[2]. In such a case, the
corresponding elements of params are filled with default values on
output.

Return Values
This function returns a value info.

If info = 0, the execution is successful. The solution to every right-hand side is guaranteed.

If info = -i, parameter i had an illegal value.

If 0 < info≤n: Uinfo,info is exactly zero. The factorization has been completed, but the factor U is exactly
singular, so the solution and error bounds could not be computed; rcond = 0 is returned.

If info = n+j: The solution corresponding to the j-th right-hand side is not guaranteed. The solutions
corresponding to other right-hand sides k with k > j may not be guaranteed as well, but only the first such
right-hand side is reported. If a small componentwise error is not requested params[2] = 0.0, then the j-th
right-hand side is the first with a normwise error bound that is not guaranteed (the smallest j such that for
column major layout err_bnds_norm[j - 1] = 0.0 or err_bnds_comp[j - 1] = 0.0; or for row major
layout err_bnds_norm[(j - 1)*n_err_bnds] = 0.0 or err_bnds_comp[(j - 1)*n_err_bnds] = 0.0).
See the definition of err_bnds_norm and err_bnds_comp for err = 1. To get information about all of the
right-hand sides, check err_bnds_norm or err_bnds_comp.

See Also
Matrix Storage Schemes

591
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

?gbrfs
Refines the solution of a system of linear equations
with a general band coefficient matrix and estimates
its error.

Syntax
lapack_int LAPACKE_sgbrfs( int matrix_layout, char trans, lapack_int n, lapack_int kl,
lapack_int ku, lapack_int nrhs, const float* ab, lapack_int ldab, const float* afb,
lapack_int ldafb, const lapack_int* ipiv, const float* b, lapack_int ldb, float* x,
lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_dgbrfs( int matrix_layout, char trans, lapack_int n, lapack_int kl,
lapack_int ku, lapack_int nrhs, const double* ab, lapack_int ldab, const double* afb,
lapack_int ldafb, const lapack_int* ipiv, const double* b, lapack_int ldb, double* x,
lapack_int ldx, double* ferr, double* berr );
lapack_int LAPACKE_cgbrfs( int matrix_layout, char trans, lapack_int n, lapack_int kl,
lapack_int ku, lapack_int nrhs, const lapack_complex_float* ab, lapack_int ldab, const
lapack_complex_float* afb, lapack_int ldafb, const lapack_int* ipiv, const
lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x, lapack_int ldx,
float* ferr, float* berr );
lapack_int LAPACKE_zgbrfs( int matrix_layout, char trans, lapack_int n, lapack_int kl,
lapack_int ku, lapack_int nrhs, const lapack_complex_double* ab, lapack_int ldab, const
lapack_complex_double* afb, lapack_int ldafb, const lapack_int* ipiv, const
lapack_complex_double* b, lapack_int ldb, lapack_complex_double* x, lapack_int ldx,
double* ferr, double* berr );

Include Files
• mkl.h

Description

The routine performs an iterative refinement of the solution to a system of linear equations A*X = B or AT*X
= B or AH*X = B with a band matrix A, with multiple right-hand sides. For each computed solution vector x,
the routine computes the component-wise backward errorβ. This error is the smallest relative perturbation
in elements of A and b such that x is the exact solution of the perturbed system:
|δaij| ≤β|aij|, |δbi| ≤β|bi| such that (A + δA)x = (b + δb).
Finally, the routine estimates the component-wise forward error in the computed solution ||x - xe||∞/||
x||∞ (here xe is the exact solution).
Before calling this routine:

• call the factorization routine ?gbtrf

• call the solver routine ?gbtrs.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

trans Must be 'N' or 'T' or 'C'.

Indicates the form of the equations:

592
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If trans = 'N', the system has the form A*X = B.

If trans = 'T', the system has the form AT*X = B.

If trans = 'C', the system has the form AH*X = B.

n The order of the matrix A; n≥ 0.

kl The number of sub-diagonals within the band of A; kl≥ 0.

ku The number of super-diagonals within the band of A; ku≥ 0.

nrhs The number of right-hand sides; nrhs≥ 0.

ab,afb,b,x Arrays:
ab(size max(1, ldab*n)) contains the original band matrix A, as
supplied to ?gbtrf, but stored in rows from 1 to kl + ku + 1 for
column major layout, and columns from 1 to kl + ku + 1 for row
major layout.
afb(size max(1, ldafb*n)) contains the factored band matrix A, as
returned by ?gbtrf.

bof size max(1, ldb*nrhs) for column major layout and max(1,
ldb*n) for row major layout contains the right-hand side matrix B.
xof size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout contains the solution matrix X.

ldab The leading dimension of ab, ldab≥kl + ku + 1.

ldafb The leading dimension of afb, ldafb≥ 2*kl + ku + 1.

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

ldx The leading dimension of x; ldx≥ max(1, n).

ipiv Array, size at least max(1, n). The ipiv array, as returned by ?gbtrf.

Output Parameters

x The refined solution matrix X.

ferr, berr Arrays, size at least max(1, nrhs). Contain the component-wise
forward and backward errors, respectively, for each solution vector.

Return Values
This function returns a value info.

If info =0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
The bounds returned in ferr are not rigorous, but in practice they almost always overestimate the actual
error.

593
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

For each right-hand side, computation of the backward error involves a minimum of 4n(kl + ku) floating-
point operations (for real flavors) or 16n(kl + ku) operations (for complex flavors). In addition, each step of
iterative refinement involves 2n(4kl + 3ku) operations (for real flavors) or 8n(4kl + 3ku) operations (for
complex flavors); the number of iterations may range from 1 to 5. Estimating the forward error involves
solving a number of systems of linear equations A*x = b; the number is usually 4 or 5 and never more than
11. Each solution requires approximately 2n2 floating-point operations for real flavors or 8n2 for complex
flavors.

See Also
Matrix Storage Schemes

?gbrfsx
Uses extra precise iterative refinement to improve the
solution to the system of linear equations with a
banded coefficient matrix A and provides error bounds
and backward error estimates.

Syntax
lapack_int LAPACKE_sgbrfsx( int matrix_layout, char trans, char equed, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, const float* ab, lapack_int ldab, const
float* afb, lapack_int ldafb, const lapack_int* ipiv, const float* r, const float* c,
const float* b, lapack_int ldb, float* x, lapack_int ldx, float* rcond, float* berr,
lapack_int n_err_bnds, float* err_bnds_norm, float* err_bnds_comp, lapack_int nparams,
float* params );
lapack_int LAPACKE_dgbrfsx( int matrix_layout, char trans, char equed, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, const double* ab, lapack_int ldab, const
double* afb, lapack_int ldafb, const lapack_int* ipiv, const double* r, const double*
c, const double* b, lapack_int ldb, double* x, lapack_int ldx, double* rcond, double*
berr, lapack_int n_err_bnds, double* err_bnds_norm, double* err_bnds_comp, lapack_int
nparams, double* params );
lapack_int LAPACKE_cgbrfsx( int matrix_layout, char trans, char equed, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, const lapack_complex_float* ab,
lapack_int ldab, const lapack_complex_float* afb, lapack_int ldafb, const lapack_int*
ipiv, const float* r, const float* c, const lapack_complex_float* b, lapack_int ldb,
lapack_complex_float* x, lapack_int ldx, float* rcond, float* berr, lapack_int
n_err_bnds, float* err_bnds_norm, float* err_bnds_comp, lapack_int nparams, float*
params );
lapack_int LAPACKE_zgbrfsx( int matrix_layout, char trans, char equed, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, const lapack_complex_double* ab,
lapack_int ldab, const lapack_complex_double* afb, lapack_int ldafb, const lapack_int*
ipiv, const double* r, const double* c, const lapack_complex_double* b, lapack_int ldb,
lapack_complex_double* x, lapack_int ldx, double* rcond, double* berr, lapack_int
n_err_bnds, double* err_bnds_norm, double* err_bnds_comp, lapack_int nparams, double*
params );

Include Files
• mkl.h

594
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
The routine improves the computed solution to a system of linear equations and provides error bounds and
backward error estimates for the solution. In addition to a normwise error bound, the code provides a
maximum componentwise error bound, if possible. See comments for err_bnds_norm and err_bnds_comp
for details of the error bounds.
The original system of linear equations may have been equilibrated before calling this routine, as described
by the parameters equed, r, and c below. In this case, the solution and error bounds returned are for the
original unequilibrated system.

Input Parameters

matrix_layout Specifies whether two-dimensional array storage is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

trans Must be 'N', 'T', or 'C'.

Specifies the form of the system of equations:

If trans = 'N', the system has the form A*X = B (No transpose).

If trans = 'T', the system has the form AT*X = B (Transpose).

If trans = 'C', the system has the form AH*X = B (Conjugate transpose
for complex flavors, Transpose for real flavors).

equed Must be 'N', 'R', 'C', or 'B'.

Specifies the form of equilibration that was done to A before calling this
routine.
If equed = 'N', no equilibration was done.

If equed = 'R', row equilibration was done, that is, A has been
premultiplied by diag(r).
If equed = 'C', column equilibration was done, that is, A has been
postmultiplied by diag(c).
If equed = 'B', both row and column equilibration was done, that is, A has
been replaced by diag(r)*A*diag(c). The right-hand side B has been
changed accordingly.

n The number of linear equations; the order of the matrix A; n≥ 0.

kl The number of subdiagonals within the band of A; kl≥ 0.

ku The number of superdiagonals within the band of A; ku≥ 0.

nrhs The number of right-hand sides; the number of columns of the matrices B
and X; nrhs≥ 0.

ab, afb, b The array abof size max(1, ldab*n) contains the original matrix A in band
storage, in rows from 1 to kl+ku + 1 for column major layout, and in
columns from 1 to kl+ku + 1 for row major layout.

The array afbof size max(1, ldafb*n) contains details of the LU

factorization of the banded matrix A as computed by ?gbtrf.

595
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The array bof size max(1, ldb*nrhs) for column major layout and max(1,
ldb*n) for row major layout contains the matrix B whose columns are the
right-hand sides for the systems of equations.

ldab The leading dimension of the array ab; ldab≥kl+ku+1.

ldafb The leading dimension of the array afb; ldafb≥ 2*kl+ku+1.

ipiv Array, size at least max(1, n). Contains the pivot indices as computed
by ?gbtrf; for row 1 ≤i≤n, row i of the matrix was interchanged with row
ipiv[i-1].

r, c Arrays: r(n), c(n). The array r contains the row scale factors for A, and
the array c contains the column scale factors for A.

If equed = 'R' or 'B', A is multiplied on the left by diag(r); if equed =

'N' or 'C', r is not accessed.
If equed = 'R' or 'B', each element of r must be positive.

If equed = 'C' or 'B', A is multiplied on the right by diag(c); if equed =

'N' or 'R', c is not accessed.
If equed = 'C' or 'B', each element of c must be positive.

Each element of r or c should be a power of the radix to ensure a reliable

solution and error estimates. Scaling by powers of the radix does not cause
rounding errors unless the result underflows or overflows. Rounding errors
during scaling lead to refining with a matrix that is not equivalent to the
input matrix, producing error estimates that may not be reliable.

ldb The leading dimension of the array b; ldb≥ max(1, n) for column major
layout and ldb≥nrhs for row major layout.

x Array, size max(1, ldx*nrhs) for column major layout and max(1, ldx*n)
for row major layout.
The solution matrix X as computed by sgbtrs/dgbtrs for real flavors or
cgbtrs/zgbtrs for complex flavors.

ldx The leading dimension of the output array x; ldx≥ max(1, n) for column
major layout and ldx≥nrhs for row major layout.

n_err_bnds Number of error bounds to return for each right-hand side and each type
(normwise or componentwise). See err_bnds_norm and err_bnds_comp
descriptions in Output Arguments section below.

nparams Specifies the number of parameters set in params. If ≤ 0, the params array
is never referenced and default values are used.

params Array, size nparams. Specifies algorithm parameters. If an entry is less than
0.0, that entry will be filled with the default value used for that parameter.
Only positions up to nparams are accessed; defaults are used for higher-
numbered parameters. If defaults are acceptable, you can pass nparams =
0, which prevents the source code from accessing the params argument.

params[0] : Whether to perform iterative refinement or not. Default: 1.0

(for single precision flavors), 1.0D+0 (for double precision flavors).

596
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
=0.0 No refinement is performed and no error bounds
are computed.

=1.0 Use the double-precision refinement algorithm,

possibly with doubled-single computations if the
compilation environment does not support
double precision.

(Other values are reserved for future use.)

params[1] : Maximum number of residual computations allowed for
refinement.

Default 10.0

Aggressive Set to 100.0 to permit convergence using

approximate factorizations or factorizations
other than LU. If the factorization uses a
technique other than Gaussian elimination, the
guarantees in err_bnds_norm and
err_bnds_comp may no longer be trustworthy.

params[2] : Flag determining if the code will attempt to find a solution

with a small componentwise relative error in the double-precision algorithm.
Positive is true, 0.0 is false. Default: 1.0 (attempt componentwise
convergence).

Output Parameters

x The improved solution matrix X.

rcond Reciprocal scaled condition number. An estimate of the reciprocal Skeel

condition number of the matrix A after equilibration (if done). If rcond is
less than the machine precision, in particular, if rcond = 0, the matrix is
singular to working precision. Note that the error may still be small even if
this number is very small and the matrix appears ill-conditioned.

berr Array, size at least max(1, nrhs). Contains the componentwise relative
backward error for each solution vector xj, that is, the smallest relative
change in any element of A or B that makes xj an exact solution.

err_bnds_norm Array of size nrhs*n_err_bnds. For each right-hand side, contains

information about various error bounds and condition numbers
corresponding to the normwise relative error, which is defined as follows:
Normwise relative error in the i-th solution vector

The array is indexed by the type of error information as described below.

There are currently up to three pieces of information returned.

597
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

err=1 "Trust/don't trust" boolean. Trust the answer if

the reciprocal condition number is less than the
threshold sqrt(n)*slamch(ε) for single
precision flavors and sqrt(n)*dlamch(ε) for
double precision flavors.

err=2 "Guaranteed" error bound. The estimated

forward error, almost certainly within a factor of
10 of the true error so long as the next entry is
greater than the threshold sqrt(n)*slamch(ε)
for single precision flavors and
sqrt(n)*dlamch(ε) for double precision
flavors. This error bound should only be trusted
if the previous boolean is true.

err=3 Reciprocal condition number. Estimated

normwise reciprocal condition number.
Compared with the threshold
sqrt(n)*slamch(ε) for single precision flavors
and sqrt(n)*dlamch(ε) for double precision
flavors to determine if the error estimate is
"guaranteed". These reciprocal condition
numbers for some appropriately scaled matrix Z
are

Let z=s*a, where s scales each row by a power

of the radix so all absolute row sums of z are
approximately 1.

The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in:

• Column major layout: err_bnds_norm[(err - 1)*nrhs + i - 1].

• Row major layout: err_bnds_norm[err - 1 + (i - 1)*n_err_bnds]

err_bnds_comp Array of size nrhs*n_err_bnds. For each right-hand side, contains

information about various error bounds and condition numbers
corresponding to the componentwise relative error, which is defined as
follows:
Componentwise relative error in the i-th solution vector:

The array is indexed by the type of error information as described below.

There are currently up to three pieces of information returned for each
right-hand side. If componentwise accuracy is not requested (params[2] =
0.0), then err_bnds_comp is not accessed.

598
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
err=1 "Trust/don't trust" boolean. Trust the answer if
the reciprocal condition number is less than the
threshold sqrt(n)*slamch(ε) for single
precision flavors and sqrt(n)*dlamch(ε) for
double precision flavors.

err=2 "Guaranteed" error bpound. The estimated

forward error, almost certainly within a factor of
10 of the true error so long as the next entry is
greater than the threshold sqrt(n)*slamch(ε)
for single precision flavors and
sqrt(n)*dlamch(ε) for double precision
flavors. This error bound should only be trusted
if the previous boolean is true.

err=3 Reciprocal condition number. Estimated

componentwise reciprocal condition number.
Compared with the threshold
sqrt(n)*slamch(ε) for single precision flavors
and sqrt(n)*dlamch(ε) for double precision
flavors to determine if the error estimate is
"guaranteed". These reciprocal condition
numbers for some appropriately scaled matrix Z
are

Let z=s(adiag(x)), where x is the solution

for the current right-hand side and s scales each
row of a*diag(x) by a power of the radix so all
absolute row sums of z are approximately 1.

The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in:

• Column major layout: err_bnds_comp[(err - 1)*nrhs + i - 1].

• Row major layout: err_bnds_comp[err - 1 + (i - 1)*n_err_bnds]

params Output parameter only if the input contains erroneous values, namely, in
params[0], params[1], and params[2]. In such a case, the corresponding
elements of params are filled with default values on output.

Return Values
This function returns a value info.

If info = 0, the execution is successful. The solution to every right-hand side is guaranteed.

If info = -i, parameter i had an illegal value.

If 0 < info≤n: Uinfo,info is exactly zero. The factorization has been completed, but the factor U is exactly
singular, so the solution and error bounds could not be computed; rcond = 0 is returned.

599
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

right-hand side is the first with a normwise error bound that is not guaranteed (the smallest j such that for
column major layout err_bnds_norm[j - 1] = 0.0 or err_bnds_comp[j - 1] = 0.0; or for row major
layout err_bnds_norm[(j - 1)*n_err_bnds] = 0.0 or err_bnds_comp[(j - 1)*n_err_bnds] = 0.0).
See the definition of err_bnds_norm and err_bnds_comp for err = 1. To get information about all of the
right-hand sides, check err_bnds_norm or err_bnds_comp.

See Also
Matrix Storage Schemes

?gtrfs
Refines the solution of a system of linear equations
with a tridiagonal coefficient matrix and estimates its
error.

Syntax
lapack_int LAPACKE_sgtrfs( int matrix_layout, char trans, lapack_int n, lapack_int
nrhs, const float* dl, const float* d, const float* du, const float* dlf, const float*
df, const float* duf, const float* du2, const lapack_int* ipiv, const float* b,
lapack_int ldb, float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_dgtrfs( int matrix_layout, char trans, lapack_int n, lapack_int
nrhs, const double* dl, const double* d, const double* du, const double* dlf, const
double* df, const double* duf, const double* du2, const lapack_int* ipiv, const double*
b, lapack_int ldb, double* x, lapack_int ldx, double* ferr, double* berr );
lapack_int LAPACKE_cgtrfs( int matrix_layout, char trans, lapack_int n, lapack_int
nrhs, const lapack_complex_float* dl, const lapack_complex_float* d, const
lapack_complex_float* du, const lapack_complex_float* dlf, const lapack_complex_float*
df, const lapack_complex_float* duf, const lapack_complex_float* du2, const lapack_int*
ipiv, const lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x,
lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_zgtrfs( int matrix_layout, char trans, lapack_int n, lapack_int
nrhs, const lapack_complex_double* dl, const lapack_complex_double* d, const
lapack_complex_double* du, const lapack_complex_double* dlf, const
lapack_complex_double* df, const lapack_complex_double* duf, const
lapack_complex_double* du2, const lapack_int* ipiv, const lapack_complex_double* b,
lapack_int ldb, lapack_complex_double* x, lapack_int ldx, double* ferr, double* berr );

Include Files
• mkl.h

Description
The routine performs an iterative refinement of the solution to a system of linear equations A*X = B or AT*X
= B or AH*X = B with a tridiagonal matrix A, with multiple right-hand sides. For each computed solution
vector x, the routine computes the component-wise backward errorβ. This error is the smallest relative
perturbation in elements of A and b such that x is the exact solution of the perturbed system:
|δaij|/|aij| ≤β|aij|, |δbi|/|bi| ≤β|bi| such that (A + δA)x = (b + δb).
Finally, the routine estimates the component-wise forward error in the computed solution ||x - xe||∞/||
x||∞ (here xe is the exact solution).
Before calling this routine:

• call the factorization routine ?gttrf

600
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• call the solver routine ?gttrs.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

trans Must be 'N' or 'T' or 'C'.

Indicates the form of the equations:

If trans = 'N', the system has the form A*X = B.

If trans = 'T', the system has the form AT*X = B.

If trans = 'C', the system has the form AH*X = B.

n The order of the matrix A; n≥ 0.

nrhs The number of right-hand sides, that is, the number of columns of the
matrix B; nrhs≥ 0.

dl Array dl of size n -1 contains the subdiagonal elements of A.

d Array d of size n contains the diagonal elements of A.

du Array du of size n -1 contains the superdiagonal elements of A.

dlf Array dlf of size n -1 contains the (n - 1) multipliers that define the
matrix L from the LU factorization of A as computed by ?gttrf.

df Array df of size n contains the n diagonal elements of the upper

triangular matrix U from the LU factorization of A.

duf Array duf of size n -1 contains the (n - 1) elements of the first

superdiagonal of U.

du2 Array du2 of size n -2 contains the (n - 2) elements of the second

superdiagonal of U.

b Array b (size max(1,ldb*nrhs) for column major layout and max(1,

ldb*n) for row major layout) contains the right-hand side matrix B.

x Array x (size max(1,ldx*nrhs) for column major layout and max(1,

ldx*n) contains the solution matrix X, as computed by ?gttrs.

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

ldx The leading dimension of x; ldx≥ max(1, n) for column major layout
and ldx≥nrhs for row major layout.

ipiv Array, size at least max(1, n). The ipiv array, as returned by ?gttrf.

Output Parameters

x The refined solution matrix X.

ferr, berr Arrays, size at least max(1,nrhs). Contain the component-wise

forward and backward errors, respectively, for each solution vector.

601
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

See Also
Matrix Storage Schemes

?porfs
Refines the solution of a system of linear equations
with a symmetric (Hermitian) positive-definite
coefficient matrix and estimates its error.

Syntax
lapack_int LAPACKE_sporfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const float* a, lapack_int lda, const float* af, lapack_int ldaf, const float* b,
lapack_int ldb, float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_dporfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const double* a, lapack_int lda, const double* af, lapack_int ldaf, const double* b,
lapack_int ldb, double* x, lapack_int ldx, double* ferr, double* berr );
lapack_int LAPACKE_cporfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_float* a, lapack_int lda, const lapack_complex_float* af,
lapack_int ldaf, const lapack_complex_float* b, lapack_int ldb, lapack_complex_float*
x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_zporfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_double* a, lapack_int lda, const lapack_complex_double* af,
lapack_int ldaf, const lapack_complex_double* b, lapack_int ldb, lapack_complex_double*
x, lapack_int ldx, double* ferr, double* berr );

Include Files
• mkl.h

Description

The routine performs an iterative refinement of the solution to a system of linear equations A*X = B with a
symmetric (Hermitian) positive definite matrix A, with multiple right-hand sides. For each computed solution
vector x, the routine computes the component-wise backward errorβ. This error is the smallest relative
perturbation in elements of A and b such that x is the exact solution of the perturbed system:
|δaij| ≤β|aij|, |δbi| ≤β|bi| such that (A + δA)x = (b + δb).
Finally, the routine estimates the component-wise forward error in the computed solution ||x - xe||∞/||
x||∞ (here xe is the exact solution).
Before calling this routine:

• call the factorization routine ?potrf

• call the solver routine ?potrs.

602
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

If uplo = 'U', the upper triangle of A is stored.

If uplo = 'L', the lower triangle of A is stored.

n The order of the matrix A; n≥ 0.

nrhs The number of right-hand sides; nrhs≥ 0.

a Array a (size max(1, lda*n)) contains the original matrix A, as

supplied to ?potrf.

af Array af (size max(1, ldaf*n)) contains the factored matrix A, as

returned by ?potrf.

b Array bof size max(1, ldb*nrhs) for column major layout and max(1,
ldb*n) for row major layout contains the right-hand side matrix B.
The second dimension of b must be at least max(1, nrhs).

x Array x of size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout contains the solution matrix X.

lda The leading dimension of a; lda≥ max(1, n).

ldaf The leading dimension of af; ldaf≥ max(1, n).

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

ldx The leading dimension of x; ldx≥ max(1, n) for column major layout
and ldx≥nrhs for row major layout.

Output Parameters

x The refined solution matrix X.

ferr, berr Arrays, size at least max(1, nrhs). Contain the component-wise
forward and backward errors, respectively, for each solution vector.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
The bounds returned in ferr are not rigorous, but in practice they almost always overestimate the actual
error.

603
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

For each right-hand side, computation of the backward error involves a minimum of 4n2 floating-point
operations (for real flavors) or 16n2 operations (for complex flavors). In addition, each step of iterative
refinement involves 6n2 operations (for real flavors) or 24n2 operations (for complex flavors); the number of
iterations may range from 1 to 5. Estimating the forward error involves solving a number of systems of linear
equations A*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires
approximately 2n2 floating-point operations for real flavors or 8n2 for complex flavors.
See Also
Matrix Storage Schemes

?porfsx
Uses extra precise iterative refinement to improve the
solution to the system of linear equations with a
symmetric/Hermitian positive-definite coefficient
matrix A and provides error bounds and backward
error estimates.

Syntax
lapack_int LAPACKE_sporfsx( int matrix_layout, char uplo, char equed, lapack_int n,
lapack_int nrhs, const float* a, lapack_int lda, const float* af, lapack_int ldaf,
const float* s, const float* b, lapack_int ldb, float* x, lapack_int ldx, float* rcond,
float* berr, lapack_int n_err_bnds, float* err_bnds_norm, float* err_bnds_comp,
lapack_int nparams, float* params );
lapack_int LAPACKE_dporfsx( int matrix_layout, char uplo, char equed, lapack_int n,
lapack_int nrhs, const double* a, lapack_int lda, const double* af, lapack_int ldaf,
const double* s, const double* b, lapack_int ldb, double* x, lapack_int ldx, double*
rcond, double* berr, lapack_int n_err_bnds, double* err_bnds_norm, double*
err_bnds_comp, lapack_int nparams, double* params );
lapack_int LAPACKE_cporfsx( int matrix_layout, char uplo, char equed, lapack_int n,
lapack_int nrhs, const lapack_complex_float* a, lapack_int lda, const
lapack_complex_float* af, lapack_int ldaf, const float* s, const lapack_complex_float*
b, lapack_int ldb, lapack_complex_float* x, lapack_int ldx, float* rcond, float* berr,
lapack_int n_err_bnds, float* err_bnds_norm, float* err_bnds_comp, lapack_int nparams,
float* params );
lapack_int LAPACKE_zporfsx( int matrix_layout, char uplo, char equed, lapack_int n,
lapack_int nrhs, const lapack_complex_double* a, lapack_int lda, const
lapack_complex_double* af, lapack_int ldaf, const double* s, const
lapack_complex_double* b, lapack_int ldb, lapack_complex_double* x, lapack_int ldx,
double* rcond, double* berr, lapack_int n_err_bnds, double* err_bnds_norm, double*
err_bnds_comp, lapack_int nparams, double* params );

Include Files
• mkl.h

Description
The routine improves the computed solution to a system of linear equations and provides error bounds and
backward error estimates for the solution. In addition to a normwise error bound, the code provides a
maximum componentwise error bound, if possible. See comments for err_bnds_norm and err_bnds_comp
for details of the error bounds.
The original system of linear equations may have been equilibrated before calling this routine, as described
by the parameters equed and s below. In this case, the solution and error bounds returned are for the
original unequilibrated system.

604
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

matrix_layout Specifies whether two-dimensional array storage is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored:

If uplo = 'U', the upper triangle of A is stored.

If uplo = 'L', the lower triangle of A is stored.

equed Must be 'N' or 'Y'.

Specifies the form of equilibration that was done to A before calling this
routine.
If equed = 'N', no equilibration was done.

If equed = 'Y', both row and column equilibration was done, that is, A has
been replaced by diag(s)*A*diag(s). The right-hand side B has been
changed accordingly.

n The number of linear equations; the order of the matrix A; n≥ 0.

nrhs The number of right-hand sides; the number of columns of the matrices B
and X; nrhs≥ 0.

a The array a (size max(1, lda*n)) contains the symmetric/Hermitian matrix

A as specified by uplo. If uplo = 'U', the leading n-by-n upper triangular
part of a contains the upper triangular part of the matrix A and the strictly
lower triangular part of a is not referenced. If uplo = 'L', the leading n-
by-n lower triangular part of a contains the lower triangular part of the
matrix A and the strictly upper triangular part of a is not referenced.

af The array af (size max(1, ldaf*n)) contains the triangular factor L or U

from the Cholesky factorization A = UT*U or A = L*LT as computed by
spotrf for real flavors or dpotrf for complex flavors.

b The array b (size max(1, ldb*nrhs for column major layout and max(1,
ldb*n) for row major layout) contains the matrix B whose columns are the
right-hand sides for the systems of equations.

lda The leading dimension of a; lda≥ max(1, n).

ldaf The leading dimension of af; ldaf≥ max(1, n).

s Array of size n. The array s contains the scale factors for A.

If equed = 'N', s is not accessed.

If equed = 'Y', each element of s must be positive.

Each element of s should be a power of the radix to ensure a reliable

605
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ldb The leading dimension of the array b; ldb≥ max(1, n) for column major
layout and ldb≥nrhs for row major layout.

x Array, size max(1, ldx*nrhs) for column major layout and max(1, ldx*n)
for row major layout.
The solution matrix X as computed by ?potrs

ldx The leading dimension of the output array x; ldx≥ max(1, n) for column
major layout and ldx≥nrhs for row major layout.

n_err_bnds Number of error bounds to return for each right hand side and each type
(normwise or componentwise). See err_bnds_norm and err_bnds_comp
descriptions in Output Arguments section below.

nparams Specifies the number of parameters set in params. If ≤ 0, the params array
is never referenced and default values are used.

params[0] : Whether to perform iterative refinement or not. Default: 1.0

(for single precision flavors), 1.0D+0 (for double precision flavors).

=0.0 No refinement is performed and no error bounds

are computed.

=1.0 Use the double-precision refinement algorithm,

possibly with doubled-single computations if the
compilation environment does not support
double precision.

(Other values are reserved for future use.)

params[1] : Maximum number of residual computations allowed for
refinement.

Default 10.0

Aggressive Set to 100.0 to permit convergence using

approximate factorizations or factorizations
other than LU. If the factorization uses a
technique other than Gaussian elimination, the
guarantees in err_bnds_norm and
err_bnds_comp may no longer be trustworthy.

params[2] : Flag determining if the code will attempt to find a solution

with a small componentwise relative error in the double-precision algorithm.
Positive is true, 0.0 is false. Default: 1.0 (attempt componentwise
convergence).

Output Parameters

x The improved solution matrix X.

606
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
rcond Reciprocal scaled condition number. An estimate of the reciprocal Skeel
condition number of the matrix A after equilibration (if done). If rcond is
less than the machine precision, in particular, if rcond = 0, the matrix is
singular to working precision. Note that the error may still be small even if
this number is very small and the matrix appears ill-conditioned.

err_bnds_norm Array of size nrhs*n_err_bnds. For each right-hand side, contains

information about various error bounds and condition numbers
corresponding to the normwise relative error, which is defined as follows:
Normwise relative error in the i-th solution vector

The array is indexed by the type of error information as described below.

There are currently up to three pieces of information returned.

err=1 "Trust/don't trust" boolean. Trust the answer if

the reciprocal condition number is less than the
threshold sqrt(n)*slamch(ε) for single
precision flavors and sqrt(n)*dlamch(ε) for
double precision flavors.

err=2 "Guaranteed" error bound. The estimated

forward error, almost certainly within a factor of
10 of the true error so long as the next entry is
greater than the threshold sqrt(n)*slamch(ε)
for single precision flavors and
sqrt(n)*dlamch(ε) for double precision
flavors. This error bound should only be trusted
if the previous boolean is true.

err=3 Reciprocal condition number. Estimated

normwise reciprocal condition number.
Compared with the threshold
sqrt(n)*slamch(ε) for single precision flavors
and sqrt(n)*dlamch(ε) for double precision
flavors to determine if the error estimate is
"guaranteed". These reciprocal condition
numbers for some appropriately scaled matrix Z
are

Let z=s*a, where s scales each row by a power

of the radix so all absolute row sums of z are
approximately 1.

607
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in:

• Column major layout: err_bnds_norm[(err - 1)*nrhs + i - 1].

• Row major layout: err_bnds_norm[err - 1 + (i - 1)*n_err_bnds]

err_bnds_comp Array of size nrhs*n_err_bnds. For each right-hand side, contains

information about various error bounds and condition numbers
corresponding to the componentwise relative error, which is defined as
follows:
Componentwise relative error in the i-th solution vector:

The array is indexed by the type of error information as described below.

There are currently up to three pieces of information returned for each
right-hand side. If componentwise accuracy is not requested (params[2] =
0.0), then err_bnds_comp is not accessed.

err=1 "Trust/don't trust" boolean. Trust the answer if

the reciprocal condition number is less than the
threshold sqrt(n)*slamch(ε) for single
precision flavors and sqrt(n)*dlamch(ε) for
double precision flavors.

err=2 "Guaranteed" error bpound. The estimated

forward error, almost certainly within a factor of
10 of the true error so long as the next entry is
greater than the threshold sqrt(n)*slamch(ε)
for single precision flavors and
sqrt(n)*dlamch(ε) for double precision
flavors. This error bound should only be trusted
if the previous boolean is true.

err=3 Reciprocal condition number. Estimated

componentwise reciprocal condition number.
Compared with the threshold
sqrt(n)*slamch(ε) for single precision flavors
and sqrt(n)*dlamch(ε) for double precision
flavors to determine if the error estimate is
"guaranteed". These reciprocal condition
numbers for some appropriately scaled matrix Z
are

Let z=s(adiag(x)), where x is the solution

for the current right-hand side and s scales each
row of a*diag(x) by a power of the radix so all
absolute row sums of z are approximately 1.

The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in:

608
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• Column major layout: err_bnds_comp[(err - 1)*nrhs + i - 1].
• Row major layout: err_bnds_comp[err - 1 + (i - 1)*n_err_bnds]

params Output parameter only if the input contains erroneous values, namely in
params[0], params[1], or params[2]. In such a case, the corresponding
elements of params are filled with default values on output.

Return Values
This function returns a value info.

If info = 0, the execution is successful. The solution to every right-hand side is guaranteed.

If info = -i, parameter i had an illegal value.

If 0 < info≤n: Uinfo,info is exactly zero. The factorization has been completed, but the factor U is exactly
singular, so the solution and error bounds could not be computed; rcond = 0 is returned.

See Also
Matrix Storage Schemes

?pprfs
Refines the solution of a system of linear equations
with a symmetric (Hermitian) positive-definite
coefficient matrix stored in a packed format and
estimates its error.

Syntax
lapack_int LAPACKE_spprfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const float* ap, const float* afp, const float* b, lapack_int ldb, float* x, lapack_int
ldx, float* ferr, float* berr );
lapack_int LAPACKE_dpprfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const double* ap, const double* afp, const double* b, lapack_int ldb, double* x,
lapack_int ldx, double* ferr, double* berr );
lapack_int LAPACKE_cpprfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_float* ap, const lapack_complex_float* afp, const
lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x, lapack_int ldx,
float* ferr, float* berr );
lapack_int LAPACKE_zpprfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_double* ap, const lapack_complex_double* afp, const
lapack_complex_double* b, lapack_int ldb, lapack_complex_double* x, lapack_int ldx,
double* ferr, double* berr );

609
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description
The routine performs an iterative refinement of the solution to a system of linear equations A*X = B with a
symmetric (Hermitian) positive definite matrix A, with multiple right-hand sides. For each computed solution
vector x, the routine computes the component-wise backward errorβ. This error is the smallest relative
perturbation in elements of A and b such that x is the exact solution of the perturbed system:
|δaij| ≤β|aij|, |δbi| ≤β|bi| such that (A + δA)x = (b + δb).
Finally, the routine estimates the component-wise forward error in the computed solution
||x - xe||∞/||x||∞
where xe is the exact solution.

Before calling this routine:

• call the factorization routine ?pptrf

• call the solver routine ?pptrs.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', the upper triangle of A is stored.

If uplo = 'L', the lower triangle of A is stored.

n The order of the matrix A; n≥ 0.

nrhs The number of right-hand sides; nrhs≥ 0.

ap ap contains the original matrix A in a packed format, as supplied

to ?pptrf. The dimension of ap must be at least max(1,n(n+1)/2).

afp afp contains the factored matrix A in a packed format, as returned

by ?pptrf. The dimension of afp must be at least max(1,n(n+1)/2).

b Array b of size max(1, ldb*nrhs) for column major layout and

max(1, ldb*n) for row major layout contains the right-hand side
matrix B.

x Array x of size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout contains the solution matrix X.

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

ldx The leading dimension of x; ldx≥ max(1, n) for column major layout
and ldx≥nrhs for row major layout.

610
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

x The refined solution matrix X.

ferr, berr Arrays, size at least max(1, nrhs). Contain the component-wise
forward and backward errors, respectively, for each solution vector.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
The bounds returned in ferr are not rigorous, but in practice they almost always overestimate the actual
error.
For each right-hand side, computation of the backward error involves a minimum of 4n2 floating-point
operations (for real flavors) or 16n2 operations (for complex flavors). In addition, each step of iterative
refinement involves 6n2 operations (for real flavors) or 24n2 operations (for complex flavors); the number of
iterations may range from 1 to 5.
Estimating the forward error involves solving a number of systems of linear equations A*x = b; the number
of systems is usually 4 or 5 and never more than 11. Each solution requires approximately 2n2 floating-point
operations for real flavors or 8n2 for complex flavors.

See Also
Matrix Storage Schemes

?pbrfs
Refines the solution of a system of linear equations
with a band symmetric (Hermitian) positive-definite
coefficient matrix and estimates its error.

Syntax
lapack_int LAPACKE_spbrfs( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
lapack_int nrhs, const float* ab, lapack_int ldab, const float* afb, lapack_int ldafb,
const float* b, lapack_int ldb, float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_dpbrfs( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
lapack_int nrhs, const double* ab, lapack_int ldab, const double* afb, lapack_int
ldafb, const double* b, lapack_int ldb, double* x, lapack_int ldx, double* ferr, double*
berr );
lapack_int LAPACKE_cpbrfs( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
lapack_int nrhs, const lapack_complex_float* ab, lapack_int ldab, const
lapack_complex_float* afb, lapack_int ldafb, const lapack_complex_float* b, lapack_int
ldb, lapack_complex_float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_zpbrfs( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
lapack_int nrhs, const lapack_complex_double* ab, lapack_int ldab, const
lapack_complex_double* afb, lapack_int ldafb, const lapack_complex_double* b,
lapack_int ldb, lapack_complex_double* x, lapack_int ldx, double* ferr, double* berr );

611
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description

The routine performs an iterative refinement of the solution to a system of linear equations A*X = B with a
symmetric (Hermitian) positive definite band matrix A, with multiple right-hand sides. For each computed
solution vector x, the routine computes the component-wise backward errorβ. This error is the smallest
relative perturbation in elements of A and b such that x is the exact solution of the perturbed system:
|δaij| ≤β|aij|, |δbi| ≤β|bi| such that (A + δA)x = (b + δb).
Finally, the routine estimates the component-wise forward error in the computed solution ||x - xe||∞/||
x||∞ (here xe is the exact solution).
Before calling this routine:

• call the factorization routine ?pbtrf

• call the solver routine ?pbtrs.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', the upper triangle of A is stored.

If uplo = 'L', the lower triangle of A is stored.

n The order of the matrix A; n≥ 0.

kd The number of superdiagonals or subdiagonals in the matrix A; kd≥ 0.

nrhs The number of right-hand sides; nrhs≥ 0.

ab Array ab (size max(ldab*n)) contains the original band matrix A, as

supplied to ?pbtrf.

afb Array afb (size max(ldafb*n)) contains the factored band matrix A,
as returned by ?pbtrf.

b Array b of size max(1, ldb*nrhs) for column major layout and

max(1, ldb*n) for row major layout contains the right-hand side
matrix B.

x Array x of size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout contains the solution matrix X.

ldab The leading dimension of ab; ldab≥kd + 1.

ldafb The leading dimension of afb; ldafb≥kd + 1.

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

612
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ldx The leading dimension of x; ldx≥ max(1, n) for column major layout
and ldx≥nrhs for row major layout.

Output Parameters

x The refined solution matrix X.

ferr, berr Arrays, size at least max(1, nrhs). Contain the component-wise
forward and backward errors, respectively, for each solution vector.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
The bounds returned in ferr are not rigorous, but in practice they almost always overestimate the actual
error.
For each right-hand side, computation of the backward error involves a minimum of 8n*kd floating-point
operations (for real flavors) or 32n*kd operations (for complex flavors). In addition, each step of iterative
refinement involves 12n*kd operations (for real flavors) or 48n*kd operations (for complex flavors); the
number of iterations may range from 1 to 5.
Estimating the forward error involves solving a number of systems of linear equations A*x = b; the number
is usually 4 or 5 and never more than 11. Each solution requires approximately 4n*kd floating-point
operations for real flavors or 16n*kd for complex flavors.

See Also
Matrix Storage Schemes

?ptrfs
Refines the solution of a system of linear equations
with a symmetric (Hermitian) positive-definite
tridiagonal coefficient matrix and estimates its error.

Syntax
lapack_int LAPACKE_sptrfs( int matrix_layout, lapack_int n, lapack_int nrhs, const
float* d, const float* e, const float* df, const float* ef, const float* b, lapack_int
ldb, float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_dptrfs( int matrix_layout, lapack_int n, lapack_int nrhs, const
double* d, const double* e, const double* df, const double* ef, const double* b,
lapack_int ldb, double* x, lapack_int ldx, double* ferr, double* berr );
lapack_int LAPACKE_cptrfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const float* d, const lapack_complex_float* e, const float* df, const
lapack_complex_float* ef, const lapack_complex_float* b, lapack_int ldb,
lapack_complex_float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_zptrfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const double* d, const lapack_complex_double* e, const double* df, const
lapack_complex_double* ef, const lapack_complex_double* b, lapack_int ldb,
lapack_complex_double* x, lapack_int ldx, double* ferr, double* berr );

613
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description

The routine performs an iterative refinement of the solution to a system of linear equations A*X = B with a
symmetric (Hermitian) positive definite tridiagonal matrix A, with multiple right-hand sides. For each
computed solution vector x, the routine computes the component-wise backward errorβ. This error is the
smallest relative perturbation in elements of A and b such that x is the exact solution of the perturbed
system:
|δaij| ≤β|aij|, |δbi| ≤β|bi| such that (A + δA)x = (b + δb).
Finally, the routine estimates the component-wise forward error in the computed solution ||x - xe||∞/||
x||∞ (here xe is the exact solution).
Before calling this routine:

• call the factorization routine ?pttrf

• call the solver routine ?pttrs.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Used for complex flavors only. Must be 'U' or 'L'.

Specifies whether the superdiagonal or the subdiagonal of the

tridiagonal matrix A is stored and how A is factored:
If uplo = 'U', the array e stores the superdiagonal of A, and A is
factored as UH*D*U.

If uplo = 'L', the array e stores the subdiagonal of A, and A is

factored as L*D*LH.

n The order of the matrix A; n≥ 0.

nrhs The number of right-hand sides; nrhs≥ 0.

d The array d (size n) contains the n diagonal elements of the

tridiagonal matrix A.

df The array df (size n) contains the n diagonal elements of the diagonal

matrix D from the factorization of A as computed by ?pttrf.

e,ef,b,x The array e (size n -1) contains the (n - 1) off-diagonal elements of

the tridiagonal matrix A (see uplo).
The array ef (size n -1) contains the (n - 1) off-diagonal elements of
the unit bidiagonal factor U or L from the factorization computed
by ?pttrf (see uplo).

The array b of size max(1, ldb*nrhs) for column major layout and
max(1, ldb*n) for row major layout contains the matrix B whose
columns are the right-hand sides for the systems of equations.

614
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The array x of size max(1, ldx*nrhs) for column major layout and
max(1, ldx*n) for row major layout contains the solution matrix X as
computed by ?pttrs.

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

ldx The leading dimension of x; ldx≥ max(1, n) for column major layout
and ldx≥nrhs for row major layout.

Output Parameters

x The refined solution matrix X.

ferr, berr Arrays, size at least max(1, nrhs). Contain the component-wise
forward and backward errors, respectively, for each solution vector.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

See Also
Matrix Storage Schemes

?syrfs
Refines the solution of a system of linear equations
with a symmetric coefficient matrix and estimates its
error.

Syntax
lapack_int LAPACKE_ssyrfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const float* a, lapack_int lda, const float* af, lapack_int ldaf, const lapack_int*
ipiv, const float* b, lapack_int ldb, float* x, lapack_int ldx, float* ferr, float*
berr );
lapack_int LAPACKE_dsyrfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const double* a, lapack_int lda, const double* af, lapack_int ldaf, const lapack_int*
ipiv, const double* b, lapack_int ldb, double* x, lapack_int ldx, double* ferr, double*
berr );
lapack_int LAPACKE_csyrfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_float* a, lapack_int lda, const lapack_complex_float* af,
lapack_int ldaf, const lapack_int* ipiv, const lapack_complex_float* b, lapack_int ldb,
lapack_complex_float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_zsyrfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_double* a, lapack_int lda, const lapack_complex_double* af,
lapack_int ldaf, const lapack_int* ipiv, const lapack_complex_double* b, lapack_int
ldb, lapack_complex_double* x, lapack_int ldx, double* ferr, double* berr );

Include Files
• mkl.h

615
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Description

The routine performs an iterative refinement of the solution to a system of linear equations A*X = B with a
symmetric full-storage matrix A, with multiple right-hand sides. For each computed solution vector x, the
routine computes the component-wise backward errorβ. This error is the smallest relative perturbation in
elements of A and b such that x is the exact solution of the perturbed system:
|δaij| ≤β|aij|, |δbi| ≤β|bi| such that (A + δA)x = (b + δb).
Finally, the routine estimates the component-wise forward error in the computed solution ||x - xe||∞/||
x||∞ (here xe is the exact solution).
Before calling this routine:

• call the factorization routine ?sytrf

• call the solver routine ?sytrs.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

If uplo = 'U', the upper triangle of A is stored.

If uplo = 'L', the lower triangle of A is stored.

n The order of the matrix A; n≥ 0.

nrhs The number of right-hand sides; nrhs≥ 0.

a Array a(size max(1, lda*n)) contains the original matrix A, as

supplied to ?sytrf.

af Array af (size max(1, ldaf*n)) contains the factored matrix A, as

returned by ?sytrf.

b Array b of size max(1, ldb*nrhs) for column major layout and

max(1, ldb*n) for row major layout contains the right-hand side
matrix B.

x Array x of size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout contains the solution matrix X.

lda The leading dimension of a; lda≥ max(1, n).

ldaf The leading dimension of af; ldaf≥ max(1, n).

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

ldx The leading dimension of x; ldx≥ max(1, n) for column major layout
and ldx≥nrhs for row major layout.

ipiv Array, size at least max(1, n). The ipiv array, as returned by ?sytrf.

616
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

x The refined solution matrix X.

ferr, berr Arrays, size at least max(1, nrhs). Contain the component-wise
forward and backward errors, respectively, for each solution vector.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
The bounds returned in ferr are not rigorous, but in practice they almost always overestimate the actual
error.
For each right-hand side, computation of the backward error involves a minimum of 4n2 floating-point
operations (for real flavors) or 16n2 operations (for complex flavors). In addition, each step of iterative
refinement involves 6n2 operations (for real flavors) or 24n2 operations (for complex flavors); the number of
iterations may range from 1 to 5. Estimating the forward error involves solving a number of systems of linear
equations A*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires
approximately 2n2 floating-point operations for real flavors or 8n2 for complex flavors.
See Also
Matrix Storage Schemes

?syrfsx
Uses extra precise iterative refinement to improve the
solution to the system of linear equations with a
symmetric indefinite coefficient matrix A and provides
error bounds and backward error estimates.

Syntax
lapack_int LAPACKE_ssyrfsx( int matrix_layout, char uplo, char equed, lapack_int n,
lapack_int nrhs, const float* a, lapack_int lda, const float* af, lapack_int ldaf,
const lapack_int* ipiv, const float* s, const float* b, lapack_int ldb, float* x,
lapack_int ldx, float* rcond, float* berr, lapack_int n_err_bnds, float* err_bnds_norm,
float* err_bnds_comp, lapack_int nparams, float* params );
lapack_int LAPACKE_dsyrfsx( int matrix_layout, char uplo, char equed, lapack_int n,
lapack_int nrhs, const double* a, lapack_int lda, const double* af, lapack_int ldaf,
const lapack_int* ipiv, const double* s, const double* b, lapack_int ldb, double* x,
lapack_int ldx, double* rcond, double* berr, lapack_int n_err_bnds, double*
err_bnds_norm, double* err_bnds_comp, lapack_int nparams, double* params );
lapack_int LAPACKE_csyrfsx( int matrix_layout, char uplo, char equed, lapack_int n,
lapack_int nrhs, const lapack_complex_float* a, lapack_int lda, const
lapack_complex_float* af, lapack_int ldaf, const lapack_int* ipiv, const float* s,
const lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x, lapack_int ldx,
float* rcond, float* berr, lapack_int n_err_bnds, float* err_bnds_norm, float*
err_bnds_comp, lapack_int nparams, float* params );
lapack_int LAPACKE_zsyrfsx( int matrix_layout, char uplo, char equed, lapack_int n,
lapack_int nrhs, const lapack_complex_double* a, lapack_int lda, const
lapack_complex_double* af, lapack_int ldaf, const lapack_int* ipiv, const double* s,

617
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

const lapack_complex_double* b, lapack_int ldb, lapack_complex_double* x, lapack_int

ldx, double* rcond, double* berr, lapack_int n_err_bnds, double* err_bnds_norm, double*
err_bnds_comp, lapack_int nparams, double* params );

Include Files
• mkl.h

Description

The routine improves the computed solution to a system of linear equations when the coefficient matrix is
symmetric indefinite, and provides error bounds and backward error estimates for the solution. In addition to
a normwise error bound, the code provides a maximum componentwise error bound, if possible. See
comments for err_bnds_norm and err_bnds_comp for details of the error bounds.

The original system of linear equations may have been equilibrated before calling this routine, as described
by the parameters equed and s below. In this case, the solution and error bounds returned are for the
original unequilibrated system.

Input Parameters

matrix_layout Specifies whether two-dimensional array storage is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored:

If uplo = 'U', the upper triangle of A is stored.

If uplo = 'L', the lower triangle of A is stored.

equed Must be 'N' or 'Y'.

Specifies the form of equilibration that was done to A before calling this
routine.
If equed = 'N', no equilibration was done.

If equed = 'Y', both row and column equilibration was done, that is, A has
been replaced by diag(s)*A*diag(s). The right-hand side B has been
changed accordingly.

n The number of linear equations; the order of the matrix A; n≥ 0.

nrhs The number of right-hand sides; the number of columns of the matrices B
and X; nrhs≥ 0.

a, af, b The array a (size max(1, lda*n)) contains the symmetric/Hermitian matrix
A as specified by uplo. If uplo = 'U', the leading n-by-n upper triangular
part of a contains the upper triangular part of the matrix A and the strictly
lower triangular part of a is not referenced. If uplo = 'L', the leading n-
by-n lower triangular part of a contains the lower triangular part of the
matrix A and the strictly upper triangular part of a is not referenced.

The array af (size max(1, ldaf*n)) contains the triangular factor L or U

from the Cholesky factorization A = UT*U or A = L*LT as computed by
ssytrf for real flavors or dsytrf for complex flavors.

618
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The array b (size max(1, ldb*nrhs) for column major layout and max(1,
ldb*n) for row major layout) contains the matrix B whose columns are the
right-hand sides for the systems of equations.

lda The leading dimension of a; lda≥ max(1, n).

ldaf The leading dimension of af; ldaf≥ max(1, n).

ipiv Array, size at least max(1, n). Contains details of the interchanges and the
block structure of D as determined by ssytrf for real flavors or dsytrf for
complex flavors.

s Array, size (n). The array s contains the scale factors for A.

If equed = 'N', s is not accessed.

If equed = 'Y', each element of s must be positive.

Each element of s should be a power of the radix to ensure a reliable

ldb The leading dimension of the array b; ldb≥ max(1, n) for column major
layout and ldb≥nrhs for row major layout.

x Array, of size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout.
The solution matrix X as computed by ?sytrs

ldx The leading dimension of the output array x; ldx≥ max(1, n) for column
major layout and ldx≥nrhs for row major layout.

n_err_bnds Number of error bounds to return for each right hand side and each type
(normwise or componentwise). See err_bnds_norm and err_bnds_comp
descriptions in Output Arguments section below.

nparams Specifies the number of parameters set in params. If ≤ 0, the params array
is never referenced and default values are used.

params[0] : Whether to perform iterative refinement or not. Default: 1.0

(for single precision flavors), 1.0D+0 (for double precision flavors).

=0.0 No refinement is performed and no error bounds

are computed.

=1.0 Use the double-precision refinement algorithm,

possibly with doubled-single computations if the
compilation environment does not support
double precision.

619
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

(Other values are reserved for future use.)

params[1] : Maximum number of residual computations allowed for
refinement.

Default 10.0

Aggressive Set to 100.0 to permit convergence using

approximate factorizations or factorizations
other than LU. If the factorization uses a
technique other than Gaussian elimination, the
guarantees in err_bnds_norm and
err_bnds_comp may no longer be trustworthy.

params[2] : Flag determining if the code will attempt to find a solution

with a small componentwise relative error in the double-precision algorithm.
Positive is true, 0.0 is false. Default: 1.0 (attempt componentwise
convergence).

Output Parameters

x The improved solution matrix X.

rcond Reciprocal scaled condition number. An estimate of the reciprocal Skeel

err_bnds_norm Array of size nrhs*n_err_bnds. For each right-hand side, contains

information about various error bounds and condition numbers
corresponding to the normwise relative error, which is defined as follows:
Normwise relative error in the i-th solution vector

The array is indexed by the type of error information as described below.

There are currently up to three pieces of information returned.

err=1 "Trust/don't trust" boolean. Trust the answer if

the reciprocal condition number is less than the
threshold sqrt(n)*slamch(ε) for single
precision flavors and sqrt(n)*dlamch(ε) for
double precision flavors.

err=2 "Guaranteed" error bound. The estimated

forward error, almost certainly within a factor of
10 of the true error so long as the next entry is

620
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
greater than the threshold sqrt(n)*slamch(ε)
for single precision flavors and
sqrt(n)*dlamch(ε) for double precision
flavors. This error bound should only be trusted
if the previous boolean is true.

err=3 Reciprocal condition number. Estimated

normwise reciprocal condition number.
Compared with the threshold
sqrt(n)*slamch(ε) for single precision flavors
and sqrt(n)*dlamch(ε) for double precision
flavors to determine if the error estimate is
"guaranteed". These reciprocal condition
numbers for some appropriately scaled matrix Z
are:

The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in:

• Column major layout: err_bnds_norm[(err - 1)*nrhs + i - 1].

• Row major layout: err_bnds_norm[err - 1 + (i - 1)*n_err_bnds]

err_bnds_comp Array of size nrhs*n_err_bnds. For each right-hand side, contains

information about various error bounds and condition numbers
corresponding to the componentwise relative error, which is defined as
follows:
Componentwise relative error in the i-th solution vector:

The array is indexed by the type of error information as described below.

There are currently up to three pieces of information returned for each
right-hand side. If componentwise accuracy is not requested (params[2] =
0.0), then err_bnds_comp is not accessed.

err=1 "Trust/don't trust" boolean. Trust the answer if

the reciprocal condition number is less than the
threshold sqrt(n)*slamch(ε) for single
precision flavors and sqrt(n)*dlamch(ε) for
double precision flavors.

err=2 "Guaranteed" error bpound. The estimated

forward error, almost certainly within a factor of
10 of the true error so long as the next entry is
greater than the threshold sqrt(n)*slamch(ε)
for single precision flavors and
sqrt(n)*dlamch(ε) for double precision
flavors. This error bound should only be trusted
if the previous boolean is true.

621
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

err=3 Reciprocal condition number. Estimated

componentwise reciprocal condition number.
Compared with the threshold
sqrt(n)*slamch(ε) for single precision flavors
and sqrt(n)*dlamch(ε) for double precision
flavors to determine if the error estimate is
"guaranteed". These reciprocal condition
numbers for some appropriately scaled matrix Z
are:

Let z=s(adiag(x)), where x is the solution

for the current right-hand side and s scales each
row of a*diag(x) by a power of the radix so all
absolute row sums of z are approximately 1.

The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in:

• Column major layout: err_bnds_comp[(err - 1)*nrhs + i - 1].

• Row major layout: err_bnds_comp[err - 1 + (i - 1)*n_err_bnds]

params Output parameter only if the input contains erroneous values, namely, in
params[0], params[1], params[2]. In such a case, the corresponding
elements of params are filled with default values on output.

Return Values
This function returns a value info.

If info = 0, the execution is successful. The solution to every right-hand side is guaranteed.

If info = -i, parameter i had an illegal value.

If 0 < info≤n: Uinfo,info is exactly zero. The factorization has been completed, but the factor U is exactly
singular, so the solution and error bounds could not be computed; rcond = 0 is returned.

See Also
Matrix Storage Schemes

?herfs
Refines the solution of a system of linear equations
with a complex Hermitian coefficient matrix and
estimates its error.

622
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_cherfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_float* a, lapack_int lda, const lapack_complex_float* af,
lapack_int ldaf, const lapack_int* ipiv, const lapack_complex_float* b, lapack_int ldb,
lapack_complex_float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_zherfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_double* a, lapack_int lda, const lapack_complex_double* af,
lapack_int ldaf, const lapack_int* ipiv, const lapack_complex_double* b, lapack_int
ldb, lapack_complex_double* x, lapack_int ldx, double* ferr, double* berr );

Include Files
• mkl.h

Description

The routine performs an iterative refinement of the solution to a system of linear equations A*X = B with a
complex Hermitian full-storage matrix A, with multiple right-hand sides. For each computed solution vector x,
the routine computes the component-wise backward errorβ. This error is the smallest relative perturbation
in elements of A and b such that x is the exact solution of the perturbed system:
|δaij| ≤β|aij|, |δbi| ≤β|bi| such that (A + δA)x = (b + δb).
Finally, the routine estimates the component-wise forward error in the computed solution ||x - xe||∞/||
x||∞ (here xe is the exact solution).
Before calling this routine:

• call the factorization routine ?hetrf

• call the solver routine ?hetrs.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

If uplo = 'U', the upper triangle of A is stored.

If uplo = 'L', the lower triangle of A is stored.

n The order of the matrix A; n≥ 0.

nrhs The number of right-hand sides; nrhs≥ 0.

a,af,b,x Arrays:
a(size max(1, lda*n)) contains the original matrix A, as supplied
to ?hetrf.

af(size max(1, ldaf*n)) contains the factored matrix A, as returned

by ?hetrf.

bof size max(1, ldb*nrhs) for column major layout and max(1,
ldb*n) for row major layout contains the right-hand side matrix B.

623
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

xof size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout contains the solution matrix X.

lda The leading dimension of a; lda≥ max(1, n).

ldaf The leading dimension of af; ldaf≥ max(1, n).

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

ldx The leading dimension of x; ldx≥ max(1, n) for column major layout
and ldx≥nrhs for row major layout.

ipiv Array, size at least max(1, n). The ipiv array, as returned by ?hetrf.

Output Parameters

x The refined solution matrix X.

ferr, berr Arrays, size at least max(1, nrhs). Contain the component-wise
forward and backward errors, respectively, for each solution vector.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
The bounds returned in ferr are not rigorous, but in practice they almost always overestimate the actual
error.
For each right-hand side, computation of the backward error involves a minimum of 16n2 operations. In
addition, each step of iterative refinement involves 24n2 operations; the number of iterations may range
from 1 to 5.
Estimating the forward error involves solving a number of systems of linear equations A*x = b; the number
is usually 4 or 5 and never more than 11. Each solution requires approximately 8n2 floating-point operations.

The real counterpart of this routine is ?ssyrfs/?dsyrfs

See Also
Matrix Storage Schemes

?herfsx
Uses extra precise iterative refinement to improve the
solution to the system of linear equations with a
symmetric indefinite coefficient matrix A and provides
error bounds and backward error estimates.

Syntax
lapack_int LAPACKE_cherfsx( int matrix_layout, char uplo, char equed, lapack_int n,
lapack_int nrhs, const lapack_complex_float* a, lapack_int lda, const
lapack_complex_float* af, lapack_int ldaf, const lapack_int* ipiv, const float* s,

624
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
const lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x, lapack_int ldx,
float* rcond, float* berr, lapack_int n_err_bnds, float* err_bnds_norm, float*
err_bnds_comp, lapack_int nparams, float* params );
lapack_int LAPACKE_zherfsx( int matrix_layout, char uplo, char equed, lapack_int n,
lapack_int nrhs, const lapack_complex_double* a, lapack_int lda, const
lapack_complex_double* af, lapack_int ldaf, const lapack_int* ipiv, const double* s,
const lapack_complex_double* b, lapack_int ldb, lapack_complex_double* x, lapack_int
ldx, double* rcond, double* berr, lapack_int n_err_bnds, double* err_bnds_norm, double*
err_bnds_comp, lapack_int nparams, double* params );

Include Files
• mkl.h

Description

The routine improves the computed solution to a system of linear equations when the coefficient matrix is
Hermitian indefinite, and provides error bounds and backward error estimates for the solution. In addition to
a normwise error bound, the code provides a maximum componentwise error bound, if possible. See
comments for err_bnds_norm and err_bnds_comp for details of the error bounds.

Input Parameters

matrix_layout Specifies whether two-dimensional array storage is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored:

If uplo = 'U', the upper triangle of A is stored.

If uplo = 'L', the lower triangle of A is stored.

equed Must be 'N' or 'Y'.

Specifies the form of equilibration that was done to A before calling this
routine.
If equed = 'N', no equilibration was done.

If equed = 'Y', both row and column equilibration was done, that is, A has
been replaced by diag(s)*A*diag(s). The right-hand side B has been
changed accordingly.

n The number of linear equations; the order of the matrix A; n≥ 0.

nrhs The number of right-hand sides; the number of columns of the matrices B
and X; nrhs≥ 0.

a, af, b The array a of size max(1, lda*n) contains the Hermitian matrix A as
specified by uplo. If uplo = 'U', the leading n-by-n upper triangular part
of a contains the upper triangular part of the matrix A and the strictly lower

625
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

triangular part of a is not referenced. If uplo = 'L', the leading n-by-n

lower triangular part of a contains the lower triangular part of the matrix A
and the strictly upper triangular part of a is not referenced.

The array af of size max(1, ldaf*n) contains the block diagonal matrix D
and the multipliers used to obtain the factor U or L from the factorization A
= U*D*UT or A = L*D*LT as computed by ssytrf for cherfsx or dsytrf
for zherfsx.

The array b of size max(1, ldb*nrhs) for row major layout and max(1,
ldb*n) for column major layout contains the matrix B whose columns are
the right-hand sides for the systems of equations.

lda The leading dimension of a; lda≥ max(1, n).

ldaf The leading dimension of af; ldaf≥ max(1, n).

ipiv Array, size at least max(1, n). Contains details of the interchanges and the
block structure of D as determined by ssytrf for real flavors or dsytrf for
complex flavors.

s Array, size (n). The array s contains the scale factors for A.

If equed = 'N', s is not accessed.

If equed = 'Y', each element of s must be positive.

Each element of s should be a power of the radix to ensure a reliable

ldb The leading dimension of the array b; ldb≥ max(1, n) for column major
layout and ldb≥nrhs for row major layout.

x Array, size max(1, ldx*nrhs) for column major layout and max(1, ldx*n)
for row major layout.
The solution matrix X as computed by ?hetrs

ldx The leading dimension of the output array x; ldx≥ max(1, n) for column
major layout and ldx≥nrhs for row major layout.

n_err_bnds Number of error bounds to return for each right hand side and each type
(normwise or componentwise). See err_bnds_norm and err_bnds_comp
descriptions in Output Arguments section below.

nparams Specifies the number of parameters set in params. If ≤ 0, the params array
is never referenced and default values are used.

params[0] : Whether to perform iterative refinement or not. Default: 1.0

(for cherfsx), 1.0D+0 (for zherfsx).

626
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
=0.0 No refinement is performed and no error bounds
are computed.

=1.0 Use the double-precision refinement algorithm,

possibly with doubled-single computations if the
compilation environment does not support
double precision.

(Other values are reserved for future use.)

params[1] : Maximum number of residual computations allowed for
refinement.

Default 10

Aggressive Set to 100 to permit convergence using

approximate factorizations or factorizations
other than LU. If the factorization uses a
technique other than Gaussian elimination, the
guarantees in err_bnds_norm and
err_bnds_comp may no longer be trustworthy.

params[2] : Flag determining if the code will attempt to find a solution

with a small componentwise relative error in the double-precision algorithm.
Positive is true, 0.0 is false. Default: 1.0 (attempt componentwise
convergence).

Output Parameters

x The improved solution matrix X.

rcond Reciprocal scaled condition number. An estimate of the reciprocal Skeel

err_bnds_norm Array of size nrhs*n_err_bnds. For each right-hand side, contains

information about various error bounds and condition numbers
corresponding to the normwise relative error, which is defined as follows:
Normwise relative error in the i-th solution vector

The array is indexed by the type of error information as described below.

There are currently up to three pieces of information returned.

627
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

err=1 "Trust/don't trust" boolean. Trust the answer if

the reciprocal condition number is less than the
threshold sqrt(n)*slamch(ε) for cherfsx and
sqrt(n)*dlamch(ε) for zherfsx.

err=2 "Guaranteed" error bound. The estimated

forward error, almost certainly within a factor of
10 of the true error so long as the next entry is
greater than the threshold sqrt(n)*slamch(ε)
for cherfsx and sqrt(n)*dlamch(ε) for
zherfsx. This error bound should only be
trusted if the previous boolean is true.

err=3 Reciprocal condition number. Estimated

normwise reciprocal condition number.
Compared with the threshold
sqrt(n)*slamch(ε) for single precision flavors
and sqrt(n)*dlamch(ε) for double precision
flavors to determine if the error estimate is
"guaranteed". These reciprocal condition
numbers for some appropriately scaled matrix Z
are:

Let z=s*a, where s scales each row by a power

of the radix so all absolute row sums of z are
approximately 1.

The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in:

• Column major layout: err_bnds_norm[(err - 1)*nrhs + i - 1].

• Row major layout: err_bnds_norm[err - 1 + (i - 1)*n_err_bnds]

err_bnds_comp Array of size nrhs*n_err_bnds. For each right-hand side, contains

information about various error bounds and condition numbers
corresponding to the componentwise relative error, which is defined as
follows:
Componentwise relative error in the i-th solution vector:

The array is indexed by the type of error information as described below.

There are currently up to three pieces of information returned for each
right-hand side. If componentwise accuracy is not requested (params[2] =
0.0), then err_bnds_comp is not accessed.

err=1 "Trust/don't trust" boolean. Trust the answer if

the reciprocal condition number is less than the
threshold sqrt(n)*slamch(ε) for cherfsx and
sqrt(n)*dlamch(ε) for zherfsx.

628
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
err=2 "Guaranteed" error bpound. The estimated
forward error, almost certainly within a factor of
10 of the true error so long as the next entry is
greater than the threshold sqrt(n)*slamch(ε)
for cherfsx and sqrt(n)*dlamch(ε) for
zherfsx. This error bound should only be
trusted if the previous boolean is true.

err=3 Reciprocal condition number. Estimated

componentwise reciprocal condition number.
Compared with the threshold
sqrt(n)*slamch(ε) for single precision flavors
and sqrt(n)*dlamch(ε) for double precision
flavors to determine if the error estimate is
"guaranteed". These reciprocal condition
numbers for some appropriately scaled matrix Z
are:

Let z=s(adiag(x)), where x is the solution

for the current right-hand side and s scales each
row of a*diag(x) by a power of the radix so all
absolute row sums of z are approximately 1.

The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in:

• Column major layout: err_bnds_comp[(err - 1)*nrhs + i - 1].

• Row major layout: err_bnds_comp[err - 1 + (i - 1)*n_err_bnds]

params Output parameter only if the input contains erroneous values, namely, in
params[0], params[1], params[2]. In such a case, the corresponding
elements of params are filled with default values on output.

Return Values
This function returns a value info.

If info = 0, the execution is successful. The solution to every right-hand side is guaranteed.

If info = -i, parameter i had an illegal value.

If 0 < info≤n: Uinfo,info is exactly zero. The factorization has been completed, but the factor D is exactly
singular, so the solution and error bounds could not be computed; rcond = 0 is returned.

629
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

?sprfs
Refines the solution of a system of linear equations
with a packed symmetric coefficient matrix and
estimates the solution error.

Syntax
lapack_int LAPACKE_ssprfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const float* ap, const float* afp, const lapack_int* ipiv, const float* b, lapack_int
ldb, float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_dsprfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const double* ap, const double* afp, const lapack_int* ipiv, const double* b,
lapack_int ldb, double* x, lapack_int ldx, double* ferr, double* berr );
lapack_int LAPACKE_csprfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_float* ap, const lapack_complex_float* afp, const lapack_int*
ipiv, const lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x,
lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_zsprfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_double* ap, const lapack_complex_double* afp, const lapack_int*
ipiv, const lapack_complex_double* b, lapack_int ldb, lapack_complex_double* x,
lapack_int ldx, double* ferr, double* berr );

Include Files
• mkl.h

Description

The routine performs an iterative refinement of the solution to a system of linear equations A*X = B with a
packed symmetric matrix A, with multiple right-hand sides. For each computed solution vector x, the routine
computes the component-wise backward errorβ. This error is the smallest relative perturbation in elements
of A and b such that x is the exact solution of the perturbed system:
|δaij| ≤β|aij|, |δbi| ≤β|bi| such that (A + δA)x = (b + δb).
Finally, the routine estimates the component-wise forward error in the computed solution ||x - xe||∞/||
x||∞ (here xe is the exact solution).
Before calling this routine:

• call the factorization routine ?sptrf

• call the solver routine ?sptrs.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

If uplo = 'U', the upper triangle of A is stored.

If uplo = 'L', the lower triangle of A is stored.

n The order of the matrix A; n≥ 0.

nrhs The number of right-hand sides; nrhs≥ 0.

630
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ap,afp,b,x Arrays:
ap of size max(1, n(n+1)/2) contains the original packed matrix A, as
supplied to ?sptrf.

afp of size max(1, n(n+1)/2) contains the factored packed matrix A,

as returned by ?sptrf.

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

ldx The leading dimension of x; ldx≥ max(1, n) for column major layout
and ldx≥ max(1,nrhs) for row major layout.

ipiv Array, size at least max(1, n). The ipiv array, as returned by ?sptrf.

Output Parameters

x The refined solution matrix X.

ferr, berr Arrays, size at least max(1, nrhs). Contain the component-wise
forward and backward errors, respectively, for each solution vector.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
The bounds returned in ferr are not rigorous, but in practice they almost always overestimate the actual
error.
For each right-hand side, computation of the backward error involves a minimum of 4n2 floating-point
operations (for real flavors) or 16n2 operations (for complex flavors). In addition, each step of iterative
refinement involves 6n2 operations (for real flavors) or 24n2 operations (for complex flavors); the number of
iterations may range from 1 to 5.
Estimating the forward error involves solving a number of systems of linear equations A*x = b; the number
of systems is usually 4 or 5 and never more than 11. Each solution requires approximately 2n2 floating-point
operations for real flavors or 8n2 for complex flavors.

See Also
Matrix Storage Schemes

?hprfs
Refines the solution of a system of linear equations
with a packed complex Hermitian coefficient matrix
and estimates the solution error.

631
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
lapack_int LAPACKE_chprfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_float* ap, const lapack_complex_float* afp, const lapack_int*
ipiv, const lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x,
lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_zhprfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_double* ap, const lapack_complex_double* afp, const lapack_int*
ipiv, const lapack_complex_double* b, lapack_int ldb, lapack_complex_double* x,
lapack_int ldx, double* ferr, double* berr );

Include Files
• mkl.h

Description

The routine performs an iterative refinement of the solution to a system of linear equations A*X = B with a
packed complex Hermitian matrix A, with multiple right-hand sides. For each computed solution vector x, the
routine computes the component-wise backward errorβ. This error is the smallest relative perturbation in
elements of A and b such that x is the exact solution of the perturbed system:
|δaij| ≤β|aij|, |δbi| ≤β|bi| such that (A + δA)x = (b + δb).
Finally, the routine estimates the component-wise forward error in the computed solution ||x - xe||∞/||
x||∞ (here xe is the exact solution).
Before calling this routine:

• call the factorization routine ?hptrf

• call the solver routine ?hptrs.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

If uplo = 'U', the upper triangle of A is stored.

If uplo = 'L', the lower triangle of A is stored.

n The order of the matrix A; n≥ 0.

nrhs The number of right-hand sides; nrhs≥ 0.

ap,afp,b,x Arrays:
apmax(1, n(n + 1)/2) contains the original packed matrix A, as
supplied to ?hptrf.

afpmax(1, n(n + 1)/2) contains the factored packed matrix A, as

returned by ?hptrf.

bof size max(1, ldb*nrhs) for column major layout and max(1,
ldb*n) for row major layout contains the right-hand side matrix B.

632
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
xof size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout contains the solution matrix X.

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

ldx The leading dimension of x; ldx≥ max(1, n) for column major layout
and ldx≥nrhs for row major layout.

ipiv Array, size at least max(1, n). The ipiv array, as returned by ?hptrf.

Output Parameters

x The refined solution matrix X.

ferr, berr Arrays, size at least max(1,nrhs). Contain the component-wise

forward and backward errors, respectively, for each solution vector.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
The bounds returned in ferr are not rigorous, but in practice they almost always overestimate the actual
error.
For each right-hand side, computation of the backward error involves a minimum of 16n2 operations. In
addition, each step of iterative refinement involves 24n2 operations; the number of iterations may range
from 1 to 5.
Estimating the forward error involves solving a number of systems of linear equations A*x = b; the number
is usually 4 or 5 and never more than 11. Each solution requires approximately 8n2 floating-point operations.

The real counterpart of this routine is ?ssprfs/?dsprfs.

See Also
Matrix Storage Schemes

?trrfs
Estimates the error in the solution of a system of
linear equations with a triangular coefficient matrix.

Syntax
lapack_int LAPACKE_strrfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int nrhs, const float* a, lapack_int lda, const float* b,
lapack_int ldb, const float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_dtrrfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int nrhs, const double* a, lapack_int lda, const double* b,
lapack_int ldb, const double* x, lapack_int ldx, double* ferr, double* berr );

633
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lapack_int LAPACKE_ctrrfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int nrhs, const lapack_complex_float* a, lapack_int lda, const
lapack_complex_float* b, lapack_int ldb, const lapack_complex_float* x, lapack_int ldx,
float* ferr, float* berr );
lapack_int LAPACKE_ztrrfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int nrhs, const lapack_complex_double* a, lapack_int lda, const
lapack_complex_double* b, lapack_int ldb, const lapack_complex_double* x, lapack_int
ldx, double* ferr, double* berr );

Include Files
• mkl.h

Description

The routine estimates the errors in the solution to a system of linear equations A*X = B or AT*X = B or
AH*X = B with a triangular matrix A, with multiple right-hand sides. For each computed solution vector x, the
routine computes the component-wise backward errorβ. This error is the smallest relative perturbation in
elements of A and b such that x is the exact solution of the perturbed system:
|δaij| ≤β|aij|, |δbi| ≤β|bi| such that (A + δA)x = (b + δb).
The routine also estimates the component-wise forward error in the computed solution ||x - xe||∞/||
x||∞ (here xe is the exact solution).
Before calling this routine, call the solver routine ?trtrs.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether A is upper or lower triangular:

If uplo = 'U', then A is upper triangular.

If uplo = 'L', then A is lower triangular.

trans Must be 'N' or 'T' or 'C'.

Indicates the form of the equations:

If trans = 'N', the system has the form A*X = B.

If trans = 'T', the system has the form AT*X = B.

If trans = 'C', the system has the form AH*X = B.

diag Must be 'N' or 'U'.

If diag = 'N', then A is not a unit triangular matrix.

If diag = 'U', then A is unit triangular: diagonal elements of A are

assumed to be 1 and not referenced in the array a.

n The order of the matrix A; n≥ 0.

nrhs The number of right-hand sides; nrhs≥ 0.

634
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
a, b, x Arrays:
a(size max(1, lda*n)) contains the upper or lower triangular matrix
A, as specified by uplo.
bof size max(1, ldb*nrhs) for column major layout and max(1,
ldb*n) for row major layout contains the right-hand side matrix B.
xof size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout contains the solution matrix X.

lda The leading dimension of a; lda≥ max(1, n).

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

ldx The leading dimension of x; ldx≥ max(1, n) for column major layout
and ldx≥nrhs for row major layout.

Output Parameters

ferr, berr Arrays, size at least max(1, nrhs). Contain the component-wise
forward and backward errors, respectively, for each solution vector.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
The bounds returned in ferr are not rigorous, but in practice they almost always overestimate the actual
error.
A call to this routine involves, for each right-hand side, solving a number of systems of linear equations A*x
= b; the number of systems is usually 4 or 5 and never more than 11. Each solution requires approximately
n2 floating-point operations for real flavors or 4n2 for complex flavors.

See Also
Matrix Storage Schemes

?tprfs
Estimates the error in the solution of a system of
linear equations with a packed triangular coefficient
matrix.

Syntax
lapack_int LAPACKE_stprfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int nrhs, const float* ap, const float* b, lapack_int ldb, const
float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_dtprfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int nrhs, const double* ap, const double* b, lapack_int ldb, const
double* x, lapack_int ldx, double* ferr, double* berr );

635
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lapack_int LAPACKE_ctprfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int nrhs, const lapack_complex_float* ap, const
lapack_complex_float* b, lapack_int ldb, const lapack_complex_float* x, lapack_int ldx,
float* ferr, float* berr );
lapack_int LAPACKE_ztprfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int nrhs, const lapack_complex_double* ap, const
lapack_complex_double* b, lapack_int ldb, const lapack_complex_double* x, lapack_int
ldx, double* ferr, double* berr );

Include Files
• mkl.h

Description

The routine estimates the errors in the solution to a system of linear equations A*X = B or AT*X = B or
AH*X = B with a packed triangular matrix A, with multiple right-hand sides. For each computed solution
vector x, the routine computes the component-wise backward errorβ. This error is the smallest relative
perturbation in elements of A and b such that x is the exact solution of the perturbed system:
|δaij| ≤β|aij|, |δbi| ≤β|bi| such that (A + δA)x = (b + δb).
The routine also estimates the component-wise forward error in the computed solution ||x - xe||∞/||
x||∞ (here xe is the exact solution).
Before calling this routine, call the solver routine ?tptrs.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether A is upper or lower triangular:

If uplo = 'U', then A is upper triangular.

If uplo = 'L', then A is lower triangular.

trans Must be 'N' or 'T' or 'C'.

Indicates the form of the equations:

If trans = 'N', the system has the form A*X = B.

If trans = 'T', the system has the form AT*X = B.

If trans = 'C', the system has the form AH*X = B.

diag Must be 'N' or 'U'.

If diag = 'N', A is not a unit triangular matrix.

If diag = 'U', A is unit triangular: diagonal elements of A are

assumed to be 1 and not referenced in the array ap.

n The order of the matrix A; n≥ 0.

nrhs The number of right-hand sides; nrhs≥ 0.

636
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ap, b, x Arrays:
apmax(1, n(n + 1)/2) contains the upper or lower triangular matrix A,
as specified by uplo.
bof size max(1, ldb*nrhs) for column major layout and max(1,
ldb*n) for row major layout contains the right-hand side matrix B.
xof size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout contains the solution matrix X.

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

ldx The leading dimension of x; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

Output Parameters

ferr, berr Arrays, size at least max(1, nrhs). Contain the component-wise
forward and backward errors, respectively, for each solution vector.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

See Also
Matrix Storage Schemes

?tbrfs
Estimates the error in the solution of a system of
linear equations with a triangular band coefficient
matrix.

Syntax
lapack_int LAPACKE_stbrfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int kd, lapack_int nrhs, const float* ab, lapack_int ldab, const
float* b, lapack_int ldb, const float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_dtbrfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int kd, lapack_int nrhs, const double* ab, lapack_int ldab, const
double* b, lapack_int ldb, const double* x, lapack_int ldx, double* ferr, double*
berr );

637
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lapack_int LAPACKE_ctbrfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int kd, lapack_int nrhs, const lapack_complex_float* ab,
lapack_int ldab, const lapack_complex_float* b, lapack_int ldb, const
lapack_complex_float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_ztbrfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int kd, lapack_int nrhs, const lapack_complex_double* ab,
lapack_int ldab, const lapack_complex_double* b, lapack_int ldb, const
lapack_complex_double* x, lapack_int ldx, double* ferr, double* berr );

Include Files
• mkl.h

Description

The routine estimates the errors in the solution to a system of linear equations A*X = B or AT*X = B or
AH*X = B with a triangular band matrix A, with multiple right-hand sides. For each computed solution vector
x, the routine computes the component-wise backward errorβ. This error is the smallest relative
perturbation in elements of A and b such that x is the exact solution of the perturbed system:
|δaij| ≤β|aij|, |δbi| ≤β|bi| such that (A + δA)x = (b + δb).
The routine also estimates the component-wise forward error in the computed solution ||x - xe||∞/||
x||∞ (here xe is the exact solution).
Before calling this routine, call the solver routine ?tbtrs.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether A is upper or lower triangular:

If uplo = 'U', then A is upper triangular.

If uplo = 'L', then A is lower triangular.

trans Must be 'N' or 'T' or 'C'.

Indicates the form of the equations:

If trans = 'N', the system has the form A*X = B.

If trans = 'T', the system has the form AT*X = B.

If trans = 'C', the system has the form AH*X = B.

diag Must be 'N' or 'U'.

If diag = 'N', A is not a unit triangular matrix.

If diag = 'U', A is unit triangular: diagonal elements of A are

assumed to be 1 and not referenced in the array ab.

n The order of the matrix A; n≥ 0.

kd The number of super-diagonals or sub-diagonals in the matrix A; kd≥

638
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
nrhs The number of right-hand sides; nrhs≥ 0.

ab, b, x Arrays:
ab(size max(1, ldab*n)) contains the upper or lower triangular matrix
A, as specified by uplo, in band storage format.
bof size max(1, ldb*nrhs) for column major layout and max(1,
ldb*n) for row major layout contains the right-hand side matrix B.
xof size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout contains the solution matrix X.

ldab The leading dimension of the array ab; ldab≥kd +1.

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

ldx The leading dimension of x; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

Output Parameters

ferr, berr Arrays, size at least max(1, nrhs). Contain the component-wise
forward and backward errors, respectively, for each solution vector.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

Application Notes
The bounds returned in ferr are not rigorous, but in practice they almost always overestimate the actual
error.
A call to this routine involves, for each right-hand side, solving a number of systems of linear equations A*x
= b; the number of systems is usually 4 or 5 and never more than 11. Each solution requires approximately
2n*kd floating-point operations for real flavors or 8n*kd operations for complex flavors.

See Also
Matrix Storage Schemes

Matrix Inversion: LAPACK Computational Routines

It is seldom necessary to compute an explicit inverse of a matrix. In particular, do not attempt to solve a
system of equations Ax = b by first computing A-1 and then forming the matrix-vector product x = A-1b.
Call a solver routine instead (see Routines for Solving Systems of Linear Equations); this is more efficient
and more accurate.
However, matrix inversion routines are provided for the rare occasions when an explicit inverse matrix is
needed.

?getri
Computes the inverse of an LU-factored general
matrix.

639
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
lapack_int LAPACKE_sgetri (int matrix_layout , lapack_int n , float * a , lapack_int
lda , const lapack_int * ipiv );
lapack_int LAPACKE_dgetri (int matrix_layout , lapack_int n , double * a , lapack_int
lda , const lapack_int * ipiv );
lapack_int LAPACKE_cgetri (int matrix_layout , lapack_int n , lapack_complex_float *
a , lapack_int lda , const lapack_int * ipiv );
lapack_int LAPACKE_zgetri (int matrix_layout , lapack_int n , lapack_complex_double *
a , lapack_int lda , const lapack_int * ipiv );

Include Files
• mkl.h

Description

The routine computes the inverse inv(A) of a general matrix A. Before calling this routine, call ?getrf to
factorize A.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

n The order of the matrix A; n≥ 0.

a Array a(size max(1, lda*n)) contains the factorization of the matrix

A, as returned by ?getrf: A = P*L*U. The second dimension of a
must be at least max(1,n).

lda The leading dimension of a; lda≥ max(1, n).

ipiv Array, size at least max(1, n).

The ipiv array, as returned by ?getrf.

Output Parameters

a Overwritten by the n-by-n matrix inv(A).

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, the i-th diagonal element of the factor U is zero, U is singular, and the inversion could not be
completed.

Application Notes
The computed inverse X satisfies the following error bound:

|XA - I| ≤c(n)ε|X|P|L||U|,

640
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
where c(n) is a modest linear function of n; ε is the machine precision; I denotes the identity matrix; P, L,
and U are the factors of the matrix factorization A = P*L*U.

The total number of floating-point operations is approximately (4/3)n3 for real flavors and (16/3)n3 for
complex flavors.
See Also
Matrix Storage Schemes

mkl_?getrinp
Computes the inverse of an LU-factored general
matrix without pivoting.

Syntax
lapack_int LAPACKE_mkl_sgetrinp (int matrix_layout , lapack_int n , float * a ,
lapack_int lda );
lapack_int LAPACKE_mkl_dgetrinp (int matrix_layout , lapack_int n , double * a ,
lapack_int lda );
lapack_int LAPACKE_mkl_cgetrinp (int matrix_layout , lapack_int n ,
lapack_complex_float * a , lapack_int lda );
lapack_int LAPACKE_mkl_zgetrinp (int matrix_layout , lapack_int n ,
lapack_complex_double * a , lapack_int lda );

Include Files
• mkl.h

Description

The routine computes the inverse inv(A) of a general matrix A. Before calling this routine, call
mkl_?getrfnp to factorize A.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

n The order of the matrix A; n≥ 0.

a Array a(size max(1, lda*n)) contains the factorization of the matrix

A, as returned by mkl_?getrfnp: A = L*U. The second dimension of
a must be at least max(1,n).

lda The leading dimension of a; lda≥ max(1, n).

Output Parameters

a Overwritten by the n-by-n matrix inv(A).

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

641
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If info = i, the i-th diagonal element of the factor U is zero, U is singular, and the inversion could not be
completed.

Application Notes
The total number of floating-point operations is approximately (4/3)n3 for real flavors and (16/3)n3 for
complex flavors.

See Also
Matrix Storage Schemes

?potri
Computes the inverse of a symmetric (Hermitian)
positive-definite matrix using the Cholesky
factorization.

Syntax
lapack_int LAPACKE_spotri (int matrix_layout , char uplo , lapack_int n , float * a ,
lapack_int lda );
lapack_int LAPACKE_dpotri (int matrix_layout , char uplo , lapack_int n , double * a ,
lapack_int lda );
lapack_int LAPACKE_cpotri (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * a , lapack_int lda );
lapack_int LAPACKE_zpotri (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * a , lapack_int lda );

Include Files
• mkl.h

Description

The routine computes the inverse inv(A) of a symmetric positive definite or, for complex flavors, Hermitian
positive-definite matrix A. Before calling this routine, call ?potrf to factorize A.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', the upper triangle of A is stored.

If uplo = 'L', the lower triangle of A is stored.

n The order of the matrix A; n≥ 0.

a Array a(size max(1, lda*n)). Contains the factorization of the matrix

A, as returned by ?potrf.

lda The leading dimension of a. lda≥ max(1, n).

642
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

a Overwritten by the upper or lower triangle of the inverse of A.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, the i-th diagonal element of the Cholesky factor (and therefore the factor itself) is zero, and the
inversion could not be completed.

Application Notes
The computed inverse X satisfies the following error bounds:

||XA - I||2≤c(n)εκ2(A), ||AX - I||2≤c(n)εκ2(A),

where c(n) is a modest linear function of n, and ε is the machine precision; I denotes the identity matrix.

The 2-norm ||A||2 of a matrix A is defined by ||A||2 = maxx·x=1(Ax·Ax)1/2, and the condition number
κ2(A) is defined by κ2(A) = ||A||2 ||A-1||2.
The total number of floating-point operations is approximately (2/3)n3 for real flavors and (8/3)n3 for
complex flavors.

See Also
Matrix Storage Schemes

?pftri
Computes the inverse of a symmetric (Hermitian)
positive-definite matrix in RFP format using the
Cholesky factorization.

Syntax
lapack_int LAPACKE_spftri (int matrix_layout , char transr , char uplo , lapack_int n ,
float * a );
lapack_int LAPACKE_dpftri (int matrix_layout , char transr , char uplo , lapack_int n ,
double * a );
lapack_int LAPACKE_cpftri (int matrix_layout , char transr , char uplo , lapack_int n ,
lapack_complex_float * a );
lapack_int LAPACKE_zpftri (int matrix_layout , char transr , char uplo , lapack_int n ,
lapack_complex_double * a );

Include Files
• mkl.h

Description

The routine computes the inverse inv(A) of a symmetric positive definite or, for complex data, Hermitian
positive-definite matrix A using the Cholesky factorization:

A = UTU for real data, A = UHU for complex data if uplo='U'

643
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

A = LLT for real data, A = LLH for complex data if uplo='L'

Before calling this routine, call ?pftrf to factorize A.

The matrix A is in the Rectangular Full Packed (RFP) format. For the description of the RFP format, see Matrix
Storage Schemes.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

transr Must be 'N', 'T' (for real data) or 'C' (for complex data).

If transr = 'N', the Normal transr of RFP U (if uplo = 'U') or L (if
uplo = 'L') is stored.
If transr = 'T', the Transpose transr of RFP U (if uplo = 'U') or L
(if uplo = 'L' is stored.

If transr = 'C', the Conjugate-Transpose transr of RFP U (if uplo

= 'U') or L (if uplo = 'L' is stored.

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', A = UT*U for real data or A = UH*U for complex data,
and U is stored.
If uplo = 'L', A = L*LT for real data or A = L*LH for complex data,
and L is stored.

n The order of the matrix A; n≥ 0.

a Array, size (n*(n+1)/2). The array a contains the factor U or L

matrix A in the RFP format.

Output Parameters

a The symmetric/Hermitian inverse of the original matrix in the same

storage format.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, the (i,i) element of the factor U or L is zero, and the inverse could not be computed.

See Also
Matrix Storage Schemes

?pptri
Computes the inverse of a packed symmetric
(Hermitian) positive-definite matrix using Cholesky
factorization.

644
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_spptri (int matrix_layout , char uplo , lapack_int n , float * ap );
lapack_int LAPACKE_dpptri (int matrix_layout , char uplo , lapack_int n , double *
ap );
lapack_int LAPACKE_cpptri (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * ap );
lapack_int LAPACKE_zpptri (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * ap );

Include Files
• mkl.h

Description

The routine computes the inverse inv(A) of a symmetric positive definite or, for complex flavors, Hermitian
positive-definite matrix A in packed form. Before calling this routine, call ?pptrf to factorize A.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular factor is stored in ap:

If uplo = 'U', then the upper triangular factor is stored.

If uplo = 'L', then the lower triangular factor is stored.

n The order of the matrix A; n≥ 0.

ap Array, size at least max(1, n(n+1)/2).

Contains the factorization of the packed matrix A, as returned
by ?pptrf.

The dimension ap must be at least max(1,n(n+1)/2).

Output Parameters

ap Overwritten by the packed n-by-n matrix inv(A).

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, the i-th diagonal element of the Cholesky factor (and therefore the factor itself) is zero, and the
inversion could not be completed.

645
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Application Notes
The computed inverse X satisfies the following error bounds:

||XA - I||2≤c(n)εκ2(A), ||AX - I||2≤c(n)εκ2(A),

where c(n) is a modest linear function of n, and ε is the machine precision; I denotes the identity matrix.

The 2-norm ||A||2 of a matrix A is defined by ||A||2 =maxx·x=1(Ax·Ax)1/2, and the condition number
κ2(A) is defined by κ2(A) = ||A||2 ||A-1||2 .
The total number of floating-point operations is approximately (2/3)n3 for real flavors and (8/3)n3 for
complex flavors.

See Also
Matrix Storage Schemes

?sytri
Computes the inverse of a symmetric matrix using
U*D*UT or L*D*LT Bunch-Kaufman factorization.

Syntax
lapack_int LAPACKE_ssytri (int matrix_layout , char uplo , lapack_int n , float * a ,
lapack_int lda , const lapack_int * ipiv );
lapack_int LAPACKE_dsytri (int matrix_layout , char uplo , lapack_int n , double * a ,
lapack_int lda , const lapack_int * ipiv );
lapack_int LAPACKE_csytri (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * a , lapack_int lda , const lapack_int * ipiv );
lapack_int LAPACKE_zsytri (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * a , lapack_int lda , const lapack_int * ipiv );

Include Files
• mkl.h

Description

The routine computes the inverse inv(A) of a symmetric matrix A. Before calling this routine, call ?sytrf to
factorize A.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', the array a stores the Bunch-Kaufman factorization A
= U*D*UT.
If uplo = 'L', the array a stores the Bunch-Kaufman factorization A
= L*D*LT.

n The order of the matrix A; n≥ 0.

646
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
a a(size max(1, lda*n)) contains the factorization of the matrix A, as
returned by ?sytrf.

lda The leading dimension of a; lda≥ max(1, n).

ipiv Array, size at least max(1, n).

The ipiv array, as returned by ?sytrf.

Output Parameters

a Overwritten by the n-by-n matrix inv(A).

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info =-i, parameter i had an illegal value.
If info = i, the i-th diagonal element of D is zero, D is singular, and the inversion could not be completed.

Application Notes
The computed inverse X satisfies the following error bounds:

|DUTPTXP*U - I| ≤c(n)ε(|D||UT|PT|X|P|U| + |D||D-1|)

for uplo = 'U', and

|DLTPTXP*L - I| ≤c(n)ε(|D||LT|PT|X|P|L| + |D||D-1|)

for uplo = 'L'. Here c(n) is a modest linear function of n, and ε is the machine precision; I denotes the
identity matrix.
The total number of floating-point operations is approximately (2/3)n3 for real flavors and (8/3)n3 for
complex flavors.

See Also
Matrix Storage Schemes

?hetri
Computes the inverse of a complex Hermitian matrix
using U*D*UH or L*D*LH Bunch-Kaufman
factorization.

Syntax
lapack_int LAPACKE_chetri (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * a , lapack_int lda , const lapack_int * ipiv );
lapack_int LAPACKE_zhetri (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * a , lapack_int lda , const lapack_int * ipiv );

Include Files
• mkl.h

Description

647
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The routine computes the inverse inv(A) of a complex Hermitian matrix A. Before calling this routine,
call ?hetrf to factorize A.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', the array a stores the Bunch-Kaufman factorization A
= U*D*UH.
If uplo = 'L', the array a stores the Bunch-Kaufman factorization A
= L*D*LH.

n The order of the matrix A; n≥ 0.

a, Array a(size max(1, lda*n)) contains the factorization of the matrix

A, as returned by ?hetrf. The second dimension of a must be at least
max(1,n).

lda The leading dimension of a; lda≥ max(1, n).

ipiv Array, size at least max(1, n). The ipiv array, as returned by ?hetrf.

Output Parameters

a Overwritten by the n-by-n matrix inv(A).

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, the i-th diagonal element of D is zero, D is singular, and the inversion could not be completed.

Application Notes
The computed inverse X satisfies the following error bounds:

|DUHPTXP*U - I| ≤c(n)ε(|D||UH|PT|X|P|U| + |D||D-1|)

for uplo = 'U', and

|DLHPTXP*L - I| ≤c(n)ε(|D||LH|PT|X|P|L| + |D||D-1|)

The real counterpart of this routine is ?sytri.

See Also
Matrix Storage Schemes

648
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
?sytri2
Computes the inverse of a symmetric indefinite matrix
through allocating memory and calling ?sytri2x.

Syntax
lapack_int LAPACKE_ssytri2 (int matrix_layout , char uplo , lapack_int n , float * a ,
lapack_int lda , const lapack_int * ipiv );
lapack_int LAPACKE_dsytri2 (int matrix_layout , char uplo , lapack_int n , double * a ,
lapack_int lda , const lapack_int * ipiv );
lapack_int LAPACKE_csytri2 (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * a , lapack_int lda , const lapack_int * ipiv );
lapack_int LAPACKE_zsytri2 (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * a , lapack_int lda , const lapack_int * ipiv );

Include Files
• mkl.h

Description
The routine computes the inverse inv(A) of a symmetric indefinite matrix A using the factorization A =
U*D*UT or A = L*D*LT computed by ?sytrf.
The ?sytri2 routine allocates a temporary buffer before calling ?sytri2x that actually computes the
inverse.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', the array a stores the factorization A = U*D*UT.

If uplo = 'L', the array a stores the factorization A = LDLT.

n The order of the matrix A; n≥ 0.

a Array a(size max(1, lda*n)) contains the block diagonal matrix D and
the multipliers used to obtain the factor U or L as returned by ?sytrf.

lda The leading dimension of a; lda≥ max(1, n).

ipiv Array, size at least max(1, n).

Details of the interchanges and the block structure of D as returned

by ?sytrf.

Output Parameters

a If info = 0, the symmetric inverse of the original matrix.

If uplo = 'U', the upper triangular part of the inverse is formed and
the part of A below the diagonal is not referenced.

649
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If uplo = 'L', the lower triangular part of the inverse is formed and
the part of A above the diagonal is not referenced.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info =-i, the i-th parameter had an illegal value.

If info = i, D(i,i) = 0; D is singular and its inversion could not be computed.

See Also
?sytrf
?sytri2x
Matrix Storage Schemes

?hetri2
Computes the inverse of a Hermitian indefinite matrix
through allocating memory and calling ?hetri2x.

Syntax
lapack_int LAPACKE_chetri2 (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * a , lapack_int lda , const lapack_int * ipiv );
lapack_int LAPACKE_zhetri2 (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * a , lapack_int lda , const lapack_int * ipiv );

Include Files
• mkl.h

Description
The routine computes the inverse inv(A) of a Hermitian indefinite matrix A using the factorization A =
U*D*UH or A = L*D*LH computed by ?hetrf.
The ?hetri2 routine allocates a temporary buffer before calling ?hetri2x that actually computes the
inverse.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', the array a stores the factorization A = U*D*UH.

If uplo = 'L', the array a stores the factorization A = LDLH.

n The order of the matrix A; n≥ 0.

a Array a(size max(1, lda*n)) contains the block diagonal matrix D and
the multipliers used to obtain the factor U or L as returned by ?sytrf.

650
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lda The leading dimension of a; lda≥ max(1, n).

ipiv Array, size at least max(1, n).

Details of the interchanges and the block structure of D as returned

by ?hetrf.

Output Parameters

a If info = 0, the inverse of the original matrix.

If uplo = 'U', the upper triangular part of the inverse is formed and
the part of A below the diagonal is not referenced.
If uplo = 'L', the lower triangular part of the inverse is formed and
the part of A above the diagonal is not referenced.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info =-i, parameter i had an illegal value.

If info = i, D(i,i) = 0; D is singular and its inversion could not be computed.

See Also
?hetrf
?hetri2x
Matrix Storage Schemes

?sytri2x
Computes the inverse of a symmetric indefinite matrix
after ?sytri2allocates memory.

Syntax
lapack_int LAPACKE_ssytri2x (int matrix_layout , char uplo , lapack_int n , float * a ,
lapack_int lda , const lapack_int * ipiv , lapack_int nb );
lapack_int LAPACKE_dsytri2x (int matrix_layout , char uplo , lapack_int n , double *
a , lapack_int lda , const lapack_int * ipiv , lapack_int nb );
lapack_int LAPACKE_csytri2x (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * a , lapack_int lda , const lapack_int * ipiv , lapack_int nb );
lapack_int LAPACKE_zsytri2x (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * a , lapack_int lda , const lapack_int * ipiv , lapack_int nb );

Include Files
• mkl.h

Description
The routine computes the inverse inv(A) of a symmetric indefinite matrix A using the factorization A =
U*D*UT or A = L*D*LT computed by ?sytrf.
The ?sytri2x actually computes the inverse after the ?sytri2 routine allocates memory before
calling ?sytri2x.

651
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', the array a stores the factorization A = U*D*UT.

If uplo = 'L', the array a stores the factorization A = LDLT.

n The order of the matrix A; n≥ 0.

a Array a (size max(1, lda*n)) contains the nb (block size) diagonal

matrix D and the multipliers used to obtain the factor U or L as
returned by ?sytrf. The second dimension of a must be at least
max(1,n).

lda The leading dimension of a; lda≥ max(1, n).

ipiv Array, size at least max(1, n).

Details of the interchanges and the nb structure of D as returned

by ?sytrf.

nb Block size.

Output Parameters

a If info = 0, the symmetric inverse of the original matrix.

If info = 'U', the upper triangular part of the inverse is formed and
the part of A below the diagonal is not referenced.
If info = 'L', the lower triangular part of the inverse is formed and
the part of A above the diagonal is not referenced.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info =-i, parameter i had an illegal value.

If info = i, Dii= 0; D is singular and its inversion could not be computed.

See Also
?sytrf
?sytri2
Matrix Storage Schemes

?hetri2x
Computes the inverse of a Hermitian indefinite matrix
after ?hetri2allocates memory.

652
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_chetri2x (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * a , lapack_int lda , const lapack_int * ipiv , lapack_int nb );
lapack_int LAPACKE_zhetri2x (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * a , lapack_int lda , const lapack_int * ipiv , lapack_int nb );

Include Files
• mkl.h

Description
The routine computes the inverse inv(A) of a Hermitian indefinite matrix A using the factorization A =
U*D*UH or A = L*D*LH computed by ?hetrf.
The ?hetri2x actually computes the inverse after the ?hetri2 routine allocates memory before
calling ?hetri2x.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', the array a stores the factorization A = U*D*UH.

If uplo = 'L', the array a stores the factorization A = LDLH.

n The order of the matrix A; n≥ 0.

a Arrays a(size max(1, lda*n)) contains the nb (block size) diagonal

matrix D and the multipliers used to obtain the factor U or L as
returned by ?hetrf.

lda The leading dimension of a; lda≥ max(1, n).

ipiv Array, size at least max(1, n).

Details of the interchanges and the nb structure of D as returned

by ?hetrf.

nb Block size.

Output Parameters

a If info = 0, the symmetric inverse of the original matrix.

Return Values
This function returns a value info.

653
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If info = 0, the execution is successful.

If info =-i, parameter i had an illegal value.

If info = i, Dii= 0; D is singular and its inversion could not be computed.

See Also
?hetrf
?hetri2
Matrix Storage Schemes

?sytri_3
Computes the inverse of a real or complex symmetric
matrix.
lapack_int LAPACKE_ssytri_3 (int matrix_layout, char uplo, lapack_int n, float * A,
lapack_int lda, const float * e, const lapack_int * ipiv);
lapack_int LAPACKE_dsytri_3 (int matrix_layout, char uplo, lapack_int n, double * A,
lapack_int lda, const double * e, const lapack_int * ipiv);
lapack_int LAPACKE_csytri_3 (int matrix_layout, char uplo, lapack_int n,
lapack_complex_float * A, lapack_int lda, const lapack_complex_float * e, const
lapack_int * ipiv);
lapack_int LAPACKE_zsytri_3 (int matrix_layout, char uplo, lapack_int n,
lapack_complex_double * A, lapack_int lda, const lapack_complex_double * e, const
lapack_int * ipiv);

Description
?sytri_3 computes the inverse of a real or complex symmetric matrix A using the factorization computed
by ?sytrf_rk: A = P*U*D*(UT)*(PT) or A = P*L*D*(LT)*(PT), where U (or L) is a unit upper (or lower)
triangular matrix, UT (or LT) is the transpose of U (or L), P is a permutation matrix, PT is the transpose of P,
and D is symmetric and block diagonal with 1-by-1 and 2-by-2 diagonal blocks.
?sytri_3 sets the leading dimension of the workspace before calling ?sytri_3x, which actually computes
the inverse. This is the blocked version of the algorithm, calling Level-3 BLAS.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo Specifies whether the details of the factorization are stored as an upper or
lower triangular matrix.

• = 'U': The upper triangle of A is stored.

• = 'L': The lower triangle of A is stored.

n The order of the matrix A. n ≥ 0.

A Array of size max(1, lda*n). On entry, diagonal of the block diagonal

matrix D and factors U or L as computed by ?sytrf_rk:

• Only diagonal elements of the symmetric block diagonal matrix D on the

diagonal of A; that is, D(k,k) = A(k,k). Superdiagonal (or subdiagonal)
elements of D should be provided on entry in array e.

—and—

654
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• If uplo = 'U', factor U in the superdiagonal part of A. If uplo = 'L',
factor L in the subdiagonal part of A.

lda The leading dimension of the array A.

e Array of size n. On entry, contains the superdiagonal (or subdiagonal)

elements of the symmetric block diagonal matrix D with 1-by-1 or 2-by-2
diagonal blocks. If uplo = 'U', e(i) = D(i-1,i), i=2:N, and e(1) is not
referenced. If uplo = 'L', e(i) = D(i+1,i), i=1:N-1, and e(n) is not
referenced.

NOTE For 1-by-1 diagonal block D(k), where 1 ≤ k ≤ n, the

element e[k-1] is not referenced in both the uplo = 'U' and
uplo = 'L' cases.

ipiv Array of size n. Details of the interchanges and the block structure of D as
determined by ?sytrf_rk.

Output Parameters

A On exit, if info = 0, the symmetric inverse of the original matrix. If uplo =

'U', the upper triangular part of the inverse is formed and the part of A
below the diagonal is not referenced. If uplo = 'L', the lower triangular
part of the inverse is formed and the part of A above the diagonal is not
referenced.

Return Values
This function returns a value info.

= 0: Successful exit.
< 0: If info = -i, the ith argument had an illegal value.

> 0: If info = i, D(i,i) = 0; the matrix is singular and its inverse could not be computed.

?hetri_3
Computes the inverse of a complex Hermitian matrix
using the factorization computed by ?hetrf_rk.
lapack_int LAPACKE_chetri_3 (int matrix_layout, char uplo, lapack_int n,
lapack_complex_float * A, lapack_int lda, const lapack_complex_float * e, const
lapack_int * ipiv);
lapack_int LAPACKE_zhetri_3 (int matrix_layout, char uplo, lapack_int n,
lapack_complex_double * A, lapack_int lda, const lapack_complex_double * e, const
lapack_int * ipiv);

Description
?hetri_3 computes the inverse of a complex Hermitian matrix A using the factorization computed
by ?hetrf_rk: A = P*U*D*(UH)*(PT) or A = P*L*D*(LH)*(PT), where U (or L) is a unit upper (or lower)
triangular matrix, UH (or LH) is the conjugate of U (or L), P is a permutation matrix, PT is the transpose of P,
and D is a Hermitian and block diagonal with 1-by-1 and 2-by-2 diagonal blocks.
?hetri_3 sets the leading dimension of the workspace before calling ?hetri_3x, which actually computes
the inverse.

655
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

This is the blocked version of the algorithm, calling Level-3 BLAS.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo Specifies whether the details of the factorization are stored as an upper or
lower triangular matrix.

• = 'U': The upper triangle of A is stored.

• = 'L': The lower triangle of A is stored.

n The order of the matrix A. n ≥ 0.

A Array of size max(1, lda*n). On entry, diagonal of the block diagonal

matrix D and factor U or L as computed by ?hetrf_rk:

• Only diagonal elements of the Hermitian block diagonal matrix D on the

lda The leading dimension of the array A.

e Array of size n. On entry, contains the superdiagonal (or subdiagonal)

elements of the Hermitian block diagonal matrix D with 1-by-1 or 2-by-2
diagonal blocks. If uplo = 'U', e(i) = D(i-1,i), i=2:N, and e(1) is not
referenced. If uplo = 'L', e(i) = D(i+1,i), i=1:N-1, and e(n) is not
referenced.

NOTE For 1-by-1 diagonal block D(k), where 1 ≤ k ≤ n, the

element e[k-1] is not referenced in both the uplo = 'U' and
uplo = 'L' cases.

ipiv Array of size n. Details of the interchanges and the block structure of D as
determined by ?hetrf_rk.

Output Parameters

A On exit, if info = 0, the Hermitian inverse of the original matrix. If uplo =

Return Values
This function returns a value info.

= 0: Successful exit.
< 0: If info = -i, the ith argument had an illegal value.

> 0: If info = i, D(i,i) = 0; the matrix is singular and its inverse could not be computed.

656
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
?sptri
Computes the inverse of a symmetric matrix using
U*D*UT or L*D*LT Bunch-Kaufman factorization of
matrix in packed storage.

Syntax
lapack_int LAPACKE_ssptri (int matrix_layout , char uplo , lapack_int n , float * ap ,
const lapack_int * ipiv );
lapack_int LAPACKE_dsptri (int matrix_layout , char uplo , lapack_int n , double * ap ,
const lapack_int * ipiv );
lapack_int LAPACKE_csptri (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * ap , const lapack_int * ipiv );
lapack_int LAPACKE_zsptri (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * ap , const lapack_int * ipiv );

Include Files
• mkl.h

Description

The routine computes the inverse inv(A) of a packed symmetric matrix A. Before calling this routine,
call ?sptrf to factorize A.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', the array ap stores the Bunch-Kaufman factorization A
= U*D*UT.
If uplo = 'L', the array ap stores the Bunch-Kaufman factorization A
= L*D*LT.

n The order of the matrix A; n≥ 0.

ap Arrays ap (size max(1,n(n+1)/2)) contains the factorization of the

matrix A, as returned by ?sptrf.

ipiv Array, size at least max(1, n). The ipiv array, as returned by ?sptrf.

Output Parameters

ap Overwritten by the matrix inv(A) in packed form.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

657
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If info = -i, parameter i had an illegal value.

If info = i, the i-th diagonal element of D is zero, D is singular, and the inversion could not be completed.

Application Notes
The computed inverse X satisfies the following error bounds:

|DUTPTXP*U - I| ≤c(n)ε(|D||UT|PT|X|P|U| + |D||D-1|)

for uplo = 'U', and

|DLTPTXP*L - I| ≤c(n)ε(|D||LT|PT|X|P|L| + |D||D-1|)

See Also
Matrix Storage Schemes

?hptri
Computes the inverse of a complex Hermitian matrix
using U*D*UH or L*D*LH Bunch-Kaufman factorization
of matrix in packed storage.

Syntax
lapack_int LAPACKE_chptri (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * ap , const lapack_int * ipiv );
lapack_int LAPACKE_zhptri (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * ap , const lapack_int * ipiv );

Include Files
• mkl.h

Description

The routine computes the inverse inv(A) of a complex Hermitian matrix A using packed storage. Before
calling this routine, call ?hptrf to factorize A.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', the array ap stores the packed Bunch-Kaufman
factorization A = U*D*UH.

If uplo = 'L', the array ap stores the packed Bunch-Kaufman

factorization A = L*D*LH.

n The order of the matrix A; n≥ 0.

658
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ap Array ap (size max(1,n(n+1)/2)) contains the factorization of the
matrix A, as returned by ?hptrf.

ipiv Array, size at least max(1, n).

The ipiv array, as returned by ?hptrf.

Output Parameters

ap Overwritten by the matrix inv(A).

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, the i-th diagonal element of D is zero, D is singular, and the inversion could not be completed.

Application Notes
The computed inverse X satisfies the following error bounds:

|DUHPTXP*U - I| ≤c(n)ε(|D||UH|PT|X|P|U| + |D||D-1|)

for uplo = 'U', and

|DLHPTXPL - I| ≤c(n)ε(|D||LH|PT|X|P|L| + |D||D-1|)

for uplo = 'L'. Here c(n) is a modest linear function of n, and ε is the machine precision; I denotes the
identity matrix.
The total number of floating-point operations is approximately (8/3)n3.

The real counterpart of this routine is ?sptri.

See Also
Matrix Storage Schemes

?trtri
Computes the inverse of a triangular matrix.

Syntax
lapack_int LAPACKE_strtri (int matrix_layout , char uplo , char diag , lapack_int n ,
float * a , lapack_int lda );
lapack_int LAPACKE_dtrtri (int matrix_layout , char uplo , char diag , lapack_int n ,
double * a , lapack_int lda );
lapack_int LAPACKE_ctrtri (int matrix_layout , char uplo , char diag , lapack_int n ,
lapack_complex_float * a , lapack_int lda );
lapack_int LAPACKE_ztrtri (int matrix_layout , char uplo , char diag , lapack_int n ,
lapack_complex_double * a , lapack_int lda );

Include Files
• mkl.h

659
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Description

The routine computes the inverse inv(A) of a triangular matrix A.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether A is upper or lower triangular:

If uplo = 'U', then A is upper triangular.

If uplo = 'L', then A is lower triangular.

diag Must be 'N' or 'U'.

If diag = 'N', then A is not a unit triangular matrix.

If diag = 'U', A is unit triangular: diagonal elements of A are

assumed to be 1 and not referenced in the array a.

n The order of the matrix A; n≥ 0.

a Array: . Contains the matrix A.

lda The first dimension of a; lda≥ max(1, n).

Output Parameters

a Overwritten by the matrix inv(A).

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, the i-th diagonal element of A is zero, A is singular, and the inversion could not be completed.

Application Notes
The computed inverse X satisfies the following error bounds:

|XA - I| ≤c(n)ε |X||A|

|XA - I| ≤c(n)ε |A-1||A||X|,

where c(n) is a modest linear function of n; ε is the machine precision; I denotes the identity matrix.

The total number of floating-point operations is approximately (1/3)n3 for real flavors and (4/3)n3 for
complex flavors.

See Also
Matrix Storage Schemes

660
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
?tftri
Computes the inverse of a triangular matrix stored in
the Rectangular Full Packed (RFP) format.

Syntax
lapack_int LAPACKE_stftri (int matrix_layout , char transr , char uplo , char diag ,
lapack_int n , float * a );
lapack_int LAPACKE_dtftri (int matrix_layout , char transr , char uplo , char diag ,
lapack_int n , double * a );
lapack_int LAPACKE_ctftri (int matrix_layout , char transr , char uplo , char diag ,
lapack_int n , lapack_complex_float * a );
lapack_int LAPACKE_ztftri (int matrix_layout , char transr , char uplo , char diag ,
lapack_int n , lapack_complex_double * a );

Include Files
• mkl.h

Description

Computes the inverse of a triangular matrix A stored in the Rectangular Full Packed (RFP) format. For the
description of the RFP format, see Matrix Storage Schemes.
This is the block version of the algorithm, calling Level 3 BLAS.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

transr Must be 'N', 'T' (for real data) or 'C' (for complex data).

If transr = 'N', the Normal transr of RFP A is stored.

If transr = 'T', the Transpose transr of RFP A is stored.

If transr = 'C', the Conjugate-Transpose transr of RFP A is stored.

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of RFP A is

stored:
If uplo = 'U', the array a stores the upper triangular part of the
matrix A.
If uplo = 'L', the array a stores the lower triangular part of the
matrix A.

diag Must be 'N' or 'U'.

If diag = 'N', then A is not a unit triangular matrix.

If diag = 'U', A is unit triangular: diagonal elements of A are

assumed to be 1 and not referenced in the array a.

n The order of the matrix A; n≥ 0.

661
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

a Array, size max(1, n*(n + 1)/2). The array a contains the matrix A in
the RFP format.

Output Parameters

a The (triangular) inverse of the original matrix in the same storage

format.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, Ai, i is exactly zero. The triangular matrix is singular and its inverse cannot be computed.

See Also
Matrix Storage Schemes

?tptri
Computes the inverse of a triangular matrix using
packed storage.

Syntax
lapack_int LAPACKE_stptri (int matrix_layout , char uplo , char diag , lapack_int n ,
float * ap );
lapack_int LAPACKE_dtptri (int matrix_layout , char uplo , char diag , lapack_int n ,
double * ap );
lapack_int LAPACKE_ctptri (int matrix_layout , char uplo , char diag , lapack_int n ,
lapack_complex_float * ap );
lapack_int LAPACKE_ztptri (int matrix_layout , char uplo , char diag , lapack_int n ,
lapack_complex_double * ap );

Include Files
• mkl.h

Description

The routine computes the inverse inv(A) of a packed triangular matrix A.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether A is upper or lower triangular:

If uplo = 'U', then A is upper triangular.

If uplo = 'L', then A is lower triangular.

662
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
diag Must be 'N' or 'U'.

If diag = 'N', then A is not a unit triangular matrix.

If diag = 'U', A is unit triangular: diagonal elements of A are

assumed to be 1 and not referenced in the array ap.

n The order of the matrix A; n≥ 0.

ap Array, size at least max(1,n(n+1)/2).

Contains the packed triangular matrix A.

Output Parameters

ap Overwritten by the packed n-by-n matrix inv(A) .

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, the i-th diagonal element of A is zero, A is singular, and the inversion could not be completed.

Application Notes
The computed inverse X satisfies the following error bounds:

|XA - I| ≤c(n)ε |X||A|

|X - A-1| ≤c(n)ε |A-1||A||X|,

where c(n) is a modest linear function of n; ε is the machine precision; I denotes the identity matrix.

The total number of floating-point operations is approximately (1/3)n3 for real flavors and (4/3)n3 for
complex flavors.

See Also
Matrix Storage Schemes

Matrix Equilibration: LAPACK Computational Routines

Routines described in this section are used to compute scaling factors needed to equilibrate a matrix. Note
that these routines do not actually scale the matrices.

?geequ
Computes row and column scaling factors intended to
equilibrate a general matrix and reduce its condition
number.

Syntax
lapack_int LAPACKE_sgeequ( int matrix_layout, lapack_int m, lapack_int n, const float*
a, lapack_int lda, float* r, float* c, float* rowcnd, float* colcnd, float* amax );
lapack_int LAPACKE_dgeequ( int matrix_layout, lapack_int m, lapack_int n, const double*
a, lapack_int lda, double* r, double* c, double* rowcnd, double* colcnd, double* amax );

663
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lapack_int LAPACKE_cgeequ( int matrix_layout, lapack_int m, lapack_int n, const

lapack_complex_float* a, lapack_int lda, float* r, float* c, float* rowcnd, float*
colcnd, float* amax );
lapack_int LAPACKE_zgeequ( int matrix_layout, lapack_int m, lapack_int n, const
lapack_complex_double* a, lapack_int lda, double* r, double* c, double* rowcnd, double*
colcnd, double* amax );

Include Files
• mkl.h

Description

The routine computes row and column scalings intended to equilibrate an m-by-n matrix A and reduce its
condition number. The output array r returns the row scale factors and the array c the column scale factors.
These factors are chosen to try to make the largest element in each row and column of the matrix B with
elements bij=r[i-1]*aij*c[j-1] have absolute value 1.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

m The number of rows of the matrix A; m≥ 0.

n The number of columns of the matrix A; n≥ 0.

a Array: size max(1, lda*n) for column major layout and max(1,
lda*m) for row major layout.
Contains the m-by-n matrix A whose equilibration factors are to be
computed.

lda The leading dimension of a; lda≥ max(1, m).

Output Parameters

r, c Arrays: r (size m), c (size n).

If info = 0, or info>m, the array r contains the row scale factors of
the matrix A.
If info = 0, the array c contains the column scale factors of the
matrix A.

rowcnd If info = 0 or info>m, rowcnd contains the ratio of the smallest

r[i] to the largest r[i].

colcnd If info = 0, colcnd contains the ratio of the smallest c[i] to the
largest c[i].

amax Absolute value of the largest element of the matrix A.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

664
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = -i, parameter i had an illegal value.

If info = i, i > 0, and

i≤m, the i-th row of A is exactly zero;

i>m, the (i-m)th column of A is exactly zero.

Application Notes
All the components of r and c are restricted to be between SMLNUM = smallest safe number and BIGNUM=
largest safe number. Use of these scaling factors is not guaranteed to reduce the condition number of A but
works well in practice.
SMLNUM and BIGNUM are parameters representing machine precision. You can use the ?lamch routines to
compute them. For example, compute single precision values of SMLNUM and BIGNUM as follows:

SMLNUM = slamch ('s')

BIGNUM = 1 / SMLNUM
If rowcnd≥ 0.1 and amax is neither too large nor too small, it is not worth scaling by r.

If colcnd≥ 0.1, it is not worth scaling by c.

If amax is very close to SMLNUM or very close to BIGNUM, the matrix A should be scaled.

See Also
Error Analysis
Matrix Storage Schemes

?geequb
Computes row and column scaling factors restricted to
a power of radix to equilibrate a general matrix and
reduce its condition number.

Syntax
lapack_int LAPACKE_sgeequb( int matrix_layout, lapack_int m, lapack_int n, const float*
a, lapack_int lda, float* r, float* c, float* rowcnd, float* colcnd, float* amax );
lapack_int LAPACKE_dgeequb( int matrix_layout, lapack_int m, lapack_int n, const
double* a, lapack_int lda, double* r, double* c, double* rowcnd, double* colcnd, double*
amax );
lapack_int LAPACKE_cgeequb( int matrix_layout, lapack_int m, lapack_int n, const
lapack_complex_float* a, lapack_int lda, float* r, float* c, float* rowcnd, float*
colcnd, float* amax );
lapack_int LAPACKE_zgeequb( int matrix_layout, lapack_int m, lapack_int n, const
lapack_complex_double* a, lapack_int lda, double* r, double* c, double* rowcnd, double*
colcnd, double* amax );

Include Files
• mkl.h

Description

665
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The routine computes row and column scalings intended to equilibrate an m-by-n general matrix A and
reduce its condition number. The output array r returns the row scale factors and the array c - the column
scale factors. These factors are chosen to try to make the largest element in each row and column of the
matrix B with elements bi,j = r[i-1]*ai,j*c[j-1] have an absolute value of at most the radix.

r[i-1] and c[j-1] are restricted to be a power of the radix between SMLNUM = smallest safe number and
BIGNUM = largest safe number. Use of these scaling factors is not guaranteed to reduce the condition number
of a but works well in practice.
SMLNUM and BIGNUM are parameters representing machine precision. You can use the ?lamch routines to
compute them. For example, compute single precision values of SMLNUM and BIGNUM as follows:

SMLNUM = slamch ('s')

BIGNUM = 1 / SMLNUM
This routine differs from ?geequ by restricting the scaling factors to a power of the radix. Except for over-
and underflow, scaling by these factors introduces no additional rounding errors. However, the scaled entries'
magnitudes are no longer equal to approximately 1 but lie between sqrt(radix) and 1/sqrt(radix).

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

m The number of rows of the matrix A; m≥ 0.

n The number of columns of the matrix A; n≥ 0.

a Array: size max(1, lda*n) for column major layout and max(1,
lda*m) for row major layout.
Contains the m-by-n matrix A whose equilibration factors are to be
computed.

lda The leading dimension of a; lda≥ max(1, m).

Output Parameters

r, c Arrays: r(m), c(n).

If info = 0, or info>m, the array r contains the row scale factors for
the matrix A.
If info = 0, the array c contains the column scale factors for the
matrix A.

rowcnd If info = 0 or info>m, rowcnd contains the ratio of the smallest

r[i] to the largest r[i]. If rowcnd≥ 0.1, and amax is neither too
large nor too small, it is not worth scaling by r.

colcnd If info = 0, colcnd contains the ratio of the smallest c[i] to the
largest c[i]. If colcnd≥ 0.1, it is not worth scaling by c.

amax Absolute value of the largest element of the matrix A. If amax is very
close to SMLNUM or very close to BIGNUM, the matrix should be scaled.

Return Values
This function returns a value info.

666
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, i > 0, and

i≤m, the i-th row of A is exactly zero;

i>m, the (i-m)-th column of A is exactly zero.

See Also
Error Analysis
Matrix Storage Schemes

?gbequ
Computes row and column scaling factors intended to
equilibrate a banded matrix and reduce its condition
number.

Syntax
lapack_int LAPACKE_sgbequ( int matrix_layout, lapack_int m, lapack_int n, lapack_int
kl, lapack_int ku, const float* ab, lapack_int ldab, float* r, float* c, float* rowcnd,
float* colcnd, float* amax );
lapack_int LAPACKE_dgbequ( int matrix_layout, lapack_int m, lapack_int n, lapack_int
kl, lapack_int ku, const double* ab, lapack_int ldab, double* r, double* c, double*
rowcnd, double* colcnd, double* amax );
lapack_int LAPACKE_cgbequ( int matrix_layout, lapack_int m, lapack_int n, lapack_int
kl, lapack_int ku, const lapack_complex_float* ab, lapack_int ldab, float* r, float* c,
float* rowcnd, float* colcnd, float* amax );
lapack_int LAPACKE_zgbequ( int matrix_layout, lapack_int m, lapack_int n, lapack_int
kl, lapack_int ku, const lapack_complex_double* ab, lapack_int ldab, double* r, double*
c, double* rowcnd, double* colcnd, double* amax );

Include Files
• mkl.h

Description

The routine computes row and column scalings intended to equilibrate an m-by-n band matrix A and reduce
its condition number. The output array r returns the row scale factors and the array c the column scale
factors. These factors are chosen to try to make the largest element in each row and column of the matrix B
with elements bij=r[i - 1]*aij*c[j - 1] have absolute value 1.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

m The number of rows of the matrix A; m≥ 0.

n The number of columns of the matrix A; n≥ 0.

kl The number of subdiagonals within the band of A; kl≥ 0.

667
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ku The number of superdiagonals within the band of A; ku≥ 0.

ab Array, size max(1, ldab*n) for column major layout and max(1,
ldab*m) for row major layout. Contains the original band matrix A.

ldab The leading dimension of ab; ldab≥kl+ku+1.

Output Parameters

r, c Arrays: r (size m), c (size n).

If info = 0, or info>m, the array r contains the row scale factors of
the matrix A.
If info = 0, the array c contains the column scale factors of the
matrix A.

rowcnd If info = 0 or info>m, rowcnd contains the ratio of the smallest

r[i] to the largest r[i].

colcnd If info = 0, colcnd contains the ratio of the smallest c[i] to the
largest c[i].

amax Absolute value of the largest element of the matrix A.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i and

i≤m, the i-th row of A is exactly zero;

i>m, the (i-m)th column of A is exactly zero.

SMLNUM = slamch ('s')

BIGNUM = 1 / SMLNUM
If rowcnd≥ 0.1 and amax is neither too large nor too small, it is not worth scaling by r.

If colcnd≥ 0.1, it is not worth scaling by c.

If amax is very close to SMLNUM or very close to BIGNUM, the matrix A should be scaled.

See Also
Error Analysis
Matrix Storage Schemes

668
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
?gbequb
Computes row and column scaling factors restricted to
a power of radix to equilibrate a banded matrix and
reduce its condition number.

Syntax
lapack_int LAPACKE_sgbequb( int matrix_layout, lapack_int m, lapack_int n, lapack_int
kl, lapack_int ku, const float* ab, lapack_int ldab, float* r, float* c, float* rowcnd,
float* colcnd, float* amax );
lapack_int LAPACKE_dgbequb( int matrix_layout, lapack_int m, lapack_int n, lapack_int
kl, lapack_int ku, const double* ab, lapack_int ldab, double* r, double* c, double*
rowcnd, double* colcnd, double* amax );
lapack_int LAPACKE_cgbequb( int matrix_layout, lapack_int m, lapack_int n, lapack_int
kl, lapack_int ku, const lapack_complex_float* ab, lapack_int ldab, float* r, float* c,
float* rowcnd, float* colcnd, float* amax );
lapack_int LAPACKE_zgbequb( int matrix_layout, lapack_int m, lapack_int n, lapack_int
kl, lapack_int ku, const lapack_complex_double* ab, lapack_int ldab, double* r, double*
c, double* rowcnd, double* colcnd, double* amax );

Include Files
• mkl.h

Description

The routine computes row and column scalings intended to equilibrate an m-by-n banded matrix A and
reduce its condition number. The output array r returns the row scale factors and the array c - the column
scale factors. These factors are chosen to try to make the largest element in each row and column of the
matrix B with elements bi, j=r[i-1]*ai, j*c[j-1] have an absolute value of at most the radix.

r[i] and c[j] are restricted to be a power of the radix between SMLNUM = smallest safe number and
BIGNUM = largest safe number. Use of these scaling factors is not guaranteed to reduce the condition
number of a but works well in practice.
SMLNUM and BIGNUM are parameters representing machine precision. You can use the ?lamch routines to
compute them. For example, compute single precision values of SMLNUM and BIGNUM as follows:

SMLNUM = slamch ('s')

BIGNUM = 1 / SMLNUM
This routine differs from ?gbequ by restricting the scaling factors to a power of the radix. Except for over-
and underflow, scaling by these factors introduces no additional rounding errors. However, the scaled entries'
magnitudes are no longer equal to approximately 1 but lie between sqrt(radix) and 1/sqrt(radix).

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

m The number of rows of the matrix A; m≥ 0.

n The number of columns of the matrix A; n≥ 0.

kl The number of subdiagonals within the band of A; kl≥ 0.

669
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ku The number of superdiagonals within the band of A; ku≥ 0.

ab Array: size max(1, ldab*n) for column major layout and max(1,
ldab*m) for row major layout

ldab The leading dimension of a; ldab≥ max(1, m).

Output Parameters

r, c Arrays: r (size m), c (size n).

If info = 0, or info>m, the array r contains the row scale factors for
the matrix A.
If info = 0, the array c contains the column scale factors for the
matrix A.

rowcnd If info = 0 or info>m, rowcnd contains the ratio of the smallest

r(i) to the largest r(i). If rowcnd≥ 0.1, and amax is neither too
large nor too small, it is not worth scaling by r.

colcnd If info = 0, colcnd contains the ratio of the smallest c[i] to the
largest c[i]. If colcnd≥ 0.1, it is not worth scaling by c.

amax Absolute value of the largest element of the matrix A. If amax is very
close to SMLNUM or BIGNUM, the matrix should be scaled.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, the i-th diagonal element of A is nonpositive.

i≤m, the i-th row of A is exactly zero;

i>m, the (i-m)-th column of A is exactly zero.

See Also
Error Analysis
Matrix Storage Schemes

?poequ
Computes row and column scaling factors intended to
equilibrate a symmetric (Hermitian) positive definite
matrix and reduce its condition number.

Syntax
lapack_int LAPACKE_spoequ( int matrix_layout, lapack_int n, const float* a, lapack_int
lda, float* s, float* scond, float* amax );
lapack_int LAPACKE_dpoequ( int matrix_layout, lapack_int n, const double* a, lapack_int
lda, double* s, double* scond, double* amax );
lapack_int LAPACKE_cpoequ( int matrix_layout, lapack_int n, const lapack_complex_float*
a, lapack_int lda, float* s, float* scond, float* amax );

670
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_zpoequ( int matrix_layout, lapack_int n, const
lapack_complex_double* a, lapack_int lda, double* s, double* scond, double* amax );

Include Files
• mkl.h

Description

These factors are chosen so that the scaled matrix B with elements Bi,j=s[i-1]*Ai,j*s[j-1] has diagonal
elements equal to 1.
This choice of s puts the condition number of B within a factor n of the smallest possible condition number
over all possible diagonal scalings.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

n The order of the matrix A; n≥ 0.

a Array: size max(1, lda*n) .

Contains the n-by-n symmetric or Hermitian positive definite matrix A

whose scaling factors are to be computed. Only the diagonal elements
of A are referenced.

lda The leading dimension of a; lda≥ max(1,n).

Output Parameters

s Array, size n.
If info = 0, the array s contains the scale factors for A.

scond If info = 0, scond contains the ratio of the smallest s[i] to the
largest s[i].

amax Absolute value of the largest element of the matrix A.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, the i-th diagonal element of A is nonpositive.

671
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Application Notes
If scond≥ 0.1 and amax is neither too large nor too small, it is not worth scaling by s.

If amax is very close to SMLNUM or very close to BIGNUM, the matrix A should be scaled.

See Also
Error Analysis
Matrix Storage Schemes

?poequb
Computes row and column scaling factors intended to
equilibrate a symmetric (Hermitian) positive definite
matrix and reduce its condition number.

Syntax
lapack_int LAPACKE_spoequb( int matrix_layout, lapack_int n, const float* a, lapack_int
lda, float* s, float* scond, float* amax );
lapack_int LAPACKE_dpoequb( int matrix_layout, lapack_int n, const double* a,
lapack_int lda, double* s, double* scond, double* amax );
lapack_int LAPACKE_cpoequb( int matrix_layout, lapack_int n, const
lapack_complex_float* a, lapack_int lda, float* s, float* scond, float* amax );
lapack_int LAPACKE_zpoequb( int matrix_layout, lapack_int n, const
lapack_complex_double* a, lapack_int lda, double* s, double* scond, double* amax );

Include Files
• mkl.h

Description

The routine computes row and column scalings intended to equilibrate a symmetric (Hermitian) positive-
definite matrix A and reduce its condition number (with respect to the two-norm).
These factors are chosen so that the scaled matrix B with elements Bi,j=s[i-1]*Ai,j*s[j-1] has diagonal
elements equal to 1. s[i - 1] is a power of two nearest to, but not exceeding 1/sqrt(Ai,i).

This choice of s puts the condition number of B within a factor n of the smallest possible condition number
over all possible diagonal scalings.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

n The order of the matrix A; n≥ 0.

a Array: size max(1, lda*n) .

Contains the n-by-n symmetric or Hermitian positive definite matrix A

whose scaling factors are to be computed. Only the diagonal elements
of A are referenced.

lda The leading dimension of a; lda≥ max(1, m).

672
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

s Array, size (n).

If info = 0, the array s contains the scale factors for A.

scond If info = 0, scond contains the ratio of the smallest s[i] to the
largest s[i]. If scond≥ 0.1, and amax is neither too large nor too
small, it is not worth scaling by s.

amax Absolute value of the largest element of the matrix A. If amax is very
close to SMLNUM or BIGNUM, the matrix should be scaled.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, the i-th diagonal element of A is nonpositive.

See Also
Error Analysis
Matrix Storage Schemes

?ppequ
Computes row and column scaling factors intended to
equilibrate a symmetric (Hermitian) positive definite
matrix in packed storage and reduce its condition
number.

Syntax
lapack_int LAPACKE_sppequ( int matrix_layout, char uplo, lapack_int n, const float* ap,
float* s, float* scond, float* amax );
lapack_int LAPACKE_dppequ( int matrix_layout, char uplo, lapack_int n, const double*
ap, double* s, double* scond, double* amax );
lapack_int LAPACKE_cppequ( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float* ap, float* s, float* scond, float* amax );
lapack_int LAPACKE_zppequ( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double* ap, double* s, double* scond, double* amax );

Include Files
• mkl.h

Description

The routine computes row and column scalings intended to equilibrate a symmetric (Hermitian) positive
definite matrix A in packed storage and reduce its condition number (with respect to the two-norm). The
output array s returns scale factors such that contains

673
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

These factors are chosen so that the scaled matrix B with elements bij=s[i-1]*aij*s[j-1] has diagonal
elements equal to 1.
This choice of s puts the condition number of B within a factor n of the smallest possible condition number
over all possible diagonal scalings.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is packed in

the array ap:
If uplo = 'U', the array ap stores the upper triangular part of the
matrix A.
If uplo = 'L', the array ap stores the lower triangular part of the
matrix A.

n The order of matrix A; n≥ 0.

ap Array, size at least max(1,n(n+1)/2). The array ap contains the

upper or the lower triangular part of the matrix A (as specified by
uplo) in packed storage (see Matrix Storage Schemes).

Output Parameters

s Array, size (n).

If info = 0, the array s contains the scale factors for A.

scond If info = 0, scond contains the ratio of the smallest s[i] to the
largest s[i].

amax Absolute value of the largest element of the matrix A.

674
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, the i-th diagonal element of A is nonpositive.

Application Notes
If scond≥ 0.1 and amax is neither too large nor too small, it is not worth scaling by s.

If amax is very close to SMLNUM or very close to BIGNUM, the matrix A should be scaled.

See Also
Error Analysis
Matrix Storage Schemes

?pbequ
Computes row and column scaling factors intended to
equilibrate a symmetric (Hermitian) positive-definite
band matrix and reduce its condition number.

Syntax
lapack_int LAPACKE_spbequ( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
const float* ab, lapack_int ldab, float* s, float* scond, float* amax );
lapack_int LAPACKE_dpbequ( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
const double* ab, lapack_int ldab, double* s, double* scond, double* amax );
lapack_int LAPACKE_cpbequ( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
const lapack_complex_float* ab, lapack_int ldab, float* s, float* scond, float* amax );
lapack_int LAPACKE_zpbequ( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
const lapack_complex_double* ab, lapack_int ldab, double* s, double* scond, double*
amax );

Include Files
• mkl.h

Description

The routine computes row and column scalings intended to equilibrate a symmetric (Hermitian) positive
definite band matrix A and reduce its condition number (with respect to the two-norm). The output array s
returns scale factors such that contains

These factors are chosen so that the scaled matrix B with elements bij=s[i-1]*aij*s[j-1] has diagonal
elements equal to 1. This choice of s puts the condition number of B within a factor n of the smallest possible
condition number over all possible diagonal scalings.

675
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored in

the array ab:
If uplo = 'U', the array ab stores the upper triangular part of the
matrix A.
If uplo = 'L', the array ab stores the lower triangular part of the
matrix A.

n The order of matrix A; n≥ 0.

kd The number of superdiagonals or subdiagonals in the matrix A; kd≥ 0.

ab Array, size max(1, ldab*n) .

The array ap contains either the upper or the lower triangular part of
the matrix A (as specified by uplo) in band storage (see Matrix
Storage Schemes).

ldab The leading dimension of the array ab; ldab≥kd +1.

Output Parameters

s Array, size (n).

If info = 0, the array s contains the scale factors for A.

scond If info = 0, scond contains the ratio of the smallest s[i] to the
largest s[i].

amax Absolute value of the largest element of the matrix A.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, the i-th diagonal element of A is nonpositive.

Application Notes
If scond≥ 0.1 and amax is neither too large nor too small, it is not worth scaling by s.

If amax is very close to SMLNUM or very close to BIGNUM, the matrix A should be scaled.

See Also
Error Analysis
Matrix Storage Schemes

676
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
?syequb
Computes row and column scaling factors intended to
equilibrate a symmetric indefinite matrix and reduce
its condition number.

Syntax
lapack_int LAPACKE_ssyequb( int matrix_layout, char uplo, lapack_int n, const float* a,
lapack_int lda, float* s, float* scond, float* amax );
lapack_int LAPACKE_dsyequb( int matrix_layout, char uplo, lapack_int n, const double*
a, lapack_int lda, double* s, double* scond, double* amax );
lapack_int LAPACKE_csyequb( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float* a, lapack_int lda, float* s, float* scond, float* amax );
lapack_int LAPACKE_zsyequb( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double* a, lapack_int lda, double* s, double* scond, double* amax );

Include Files
• mkl.h

Description

The routine computes row and column scalings intended to equilibrate a symmetric indefinite matrix A and
reduce its condition number (with respect to the two-norm).
The array s contains the scale factors, s[i-1] = 1/sqrt(A(i,i)). These factors are chosen so that the
scaled matrix B with elements bi,j=s[i-1]*ai, j*s[j-1] has ones on the diagonal.

This choice of s puts the condition number of B within a factor n of the smallest possible condition number
over all possible diagonal scalings.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored:

If uplo = 'U', the array a stores the upper triangular part of the
matrix A.
If uplo = 'L', the array a stores the lower triangular part of the
matrix A.

n The order of the matrix A; n≥ 0.

a Array a: max(1, lda*n) .

Contains the n-by-n symmetric indefinite matrix A whose scaling

factors are to be computed. Only the diagonal elements of A are
referenced.

lda The leading dimension of a; lda≥ max(1, m).

677
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

s Array, size (n).

If info = 0, the array s contains the scale factors for A.

scond If info = 0, scond contains the ratio of the smallest s[i] to the
largest s[i]. If scond≥ 0.1, and amax is neither too large nor too
small, it is not worth scaling by s.

amax Absolute value of the largest element of the matrix A. If amax is very
close to SMLNUM or BIGNUM, the matrix should be scaled.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, the i-th diagonal element of A is nonpositive.

See Also
Error Analysis
Matrix Storage Schemes

?heequb
Computes row and column scaling factors intended to
equilibrate a Hermitian indefinite matrix and reduce its
condition number.

Syntax
lapack_int LAPACKE_cheequb( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float* a, lapack_int lda, float* s, float* scond, float* amax );
lapack_int LAPACKE_zheequb( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double* a, lapack_int lda, double* s, double* scond, double* amax );

Include Files
• mkl.h

Description

The routine computes row and column scalings intended to equilibrate a Hermitian indefinite matrix A and
reduce its condition number (with respect to the two-norm).
The array s contains the scale factors, s[i-1] = 1/sqrt(ai,i). These factors are chosen so that the scaled
matrix B with elements bi,j=s[i-1]*ai,j*s[j-1] has ones on the diagonal.

This choice of s puts the condition number of B within a factor n of the smallest possible condition number
over all possible diagonal scalings.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

678
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored:

If uplo = 'U', the array a stores the upper triangular part of the
matrix A.
If uplo = 'L', the array a stores the lower triangular part of the
matrix A.

n The order of the matrix A; n≥ 0.

a Array a: size max(1, lda*n) .

Contains the n-by-n symmetric indefinite matrix A whose scaling

factors are to be computed. Only the diagonal elements of A are
referenced.

lda The leading dimension of a; lda≥ max(1, m).

Output Parameters

s Array, size (n).

If info = 0, the array s contains the scale factors for A.

scond If info = 0, scond contains the ratio of the smallest s[i] to the
largest s[i]. If scond≥ 0.1, and amax is neither too large nor too
small, it is not worth scaling by s.

amax Absolute value of the largest element of the matrix A. If amax is very
close to SMLNUM or BIGNUM, the matrix should be scaled.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, the i-th diagonal element of A is nonpositive.

See Also
Error Analysis
Matrix Storage Schemes

LAPACK Linear Equation Driver Routines

Table "Driver Routines for Solving Systems of Linear Equations" lists the LAPACK driver routines for solving
systems of linear equations with real or complex matrices.
Driver Routines for Solving Systems of Linear Equations
Matrix type, storage Simple Driver Expert Driver Expert Driver using
scheme Extra-Precise
Interative Refinement

general ?gesv ?gesvx ?gesvxx

general band ?gbsv ?gbsvx ?gbsvxx

679
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Matrix type, storage Simple Driver Expert Driver Expert Driver using
scheme Extra-Precise
Interative Refinement

general tridiagonal ?gtsv ?gtsvx

diagonally dominant ?dtsvb

tridiagonal

symmetric/Hermitian ?posv ?posvx ?posvxx

positive-definite

symmetric/Hermitian ?ppsv ?ppsvx

positive-definite,
storage

symmetric/Hermitian ?pbsv ?pbsvx

positive-definite, band

symmetric/Hermitian ?ptsv ?ptsvx

positive-definite,
tridiagonal

symmetric/Hermitian ?sysv/?hesv ?sysvx/?hesvx ?sysvxx/?hesvxx

indefinite
?sysv_rook/?sysv_rk/
?hesv_rk
?sysv_aa/?hesv_aa

symmetric/Hermitian ?spsv/?hpsv ?spsvx/?hpsvx

indefinite, packed
storage

complex symmetric ?sysv ?sysvx

?sysv_rook

complex symmetric, ?spsv ?spsvx

packed storage

In this table ? stands for s (single precision real), d (double precision real), c (single precision complex), or z
(double precision complex). In the description of ?gesv and ?posv routines, the ? sign stands for combined
character codes ds and zc for the mixed precision subroutines.

?gesv
Computes the solution to the system of linear
equations with a square coefficient matrix A and
multiple right-hand sides.

Syntax
lapack_int LAPACKE_sgesv (int matrix_layout , lapack_int n , lapack_int nrhs , float *
a , lapack_int lda , lapack_int * ipiv , float * b , lapack_int ldb );
lapack_int LAPACKE_dgesv (int matrix_layout , lapack_int n , lapack_int nrhs , double *
a , lapack_int lda , lapack_int * ipiv , double * b , lapack_int ldb );
lapack_int LAPACKE_cgesv (int matrix_layout , lapack_int n , lapack_int nrhs ,
lapack_complex_float * a , lapack_int lda , lapack_int * ipiv , lapack_complex_float *
b , lapack_int ldb );

680
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_zgesv (int matrix_layout , lapack_int n , lapack_int nrhs ,
lapack_complex_double * a , lapack_int lda , lapack_int * ipiv , lapack_complex_double
* b , lapack_int ldb );
lapack_int LAPACKE_dsgesv (int matrix_layout, lapack_int n, lapack_int nrhs, double *
a, lapack_int lda, lapack_int * ipiv, double * b, lapack_int ldb, double * x, lapack_int
ldx, lapack_int * iter);
lapack_int LAPACKE_zcgesv (int matrix_layout, lapack_int n, lapack_int nrhs,
lapack_complex_double * a, lapack_int lda, lapack_int * ipiv, lapack_complex_double *
b, lapack_int ldb, lapack_complex_double * x, lapack_int ldx, lapack_int * iter);

Include Files
• mkl.h

Description

The routine solves for X the system of linear equations A*X = B, where A is an n-by-n matrix, the columns
of matrix B are individual right-hand sides, and the columns of X are the corresponding solutions.
The LU decomposition with partial pivoting and row interchanges is used to factor A as A = P*L*U, where P
is a permutation matrix, L is unit lower triangular, and U is upper triangular. The factored form of A is then
used to solve the system of equations A*X = B.

The dsgesv and zcgesv are mixed precision iterative refinement subroutines for exploiting fast single
precision hardware. They first attempt to factorize the matrix in single precision (dsgesv) or single complex
precision (zcgesv) and use this factorization within an iterative refinement procedure to produce a solution
with double precision (dsgesv) / double complex precision (zcgesv) normwise backward error quality (see
below). If the approach fails, the method switches to a double precision or double complex precision
factorization respectively and computes the solution.
The iterative refinement is not going to be a winning strategy if the ratio single precision performance over
double precision performance is too small. A reasonable strategy should take the number of right-hand sides
and the size of the matrix into account. This might be done with a call to ilaenv in the future. At present,
iterative refinement is implemented.
The iterative refinement process is stopped if

iter > itermax

or for all the right-hand sides:

rnmr < sqrt(n)xnrmanrmepsbwdmax

where
• iter is the number of the current iteration in the iterativerefinement process
• rnmr is the infinity-norm of the residual
• xnrm is the infinity-norm of the solution
• anrm is the infinity-operator-norm of the matrix A
• eps is the machine epsilon returned by dlamch (‘Epsilon’).
The values itermax and bwdmax are fixed to 30 and 1.0d+00 respectively.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

681
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

n The number of linear equations, that is, the order of the matrix A; n≥
0.

nrhs The number of right-hand sides, that is, the number of columns of the
matrix B; nrhs≥ 0.

a The array a(size max(1, lda*n)) contains the n-by-n coefficient

matrix A.

b The array bof size max(1, ldb*nrhs) for column major layout and
max(1, ldb*n) for row major layout contains the n-by-nrhs matrix of
right hand side matrix B.

lda The leading dimension of the array a; lda≥ max(1, n).

ldb The leading dimension of the array b; ldb≥ max(1, n) for column
major layout and ldb≥nrhs for row major layout.

ldx The leading dimension of the array x; ldx≥ max(1, n) for column
major layout and ldx≥nrhs for row major layout.

Output Parameters

a Overwritten by the factors L and U from the factorization of A =

P*L*U; the unit diagonal elements of L are not stored.
If iterative refinement has been successfully used (info= 0 and
iter≥ 0), then A is unchanged.
If double precision factorization has been used (info= 0 and iter <
0), then the array A contains the factors L and U from the
factorization A = P*L*U; the unit diagonal elements of L are not
stored.

b Overwritten by the solution matrix X for dgesv, sgesv,zgesv,zgesv.

Unchanged for dsgesv and zcgesv.

ipiv Array, size at least max(1, n). The pivot indices that define the
permutation matrix P; row i of the matrix was interchanged with row
ipiv[i-1]. Corresponds to the single precision factorization (if
info= 0 and iter≥ 0) or the double precision factorization (if info=
0 and iter < 0).

x Array, size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout. If info = 0, contains the n-by-nrhs
solution matrix X.

iter If iter < 0: iterative refinement has failed, double precision

factorization has been performed

• If iter = -1: the routine fell back to full precision for

implementation- or machine-specific reason
• If iter = -2: narrowing the precision induced an overflow, the
routine fell back to full precision
• If iter = -3: failure of sgetrf for dsgesv, or cgetrf for zcgesv
• If iter = -31: stop the iterative refinement after the 30th
iteration.

682
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If iter > 0: iterative refinement has been successfully used. Returns
the number of iterations.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, Ui, i (computed in double precision for mixed precision subroutines) is exactly zero. The
factorization has been completed, but the factor U is exactly singular, so the solution could not be computed.

See Also
dlamch
sgetrf
Matrix Storage Schemes

?gesvx
Computes the solution to the system of linear
equations with a square coefficient matrix A and
multiple right-hand sides, and provides error bounds
on the solution.

Syntax
lapack_int LAPACKE_sgesvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int nrhs, float* a, lapack_int lda, float* af, lapack_int ldaf, lapack_int* ipiv,
char* equed, float* r, float* c, float* b, lapack_int ldb, float* x, lapack_int ldx,
float* rcond, float* ferr, float* berr, float* rpivot );
lapack_int LAPACKE_dgesvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int nrhs, double* a, lapack_int lda, double* af, lapack_int ldaf, lapack_int*
ipiv, char* equed, double* r, double* c, double* b, lapack_int ldb, double* x,
lapack_int ldx, double* rcond, double* ferr, double* berr, double* rpivot );
lapack_int LAPACKE_cgesvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int nrhs, lapack_complex_float* a, lapack_int lda, lapack_complex_float* af,
lapack_int ldaf, lapack_int* ipiv, char* equed, float* r, float* c,
lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x, lapack_int ldx,
float* rcond, float* ferr, float* berr, float* rpivot );
lapack_int LAPACKE_zgesvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int nrhs, lapack_complex_double* a, lapack_int lda, lapack_complex_double* af,
lapack_int ldaf, lapack_int* ipiv, char* equed, double* r, double* c,
lapack_complex_double* b, lapack_int ldb, lapack_complex_double* x, lapack_int ldx,
double* rcond, double* ferr, double* berr, double* rpivot );

Include Files
• mkl.h

Description

The routine uses the LU factorization to compute the solution to a real or complex system of linear equations
A*X = B, where A is an n-by-n matrix, the columns of matrix B are individual right-hand sides, and the
columns of X are the corresponding solutions.

683
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Error bounds on the solution and a condition estimate are also provided.
The routine ?gesvx performs the following steps:

1. If fact = 'E', real scaling factors r and c are computed to equilibrate the system:

trans = 'N': diag(r)Adiag(c)inv(diag(c))X = diag(r)*B

trans = 'T': (diag(r)*A*diag(c))T*inv(diag(r))*X = diag(c)*B
trans = 'C': (diag(r)*A*diag(c))H*inv(diag(r))*X = diag(c)*B
Whether or not the system will be equilibrated depends on the scaling of the matrix A, but if
equilibration is used, A is overwritten by diag(r)*A*diag(c) and B by diag(r)*B (if trans='N') or
diag(c)*B (if trans = 'T' or 'C').
2. If fact = 'N' or 'E', the LU decomposition is used to factor the matrix A (after equilibration if fact
= 'E') as A = P*L*U, where P is a permutation matrix, L is a unit lower triangular matrix, and U is
upper triangular.
3. If some Ui,i= 0, so that U is exactly singular, then the routine returns with info = i. Otherwise, the
factored form of A is used to estimate the condition number of the matrix A. If the reciprocal of the
condition number is less than machine precision, info = n + 1 is returned as a warning, but the
routine still goes on to solve for X and compute error bounds as described below.
4. The system of equations is solved for X using the factored form of A.
5. Iterative refinement is applied to improve the computed solution matrix and calculate error bounds and
backward error estimates for it.
6. If equilibration was used, the matrix X is premultiplied by diag(c) (if trans = 'N') or diag(r) (if
trans = 'T' or 'C') so that it solves the original system before equilibration.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

fact Must be 'F', 'N', or 'E'.

Specifies whether or not the factored form of the matrix A is supplied

on entry, and if not, whether the matrix A should be equilibrated
before it is factored.
If fact = 'F': on entry, af and ipiv contain the factored form of A. If
equed is not 'N', the matrix A has been equilibrated with scaling
factors given by r and c.
a, af, and ipiv are not modified.
If fact = 'N', the matrix A will be copied to af and factored.

If fact = 'E', the matrix A will be equilibrated if necessary, then

copied to af and factored.

trans Must be 'N', 'T', or 'C'.

Specifies the form of the system of equations:

If trans = 'N', the system has the form A*X = B (No transpose).

If trans = 'T', the system has the form AT*X = B (Transpose).

If trans = 'C', the system has the form AH*X = B (Transpose for
real flavors, conjugate transpose for complex flavors).

n The number of linear equations; the order of the matrix A; n≥ 0.

684
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
nrhs The number of right hand sides; the number of columns of the
matrices B and X; nrhs≥ 0.

a The array a(size max(1, lda*n)) contains the matrix A. If fact =

'F' and equed is not 'N', then A must have been equilibrated by the
scaling factors in r and/or c.

af The array afaf(size max(1, ldaf*n)) is an input argument if fact =

'F'. It contains the factored form of the matrix A, that is, the factors
L and U from the factorization A = P*L*U as computed by ?getrf. If
equed is not 'N', then af is the factored form of the equilibrated
matrix A.

b The array bbof size max(1, ldb*nrhs) for column major layout and
max(1, ldb*n) for row major layout contains the matrix B whose
columns are the right-hand sides for the systems of equations.

lda The leading dimension of a; lda≥ max(1, n).

ldaf The leading dimension of af; ldaf≥ max(1, n).

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

ipiv Array, size at least max(1, n). The array ipiv is an input argument if
fact = 'F'. It contains the pivot indices from the factorization A =
P*L*U as computed by ?getrf; row i of the matrix was interchanged
with row ipiv[i-1].

equed Must be 'N', 'R', 'C', or 'B'.

equed is an input argument if fact = 'F'. It specifies the form of

equilibration that was done:
If equed = 'N', no equilibration was done (always true if fact =
'N').
If equed = 'R', row equilibration was done, that is, A has been
premultiplied by diag(r).
If equed = 'C', column equilibration was done, that is, A has been
postmultiplied by diag(c).
If equed = 'B', both row and column equilibration was done, that is,
A has been replaced by diag(r)*A*diag(c).

685
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If fact = 'F' and equed = 'C' or 'B', each element of c must be

positive.

ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.

Output Parameters

x Array, size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout.
If info = 0 or info = n+1, the array x contains the solution matrix
X to the original system of equations. Note that A and B are modified
on exit if equed≠'N', and the solution to the equilibrated system is:

diag(C)-1*X, if trans = 'N' and equed = 'C' or 'B';

diag(R)-1*X, if trans = 'T' or 'C' and equed = 'R' or 'B'. The
second dimension of x must be at least max(1,nrhs).

a Array a is not modified on exit if fact = 'F' or 'N', or if fact =

'E' and equed = 'N'. If equed≠'N', A is scaled on exit as follows:
equed = 'R': A = diag(R)*A

equed = 'C': A = A*diag(c)

equed = 'B': A = diag(R)Adiag(c).

af If fact = 'N' or 'E', then af is an output argument and on exit

returns the factors L and U from the factorization A = PLU of the
original matrix A (if fact = 'N') or of the equilibrated matrix A (if
fact = 'E'). See the description of a for the form of the equilibrated
matrix.

b Overwritten by diag(r)*B if trans = 'N' and equed = 'R'or 'B';

overwritten by diag(c)*B if trans = 'T' or 'C' and equed = 'C'

or 'B';

not changed if equed = 'N'.

r, c These arrays are output arguments if fact≠'F'. See the description

of r, c in Input Arguments section.

rcond An estimate of the reciprocal condition number of the matrix A after

equilibration (if done). If rcond is less than the machine precision, in
particular, if rcond = 0, the matrix is singular to working precision.
This condition is indicated by a return code of info > 0.

ferr Array, size at least max(1, nrhs). Contains the estimated forward
error bound for each solution vector xj (the j-th column of the
solution matrix X). If xtrue is the true solution corresponding to xj,
ferr[j-1] is an estimated upper bound for the magnitude of the
largest element in (xj - xtrue) divided by the magnitude of the
largest element in xj. The estimate is as reliable as the estimate for
rcond, and is almost always a slight overestimate of the true error.

686
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
berr Array, size at least max(1, nrhs). Contains the component-wise
relative backward error for each solution vector xj, that is, the
smallest relative change in any element of A or B that makes xj an
exact solution.

ipiv If fact = 'N'or 'E', then ipiv is an output argument and on exit
contains the pivot indices from the factorization A = P*L*U of the
original matrix A (if fact = 'N') or of the equilibrated matrix A (if
fact = 'E').

equed If fact≠'F', then equed is an output argument. It specifies the form

of equilibration that was done (see the description of equed in Input
Arguments section).

rpivot On exit, rpivot contains the reciprocal pivot growth factor:

If rpivot is much less than 1, then the stability of the LU

factorization of the (equilibrated) matrix A could be poor. This also
means that the solution x, condition estimator rcond, and forward
error bound ferr could be unreliable. If factorization fails with 0 <
info≤n, then rpivot contains the reciprocal pivot growth factor for
the leading info columns of A.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, and i≤n, then U(i, i) is exactly zero. The factorization has been completed, but the factor U is
exactly singular, so the solution and error bounds could not be computed; rcond = 0 is returned.
If info = n + 1, then U is nonsingular, but rcond is less than machine precision, meaning that the matrix is
singular to working precision. Nevertheless, the solution and error bounds are computed because there are a
number of situations where the computed solution can be more accurate than the value of rcond would
suggest.

See Also
Matrix Storage Schemes

?gesvxx
Uses extra precise iterative refinement to compute the
solution to the system of linear equations with a
square coefficient matrix A and multiple right-hand
sides

Syntax
lapack_int LAPACKE_sgesvxx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int nrhs, float* a, lapack_int lda, float* af, lapack_int ldaf, lapack_int* ipiv,
char* equed, float* r, float* c, float* b, lapack_int ldb, float* x, lapack_int ldx,
float* rcond, float* rpvgrw, float* berr, lapack_int n_err_bnds, float* err_bnds_norm,
float* err_bnds_comp, lapack_int nparams, const float* params );

687
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lapack_int LAPACKE_dgesvxx( int matrix_layout, char fact, char trans, lapack_int n,

lapack_int nrhs, double* a, lapack_int lda, double* af, lapack_int ldaf, lapack_int*
ipiv, char* equed, double* r, double* c, double* b, lapack_int ldb, double* x,
lapack_int ldx, double* rcond, double* rpvgrw, double* berr, lapack_int n_err_bnds,
double* err_bnds_norm, double* err_bnds_comp, lapack_int nparams, const double*
params );
lapack_int LAPACKE_cgesvxx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int nrhs, lapack_complex_float* a, lapack_int lda, lapack_complex_float* af,
lapack_int ldaf, lapack_int* ipiv, char* equed, float* r, float* c,
lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x, lapack_int ldx,
float* rcond, float* rpvgrw, float* berr, lapack_int n_err_bnds, float* err_bnds_norm,
float* err_bnds_comp, lapack_int nparams, const float* params );
lapack_int LAPACKE_zgesvxx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int nrhs, lapack_complex_double* a, lapack_int lda, lapack_complex_double* af,
lapack_int ldaf, lapack_int* ipiv, char* equed, double* r, double* c,
lapack_complex_double* b, lapack_int ldb, lapack_complex_double* x, lapack_int ldx,
double* rcond, double* rpvgrw, double* berr, lapack_int n_err_bnds, double*
err_bnds_norm, double* err_bnds_comp, lapack_int nparams, const double* params );

Include Files
• mkl.h

Description

The routine uses the LU factorization to compute the solution to a real or complex system of linear equations
A*X = B, where A is an n-by-n matrix, the columns of the matrix B are individual right-hand sides, and the
columns of X are the corresponding solutions.
Both normwise and maximum componentwise error bounds are also provided on request. The routine returns
a solution with a small guaranteed error (O(eps), where eps is the working machine precision) unless the
matrix is very ill-conditioned, in which case a warning is returned. Relevant condition numbers are also
calculated and returned.
The routine accepts user-provided factorizations and equilibration factors; see definitions of the fact and
equed options. Solving with refinement and using a factorization from a previous call of the routine also
produces a solution with O(eps) errors or warnings but that may not be true for general user-provided
factorizations and equilibration factors if they differ from what the routine would itself produce.
The routine ?gesvxx performs the following steps:

1. If fact = 'E', scaling factors r and c are computed to equilibrate the system:

trans = 'N': diag(r)Adiag(c)inv(diag(c))X = diag(r)*B

688
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
3. If some Ui,i= 0, so that U is exactly singular, then the routine returns with info = i. Otherwise, the
factored form of A is used to estimate the condition number of the matrix A (see the rcond parameter).
If the reciprocal of the condition number is less than machine precision, the routine still goes on to
solve for X and compute error bounds.
4. The system of equations is solved for X using the factored form of A.
5. By default, unless is set to zero, the routine applies iterative refinement to improve the computed
solution matrix and calculate error bounds. Refinement calculates the residual to at least twice the
working precision.
6. If equilibration was used, the matrix X is premultiplied by diag(c) (if trans = 'N') or diag(r) (if
trans = 'T' or 'C') so that it solves the original system before equilibration.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

fact Must be 'F', 'N', or 'E'.

Specifies whether or not the factored form of the matrix A is supplied

on entry, and if not, whether the matrix A should be equilibrated
before it is factored.
If fact = 'F', on entry, af and ipiv contain the factored form of A.
If equed is not 'N', the matrix A has been equilibrated with scaling
factors given by r and c. Parameters a, af, and ipiv are not
modified.
If fact = 'N', the matrix A will be copied to af and factored.

If fact = 'E', the matrix A will be equilibrated, if necessary, copied

to af and factored.

trans Must be 'N', 'T', or 'C'.

Specifies the form of the system of equations:

If trans = 'N', the system has the form A*X = B (No transpose).

If trans = 'T', the system has the form AT*X = B (Transpose).

If trans = 'C', the system has the form AH*X = B (Conjugate

Transpose = Transpose for real flavors, Conjugate Transpose for
complex flavors).

n The number of linear equations; the order of the matrix A; n≥ 0.

nrhs The number of right hand sides; the number of columns of the
matrices B and X; nrhs≥ 0.

a, af, b Arrays: a(size max(ldan)), af(size max(ldafn)), b(size max(1,

ldb*nrhs) for column major layout and max(1, ldb*n) for row major
layout).
The array a contains the matrix A. If fact = 'F' and equed is not
'N', then A must have been equilibrated by the scaling factors in r
and/or c. .

689
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The array af is an input argument if fact = 'F'. It contains the

factored form of the matrix A, that is, the factors L and U from the
factorization A = P*L*U as computed by ?getrf. If equed is not 'N',
then af is the factored form of the equilibrated matrix A.

The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.

lda The leading dimension of a; lda≥ max(1, n).

ldaf The leading dimension of af; ldaf≥ max(1, n).

equed Must be 'N', 'R', 'C', or 'B'.

equed is an input argument if fact = 'F'. It specifies the form of

r, c Arrays: r (size n), c (size n). The array r contains the row scale
factors for A, and the array c contains the column scale factors for A.
These arrays are input arguments if fact = 'F' only; otherwise they
are output arguments.
If equed = 'R' or 'B', A is multiplied on the left by diag(r); if equed
= 'N' or 'C', r is not accessed.
If fact = 'F' and equed = 'R'or 'B', each element of r must be
positive.
If equed = 'C' or 'B', A is multiplied on the right by diag(c); if
equed = 'N' or 'R', c is not accessed.
If fact = 'F' and equed = 'C' or 'B', each element of c must be
positive.
Each element of r or c should be a power of the radix to ensure a
reliable solution and error estimates. Scaling by powers of the radix
does not cause rounding errors unless the result underflows or
overflows. Rounding errors during scaling lead to refining with a
matrix that is not equivalent to the input matrix, producing error
estimates that may not be reliable.

ldb The leading dimension of the array b; ldb≥ max(1, n) for column
major layout and ldb≥nrhs for row major layout.

690
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.

n_err_bnds Number of error bounds to return for each right hand side and each
type (normwise or componentwise). See err_bnds_norm and
err_bnds_comp descriptions in Output Arguments section below.

nparams Specifies the number of parameters set in params. If ≤ 0, the params

array is never referenced and default values are used.

params Array, size max(1, nparams). Specifies algorithm parameters. If an

entry is less than 0.0, that entry is filled with the default value used
for that parameter. Only positions up to nparams are accessed;
defaults are used for higher-numbered parameters. If defaults are
acceptable, you can pass nparams = 0, which prevents the source
code from accessing the params argument.

params[0] : Whether to perform iterative refinement or not. Default:

1.0

=0.0 No refinement is performed and no error

bounds are computed.

=1.0 Use the double-precision refinement

algorithm, possibly with doubled-single
computations if the compilation environment
does not support double precision.

(Other values are reserved for future use.)

params[1] : Maximum number of residual computations allowed for
refinement.

Default 10.0

Aggressive Set to 100.0 to permit convergence using

params[2] : Flag determining if the code will attempt to find a

solution with a small componentwise relative error in the double-
precision algorithm. Positive is true, 0.0 is false. Default: 1.0 (attempt
componentwise convergence).

Output Parameters

x Array, size max(1, ldx*nrhs) for column major layout and max(1, ldx*n)
for row major layout.
If info = 0, the array x contains the solution n-by-nrhs matrix X to the
original system of equations. Note that A and B are modified on exit if
equed≠'N', and the solution to the equilibrated system is:

691
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

inv(diag(c))*X, if trans = 'N' and equed = 'C' or 'B'; or

inv(diag(r))*X, if trans = 'T' or 'C' and equed = 'R' or 'B'.

a Array a is not modified on exit if fact = 'F' or 'N', or if fact = 'E' and
equed = 'N'.
If equed≠'N', A is scaled on exit as follows:

equed = 'R': A = diag(r)*A

equed = 'C': A = A*diag(c)
equed = 'B': A = diag(r)*A*diag(c).

af If fact = 'N' or 'E', then af is an output argument and on exit returns

the factors L and U from the factorization A = PLU of the original matrix A
(if fact = 'N') or of the equilibrated matrix A (if fact = 'E'). See the
description of a for the form of the equilibrated matrix.

b Overwritten by diag(r)*B if trans = 'N' and equed = 'R' or 'B';

overwritten by trans = 'T' or 'C' and equed = 'C' or 'B';

not changed if equed = 'N'.

r, c These arrays are output arguments if fact≠'F'. Each element of these

arrays is a power of the radix. See the description of r, c in Input
Arguments section.

rcond Reciprocal scaled condition number. An estimate of the reciprocal Skeel

rpvgrw Contains the reciprocal pivot growth factor:

If this is much less than 1, the stability of the LU factorization of the

(equlibrated) matrix A could be poor. This also means that the solution X,
estimated condition numbers, and error bounds could be unreliable. If
factorization fails with 0 < info≤n, this parameter contains the reciprocal
pivot growth factor for the leading info columns of A. In ?gesvx, this
quantity is returned in rpivot.

err_bnds_norm Array of size nrhs*n_err_bnds. For each right-hand side, contains

information about various error bounds and condition numbers
corresponding to the normwise relative error, which is defined as follows:
Normwise relative error in the i-th solution vector

692
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

The array is indexed by the type of error information as described below.

There are currently up to three pieces of information returned.

err=1 "Trust/don't trust" boolean. Trust the answer if

the reciprocal condition number is less than the
threshold sqrt(n)*slamch(ε) for single
precision flavors and sqrt(n)*dlamch(ε) for
double precision flavors.

err=2 "Guaranteed" error bound. The estimated

forward error, almost certainly within a factor of
10 of the true error so long as the next entry is
greater than the threshold sqrt(n)*slamch(ε)
for single precision flavors and
sqrt(n)*dlamch(ε) for double precision
flavors. This error bound should only be trusted
if the previous boolean is true.

err=3 Reciprocal condition number. Estimated

normwise reciprocal condition number.
Compared with the threshold
sqrt(n)*slamch(ε) for single precision flavors
and sqrt(n)*dlamch(ε) for double precision
flavors to determine if the error estimate is
"guaranteed". These reciprocal condition
numbers for some appropriately scaled matrix Z
are:

Let z=s*a, where s scales each row by a power

of the radix so all absolute row sums of z are
approximately 1.

The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in err_bnds_norm[(err-1)*nrhs + i - 1].

err_bnds_comp Array of size nrhs*n_err_bnds. For each right-hand side, contains

information about various error bounds and condition numbers
corresponding to the componentwise relative error, which is defined as
follows:
Componentwise relative error in the i-th solution vector:

693
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The array is indexed by the type of error information as described below.

There are currently up to three pieces of information returned for each
right-hand side. If componentwise accuracy is not requested (params[2] =
0.0), then err_bnds_comp is not accessed.

err=1 "Trust/don't trust" boolean. Trust the answer if

the reciprocal condition number is less than the
threshold sqrt(n)*slamch(ε) for single
precision flavors and sqrt(n)*dlamch(ε) for
double precision flavors.

err=2 "Guaranteed" error bpound. The estimated

forward error, almost certainly within a factor of
10 of the true error so long as the next entry is
greater than the threshold sqrt(n)*slamch(ε)
for single precision flavors and
sqrt(n)*dlamch(ε) for double precision
flavors. This error bound should only be trusted
if the previous boolean is true.

err=3 Reciprocal condition number. Estimated

componentwise reciprocal condition number.
Compared with the threshold
sqrt(n)*slamch(ε) for single precision flavors
and sqrt(n)*dlamch(ε) for double precision
flavors to determine if the error estimate is
"guaranteed". These reciprocal condition
numbers for some appropriately scaled matrix Z
are:

Let z=s(adiag(x)), where x is the solution

for the current right-hand side and s scales each
row of a*diag(x) by a power of the radix so all
absolute row sums of z are approximately 1.

The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in err_bnds_comp[(err-1)*nrhs + i - 1].

ipiv If fact = 'N' or 'E', then ipiv is an output argument and on exit
contains the pivot indices from the factorization A = P*L*U of the original
matrix A (if fact = 'N') or of the equilibrated matrix A (if fact = 'E').

equed If fact≠'F', then equed is an output argument. It specifies the form of

equilibration that was done (see the description of equed in Input
Arguments section).

params If an entry is less than 0.0, that entry is filled with the default value used
for that parameter, otherwise the entry is not modified

Return Values
This function returns a value info.

If info = 0, the execution is successful. The solution to every right-hand side is guaranteed.

694
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = -i, parameter i had an illegal value.

If 0 < info≤n: Uinfo,info is exactly zero. The factorization has been completed, but the factor U is exactly
singular, so the solution and error bounds could not be computed; rcond = 0 is returned.

See Also
Matrix Storage Schemes

?gbsv
Computes the solution to the system of linear
equations with a band coefficient matrix A and
multiple right-hand sides.

Syntax
lapack_int LAPACKE_sgbsv (int matrix_layout , lapack_int n , lapack_int kl , lapack_int
ku , lapack_int nrhs , float * ab , lapack_int ldab , lapack_int * ipiv , float * b ,
lapack_int ldb );
lapack_int LAPACKE_dgbsv (int matrix_layout , lapack_int n , lapack_int kl , lapack_int
ku , lapack_int nrhs , double * ab , lapack_int ldab , lapack_int * ipiv , double * b ,
lapack_int ldb );
lapack_int LAPACKE_cgbsv (int matrix_layout , lapack_int n , lapack_int kl , lapack_int
ku , lapack_int nrhs , lapack_complex_float * ab , lapack_int ldab , lapack_int *
ipiv , lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zgbsv (int matrix_layout , lapack_int n , lapack_int kl , lapack_int
ku , lapack_int nrhs , lapack_complex_double * ab , lapack_int ldab , lapack_int *
ipiv , lapack_complex_double * b , lapack_int ldb );

Include Files
• mkl.h

Description

The routine solves for X the real or complex system of linear equations A*X = B, where A is an n-by-n band
matrix with kl subdiagonals and ku superdiagonals, the columns of matrix B are individual right-hand sides,
and the columns of X are the corresponding solutions.
The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L*U, where L is a
product of permutation and unit lower triangular matrices with kl subdiagonals, and U is upper triangular
with kl+ku superdiagonals. The factored form of A is then used to solve the system of equations A*X = B.

695
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

n The order of A. The number of rows in B; n≥ 0.

kl The number of subdiagonals within the band of A; kl≥ 0.

ku The number of superdiagonals within the band of A; ku≥ 0.

nrhs The number of right-hand sides. The number of columns in B; nrhs≥

ab, b Arrays: ab(size max(1, ldab*n)), bof size max(1, ldb*nrhs) for
column major layout and max(1, ldb*n) for row major layout.

The array ab contains the matrix A in band storage (see Matrix

Storage Schemes).
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.

ldab The leading dimension of the array ab. (ldab≥ 2kl + ku +1)

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

Output Parameters

ab Overwritten by L and U. U is stored as an upper triangular band

matrix with kl + ku superdiagonals and L is stored as a lower
triangular band matrix with kl subdiagonals. See Matrix Storage
Schemes.

b Overwritten by the solution matrix X.

ipiv Array, size at least max(1, n). The pivot indices: row i was
interchanged with row ipiv[i-1].

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, Ui, i is exactly zero. The factorization has been completed, but the factor U is exactly singular,
so the solution could not be computed.

See Also
Matrix Storage Schemes

?gbsvx
Computes the solution to the real or complex system
of linear equations with a band coefficient matrix A
and multiple right-hand sides, and provides error
bounds on the solution.

696
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_sgbsvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, float* ab, lapack_int ldab, float* afb,
lapack_int ldafb, lapack_int* ipiv, char* equed, float* r, float* c, float* b,
lapack_int ldb, float* x, lapack_int ldx, float* rcond, float* ferr, float* berr, float*
rpivot );
lapack_int LAPACKE_dgbsvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, double* ab, lapack_int ldab, double* afb,
lapack_int ldafb, lapack_int* ipiv, char* equed, double* r, double* c, double* b,
lapack_int ldb, double* x, lapack_int ldx, double* rcond, double* ferr, double* berr,
double* rpivot );
lapack_int LAPACKE_cgbsvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, lapack_complex_float* ab, lapack_int
ldab, lapack_complex_float* afb, lapack_int ldafb, lapack_int* ipiv, char* equed,
float* r, float* c, lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x,
lapack_int ldx, float* rcond, float* ferr, float* berr, float* rpivot );
lapack_int LAPACKE_zgbsvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, lapack_complex_double* ab, lapack_int
ldab, lapack_complex_double* afb, lapack_int ldafb, lapack_int* ipiv, char* equed,
double* r, double* c, lapack_complex_double* b, lapack_int ldb, lapack_complex_double*
x, lapack_int ldx, double* rcond, double* ferr, double* berr, double* rpivot );

Include Files
• mkl.h

Description
The routine uses the LU factorization to compute the solution to a real or complex system of linear equations
A*X = B, AT*X = B, or AH*X = B, where A is a band matrix of order n with kl subdiagonals and ku
superdiagonals, the columns of matrix B are individual right-hand sides, and the columns of X are the
corresponding solutions.
Error bounds on the solution and a condition estimate are also provided.
The routine ?gbsvx performs the following steps:

1. If fact = 'E', real scaling factors r and c are computed to equilibrate the system:

trans = 'N': diag(r)Adiag(c) inv(diag(c))X = diag(r)*B

trans = 'T': (diag(r)*A*diag(c))T *inv(diag(r))*X = diag(c)*B
trans = 'C': (diag(r)*A*diag(c))H *inv(diag(r))*X = diag(c)*B
Whether the system will be equilibrated depends on the scaling of the matrix A, but if equilibration is
used, A is overwritten by diag(r)*A*diag(c) and B by diag(r)*B (if trans='N') or diag(c)*B (if
trans = 'T'or 'C').
2. If fact = 'N'or 'E', the LU decomposition is used to factor the matrix A (after equilibration if fact =
'E') as A = L*U, where L is a product of permutation and unit lower triangular matrices with kl
subdiagonals, and U is upper triangular with kl+ku superdiagonals.
3. If some Ui,i = 0, so that U is exactly singular, then the routine returns with info = i. Otherwise, the
factored form of A is used to estimate the condition number of the matrix A. If the reciprocal of the
condition number is less than machine precision, info = n + 1 is returned as a warning, but the
routine still goes on to solve for X and compute error bounds as described below.
4. The system of equations is solved for X using the factored form of A.

697
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

5. Iterative refinement is applied to improve the computed solution matrix and calculate error bounds and
backward error estimates for it.
6. If equilibration was used, the matrix X is premultiplied by diag(c) (if trans = 'N') or diag(r) (if
trans = 'T' or 'C') so that it solves the original system before equilibration.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

fact Must be 'F', 'N', or 'E'.

Specifies whether the factored form of the matrix A is supplied on

entry, and if not, whether the matrix A should be equilibrated before it
is factored.
If fact = 'F': on entry, afb and ipiv contain the factored form of A.
If equed is not 'N', the matrix A is equilibrated with scaling factors
given by r and c.
ab, afb, and ipiv are not modified.
If fact = 'N', the matrix A will be copied to afb and factored.

If fact = 'E', the matrix A will be equilibrated if necessary, then

copied to afb and factored.

trans Must be 'N', 'T', or 'C'.

Specifies the form of the system of equations:

If trans = 'N', the system has the form A*X = B (No transpose).

If trans = 'T', the system has the form AT*X = B (Transpose).

If trans = 'C', the system has the form AH*X = B (Transpose for
real flavors, conjugate transpose for complex flavors).

n The number of linear equations, the order of the matrix A; n≥ 0.

kl The number of subdiagonals within the band of A; kl≥ 0.

ku The number of superdiagonals within the band of A; ku≥ 0.

nrhs The number of right hand sides, the number of columns of the
matrices B and X; nrhs≥ 0.

ab, afb, b Arrays: ab (max(ldabn)), afb (max(ldafbn)), b(max(1,

ldb*nrhs) for column major layout and max(1, ldb*n) for row major
layout).
The array ab contains the matrix A in band storage (see Matrix
Storage Schemes). If fact = 'F' and equed is not 'N', then A must
have been equilibrated by the scaling factors in r and/or c.
The array afb is an input argument if fact = 'F'. It contains the
factored form of the matrix A, that is, the factors L and U from the
factorization A = P*L*U as computed by ?gbtrf. U is stored as an
upper triangular band matrix with kl + ku superdiagonals.L is stored
as lower triangular band matrix with kl subdiagonals. If equed is not
'N', then afb is the factored form of the equilibrated matrix A.

698
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.

ldab The leading dimension of ab; ldab≥kl+ku+1.

ldafb The leading dimension of afb; ldafb≥ 2*kl+ku+1.

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

ipiv Array, size at least max(1, n). The array ipiv is an input argument if
fact = 'F'. It contains the pivot indices from the factorization A =
P*L*U as computed by ?gbtrf; row i of the matrix was interchanged
with row ipiv[i-1].

equed Must be 'N', 'R', 'C', or 'B'.

equed is an input argument if fact = 'F'. It specifies the form of

equilibration that was done:
If equed = 'N', no equilibration was done (always true if fact =
'N').
If equed = 'R', row equilibration was done, that is, A has been
premultiplied by diag(r).

If equed = 'C', column equilibration was done, that is, A has been
postmultiplied by diag(c).

if equed = 'B', both row and column equilibration was done, that is,
A has been replaced by diag(r)*A*diag(c).

r, c Arrays: r (size n), c (size n).

The array r contains the row scale factors for A, and the array c
contains the column scale factors for A. These arrays are input
arguments if fact = 'F' only; otherwise they are output arguments.

If equed = 'R'or 'B', A is multiplied on the left by diag(r); if

equed = 'N' or 'C', r is not accessed.
If fact = 'F' and equed = 'R' or 'B', each element of r must be
positive.
If equed = 'C'or 'B', A is multiplied on the right by diag(c); if
equed = 'N'or 'R', c is not accessed.
If fact = 'F' and equed = 'C'or 'B', each element of c must be
positive.

ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.

Output Parameters

x Array, size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout.

699
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If info = 0 or info = n+1, the array x contains the solution matrix

X to the original system of equations. Note that A and B are modified
on exit if equed≠'N', and the solution to the equilibrated system is:
inv(diag(c))*X, if trans = 'N' and equed = 'C'or 'B';
inv(diag(r))*X, if trans = 'T' or 'C' and equed = 'R' or 'B'.

ab Array ab is not modified on exit if fact = 'F' or 'N', or if fact =

'E' and equed = 'N'.
If equed≠'N', A is scaled on exit as follows:

equed = 'R': A = diag(r)*A

equed = 'C': A = A*diag(c)

equed = 'B': A = diag(r)Adiag(c).

afb If fact = 'N' or 'E', then afb is an output argument and on exit
returns details of the LU factorization of the original matrix A (if fact
= 'N') or of the equilibrated matrix A (if fact = 'E'). See the
description of ab for the form of the equilibrated matrix.

b Overwritten by diag(r)*b if trans = 'N' and equed = 'R' or 'B';

overwritten by diag(c)*b if trans = 'T' or 'C' and equed = 'C'

or 'B';

not changed if equed = 'N'.

r, c These arrays are output arguments if fact≠'F'. See the description

of r, c in Input Arguments section.

rcond An estimate of the reciprocal condition number of the matrix A after

equilibration (if done).
If rcond is less than the machine precision (in particular, if rcond =0),
the matrix is singular to working precision. This condition is indicated
by a return code of info>0.

ferr Array, size at least max(1, nrhs). Contains the estimated forward
error bound for each solution vector xj (the j-th column of the solution
matrix X). If xtrue is the true solution corresponding to xj, ferr[j-1]
is an estimated upper bound for the magnitude of the largest element
in (xj - xtrue) divided by the magnitude of the largest element in xj.
The estimate is as reliable as the estimate for rcond, and is almost
always a slight overestimate of the true error.

berr Array, size at least max(1, nrhs). Contains the component-wise

relative backward error for each solution vector xj, that is, the
smallest relative change in any element of A or B that makes xj an
exact solution.

ipiv If fact = 'N' or 'E', then ipiv is an output argument and on exit
contains the pivot indices from the factorization A = L*U of the
original matrix A (if fact = 'N') or of the equilibrated matrix A (if
fact = 'E').

700
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
equed If fact≠'F', then equed is an output argument. It specifies the form
of equilibration that was done (see the description of equed in Input
Arguments section).

rpivot On exit, rpivot contains the reciprocal pivot growth factor:

If rpivot is much less than 1, then the stability of the LU

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, and i≤n, then Ui, i is exactly zero. The factorization has been completed, but the factor U is
exactly singular, so the solution and error bounds could not be computed; rcond = 0 is returned. If info =
i, and i = n+1, then U is nonsingular, but rcond is less than machine precision, meaning that the matrix is
singular to working precision. Nevertheless, the solution and error bounds are computed because there are a
number of situations where the computed solution can be more accurate than the value of rcond would
suggest.

See Also
Matrix Storage Schemes

?gbsvxx
Uses extra precise iterative refinement to compute the
solution to the system of linear equations with a
banded coefficient matrix A and multiple right-hand
sides

Syntax
lapack_int LAPACKE_sgbsvxx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, float* ab, lapack_int ldab, float* afb,
lapack_int ldafb, lapack_int* ipiv, char* equed, float* r, float* c, float* b,
lapack_int ldb, float* x, lapack_int ldx, float* rcond, float* rpvgrw, float* berr,
lapack_int n_err_bnds, float* err_bnds_norm, float* err_bnds_comp, lapack_int nparams,
const float* params );
lapack_int LAPACKE_dgbsvxx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, double* ab, lapack_int ldab, double* afb,
lapack_int ldafb, lapack_int* ipiv, char* equed, double* r, double* c, double* b,
lapack_int ldb, double* x, lapack_int ldx, double* rcond, double* rpvgrw, double* berr,
lapack_int n_err_bnds, double* err_bnds_norm, double* err_bnds_comp, lapack_int
nparams, const double* params );
lapack_int LAPACKE_cgbsvxx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, lapack_complex_float* ab, lapack_int
ldab, lapack_complex_float* afb, lapack_int ldafb, lapack_int* ipiv, char* equed,

701
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

float* r, float* c, lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x,

lapack_int ldx, float* rcond, float* rpvgrw, float* berr, lapack_int n_err_bnds, float*
err_bnds_norm, float* err_bnds_comp, lapack_int nparams, const float* params );
lapack_int LAPACKE_zgbsvxx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, lapack_complex_double* ab, lapack_int
ldab, lapack_complex_double* afb, lapack_int ldafb, lapack_int* ipiv, char* equed,
double* r, double* c, lapack_complex_double* b, lapack_int ldb, lapack_complex_double*
x, lapack_int ldx, double* rcond, double* rpvgrw, double* berr, lapack_int n_err_bnds,
double* err_bnds_norm, double* err_bnds_comp, lapack_int nparams, const double*
params );

Include Files
• mkl.h

Description

The routine uses the LU factorization to compute the solution to a real or complex system of linear equations
A*X = B, AT*X = B, or AH*X = B, where A is an n-by-n banded matrix, the columns of the matrix B are
individual right-hand sides, and the columns of X are the corresponding solutions.
Both normwise and maximum componentwise error bounds are also provided on request. The routine returns
a solution with a small guaranteed error (O(eps), where eps is the working machine precision) unless the
matrix is very ill-conditioned, in which case a warning is returned. Relevant condition numbers are also
calculated and returned.
The routine accepts user-provided factorizations and equilibration factors; see definitions of the fact and
equed options. Solving with refinement and using a factorization from a previous call of the routine also
produces a solution with O(eps) errors or warnings but that may not be true for general user-provided
factorizations and equilibration factors if they differ from what the routine would itself produce.
The routine ?gbsvxx performs the following steps:

1. If fact = 'E', scaling factors r and c are computed to equilibrate the system:

trans = 'N': diag(r)Adiag(c)inv(diag(c))X = diag(r)*B

trans = 'T': (diag(r)*A*diag(c))T*inv(diag(r))*X = diag(c)*B
trans = 'C': (diag(r)*A*diag(c))H*inv(diag(r))*X = diag(c)*B
Whether or not the system will be equilibrated depends on the scaling of the matrix A, but if
equilibration is used, A is overwritten by diag(r)*A*diag(c) and B by diag(r)*B (if trans='N') or
diag(c)*B (if trans = 'T' or 'C').
2. If fact = 'N' or 'E', the LU decomposition is used to factor the matrix A (after equilibration if fact
= 'E') as A = P*L*U, where P is a permutation matrix, L is a unit lower triangular matrix, and U is
upper triangular.
3. If some Ui,i= 0, so that U is exactly singular, then the routine returns with info = i. Otherwise, the
factored form of A is used to estimate the condition number of the matrix A (see the rcond parameter).
If the reciprocal of the condition number is less than machine precision, the routine still goes on to
solve for X and compute error bounds.
4. The system of equations is solved for X using the factored form of A.
5. By default, unless params[0] is set to zero, the routine applies iterative refinement to improve the
computed solution matrix and calculate error bounds. Refinement calculates the residual to at least
twice the working precision.
6. If equilibration was used, the matrix X is premultiplied by diag(c) (if trans = 'N') or diag(r) (if
trans = 'T' or 'C') so that it solves the original system before equilibration.

702
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

matrix_layout Specifies whether two-dimensional array storage is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

fact Must be 'F', 'N', or 'E'.

Specifies whether or not the factored form of the matrix A is supplied

on entry, and if not, whether the matrix A should be equilibrated
before it is factored.
If fact = 'F', on entry, afb and ipiv contain the factored form of
A. If equed is not 'N', the matrix A has been equilibrated with scaling
factors given by r and c. Parameters ab, afb, and ipiv are not
modified.
If fact = 'N', the matrix A will be copied to afb and factored.

If fact = 'E', the matrix A will be equilibrated, if necessary, copied

to afb and factored.

trans Must be 'N', 'T', or 'C'.

Specifies the form of the system of equations:

If trans = 'N', the system has the form A*X = B (No transpose).

If trans = 'T', the system has the form AT*X = B (Transpose).

If trans = 'C', the system has the form AH*X = B (Conjugate

Transpose = Transpose for real flavors, Conjugate Transpose for
complex flavors).

n The number of linear equations; the order of the matrix A; n≥ 0.

kl The number of subdiagonals within the band of A; kl≥ 0.

ku The number of superdiagonals within the band of A; ku≥ 0.

nrhs The number of right-hand sides; the number of columns of the

matrices B and X; nrhs≥ 0.

ab, afb, b Arrays: ab (max(ldabn)), afb (max(ldafbn)), b(max(1,

ldb*nrhs) for column major layout and max(1, ldb*n) for row major
layout).
The array ab contains the matrix A in band storage.

If fact = 'F' and equed is not 'N', then AB must have been
equilibrated by the scaling factors in r and/or c.

The array afb is an input argument if fact = 'F'. It contains the

factored form of the banded matrix A, that is, the factors L and U from
the factorization A = P*L*U as computed by ?gbtrf. U is stored as
an upper triangular banded matrix with kl + ku superdiagonals. L is
stored as lower triangular band matrix with kl subdiagonals. If equed
is not 'N', then afb is the factored form of the equilibrated matrix A.

The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.

ldab The leading dimension of the array ab; ldab≥kl+ku+1.

703
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ldafb The leading dimension of the array afb; ldafb≥ 2*kl+ku+1.

equed Must be 'N', 'R', 'C', or 'B'.

equed is an input argument if fact = 'F'. It specifies the form of

r, c Arrays: r (size n), c (size n). The array r contains the row scale factors
for A, and the array c contains the column scale factors for A. These
arrays are input arguments if fact = 'F' only; otherwise they are
output arguments.
If equed = 'R' or 'B', A is multiplied on the left by diag(r); if equed
= 'N'or 'C', r is not accessed.
If fact = 'F' and equed = 'R' or 'B', each element of r must be
positive.
If equed = 'C' or 'B', A is multiplied on the right by diag(c); if
equed = 'N' or 'R', c is not accessed.
If fact = 'F' and equed = 'C' or 'B', each element of c must be
positive.
Each element of r or c should be a power of the radix to ensure a
reliable solution and error estimates. Scaling by powers of the radix
does not cause rounding errors unless the result underflows or
overflows. Rounding errors during scaling lead to refining with a
matrix that is not equivalent to the input matrix, producing error
estimates that may not be reliable.

ldb The leading dimension of the array b; ldb≥ max(1, n) for column
major layout and ldb≥nrhs for row major layout.

ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.

n_err_bnds Number of error bounds to return for each right hand side and each
type (normwise or componentwise). See err_bnds_norm and
err_bnds_comp descriptions in Output Arguments section below.

nparams Specifies the number of parameters set in params. If ≤ 0, the params

array is never referenced and default values are used.

704
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
params Array, size max(1, nparams). Specifies algorithm parameters. If an
entry is less than 0.0, that entry is filled with the default value used
for that parameter. Only positions up to nparams are accessed;
defaults are used for higher-numbered parameters. If defaults are
acceptable, you can pass nparams = 0, which prevents the source
code from accessing the params argument.

params[0] : Whether to perform iterative refinement or not. Default:

1.0 (for single precision flavors), 1.0D+0 (for double precision
flavors).

=0.0 No refinement is performed and no error

bounds are computed.

=1.0 Use the extra-precise refinement algorithm.

(Other values are reserved for future use.)

params[1] : Maximum number of residual computations allowed for
refinement.

Default 10.0

Aggressive Set to 100.0 to permit convergence using

params[2] : Flag determining if the code will attempt to find a

solution with a small componentwise relative error in the double-
precision algorithm. Positive is true, 0.0 is false. Default: 1.0 (attempt
componentwise convergence).

Output Parameters

ab Array ab is not modified on exit if fact = 'F' or 'N', or if fact = 'E'

and equed = 'N'.

If equed≠'N', A is scaled on exit as follows:

equed = 'R': A = diag(r)*A

equed = 'C': A = A*diag(c)
equed = 'B': A = diag(r)*A*diag(c).

705
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

afb If fact = 'N' or 'E', then afb is an output argument and on exit returns
the factors L and U from the factorization A = PLU of the original matrix A
(if fact = 'N') or of the equilibrated matrix A (if fact = 'E').

b Overwritten by diag(r)*B if trans = 'N' and equed = 'R' or 'B';

overwritten by trans = 'T' or 'C' and equed = 'C' or 'B';

not changed if equed = 'N'.

r, c These arrays are output arguments if fact≠'F'. Each element of these

arrays is a power of the radix. See the description of r, c in Input
Arguments section.

rcond Reciprocal scaled condition number. An estimate of the reciprocal Skeel

rpvgrw Contains the reciprocal pivot growth factor:

If this is much less than 1, the stability of the LU factorization of the

err_bnds_norm Array of size nrhs*n_err_bnds. For each right-hand side, contains

information about various error bounds and condition numbers
corresponding to the normwise relative error, which is defined as follows:
Normwise relative error in the i-th solution vector

The array is indexed by the type of error information as described below.

There are currently up to three pieces of information returned.

err=1 "Trust/don't trust" boolean. Trust the answer if

the reciprocal condition number is less than the
threshold sqrt(n)*slamch(ε) for single
precision flavors and sqrt(n)*dlamch(ε) for
double precision flavors.

706
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
err=2 "Guaranteed" error bound. The estimated
forward error, almost certainly within a factor of
10 of the true error so long as the next entry is
greater than the threshold sqrt(n)*slamch(ε)
for single precision flavors and
sqrt(n)*dlamch(ε) for double precision
flavors. This error bound should only be trusted
if the previous boolean is true.

err=3 Reciprocal condition number. Estimated

normwise reciprocal condition number.
Compared with the threshold
sqrt(n)*slamch(ε) for single precision flavors
and sqrt(n)*dlamch(ε) for double precision
flavors to determine if the error estimate is
"guaranteed". These reciprocal condition
numbers for some appropriately scaled matrix Z
are:

Let z=s*a, where s scales each row by a power

of the radix so all absolute row sums of z are
approximately 1.

The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in err_bnds_norm[(err-1)*nrhs + i - 1].

err_bnds_comp Array of size nrhs*n_err_bnds. For each right-hand side, contains

information about various error bounds and condition numbers
corresponding to the componentwise relative error, which is defined as
follows:
Componentwise relative error in the i-th solution vector:

The array is indexed by the type of error information as described below.

There are currently up to three pieces of information returned for each
right-hand side. If componentwise accuracy is not requested (params[2] =
0.0), then err_bnds_comp is not accessed.

err=1 "Trust/don't trust" boolean. Trust the answer if

the reciprocal condition number is less than the
threshold sqrt(n)*slamch(ε) for single
precision flavors and sqrt(n)*dlamch(ε) for
double precision flavors.

err=2 "Guaranteed" error bpound. The estimated

forward error, almost certainly within a factor of
10 of the true error so long as the next entry is
greater than the threshold sqrt(n)*slamch(ε)
for single precision flavors and

707
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

sqrt(n)*dlamch(ε) for double precision

flavors. This error bound should only be trusted
if the previous boolean is true.

err=3 Reciprocal condition number. Estimated

componentwise reciprocal condition number.
Compared with the threshold
sqrt(n)*slamch(ε) for single precision flavors
and sqrt(n)*dlamch(ε) for double precision
flavors to determine if the error estimate is
"guaranteed". These reciprocal condition
numbers for some appropriately scaled matrix Z
are:

Let z=s(adiag(x)), where x is the solution

for the current right-hand side and s scales each
row of a*diag(x) by a power of the radix so all
absolute row sums of z are approximately 1.

The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in err_bnds_comp[(err-1)*nrhs + i - 1].

ipiv If fact = 'N' or 'E', then ipiv is an output argument and on exit
contains the pivot indices from the factorization A = P*L*U of the original
matrix A (if fact = 'N') or of the equilibrated matrix A (if fact = 'E').

equed If fact≠'F', then equed is an output argument. It specifies the form of

equilibration that was done (see the description of equed in Input
Arguments section).

params If an entry is less than 0.0, that entry is filled with the default value used
for that parameter, otherwise the entry is not modified.

info If info = 0, the execution is successful. The solution to every right-hand

side is guaranteed.
If info = -i, the i-th parameter had an illegal value.

If 0 < info≤n: Uinfo,info is exactly zero. The factorization has been

completed, but the factor U is exactly singular, so the solution and error
bounds could not be computed; rcond = 0 is returned.

If info = n+j: The solution corresponding to the j-th right-hand side is not
guaranteed. The solutions corresponding to other right-hand sides k with k
> j may not be guaranteed as well, but only the first such right-hand side is
reported. If a small componentwise error is not requested params[2] =
0.0, then the j-th right-hand side is the first with a normwise error bound
that is not guaranteed (the smallest j such that err_bnds_norm[j - 1] =
0.0 or err_bnds_comp[j - 1] = 0.0. See the definition of
err_bnds_norm and err_bnds_comp for err = 1. To get information about
all of the right-hand sides, check err_bnds_norm or err_bnds_comp.

708
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Return Values
This function returns a value info.

If info = 0, the execution is successful. The solution to every right-hand side is guaranteed.

If info = -i, parameter i had an illegal value.

If 0 < info≤n: Uinfo,info is exactly zero. The factorization has been completed, but the factor U is exactly
singular, so the solution and error bounds could not be computed; rcond = 0 is returned.

See Also
Matrix Storage Schemes

?gtsv
Computes the solution to the system of linear
equations with a tridiagonal coefficient matrix A and
multiple right-hand sides.

Syntax
lapack_int LAPACKE_sgtsv (int matrix_layout , lapack_int n , lapack_int nrhs , float *
dl , float * d , float * du , float * b , lapack_int ldb );
lapack_int LAPACKE_dgtsv (int matrix_layout , lapack_int n , lapack_int nrhs , double *
dl , double * d , double * du , double * b , lapack_int ldb );
lapack_int LAPACKE_cgtsv (int matrix_layout , lapack_int n , lapack_int nrhs ,
lapack_complex_float * dl , lapack_complex_float * d , lapack_complex_float * du ,
lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zgtsv (int matrix_layout , lapack_int n , lapack_int nrhs ,
lapack_complex_double * dl , lapack_complex_double * d , lapack_complex_double * du ,
lapack_complex_double * b , lapack_int ldb );

Include Files
• mkl.h

Description

The routine solves for X the system of linear equations A*X = B, where A is an n-by-n tridiagonal matrix, the
columns of matrix B are individual right-hand sides, and the columns of X are the corresponding solutions.
The routine uses Gaussian elimination with partial pivoting.
Note that the equation AT*X = B may be solved by interchanging the order of the arguments du and dl.

709
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

n The order of A, the number of rows in B; n≥ 0.

nrhs The number of right-hand sides, the number of columns in B; nrhs≥

dl The array dl (size n - 1) contains the (n - 1) subdiagonal elements

of A.

d The array d (size n) contains the diagonal elements of A.

du The array du (size n - 1) contains the (n - 1) superdiagonal

elements of A.

b The array b of size max(1, ldb*nrhs) for column major layout and
max(1, ldb*n) for row major layout contains the matrix B whose
columns are the right-hand sides for the systems of equations.

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

Output Parameters

dl Overwritten by the (n-2) elements of the second superdiagonal of the

upper triangular matrix U from the LU factorization of A. These
elements are stored in dl[0], ..., dl[n - 3].

d Overwritten by the n diagonal elements of U.

du Overwritten by the (n-1) elements of the first superdiagonal of U.

b Overwritten by the solution matrix X.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, Ui, i is exactly zero, and the solution has not been computed. The factorization has not been
completed unless i = n.

See Also
Matrix Storage Schemes

?gtsvx
Computes the solution to the real or complex system
of linear equations with a tridiagonal coefficient matrix
A and multiple right-hand sides, and provides error
bounds on the solution.

710
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_sgtsvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int nrhs, const float* dl, const float* d, const float* du, float* dlf, float*
df, float* duf, float* du2, lapack_int* ipiv, const float* b, lapack_int ldb, float* x,
lapack_int ldx, float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_dgtsvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int nrhs, const double* dl, const double* d, const double* du, double* dlf,
double* df, double* duf, double* du2, lapack_int* ipiv, const double* b, lapack_int ldb,
double* x, lapack_int ldx, double* rcond, double* ferr, double* berr );
lapack_int LAPACKE_cgtsvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int nrhs, const lapack_complex_float* dl, const lapack_complex_float* d, const
lapack_complex_float* du, lapack_complex_float* dlf, lapack_complex_float* df,
lapack_complex_float* duf, lapack_complex_float* du2, lapack_int* ipiv, const
lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x, lapack_int ldx,
float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_zgtsvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int nrhs, const lapack_complex_double* dl, const lapack_complex_double* d, const
lapack_complex_double* du, lapack_complex_double* dlf, lapack_complex_double* df,
lapack_complex_double* duf, lapack_complex_double* du2, lapack_int* ipiv, const
lapack_complex_double* b, lapack_int ldb, lapack_complex_double* x, lapack_int ldx,
double* rcond, double* ferr, double* berr );

Include Files
• mkl.h

Description

The routine uses the LU factorization to compute the solution to a real or complex system of linear equations
A*X = B, AT*X = B, or AH*X = B, where A is a tridiagonal matrix of order n, the columns of matrix B are
individual right-hand sides, and the columns of X are the corresponding solutions.
Error bounds on the solution and a condition estimate are also provided.
The routine ?gtsvx performs the following steps:

1. If fact = 'N', the LU decomposition is used to factor the matrix A as A = L*U, where L is a product
of permutation and unit lower bidiagonal matrices and U is an upper triangular matrix with nonzeroes in
only the main diagonal and first two superdiagonals.
2. If some Ui,i= 0, so that U is exactly singular, then the routine returns with info = i. Otherwise, the
factored form of A is used to estimate the condition number of the matrix A. If the reciprocal of the
condition number is less than machine precision, info = n + 1 is returned as a warning, but the
routine still goes on to solve for X and compute error bounds as described below.
3. The system of equations is solved for X using the factored form of A.
4. Iterative refinement is applied to improve the computed solution matrix and calculate error bounds and
backward error estimates for it.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

fact Must be 'F' or 'N'.

711
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Specifies whether or not the factored form of the matrix A has been
supplied on entry.
If fact = 'F': on entry, dlf, df, duf, du2, and ipiv contain the
factored form of A; arrays dl, d, du, dlf, df, duf, du2, and ipiv will not
be modified.
If fact = 'N', the matrix A will be copied to dlf, df, and duf and
factored.

trans Must be 'N', 'T', or 'C'.

Specifies the form of the system of equations:

If trans = 'N', the system has the form A*X = B (No transpose).

If trans = 'T', the system has the form AT*X = B (Transpose).

If trans = 'C', the system has the form AH*X = B (Conjugate

transpose).

n The number of linear equations, the order of the matrix A; n≥ 0.

nrhs The number of right hand sides, the number of columns of the
matrices B and X; nrhs≥ 0.

dl,d,du,dlf,df, duf,du2,b Arrays:

dl, size (n -1), contains the subdiagonal elements of A.
d, size (n), contains the diagonal elements of A.
du, size (n -1), contains the superdiagonal elements of A.
dlf, size (n -1). If fact = 'F', then dlf is an input argument and on
entry contains the (n -1) multipliers that define the matrix L from the
LU factorization of A as computed by ?gttrf.

df, size (n). If fact = 'F', then df is an input argument and on

entry contains the n diagonal elements of the upper triangular matrix
U from the LU factorization of A.
duf, size (n -1). If fact = 'F', then duf is an input argument and on
entry contains the (n -1) elements of the first superdiagonal of U.
du2, size (n -2). If fact = 'F', then du2 is an input argument and
on entry contains the (n-2) elements of the second superdiagonal of
U.
b, size max(ldb*nrhs) for column major layout and max(ldb*n) for
row major layout, contains the right-hand side matrix B.

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

ldx The leading dimension of x; ldx≥ max(1, n) for column major layout
and ldx≥nrhs for row major layout.

ipiv Array, size at least max(1, n). If fact = 'F', then ipiv is an input
argument and on entry contains the pivot indices, as returned
by ?gttrf.

712
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

x Array, size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout.
If info = 0 or info = n+1, the array x contains the solution matrix
X.

dlf If fact = 'N', then dlf is an output argument and on exit contains
the (n-1) multipliers that define the matrix L from the LU
factorization of A.

df If fact = 'N', then df is an output argument and on exit contains

the n diagonal elements of the upper triangular matrix U from the LU
factorization of A.

duf If fact = 'N', then duf is an output argument and on exit contains
the (n-1) elements of the first superdiagonal of U.

du2 If fact = 'N', then du2 is an output argument and on exit contains
the (n-2) elements of the second superdiagonal of U.

ipiv The array ipiv is an output argument if fact = 'N'and, on exit,

contains the pivot indices from the factorization A = L*U ; row i of
the matrix was interchanged with row ipiv[i-1]. The value of ipiv[i-1]
will always be i or i+1; ipiv[i-1]=i indicates a row interchange was not
required.

rcond An estimate of the reciprocal condition number of the matrix A. If

rcond is less than the machine precision (in particular, if rcond =0),
the matrix is singular to working precision. This condition is indicated
by a return code of info>0.

ferr Array, size at least max(1, nrhs). Contains the estimated forward
error bound for each solution vector xj (the j-th column of the solution
matrix X). If xtrue is the true solution corresponding to xj, ferr[j-1]
is an estimated upper bound for the magnitude of the largest element
in xj - xtrue divided by the magnitude of the largest element in xj. The
estimate is as reliable as the estimate for rcond, and is almost always
a slight overestimate of the true error.

berr Array, size at least max(1, nrhs). Contains the component-wise

relative backward error for each solution vector xj, that is, the
smallest relative change in any element of A or B that makes xj an
exact solution.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, and i≤n, then Ui, i is exactly zero. The factorization has not been completed unless i = n, but
the factor U is exactly singular, so the solution and error bounds could not be computed; rcond = 0 is
returned. If info = i, and i = n + 1, then U is nonsingular, but rcond is less than machine precision,

713
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

meaning that the matrix is singular to working precision. Nevertheless, the solution and error bounds are
computed because there are a number of situations where the computed solution can be more accurate than
the value of rcond would suggest.

See Also
Matrix Storage Schemes

?dtsvb
Computes the solution to the system of linear
equations with a diagonally dominant tridiagonal
coefficient matrix A and multiple right-hand sides.

Syntax
void sdtsvb (const MKL_INT * n, const MKL_INT * nrhs, float * dl, float * d, const
float * du, float * b, const MKL_INT * ldb, MKL_INT * info );
void ddtsvb (const MKL_INT * n, const MKL_INT * nrhs, double * dl, double * d, const
double * du, double * b, const MKL_INT * ldb, MKL_INT * info );
void cdtsvb (const MKL_INT * n, const MKL_INT * nrhs, MKL_Complex8 * dl, MKL_Complex8 *
d, const MKL_Complex8 * du, MKL_Complex8 * b, const MKL_INT * ldb, MKL_INT * info );
void zdtsvb (const MKL_INT * n, const MKL_INT * nrhs, MKL_Complex16 * dl, MKL_Complex16
* d, const MKL_Complex16 * du, MKL_Complex16 * b, const MKL_INT * ldb, MKL_INT *
info );

Include Files
• mkl.h

Description

The ?dtsvb routine solves a system of linear equations A*X = B for X, where A is an n-by-n diagonally
dominant tridiagonal matrix, the columns of matrix B are individual right-hand sides, and the columns of X
are the corresponding solutions. The routine uses the BABE (Burning At Both Ends) algorithm.
Note that the equation AT*X = B may be solved by interchanging the order of the arguments du and dl.

Input Parameters

n The order of A, the number of rows in B; n≥ 0.

nrhs The number of right-hand sides, the number of columns in B; nrhs≥

dl, d, du, b Arrays: dl (size n - 1), d (size n), du (size n - 1), b(max(ldb*nrhs)
for column major layout and max(ldb*n) for row major layout).

The array dl contains the (n - 1) subdiagonal elements of A.

The array d contains the diagonal elements of A.

The array du contains the (n - 1) superdiagonal elements of A.

The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.

ldb The leading dimension of b; ldb≥ max(1, n).

714
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

dl Overwritten by the (n-1) elements of the subdiagonal of the lower

triangular matrices L1, L2 from the factorization of A (see dttrfb).

d Overwritten by the n diagonal element reciprocals of U.

b Overwritten by the solution matrix X.

info If info = 0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, uii is exactly zero, and the solution has not been
computed. The factorization has not been completed unless i = n.

Application Notes
A diagonally dominant tridiagonal system is defined such that |di| > |dli-1| + |dui| for any i:

1 < i < n, and |d1| > |du1|, |dn| > |dln-1|

The underlying BABE algorithm is designed for diagonally dominant systems. Such systems have no
numerical stability issue unlike the canonical systems that use elimination with partial pivoting (see ?gtsv).
The diagonally dominant systems are much faster than the canonical systems.

NOTE
• The current implementation of BABE has a potential accuracy issue on very small or large data
close to the underflow or overflow threshold respectively. Scale the matrix before applying the
solver in the case of such input data.
• Applying the ?dtsvb factorization to non-diagonally dominant systems may lead to an accuracy
loss, or false singularity detected due to no pivoting.

?posv
Computes the solution to the system of linear
equations with a symmetric or Hermitian positive-
definite coefficient matrix A and multiple right-hand
sides.

Syntax
lapack_int LAPACKE_sposv (int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
float * a, lapack_int lda, float * b, lapack_int ldb);
lapack_int LAPACKE_dposv (int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
double * a, lapack_int lda, double * b, lapack_int ldb);
lapack_int LAPACKE_cposv (int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
lapack_complex_float * a, lapack_int lda, lapack_complex_float * b, lapack_int ldb);
lapack_int LAPACKE_zposv (int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
lapack_complex_double * a, lapack_int lda, lapack_complex_double * b, lapack_int ldb);
lapack_int LAPACKE_dsposv (int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
double * a, lapack_int lda, double * b, lapack_int ldb, double * x, lapack_int ldx,
lapack_int * iter);

715
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lapack_int LAPACKE_zcposv (int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,

lapack_complex_double * a, lapack_int lda, lapack_complex_double * b, lapack_int ldb,
lapack_complex_double * x, lapack_int ldx, lapack_int * iter);

Include Files
• mkl.h

Description

The routine solves for X the real or complex system of linear equations A*X = B, where A is an n-by-n
symmetric/Hermitian positive-definite matrix, the columns of matrix B are individual right-hand sides, and
the columns of X are the corresponding solutions.
The Cholesky decomposition is used to factor A as
A = UT*U (real flavors) and A = UH*U (complex flavors), if uplo = 'U'
or A = L*LT (real flavors) and A = L*LH (complex flavors), if uplo = 'L',

where U is an upper triangular matrix and L is a lower triangular matrix. The factored form of A is then used
to solve the system of equations A*X = B.

The dsposv and zcposv are mixed precision iterative refinement subroutines for exploiting fast single
precision hardware. They first attempt to factorize the matrix in single precision (dsposv) or single complex
precision (zcposv) and use this factorization within an iterative refinement procedure to produce a solution
with double precision (dsposv) / double complex precision (zcposv) normwise backward error quality (see
below). If the approach fails, the method switches to a double precision or double complex precision
factorization respectively and computes the solution.
The iterative refinement is not going to be a winning strategy if the ratio single precision/complex
performance over double precision/double complex performance is too small. A reasonable strategy should
take the number of right-hand sides and the size of the matrix into account. This might be done with a call to
ilaenv in the future. At present, iterative refinement is implemented.
The iterative refinement process is stopped if
iter > itermax
or for all the right-hand sides:
rnmr < sqrt(n)*xnrm*anrm*eps*bwdmax,
where
• iter is the number of the current iteration in the iterative refinement process
• rnmr is the infinity-norm of the residual
• xnrm is the infinity-norm of the solution
• anrm is the infinity-operator-norm of the matrix A
• eps is the machine epsilon returned by dlamch (‘Epsilon’).
The values itermax and bwdmax are fixed to 30 and 1.0d+00 respectively.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored:

If uplo = 'U', the upper triangle of A is stored.

716
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If uplo = 'L', the lower triangle of A is stored.

n The order of matrix A; n≥ 0.

nrhs The number of right-hand sides, the number of columns in B; nrhs≥

a, b Arrays: a(size max(1, lda)), b, size max(ldb*nrhs) for column major

layout and max(ldb*n) for row major layout,. The array a contains
the upper or the lower triangular part of the matrix A (see uplo).

Note that in the case of zcposv the imaginary parts of the diagonal
elements need not be set and are assumed to be zero.
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.

lda The leading dimension of a; lda≥ max(1, n).

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

ldx The leading dimension of the array x; ldx≥ max(1, n) for column
major layout and ldx≥nrhs for row major layout.

Output Parameters

a If info = 0, the upper or lower triangular part of a is overwritten by

the Cholesky factor U or L, as specified by uplo.
If iterative refinement has been successfully used (info= 0 and
iter≥ 0), then A is unchanged.
If double precision factorization has been used (info= 0 and iter <
0), then the array A contains the factors L or U from the Cholesky
factorization.

b Overwritten by the solution matrix X.

x Array, size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout. If info = 0, contains the n-by-nrhs
solution matrix X.

iter If iter < 0: iterative refinement has failed, double precision

factorization has been performed

• If iter = -1: the routine fell back to full precision for

implementation- or machine-specific reason
• If iter = -2: narrowing the precision induced an overflow, the
routine fell back to full precision
• If iter = -3: failure of spotrf for dsposv, or cpotrf for zcposv
• If iter = -31: stop the iterative refinement after the 30th
iteration.

If iter > 0: iterative refinement has been successfully used. Returns

the number of iterations.

717
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, the leading minor of order i (and therefore the matrix A itself) is not positive definite, so the
factorization could not be completed, and the solution has not been computed.

See Also
Matrix Storage Schemes

?posvx
Uses the Cholesky factorization to compute the
solution to the system of linear equations with a
symmetric or Hermitian positive-definite coefficient
matrix A, and provides error bounds on the solution.

Syntax
lapack_int LAPACKE_sposvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, float* a, lapack_int lda, float* af, lapack_int ldaf, char* equed,
float* s, float* b, lapack_int ldb, float* x, lapack_int ldx, float* rcond, float* ferr,
float* berr );
lapack_int LAPACKE_dposvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, double* a, lapack_int lda, double* af, lapack_int ldaf, char* equed,
double* s, double* b, lapack_int ldb, double* x, lapack_int ldx, double* rcond, double*
ferr, double* berr );
lapack_int LAPACKE_cposvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, lapack_complex_float* a, lapack_int lda, lapack_complex_float* af,
lapack_int ldaf, char* equed, float* s, lapack_complex_float* b, lapack_int ldb,
lapack_complex_float* x, lapack_int ldx, float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_zposvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, lapack_complex_double* a, lapack_int lda, lapack_complex_double* af,
lapack_int ldaf, char* equed, double* s, lapack_complex_double* b, lapack_int ldb,
lapack_complex_double* x, lapack_int ldx, double* rcond, double* ferr, double* berr );

Include Files
• mkl.h

Description

1. If fact = 'E', real scaling factors s are computed to equilibrate the system:

diag(s)*A*diag(s)*inv(diag(s))*X = diag(s)*B.

718
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Whether or not the system will be equilibrated depends on the scaling of the matrix A, but if
equilibration is used, A is overwritten by diag(s)*A*diag(s) and B by diag(s)*B.
2. If fact = 'N' or 'E', the Cholesky decomposition is used to factor the matrix A (after equilibration if
fact = 'E') as
A = UT*U (real), A = UH*U (complex), if uplo = 'U',
or A = L*LT (real), A = L*LH (complex), if uplo = 'L',

where U is an upper triangular matrix and L is a lower triangular matrix.

3. If the leading i-by-i principal minor is not positive-definite, then the routine returns with info = i.
Otherwise, the factored form of A is used to estimate the condition number of the matrix A. If the
reciprocal of the condition number is less than machine precision, info = n + 1 is returned as a
warning, but the routine still goes on to solve for X and compute error bounds as described below.
4. The system of equations is solved for X using the factored form of A.
5. Iterative refinement is applied to improve the computed solution matrix and calculate error bounds and
backward error estimates for it.
6. If equilibration was used, the matrix X is premultiplied by diag(s) so that it solves the original system
before equilibration.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

fact Must be 'F', 'N', or 'E'.

Specifies whether or not the factored form of the matrix A is supplied

on entry, and if not, whether the matrix A should be equilibrated
before it is factored.
If fact = 'F': on entry, af contains the factored form of A. If equed
= 'Y', the matrix A has been equilibrated with scaling factors given
by s.
a and af will not be modified.
If fact = 'N', the matrix A will be copied to af and factored.

If fact = 'E', the matrix A will be equilibrated if necessary, then

copied to af and factored.

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored:

If uplo = 'U', the upper triangle of A is stored.

If uplo = 'L', the lower triangle of A is stored.

n The order of matrix A; n≥ 0.

nrhs The number of right-hand sides, the number of columns in B; nrhs≥

a, af, b Arrays: a(size max(1, ldan)), af(size max(1, ldafn)), b, size

max(ldb*nrhs) for column major layout and max(ldb*n) for row
major layout, .

719
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The array a contains the matrix A as specified by uplo. If fact = 'F'

and equed = 'Y', then A must have been equilibrated by the scaling
factors in s, and a must contain the equilibrated matrix
diag(s)*A*diag(s).
The array af is an input argument if fact = 'F'. It contains the
triangular factor U or L from the Cholesky factorization of A in the
same storage format as A. If equed is not 'N', then af is the factored
form of the equilibrated matrix diag(s)*A*diag(s).

The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.

lda The leading dimension of a; lda≥ max(1, n).

ldaf The leading dimension of af; ldaf≥ max(1, n).

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

equed Must be 'N' or 'Y'.

equed is an input argument if fact = 'F'. It specifies the form of

equilibration that was done:
if equed = 'N', no equilibration was done (always true if fact =
'N');
if equed = 'Y', equilibration was done, that is, A has been replaced
by diag(s)*A*diag(s).

s Array, size (n). The array s contains the scale factors for A. This array
is an input argument if fact = 'F' only; otherwise it is an output
argument.
If equed = 'N', s is not accessed.

If fact = 'F' and equed = 'Y', each element of s must be positive.

ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.

Output Parameters

x Array, size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout.
If info = 0 or info = n+1, the array x contains the solution matrix
X to the original system of equations. Note that if equed = 'Y', A
and B are modified on exit, and the solution to the equilibrated system
is inv(diag(s))*X.

a Array a is not modified on exit if fact = 'F' or 'N', or if fact =

'E' and equed = 'N'.
If fact = 'E' and equed = 'Y', A is overwritten by
diag(s)*A*diag(s).

720
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
af If fact = 'N' or 'E', then af is an output argument and on exit
returns the triangular factor U or L from the Cholesky factorization
A=UT*U or A=L*LT (real routines), A=UH*U or A=L*LH (complex
routines) of the original matrix A (if fact = 'N'), or of the
equilibrated matrix A (if fact = 'E'). See the description of a for the
form of the equilibrated matrix.

b Overwritten by diag(s)*B, if equed = 'Y'; not changed if equed =

'N'.

s This array is an output argument if fact≠'F'. See the description of s

in Input Arguments section.

rcond An estimate of the reciprocal condition number of the matrix A after

equilibration (if done). If rcond is less than the machine precision (in
particular, if rcond =0), the matrix is singular to working precision.
This condition is indicated by a return code of info>0.

ferr Array, size at least max(1, nrhs). Contains the estimated forward
error bound for each solution vector xj (the j-th column of the solution
matrix X). If xtrue is the true solution corresponding to xj, ferr[j-1]
is an estimated upper bound for the magnitude of the largest element
in (xj) - xtrue) divided by the magnitude of the largest element in xj.
The estimate is as reliable as the estimate for rcond, and is almost
always a slight overestimate of the true error.

berr Array, size at least max(1, nrhs). Contains the component-wise

relative backward error for each solution vector xj, that is, the
smallest relative change in any element of A or B that makes xj an
exact solution.

equed If fact≠'F', then equed is an output argument. It specifies the form

of equilibration that was done (see the description of equed in Input
Arguments section).

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, and i≤n, the leading minor of order i (and therefore the matrix A itself) is not positive-definite,
so the factorization could not be completed, and the solution and error bounds could not be computed; rcond
=0 is returned.
If info = i, and i = n + 1, then U is nonsingular, but rcond is less than machine precision, meaning that the
matrix is singular to working precision. Nevertheless, the solution and error bounds are computed because
there are a number of situations where the computed solution can be more accurate than the value of rcond
would suggest.

See Also
Matrix Storage Schemes

721
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

?posvxx
Uses extra precise iterative refinement to compute the
solution to the system of linear equations with a
symmetric or Hermitian positive-definite coefficient
matrix A applying the Cholesky factorization.

Syntax
lapack_int LAPACKE_sposvxx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, float* a, lapack_int lda, float* af, lapack_int ldaf, char* equed,
float* s, float* b, lapack_int ldb, float* x, lapack_int ldx, float* rcond, float*
rpvgrw, float* berr, lapack_int n_err_bnds, float* err_bnds_norm, float* err_bnds_comp,
lapack_int nparams, const float* params );
lapack_int LAPACKE_dposvxx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, double* a, lapack_int lda, double* af, lapack_int ldaf, char* equed,
double* s, double* b, lapack_int ldb, double* x, lapack_int ldx, double* rcond, double*
rpvgrw, double* berr, lapack_int n_err_bnds, double* err_bnds_norm, double*
err_bnds_comp, lapack_int nparams, const double* params );
lapack_int LAPACKE_cposvxx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, lapack_complex_float* a, lapack_int lda, lapack_complex_float* af,
lapack_int ldaf, char* equed, float* s, lapack_complex_float* b, lapack_int ldb,
lapack_complex_float* x, lapack_int ldx, float* rcond, float* rpvgrw, float* berr,
lapack_int n_err_bnds, float* err_bnds_norm, float* err_bnds_comp, lapack_int nparams,
const float* params );
lapack_int LAPACKE_zposvxx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, lapack_complex_double* a, lapack_int lda, lapack_complex_double* af,
lapack_int ldaf, char* equed, double* s, lapack_complex_double* b, lapack_int ldb,
lapack_complex_double* x, lapack_int ldx, double* rcond, double* rpvgrw, double* berr,
lapack_int n_err_bnds, double* err_bnds_norm, double* err_bnds_comp, lapack_int
nparams, const double* params );

Include Files
• mkl.h

Description

The routine uses the Cholesky factorization A=UT*U (real flavors) / A=UH*U (complex flavors) or A=L*LT (real
flavors) / A=L*LH (complex flavors) to compute the solution to a real or complex system of linear equations
A*X = B, where A is an n-by-n real symmetric/Hermitian positive definite matrix, the columns of matrix B
are individual right-hand sides, and the columns of X are the corresponding solutions.
Both normwise and maximum componentwise error bounds are also provided on request. The routine returns
a solution with a small guaranteed error (O(eps), where eps is the working machine precision) unless the
matrix is very ill-conditioned, in which case a warning is returned. Relevant condition numbers are also
calculated and returned.
The routine accepts user-provided factorizations and equilibration factors; see definitions of the fact and
equed options. Solving with refinement and using a factorization from a previous call of the routine also
produces a solution with O(eps) errors or warnings but that may not be true for general user-provided
factorizations and equilibration factors if they differ from what the routine would itself produce.
The routine ?posvxx performs the following steps:

1. If fact = 'E', scaling factors are computed to equilibrate the system:

722
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
diag(s)*A*diag(s) *inv(diag(s))*X = diag(s)*B
Whether or not the system will be equilibrated depends on the scaling of the matrix A, but if
equilibration is used, A is overwritten by diag(s)*A*diag(s) and B by diag(s)*B.
2. If fact = 'N' or 'E', the Cholesky decomposition is used to factor the matrix A (after equilibration if
fact = 'E') as
A = UT*U (real), A = UH*U (complex), if uplo = 'U',
or A = L*LT (real), A = L*LH (complex), if uplo = 'L',

where U is an upper triangular matrix and L is a lower triangular matrix.

3. If the leading i-by-i principal minor is not positive-definite, the routine returns with info = i.
Otherwise, the factored form of A is used to estimate the condition number of the matrix A (see the
rcond parameter). If the reciprocal of the condition number is less than machine precision, the routine
still goes on to solve for X and compute error bounds.
4. The system of equations is solved for X using the factored form of A.
5. By default, unless params[0] is set to zero, the routine applies iterative refinement to get a small error
and error bounds. Refinement calculates the residual to at least twice the working precision.
6. If equilibration was used, the matrix X is premultiplied by diag(s) so that it solves the original system
before equilibration.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

fact Must be 'F', 'N', or 'E'.

Specifies whether or not the factored form of the matrix A is supplied

on entry, and if not, whether the matrix A should be equilibrated
before it is factored.
If fact = 'F', on entry, af contains the factored form of A. If equed
is not 'N', the matrix A has been equilibrated with scaling factors
given by s. Parameters a and af are not modified.

If fact = 'N', the matrix A will be copied to af and factored.

If fact = 'E', the matrix A will be equilibrated, if necessary, copied

to af and factored.

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored:

If uplo = 'U', the upper triangle of A is stored.

If uplo = 'L', the lower triangle of A is stored.

n The number of linear equations; the order of the matrix A; n≥ 0.

nrhs The number of right-hand sides; the number of columns of the

matrices B and X; nrhs≥ 0.

a, af, b Arrays: a(size max(ldan)), af(size max(ldafn)), b)size max(1,

ldb*nrhs) for column major layout and max(1, ldb*n) for row major
layout).

723
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The array a contains the matrix A as specified by uplo . If fact =

'F' and equed = 'Y', then A must have been equilibrated by the
scaling factors in s, and a must contain the equilibrated matrix
diag(s)*A*diag(s).
The array af is an input argument if fact = 'F'. It contains the
triangular factor U or L from the Cholesky factorization of A in the
same storage format as A. If equed is not 'N', then af is the factored
form of the equilibrated matrix diag(s)*A*diag(s).

The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.

lda The leading dimension of the array a; lda≥ max(1,n).

ldaf The leading dimension of the array af; ldaf≥ max(1,n).

equed Must be 'N' or 'Y'.

equed is an input argument if fact = 'F'. It specifies the form of

equilibration that was done:
If equed = 'N', no equilibration was done (always true if fact =
'N').
if equed = 'Y', both row and column equilibration was done, that is,
A has been replaced by diag(s)*A*diag(s).

s Array, size (n). The array s contains the scale factors for A. This array
is an input argument if fact = 'F' only; otherwise it is an output
argument.
If equed = 'N', s is not accessed.

If fact = 'F' and equed = 'Y', each element of s must be positive.

Each element of s should be a power of the radix to ensure a reliable

solution and error estimates. Scaling by powers of the radix does not
cause rounding errors unless the result underflows or overflows.
Rounding errors during scaling lead to refining with a matrix that is
not equivalent to the input matrix, producing error estimates that may
not be reliable.

ldb The leading dimension of the array b; ldb≥ max(1, n) for column
major layout and ldb≥nrhs for row major layout.

ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.

n_err_bnds Number of error bounds to return for each right hand side and each
type (normwise or componentwise). See err_bnds_norm and
err_bnds_comp descriptions in the Output Arguments section below.

nparams Specifies the number of parameters set in params. If ≤ 0, the params

array is never referenced and default values are used.

params Array, size max(1,nparams). Specifies algorithm parameters. If an

entry is less than 0.0, that entry is filled with the default value used
for that parameter. Only positions up to nparams are accessed;

724
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
defaults are used for higher-numbered parameters. If defaults are
acceptable, you can pass nparams = 0, which prevents the source
code from accessing the params argument.

params[0] : Whether to perform iterative refinement or not. Default:

1.0 (for single precision flavors), 1.0D+0 (for double precision
flavors).

=0.0 No refinement is performed and no error

bounds are computed.

=1.0 Use the extra-precise refinement algorithm.

(Other values are reserved for future use.)

params[1] : Maximum number of residual computations allowed for
refinement.

Default 10.0

Aggressive Set to 100.0 to permit convergence using

params[2] : Flag determining if the code will attempt to find a

solution with a small componentwise relative error in the double-
precision algorithm. Positive is true, 0.0 is false. Default: 1.0 (attempt
componentwise convergence).

Output Parameters

a Array a is not modified on exit if fact = 'F' or 'N', or if fact = 'E' and
equed = 'N'.
If fact = 'E' and equed = 'Y', A is overwritten by diag(s)*A*diag(s).

af If fact = 'N' or 'E', then af is an output argument and on exit returns

the triangular factor U or L from the Cholesky factorization A=UT*U or
A=L*LT (real routines), A=UH*U or A=L*LH (complex routines) of the original
matrix A (if fact = 'N'), or of the equilibrated matrix A (if fact = 'E').
See the description of a for the form of the equilibrated matrix.

b If equed = 'N', B is not modified.

If equed = 'Y', B is overwritten by diag(s)*B.

725
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

s This array is an output argument if fact≠'F'. Each element of this array is

a power of the radix. See the description of s in Input Arguments section.

rcond Reciprocal scaled condition number. An estimate of the reciprocal Skeel

rpvgrw Contains the reciprocal pivot growth factor:

If this is much less than 1, the stability of the LU factorization of the

err_bnds_norm Array of size nrhs*n_err_bnds. For each right-hand side, contains

information about various error bounds and condition numbers
corresponding to the normwise relative error, which is defined as follows:
Normwise relative error in the i-th solution vector

The array is indexed by the type of error information as described below.

There are currently up to three pieces of information returned.

err=1 "Trust/don't trust" boolean. Trust the answer if

the reciprocal condition number is less than the
threshold sqrt(n)*slamch(ε) for single
precision flavors and sqrt(n)*dlamch(ε) for
double precision flavors.

err=2 "Guaranteed" error bound. The estimated

forward error, almost certainly within a factor of
10 of the true error so long as the next entry is
greater than the threshold sqrt(n)*slamch(ε)
for single precision flavors and
sqrt(n)*dlamch(ε) for double precision
flavors. This error bound should only be trusted
if the previous boolean is true.

err=3 Reciprocal condition number. Estimated

normwise reciprocal condition number.
Compared with the threshold

726
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
sqrt(n)*slamch(ε) for single precision flavors
and sqrt(n)*dlamch(ε) for double precision
flavors to determine if the error estimate is
"guaranteed". These reciprocal condition
numbers for some appropriately scaled matrix Z
are:

Let z=s*a, where s scales each row by a power

of the radix so all absolute row sums of z are
approximately 1.

The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in err_bnds_norm[(err-1)*nrhs + i - 1].

err_bnds_comp Array of size nrhs*n_err_bnds. For each right-hand side, contains

information about various error bounds and condition numbers
corresponding to the componentwise relative error, which is defined as
follows:
Componentwise relative error in the i-th solution vector:

The array is indexed by the type of error information as described below.

There are currently up to three pieces of information returned for each
right-hand side. If componentwise accuracy is not requested (params[2] =
0.0), then err_bnds_comp is not accessed.

err=1 "Trust/don't trust" boolean. Trust the answer if

the reciprocal condition number is less than the
threshold sqrt(n)*slamch(ε) for single
precision flavors and sqrt(n)*dlamch(ε) for
double precision flavors.

err=2 "Guaranteed" error bpound. The estimated

forward error, almost certainly within a factor of
10 of the true error so long as the next entry is
greater than the threshold sqrt(n)*slamch(ε)
for single precision flavors and
sqrt(n)*dlamch(ε) for double precision
flavors. This error bound should only be trusted
if the previous boolean is true.

err=3 Reciprocal condition number. Estimated

727
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

"guaranteed". These reciprocal condition

numbers for some appropriately scaled matrix Z
are:

Let z=s(adiag(x)), where x is the solution

for the current right-hand side and s scales each
row of a*diag(x) by a power of the radix so all
absolute row sums of z are approximately 1.

The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in err_bnds_comp[(err-1)*nrhs + i - 1].

equed If fact≠'F', then equed is an output argument. It specifies the form of

equilibration that was done (see the description of equed in Input
Arguments section).

params If an entry is less than 0.0, that entry is filled with the default value used
for that parameter, otherwise the entry is not modified.

Return Values
This function returns a value info.

If info = 0, the execution is successful. The solution to every right-hand side is guaranteed.

If info = -i, parameter i had an illegal value.

If 0 < info≤n: Uinfo,info is exactly zero. The factorization has been completed, but the factor U is exactly
singular, so the solution and error bounds could not be computed; rcond = 0 is returned.

See Also
Matrix Storage Schemes

?ppsv
Computes the solution to the system of linear
equations with a symmetric (Hermitian) positive
definite packed coefficient matrix A and multiple right-
hand sides.

Syntax
lapack_int LAPACKE_sppsv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , float * ap , float * b , lapack_int ldb );
lapack_int LAPACKE_dppsv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , double * ap , double * b , lapack_int ldb );

728
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_cppsv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , lapack_complex_float * ap , lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zppsv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , lapack_complex_double * ap , lapack_complex_double * b , lapack_int ldb );

Include Files
• mkl.h

Description

The routine solves for X the real or complex system of linear equations A*X = B, where A is an n-by-n real
symmetric/Hermitian positive-definite matrix stored in packed format, the columns of matrix B are individual
right-hand sides, and the columns of X are the corresponding solutions.
The Cholesky decomposition is used to factor A as
A = UT*U (real flavors) and A = UH*U (complex flavors), if uplo = 'U'
or A = L*LT (real flavors) and A = L*LH (complex flavors), if uplo = 'L',

where U is an upper triangular matrix and L is a lower triangular matrix. The factored form of A is then used
to solve the system of equations A*X = B.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored:

If uplo = 'U', the upper triangle of A is stored.

If uplo = 'L', the lower triangle of A is stored.

n The order of matrix A; n≥ 0.

nrhs The number of right-hand sides, the number of columns in B; nrhs≥

ap, b Arrays: ap (size max(1,n(n+1)/2), b, size max(ldbnrhs) for

column major layout and max(ldb*n) for row major layout,. The
array ap contains the upper or the lower triangular part of the matrix
A (as specified by uplo) in packed storage (see Matrix Storage
Schemes). .
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

Output Parameters

ap If info = 0, the upper or lower triangular part of A in packed storage is

overwritten by the Cholesky factor U or L, as specified by uplo.

729
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

b Overwritten by the solution matrix X.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, the leading minor of order i (and therefore the matrix A itself) is not positive-definite, so the
factorization could not be completed, and the solution has not been computed.

See Also
Matrix Storage Schemes

?ppsvx
Uses the Cholesky factorization to compute the
solution to the system of linear equations with a
symmetric (Hermitian) positive definite packed
coefficient matrix A, and provides error bounds on the
solution.

Syntax
lapack_int LAPACKE_sppsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, float* ap, float* afp, char* equed, float* s, float* b, lapack_int ldb,
float* x, lapack_int ldx, float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_dppsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, double* ap, double* afp, char* equed, double* s, double* b, lapack_int
ldb, double* x, lapack_int ldx, double* rcond, double* ferr, double* berr );
lapack_int LAPACKE_cppsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, lapack_complex_float* ap, lapack_complex_float* afp, char* equed,
float* s, lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x, lapack_int
ldx, float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_zppsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, lapack_complex_double* ap, lapack_complex_double* afp, char* equed,
double* s, lapack_complex_double* b, lapack_int ldb, lapack_complex_double* x,
lapack_int ldx, double* rcond, double* ferr, double* berr );

Include Files
• mkl.h

Description

730
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
1. If fact = 'E', real scaling factors s are computed to equilibrate the system:

diag(s)*A*diag(s)*inv(diag(s))*X = diag(s)*B.
Whether or not the system will be equilibrated depends on the scaling of the matrix A, but if
equilibration is used, A is overwritten by diag(s)*A*diag(s) and B by diag(s)*B.
2. If fact = 'N' or 'E', the Cholesky decomposition is used to factor the matrix A (after equilibration if
fact = 'E') as
A = UT*U (real), A = UH*U (complex), if uplo = 'U',
or A = L*LT (real), A = L*LH (complex), if uplo = 'L',

where U is an upper triangular matrix and L is a lower triangular matrix.

3. If the leading i-by-i principal minor is not positive-definite, then the routine returns with info = i.
Otherwise, the factored form of A is used to estimate the condition number of the matrix A. If the
reciprocal of the condition number is less than machine precision, info = n+1 is returned as a
warning, but the routine still goes on to solve for X and compute error bounds as described below.
4. The system of equations is solved for X using the factored form of A.
5. Iterative refinement is applied to improve the computed solution matrix and calculate error bounds and
backward error estimates for it.
6. If equilibration was used, the matrix X is premultiplied by diag(s) so that it solves the original system
before equilibration.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

fact Must be 'F', 'N', or 'E'.

Specifies whether or not the factored form of the matrix A is supplied

on entry, and if not, whether the matrix A should be equilibrated
before it is factored.
If fact = 'F': on entry, afp contains the factored form of A. If
equed = 'Y', the matrix A has been equilibrated with scaling factors
given by s.
ap and afp will not be modified.
If fact = 'N', the matrix A will be copied to afp and factored.

If fact = 'E', the matrix A will be equilibrated if necessary, then

copied to afp and factored.

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored:

If uplo = 'U', the upper triangle of A is stored.

If uplo = 'L', the lower triangle of A is stored.

n The order of matrix A; n≥ 0.

nrhs The number of right-hand sides; the number of columns in B; nrhs≥

ap, afp, b Arrays: (size max(1,n*(n+1)/2), afp (size max(1,n*(n+1)/2), bof size
max(1, ldb*nrhs) for column major layout and max(1, ldb*n) for
row major layout.

731
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The array ap contains the upper or lower triangle of the original

symmetric/Hermitian matrix A in packed storage (see Matrix Storage
Schemes). In case when fact = 'F' and equed = 'Y', ap must
contain the equilibrated matrix diag(s)*A*diag(s).

The array afp is an input argument if fact = 'F' and contains the
triangular factor U or L from the Cholesky factorization of A in the
same storage format as A. If equed is not 'N', then afp is the
factored form of the equilibrated matrix A.
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

equed Must be 'N' or 'Y'.

equed is an input argument if fact = 'F'. It specifies the form of

equilibration that was done:
if equed = 'N', no equilibration was done (always true if fact =
'N');
if equed = 'Y', equilibration was done, that is, A has been replaced
by diag(s)A*diag(s).

s Array, size (n). The array s contains the scale factors for A. This array
is an input argument if fact = 'F' only; otherwise it is an output
argument.
If equed = 'N', s is not accessed.

If fact = 'F' and equed = 'Y', each element of s must be positive.

ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.

Output Parameters

x Array, size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout.
If info = 0 or info = n+1, the array x contains the solution matrix
X to the original system of equations. Note that if equed = 'Y', A
and B are modified on exit, and the solution to the equilibrated system
is inv(diag(s))*X.

ap Array ap is not modified on exit if fact = 'F' or 'N', or if fact =

'E'and equed = 'N'.
If fact = 'E' and equed = 'Y', ap is overwritten by
diag(s)*A*diag(s).

afp If fact = 'N'or 'E', then afp is an output argument and on exit
returns the triangular factor U or L from the Cholesky factorization
A=UT*U or A=L*LT (real routines), A=UH*U or A=L*LH (complex

732
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
routines) of the original matrix A (if fact = 'N'), or of the
equilibrated matrix A (if fact = 'E'). See the description of ap for
the form of the equilibrated matrix.

b Overwritten by diag(s)*B, if equed = 'Y'; not changed if equed =

'N'.

s This array is an output argument if fact≠'F'. See the description of s

in Input Arguments section.

rcond An estimate of the reciprocal condition number of the matrix A after

equilibration (if done). If rcond is less than the machine precision (in
particular, if rcond = 0), the matrix is singular to working precision.
This condition is indicated by a return code of info > 0.

ferr Array, size at least max(1, nrhs). Contains the estimated forward
error bound for each solution vector xj(the j-th column of the solution
matrix X). If xtrue is the true solution corresponding to xj,
ferr[j-1] is an estimated upper bound for the magnitude of the
largest element in (xj - xtrue) divided by the magnitude of the
largest element in xj. The estimate is as reliable as the estimate for
rcond, and is almost always a slight overestimate of the true error.

berr Array, size at least max(1, nrhs). Contains the component-wise

relative backward error for each solution vector xj, that is, the
smallest relative change in any element of A or B that makes xj an
exact solution.

equed If fact≠'F', then equed is an output argument. It specifies the form

of equilibration that was done (see the description of equed in Input
Arguments section).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, and i≤n, the leading minor of order i (and therefore the matrix A itself) is not positive-definite,
so the factorization could not be completed, and the solution and error bounds could not be computed; rcond
= 0 is returned.
If info = i, and i = n + 1, then U is nonsingular, but rcond is less than machine precision, meaning that the
matrix is singular to working precision. Nevertheless, the solution and error bounds are computed because
there are a number of situations where the computed solution can be more accurate than the value of rcond
would suggest.

See Also
Matrix Storage Schemes

?pbsv
Computes the solution to the system of linear
equations with a symmetric or Hermitian positive-
definite band coefficient matrix A and multiple right-
hand sides.

733
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
lapack_int LAPACKE_spbsv (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , lapack_int nrhs , float * ab , lapack_int ldab , float * b , lapack_int ldb );
lapack_int LAPACKE_dpbsv (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , lapack_int nrhs , double * ab , lapack_int ldab , double * b , lapack_int ldb );
lapack_int LAPACKE_cpbsv (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , lapack_int nrhs , lapack_complex_float * ab , lapack_int ldab ,
lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zpbsv (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , lapack_int nrhs , lapack_complex_double * ab , lapack_int ldab ,
lapack_complex_double * b , lapack_int ldb );

Include Files
• mkl.h

Description

The routine solves for X the real or complex system of linear equations A*X = B, where A is an n-by-n
symmetric/Hermitian positive definite band matrix, the columns of matrix B are individual right-hand sides,
and the columns of X are the corresponding solutions.
The Cholesky decomposition is used to factor A as
A = UT*U (real flavors) and A = UH*U (complex flavors), if uplo = 'U'
or A = L*LT (real flavors) and A = L*LH (complex flavors), if uplo = 'L',

where U is an upper triangular band matrix and L is a lower triangular band matrix, with the same number of
superdiagonals or subdiagonals as A. The factored form of A is then used to solve the system of equations
A*X = B.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored:

If uplo = 'U', the upper triangle of A is stored.

If uplo = 'L', the lower triangle of A is stored.

n The order of matrix A; n≥ 0.

kd The number of superdiagonals of the matrix A if uplo = 'U', or the

number of subdiagonals if uplo = 'L';kd≥ 0.

nrhs The number of right-hand sides, the number of columns in B; nrhs≥

734
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ab, b Arrays: ab(size max(1, ldab*n)), bof size max(1, ldb*nrhs) for
column major layout and max(1, ldb*n) for row major layout. The
array ab contains the upper or the lower triangular part of the matrix
A (as specified by uplo) in band storage (see Matrix Storage
Schemes).
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.

ldab The leading dimension of the array ab; ldab≥kd +1.

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

Output Parameters

ab The upper or lower triangular part of A (in band storage) is

overwritten by the Cholesky factor U or L, as specified by uplo, in the
same storage format as A.

b Overwritten by the solution matrix X.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, the leading minor of order i (and therefore the matrix A itself) is not positive-definite, so the
factorization could not be completed, and the solution has not been computed.
See Also
Matrix Storage Schemes

?pbsvx
Uses the Cholesky factorization to compute the
solution to the system of linear equations with a
symmetric (Hermitian) positive-definite band
coefficient matrix A, and provides error bounds on the
solution.

Syntax
lapack_int LAPACKE_spbsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int kd, lapack_int nrhs, float* ab, lapack_int ldab, float* afb, lapack_int
ldafb, char* equed, float* s, float* b, lapack_int ldb, float* x, lapack_int ldx, float*
rcond, float* ferr, float* berr );
lapack_int LAPACKE_dpbsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int kd, lapack_int nrhs, double* ab, lapack_int ldab, double* afb, lapack_int
ldafb, char* equed, double* s, double* b, lapack_int ldb, double* x, lapack_int ldx,
double* rcond, double* ferr, double* berr );
lapack_int LAPACKE_cpbsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int kd, lapack_int nrhs, lapack_complex_float* ab, lapack_int ldab,
lapack_complex_float* afb, lapack_int ldafb, char* equed, float* s,
lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x, lapack_int ldx,
float* rcond, float* ferr, float* berr );

735
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lapack_int LAPACKE_zpbsvx( int matrix_layout, char fact, char uplo, lapack_int n,

lapack_int kd, lapack_int nrhs, lapack_complex_double* ab, lapack_int ldab,
lapack_complex_double* afb, lapack_int ldafb, char* equed, double* s,
lapack_complex_double* b, lapack_int ldb, lapack_complex_double* x, lapack_int ldx,
double* rcond, double* ferr, double* berr );

Include Files
• mkl.h

Description

1. If fact = 'E', real scaling factors s are computed to equilibrate the system:

where U is an upper triangular band matrix and L is a lower triangular band matrix.
3. If the leading i-by-i principal minor is not positive definite, then the routine returns with info = i.
Otherwise, the factored form of A is used to estimate the condition number of the matrix A. If the
reciprocal of the condition number is less than machine precision, info = n+1 is returned as a
warning, but the routine still goes on to solve for X and compute error bounds as described below.
4. The system of equations is solved for X using the factored form of A.
5. Iterative refinement is applied to improve the computed solution matrix and calculate error bounds and
backward error estimates for it.
6. If equilibration was used, the matrix X is premultiplied by diag(s) so that it solves the original system
before equilibration.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

fact Must be 'F', 'N', or 'E'.

Specifies whether or not the factored form of the matrix A is supplied

on entry, and if not, whether the matrix A should be equilibrated
before it is factored.
If fact = 'F': on entry, afb contains the factored form of A. If
equed = 'Y', the matrix A has been equilibrated with scaling factors
given by s.

736
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ab and afb will not be modified.
If fact = 'N', the matrix A will be copied to afb and factored.

If fact = 'E', the matrix A will be equilibrated if necessary, then

copied to afb and factored.

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored:

If uplo = 'U', the upper triangle of A is stored.

If uplo = 'L', the lower triangle of A is stored.

n The order of matrix A; n≥ 0.

kd The number of superdiagonals or subdiagonals in the matrix A; kd≥ 0.

nrhs The number of right-hand sides, the number of columns in B; nrhs≥

ab, afb, b Arrays: ab(size max(1, ldab*n)), afb(size max(1, ldafb*n)), bof size
max(1, ldb*nrhs) for column major layout and max(1, ldb*n) for
row major layout.
The array ab contains the upper or lower triangle of the matrix A in
band storage (see Matrix Storage Schemes).
If fact = 'F' and equed = 'Y', then ab must contain the
equilibrated matrix diag(s)*A*diag(s).

The array afb is an input argument if fact = 'F'. It contains the

triangular factor U or L from the Cholesky factorization of the band
matrix A in the same storage format as A. If equed = 'Y', then afb
is the factored form of the equilibrated matrix A.
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.

ldab The leading dimension of ab; ldab≥kd+1.

ldafb The leading dimension of afb; ldafb≥kd+1.

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

equed Must be 'N' or 'Y'.

equed is an input argument if fact = 'F'. It specifies the form of

equilibration that was done:
if equed = 'N', no equilibration was done (always true if fact =
'N')
if equed = 'Y', equilibration was done, that is, A has been replaced
by diag(s)*A*diag(s).

s Array, size (n). The array s contains the scale factors for A. This array
is an input argument if fact = 'F' only; otherwise it is an output
argument.
If equed = 'N', s is not accessed.

737
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If fact = 'F' and equed = 'Y', each element of s must be positive.

ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.

Output Parameters

x Array, size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout.
If info = 0 or info = n+1, the array x contains the solution matrix X
to the original system of equations. Note that if equed = 'Y', A and
B are modified on exit, and the solution to the equilibrated system is
inv(diag(s))*X.

ab On exit, if fact = 'E'and equed = 'Y', A is overwritten by

diag(s)*A*diag(s).

afb If fact = 'N'or 'E', then afb is an output argument and on exit
returns the triangular factor U or L from the Cholesky factorization
A=UT*U or A=L*LT (real routines), A=UH*U or A=L*LH (complex
routines) of the original matrix A (if fact = 'N'), or of the
equilibrated matrix A (if fact = 'E'). See the description of ab for
the form of the equilibrated matrix.

b Overwritten by diag(s)*B, if equed = 'Y'; not changed if equed =

'N'.

s This array is an output argument if fact≠'F'. See the description of s

in Input Arguments section.

rcond An estimate of the reciprocal condition number of the matrix A after

berr Array, size at least max(1, nrhs). Contains the component-wise

relative backward error for each solution vector xj, that is, the
smallest relative change in any element of A or B that makes xj an
exact solution.

equed If fact≠'F', then equed is an output argument. It specifies the form

of equilibration that was done (see the description of equed in Input
Arguments section).

Return Values
This function returns a value info.

738
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, and i≤n, the leading minor of order i (and therefore the matrix A itself) is not positive definite,
so the factorization could not be completed, and the solution and error bounds could not be computed; rcond
=0 is returned. If info = i, and i = n + 1, then U is nonsingular, but rcond is less than machine precision,
meaning that the matrix is singular to working precision. Nevertheless, the solution and error bounds are
computed because there are a number of situations where the computed solution can be more accurate than
the value of rcond would suggest.

See Also
Matrix Storage Schemes

?ptsv
Computes the solution to the system of linear
equations with a symmetric or Hermitian positive
definite tridiagonal coefficient matrix A and multiple
right-hand sides.

Syntax
lapack_int LAPACKE_sptsv( int matrix_layout, lapack_int n, lapack_int nrhs, float* d,
float* e, float* b, lapack_int ldb );
lapack_int LAPACKE_dptsv( int matrix_layout, lapack_int n, lapack_int nrhs, double* d,
double* e, double* b, lapack_int ldb );
lapack_int LAPACKE_cptsv( int matrix_layout, lapack_int n, lapack_int nrhs, float* d,
lapack_complex_float* e, lapack_complex_float* b, lapack_int ldb );
lapack_int LAPACKE_zptsv( int matrix_layout, lapack_int n, lapack_int nrhs, double* d,
lapack_complex_double* e, lapack_complex_double* b, lapack_int ldb );

Include Files
• mkl.h

Description

The routine solves for X the real or complex system of linear equations A*X = B, where A is an n-by-n
symmetric/Hermitian positive-definite tridiagonal matrix, the columns of matrix B are individual right-hand
sides, and the columns of X are the corresponding solutions.
A is factored as A = L*D*LT (real flavors) or A = L*D*LH (complex flavors), and the factored form of A is
then used to solve the system of equations A*X = B.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

n The order of matrix A; n≥ 0.

nrhs The number of right-hand sides, the number of columns in B; nrhs≥

739
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

d Array, dimension at least max(1, n). Contains the diagonal elements

of the tridiagonal matrix A.

e, b Arrays: e (size n - 1), bof size max(1, ldb*nrhs) for column major
layout and max(1, ldb*n) for row major layout. The array e contains
the (n - 1) subdiagonal elements of A.

The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

Output Parameters

d Overwritten by the n diagonal elements of the diagonal matrix D from

the L*D*LT (real)/ L*D*LH (complex) factorization of A.

e Overwritten by the (n - 1) subdiagonal elements of the unit

bidiagonal factor L from the factorization of A.

b Overwritten by the solution matrix X.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, the leading minor of order i (and therefore the matrix A itself) is not positive-definite, and the
solution has not been computed. The factorization has not been completed unless i = n.

See Also
Matrix Storage Schemes

?ptsvx
Uses factorization to compute the solution to the
system of linear equations with a symmetric
(Hermitian) positive definite tridiagonal coefficient
matrix A, and provides error bounds on the solution.

Syntax
lapack_int LAPACKE_sptsvx( int matrix_layout, char fact, lapack_int n, lapack_int nrhs,
const float* d, const float* e, float* df, float* ef, const float* b, lapack_int ldb,
float* x, lapack_int ldx, float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_dptsvx( int matrix_layout, char fact, lapack_int n, lapack_int nrhs,
const double* d, const double* e, double* df, double* ef, const double* b, lapack_int
ldb, double* x, lapack_int ldx, double* rcond, double* ferr, double* berr );
lapack_int LAPACKE_cptsvx( int matrix_layout, char fact, lapack_int n, lapack_int nrhs,
const float* d, const lapack_complex_float* e, float* df, lapack_complex_float* ef,
const lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x, lapack_int ldx,
float* rcond, float* ferr, float* berr );

740
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_zptsvx( int matrix_layout, char fact, lapack_int n, lapack_int nrhs,
const double* d, const lapack_complex_double* e, double* df, lapack_complex_double* ef,
const lapack_complex_double* b, lapack_int ldb, lapack_complex_double* x, lapack_int
ldx, double* rcond, double* ferr, double* berr );

Include Files
• mkl.h

Description

The routine uses the Cholesky factorization A = L*D*LT (real)/A = L*D*LH (complex) to compute the
solution to a real or complex system of linear equations A*X = B, where A is a n-by-n symmetric or
Hermitian positive definite tridiagonal matrix, the columns of matrix B are individual right-hand sides, and
the columns of X are the corresponding solutions.
Error bounds on the solution and a condition estimate are also provided.
The routine ?ptsvx performs the following steps:

1. If fact = 'N', the matrix A is factored as A = L*D*LT (real flavors)/A = L*D*LH (complex flavors),
where L is a unit lower bidiagonal matrix and D is diagonal. The factorization can also be regarded as
having the form A = UT*D*U (real flavors)/A = UH*D*U (complex flavors).
2. If the leading i-by-i principal minor is not positive-definite, then the routine returns with info = i.
Otherwise, the factored form of A is used to estimate the condition number of the matrix A. If the
reciprocal of the condition number is less than machine precision, info = n+1 is returned as a
warning, but the routine still goes on to solve for X and compute error bounds as described below.
3. The system of equations is solved for X using the factored form of A.
4. Iterative refinement is applied to improve the computed solution matrix and calculate error bounds and
backward error estimates for it.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

fact Must be 'F' or 'N'.

Specifies whether or not the factored form of the matrix A is supplied

on entry.
If fact = 'F': on entry, df and ef contain the factored form of A.
Arrays d, e, df, and ef will not be modified.
If fact = 'N', the matrix A will be copied to df and ef, and factored.

n The order of matrix A; n≥ 0.

nrhs The number of right-hand sides, the number of columns in B; nrhs≥

d, df Arrays: d (size n), df (size n).

The array d contains the n diagonal elements of the tridiagonal matrix
A.
The array df is an input argument if fact = 'F' and on entry
contains the n diagonal elements of the diagonal matrix D from the
L*D*LT (real)/ L*D*LH (complex) factorization of A.

741
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

e,ef,b Arrays: e (size n -1), ef (size n -1), b, size max(ldb*nrhs) for column
major layout and max(ldb*n) for row major layout. The array e
contains the (n - 1) subdiagonal elements of the tridiagonal matrix
A.
The array ef is an input argument if fact = 'F' and on entry
contains the (n - 1) subdiagonal elements of the unit bidiagonal
factor L from the L*D*LT (real)/ L*D*LH (complex) factorization of A.

The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

ldx The leading dimension of x; ldx≥ max(1, n) for column major layout
and ldx≥nrhs for row major layout.

Output Parameters

x Array, size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout.
If info = 0 or info = n+1, the array x contains the solution matrix
X to the system of equations.

df, ef These arrays are output arguments if fact = 'N'. See the
description of df, ef in Input Arguments section.

rcond An estimate of the reciprocal condition number of the matrix A after

ferr Array, size at least max(1, nrhs). Contains the estimated forward
error bound for each solution vector xj (the j-th column of the
solution matrix X). If xtrue is the true solution corresponding to xj,
ferrj is an estimated upper bound for the magnitude of the largest
element in (xj - xtrue) divided by the magnitude of the largest
element in xj. The estimate is as reliable as the estimate for rcond,
and is almost always a slight overestimate of the true error.

berr Array, size at least max(1, nrhs). Contains the component-wise

relative backward error for each solution vector xj, that is, the
smallest relative change in any element of A or B that makes xj an
exact solution.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

742
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = i, and i = n + 1, then U is nonsingular, but rcond is less than machine precision, meaning that the
matrix is singular to working precision. Nevertheless, the solution and error bounds are computed because
there are a number of situations where the computed solution can be more accurate than the value of rcond
would suggest.

See Also
Matrix Storage Schemes

?sysv
Computes the solution to the system of linear
equations with a real or complex symmetric coefficient
matrix A and multiple right-hand sides.

Syntax
lapack_int LAPACKE_ssysv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , float * a , lapack_int lda , lapack_int * ipiv , float * b , lapack_int ldb );
lapack_int LAPACKE_dsysv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , double * a , lapack_int lda , lapack_int * ipiv , double * b , lapack_int ldb );
lapack_int LAPACKE_csysv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , lapack_complex_float * a , lapack_int lda , lapack_int * ipiv ,
lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zsysv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , lapack_complex_double * a , lapack_int lda , lapack_int * ipiv ,
lapack_complex_double * b , lapack_int ldb );

Include Files
• mkl.h

Description

The routine solves for X the real or complex system of linear equations A*X = B, where A is an n-by-n
symmetric matrix, the columns of matrix B are individual right-hand sides, and the columns of X are the
corresponding solutions.
The diagonal pivoting method is used to factor A as A = U*D*UT or A = L*D*LT, where U (or L) is a product
of permutation and unit upper (lower) triangular matrices, and D is symmetric and block diagonal with 1-
by-1 and 2-by-2 diagonal blocks.
The factored form of A is then used to solve the system of equations A*X = B.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored:

If uplo = 'U', the upper triangle of A is stored.

If uplo = 'L', the lower triangle of A is stored.

n The order of matrix A; n≥ 0.

743
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

nrhs The number of right-hand sides; the number of columns in B; nrhs≥

a, b Arrays: a(size max(1, lda*n)), bof size max(1, ldb*nrhs) for column
major layout and max(1, ldb*n) for row major layout.

The array a contains the upper or the lower triangular part of the
symmetric matrix A (see uplo).
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.

lda The leading dimension of a; lda≥ max(1, n).

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

Output Parameters

a If info = 0, a is overwritten by the block-diagonal matrix D and the

multipliers used to obtain the factor U (or L) from the factorization of
A as computed by ?sytrf.

b If info = 0, b is overwritten by the solution matrix X.

ipiv Array, size at least max(1, n). Contains details of the interchanges
and the block structure of D, as determined by ?sytrf.

If ipiv[i-1] = k >0, then dii is a 1-by-1 diagonal block, and the i-

th row and column of A was interchanged with the k-th row and
column.
If uplo = 'U' and ipiv[i] = ipiv[i-1] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and (i)-th row and column of A was
interchanged with the m-th row and column.
If uplo = 'L'and ipiv[i] = ipiv[i-1] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and (i+1)-th row and column of A
was interchanged with the m-th row and column.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, dii is 0. The factorization has been completed, but D is exactly singular, so the solution could
not be computed.

See Also
Matrix Storage Schemes

?sysv_aa
Computes the solution to a system of linear equations
A * X = B for symmetric matrices.
lapack_int LAPACKE_ssysv_aa (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, float * A, lapack_int lda, lapack_int * ipiv, float * B, lapack_int ldb);

744
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_dsysv_aa (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, double * A, lapack_int lda, lapack_int * ipiv, double * B, lapack_int ldb);
lapack_int LAPACKE_csysv_aa (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, lapack_complex_float * A, lapack_int lda, lapack_int * ipiv, lapack_complex_float
* B, lapack_int ldb);
lapack_int LAPACKE_zsysv_aa (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, lapack_complex_double * A, lapack_int lda, lapack_int * ipiv,
lapack_complex_double * B, lapack_int ldb);

Description
The ?sysv routine computes the solution to a complex system of linear equations A * X = B, where A is an
n-by-n symmetric matrix and X and B are n-by-nrhs matrices.
Aasen's algorithm is used to factor A as A = U * T * UT, if uplo = 'U', or A = L * T * LT, if uplo = 'L',
where U (or L) is a product of permutation and unit upper (lower) triangular matrices, and T is symmetric tri-
diagonal. The factored form of A is then used to solve the system of equations A * X= B.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo • = 'U': The upper triangle of A is stored.

• = 'L': The lower triangle of A is stored.

n The number of linear equations; that is, the order of the matrix A. n ≥ 0.

nrhs The number of right-hand sides; that is, the number of columns of the
matrix B. nrhs ≥ 0.

A Array of size max(1, lda*n). On entry, the symmetric matrix A. If uplo =

lda The leading dimension of the array A.

B Array of size max(1, ldb*nrhs) for column-major layout and max(1,

ldb*n) for row-major layout. On entry, the n-by-nrhs right-hand side
matrix B.

ldb The leading dimension of the array B. ldb ≥ max(1, n) for column-major
layout and ldb ≥ nrhs for row-major layout.

Output Parameters

A On exit, if info = 0, the tridiagonal matrix T and the multipliers used to

obtain the factor U or L from the factorization A = U*T*UT or A = L*T*LT as
computed by ?sytrf.

ipiv Array of size n. On exit, it contains the details of the interchanges; that is,
the row and column k of A were interchanged with the row and column
ipiv(k).

745
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

B On exit, if info = 0, the n-by-nrhs solution matrix X.

Return Values
This function returns a value info.

= 0: Successful exit.
< 0: If info = -i, the ith argument had an illegal value.

> 0: If info = i, D(i,i) is exactly zero. The factorization has been completed, but the block diagonal matrix D
is exactly singular, so the solution could not be computed.

?sysv_rook
Computes the solution to the system of linear
equations with a real or complex symmetric coefficient
matrix A and multiple right-hand sides.

Syntax
lapack_int LAPACKE_ssysv_rook (int matrix_layout , char uplo , lapack_int n ,
lapack_int nrhs , float * a , lapack_int lda , lapack_int * ipiv , float * b ,
lapack_int ldb );
lapack_int LAPACKE_dsysv_rook (int matrix_layout , char uplo , lapack_int n ,
lapack_int nrhs , double * a , lapack_int lda , lapack_int * ipiv , double * b ,
lapack_int ldb );
lapack_int LAPACKE_csysv_rook (int matrix_layout , char uplo , lapack_int n ,
lapack_int nrhs , lapack_complex_float * a , lapack_int lda , lapack_int * ipiv ,
lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zsysv_rook (int matrix_layout , char uplo , lapack_int n ,
lapack_int nrhs , lapack_complex_double * a , lapack_int lda , lapack_int * ipiv ,
lapack_complex_double * b , lapack_int ldb );

Include Files
• mkl.h

Description

The routine solves for X the real or complex system of linear equations A*X = B, where A is an n-by-n
symmetric matrix, the columns of matrix B are individual right-hand sides, and the columns of X are the
corresponding solutions.
The diagonal pivoting method is used to factor A as A = U*D*UT or A = L*D*LT, where U (or L) is a product
of permutation and unit upper (lower) triangular matrices, and D is symmetric and block diagonal with 1-
by-1 and 2-by-2 diagonal blocks.
The ?sysv_rook routine is called to compute the factorization of a complex symmetric matrix A using the
bounded Bunch-Kaufman ("rook") diagonal pivoting method.
The factored form of A is then used to solve the system of equations A*X = B.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

746
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored:

If uplo = 'U', the upper triangle of A is stored.

If uplo = 'L', the lower triangle of A is stored.

n The order of matrix A; n≥ 0.

nrhs The number of right-hand sides; the number of columns in B; nrhs≥

a, b Arrays: a(size max(1, lda*n)), bof size max(1, ldb*nrhs) for column
major layout and max(1, ldb*n) for row major layout.

The array a contains the upper or the lower triangular part of the
symmetric matrix A (see uplo). The second dimension of a must be at
least max(1, n).

The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations. The second dimension of b must
be at least max(1,nrhs).

lda The leading dimension of a; lda≥ max(1, n).

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs) for row major layout.

Output Parameters

a If info = 0, a is overwritten by the block-diagonal matrix D and the

multipliers used to obtain the factor U (or L) from the factorization of
A.

b If info = 0, b is overwritten by the solution matrix X.

ipiv Array, size at least max(1, n). Contains details of the interchanges
and the block structure of D.
If ipiv[k - 1] > 0, then rows and columns k and ipiv[k - 1] were
interchanged and Dk, k is a 1-by-1 diagonal block.
If uplo = 'U' and ipiv[k - 1] < 0 and ipiv[k - 2] < 0, then
rows and columns k and -ipiv[k - 1] were interchanged, rows and
columns k - 1 and -ipiv[k - 2] were interchanged, and Dk-1:k, k-1:k is
a 2-by-2 diagonal block.
If uplo = 'L' and ipiv[k - 1] < 0 and ipiv[k] < 0, then rows
and columns k and -ipiv[k - 1] were interchanged, rows and columns
k + 1 and -ipiv[k ] were interchanged, and Dk:k+1, k:k+1 is a 2-by-2
diagonal block.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

747
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If info = i, dii is 0. The factorization has been completed, but D is exactly singular, so the solution could
not be computed.

See Also
Matrix Storage Schemes

?sysv_rk
Computes the solution to system of linear equations A
* X = B for SY matrices.
lapack_int LAPACKE_ssysv_rk (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, float * A, lapack_int lda, float * e, lapack_int * ipiv, float * B, lapack_int
ldb);
lapack_int LAPACKE_dsysv_rk (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, double * A, lapack_int lda, double * e, lapack_int * ipiv, double * B, lapack_int
ldb);
lapack_int LAPACKE_csysv_rk (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, lapack_complex_float * A, lapack_int lda, lapack_complex_float * e, lapack_int *
ipiv, lapack_complex_float * B, lapack_int ldb);
lapack_int LAPACKE_zsysv_rk (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, lapack_complex_double * A, lapack_int lda, lapack_complex_double * e, lapack_int
* ipiv, lapack_complex_double * B, lapack_int ldb);

Description
?sysv_rk computes the solution to a real or complex system of linear equations A * X = B, where A is an n-
by-n symmetric matrix and X and B are n-by-nrhs matrices.

The bounded Bunch-Kaufman (rook) diagonal pivoting method is used to factor A as A= P*U*D*(UT)*(PT), if
uplo = 'U', or A= P*L*D*(LT)*(PT), if uplo = 'L', where U (or L) is unit upper (or lower) triangular matrix,
UT (or LT) is the transpose of U (or L), P is a permutation matrix, PT is the transpose of P, and D is symmetric
and block diagonal with 1-by-1 and 2-by-2 diagonal blocks.
?sytrf_rk is called to compute the factorization of a real or complex symmetric matrix. The factored form of
A is then used to solve the system of equations A * X = B by calling BLAS3 routine ?sytrs_3.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo Specifies whether the upper or lower triangular part of the symmetric
matrix A is stored:

• = 'U': The upper triangle of A is stored.

• = 'L': The lower triangle of A is stored.

n The number of linear equations; that is, the order of the matrix A. n ≥ 0.

nrhs The number of right-hand sides; that is, the number of columns of the
matrix B. nrhs ≥ 0.

A Array of size max(1, lda*n). On entry, the symmetric matrix A. If uplo =

'U', the leading n-by-n upper triangular part of A contains the upper
triangular part of the matrix A, and the strictly lower triangular part of A is

748
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
not referenced. If uplo = 'L', the leading n-by-n lower triangular part of A
contains the lower triangular part of the matrix A, and the strictly upper
triangular part of A is not referenced.

lda The leading dimension of the array A.

B Array of size max(1, ldb*nrhs). On entry, the n-by-nrhs right-hand side

matrix B.

ldb The leading dimension of the array B. ldb ≥ max(1, n) for column-major
layout and ldb ≥ nrhs for row-major layout.

Output Parameters

A On exit, if info = 0, the diagonal of the block diagonal matrix D and factors
U or L as computed by ?sytrf_rk:

• Only diagonal elements of the symmetric block diagonal matrix D on the

diagonal of A; that is, D(k,k) = A(k,k). Superdiagonal (or subdiagonal)
elements of D are stored on exit in array e.
• If uplo = 'U', factor U in the superdiagonal part of A. If uplo = 'L',
factor L in the subdiagonal part of A. For more information, see the
description of the ?sytrf_rk routine.

e Array of size n. On exit, contains the output computed by the factorization

routine ?sytrf_rk; that is, the superdiagonal (or subdiagonal) elements of
the symmetric block diagonal matrix D with 1-by-1 or 2-by-2 diagonal
blocks. If uplo = 'U', e(i) = D(i-1,i), i=1:N-1, and e(1) is set to 0. If uplo
= 'L', e(i) = D(i+1,i), i=1:N-1, and e(n) is set to 0.

NOTE For 1-by-1 diagonal block D(k), where 1 ≤ k ≤ n, the

element e(k) is set to 0 in both the uplo = 'U' and uplo = 'L'
cases. For more information, see the description of
the?sytrf_rk routine.

ipiv Array of size n. Details of the interchanges and the block structure of D, as
determined by ?sytrf_rk. For more information, see the description of
the ?sytrf_rk routine.

B On exit, if info = 0, the n-by-nrhs solution matrix X.

Return Values
This function returns a value info.

= 0: Successful exit.
< 0: If info = -k, the kth argument had an illegal value.

> 0: If info = k, the matrix A is singular. If uplo = 'U', column k in the upper triangular part of A contains
all zeros. If uplo = 'L', column k in the lower triangular part of A contains all zeros. Therefore D(k,k) is
exactly zero, and superdiagonal elements of column k of U (or subdiagonal elements of column k of L) are all
zeros. The factorization has been completed, but the block diagonal matrix D is exactly singular, and division
by zero will occur if it is used to solve a system of equations.

749
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

?sysvx
Uses the diagonal pivoting factorization to compute
the solution to the system of linear equations with a
real or complex symmetric coefficient matrix A, and
provides error bounds on the solution.

Syntax
lapack_int LAPACKE_ssysvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const float* a, lapack_int lda, float* af, lapack_int ldaf,
lapack_int* ipiv, const float* b, lapack_int ldb, float* x, lapack_int ldx, float*
rcond, float* ferr, float* berr );
lapack_int LAPACKE_dsysvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const double* a, lapack_int lda, double* af, lapack_int ldaf,
lapack_int* ipiv, const double* b, lapack_int ldb, double* x, lapack_int ldx, double*
rcond, double* ferr, double* berr );
lapack_int LAPACKE_csysvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const lapack_complex_float* a, lapack_int lda, lapack_complex_float*
af, lapack_int ldaf, lapack_int* ipiv, const lapack_complex_float* b, lapack_int ldb,
lapack_complex_float* x, lapack_int ldx, float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_zsysvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const lapack_complex_double* a, lapack_int lda, lapack_complex_double*
af, lapack_int ldaf, lapack_int* ipiv, const lapack_complex_double* b, lapack_int ldb,
lapack_complex_double* x, lapack_int ldx, double* rcond, double* ferr, double* berr );

Include Files
• mkl.h

Description

The routine uses the diagonal pivoting factorization to compute the solution to a real or complex system of
linear equations A*X = B, where A is a n-by-n symmetric matrix, the columns of matrix B are individual
right-hand sides, and the columns of X are the corresponding solutions.
Error bounds on the solution and a condition estimate are also provided.
The routine ?sysvx performs the following steps:

1. If fact = 'N', the diagonal pivoting method is used to factor the matrix A. The form of the
factorization is A = U*D*UT or A = L*D*LT, where U (or L) is a product of permutation and unit upper
(lower) triangular matrices, and D is symmetric and block diagonal with 1-by-1 and 2-by-2 diagonal
blocks.
2. If some di,i= 0, so that D is exactly singular, then the routine returns with info = i. Otherwise, the
factored form of A is used to estimate the condition number of the matrix A. If the reciprocal of the
condition number is less than machine precision, info = n+1 is returned as a warning, but the routine
still goes on to solve for X and compute error bounds as described below.
3. The system of equations is solved for X using the factored form of A.
4. Iterative refinement is applied to improve the computed solution matrix and calculate error bounds and
backward error estimates for it.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

750
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
fact Must be 'F' or 'N'.

Specifies whether or not the factored form of the matrix A has been
supplied on entry.
If fact = 'F': on entry, af and ipiv contain the factored form of A.
Arrays a, af, and ipiv will not be modified.
If fact = 'N', the matrix A will be copied to af and factored.

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored:

If uplo = 'U', the upper triangle of A is stored.

If uplo = 'L', the lower triangle of A is stored.

n The order of matrix A; n≥ 0.

nrhs The number of right-hand sides, the number of columns in B; nrhs≥

a, af, b Arrays: a(size max(1, lda*n)), af(size max(1, ldaf*n)), bof size
max(1, ldb*nrhs) for column major layout and max(1, ldb*n) for
row major layout .
The array a contains the upper or the lower triangular part of the
symmetric matrix A (see uplo).
The array af is an input argument if fact = 'F'. It contains the block
diagonal matrix D and the multipliers used to obtain the factor U or L
from the factorization A = U*D*UT orA = L*D*LT as computed
by ?sytrf.

The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.

lda The leading dimension of a; lda≥ max(1, n).

ldaf The leading dimension of af; ldaf≥ max(1, n).

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

ipiv Array, size at least max(1, n). The array ipiv is an input argument if
fact = 'F'. It contains details of the interchanges and the block
structure of D, as determined by ?sytrf.

If ipiv[i-1] = k > 0, then dii is a 1-by-1 diagonal block, and the

i-th row and column of A was interchanged with the k-th row and
column.
If uplo = 'U'and ipiv[i] = ipiv[i-1] = -m < 0, then D has a
2-by-2 block in rows/columns i and i+1, and (i)-th row and column of
A was interchanged with the m-th row and column.
If uplo = 'L'and ipiv[i] = ipiv[i-1] = -m < 0, then D has a
2-by-2 block in rows/columns i and i+1, and (i+1)-th row and column
of A was interchanged with the m-th row and column.

751
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.

Output Parameters

x Array, size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout.
If info = 0 or info = n+1, the array x contains the solution matrix
X to the system of equations.

af, ipiv These arrays are output arguments if fact = 'N'.

See the description of af, ipiv in Input Arguments section.

rcond An estimate of the reciprocal condition number of the matrix A. If

rcond is less than the machine precision (in particular, if rcond = 0),
the matrix is singular to working precision. This condition is indicated
by a return code of info > 0.

ferr Array, size at least max(1, nrhs). Contains the estimated forward
error bound for each solution vector xj (the j-th column of the solution
matrix X). If xtrue is the true solution corresponding to xj, ferr[j-1]
is an estimated upper bound for the magnitude of the largest element
in (xj - xtrue) divided by the magnitude of the largest element in xj.
The estimate is as reliable as the estimate for rcond, and is almost
always a slight overestimate of the true error.

berr Array, size at least max(1, nrhs). Contains the component-wise

relative backward error for each solution vector xj, that is, the
smallest relative change in any element of A or B that makes xj an
exact solution.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, and i≤n, then dii is exactly zero. The factorization has been completed, but the block diagonal
matrix D is exactly singular, so the solution and error bounds could not be computed; rcond = 0 is returned.
If info = i, and i = n + 1, then D is nonsingular, but rcond is less than machine precision, meaning that the
matrix is singular to working precision. Nevertheless, the solution and error bounds are computed because
there are a number of situations where the computed solution can be more accurate than the value of rcond
would suggest.

See Also
Matrix Storage Schemes

?sysvxx
Uses extra precise iterative refinement to compute the
solution to the system of linear equations with a
symmetric indefinite coefficient matrix A applying the
diagonal pivoting factorization.

752
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_ssysvxx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, float* a, lapack_int lda, float* af, lapack_int ldaf, lapack_int* ipiv,
char* equed, float* s, float* b, lapack_int ldb, float* x, lapack_int ldx, float* rcond,
float* rpvgrw, float* berr, lapack_int n_err_bnds, float* err_bnds_norm, float*
err_bnds_comp, lapack_int nparams, const float* params );
lapack_int LAPACKE_dsysvxx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, double* a, lapack_int lda, double* af, lapack_int ldaf, lapack_int*
ipiv, char* equed, double* s, double* b, lapack_int ldb, double* x, lapack_int ldx,
double* rcond, double* rpvgrw, double* berr, lapack_int n_err_bnds, double*
err_bnds_norm, double* err_bnds_comp, lapack_int nparams, const double* params );
lapack_int LAPACKE_csysvxx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, lapack_complex_float* a, lapack_int lda, lapack_complex_float* af,
lapack_int ldaf, lapack_int* ipiv, char* equed, float* s, lapack_complex_float* b,
lapack_int ldb, lapack_complex_float* x, lapack_int ldx, float* rcond, float* rpvgrw,
float* berr, lapack_int n_err_bnds, float* err_bnds_norm, float* err_bnds_comp,
lapack_int nparams, const float* params );
lapack_int LAPACKE_zsysvxx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, lapack_complex_double* a, lapack_int lda, lapack_complex_double* af,
lapack_int ldaf, lapack_int* ipiv, char* equed, double* s, lapack_complex_double* b,
lapack_int ldb, lapack_complex_double* x, lapack_int ldx, double* rcond, double*
rpvgrw, double* berr, lapack_int n_err_bnds, double* err_bnds_norm, double*
err_bnds_comp, lapack_int nparams, const double* params );

Include Files
• mkl.h

Description

The routine uses the diagonal pivoting factorization to compute the solution to a real or complex system of
linear equations A*X = B, where A is an n-by-n real symmetric/Hermitian matrix, the columns of matrix B
are individual right-hand sides, and the columns of X are the corresponding solutions.
Both normwise and maximum componentwise error bounds are also provided on request. The routine returns
a solution with a small guaranteed error (O(eps), where eps is the working machine precision) unless the
matrix is very ill-conditioned, in which case a warning is returned. Relevant condition numbers are also
calculated and returned.
The routine accepts user-provided factorizations and equilibration factors; see definitions of the fact and
equed options. Solving with refinement and using a factorization from a previous call of the routine also
produces a solution with O(eps) errors or warnings but that may not be true for general user-provided
factorizations and equilibration factors if they differ from what the routine would itself produce.
The routine ?sysvxx performs the following steps:

1. If fact = 'E', scaling factors are computed to equilibrate the system:

diag(s)Adiag(s) inv(diag(s))X = diag(s)*B

Whether or not the system will be equilibrated depends on the scaling of the matrix A, but if
equilibration is used, A is overwritten by diag(s)*A*diag(s) and B by diag(s)*B.
2. If fact = 'N' or 'E', the LU decomposition is used to factor the matrix A (after equilibration if fact
= 'E') as

753
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

A = UDUT, if uplo = 'U',

or A = L*D*LT, if uplo = 'L',

where U or L is a product of permutation and unit upper (lower) triangular matrices, and D is a
symmetric and block diagonal with 1-by-1 and 2-by-2 diagonal blocks.
3. If some D(i,i)=0, so that D is exactly singular, the routine returns with info = i. Otherwise, the
factored form of A is used to estimate the condition number of the matrix A (see the rcond parameter).
If the reciprocal of the condition number is less than machine precision, the routine still goes on to
solve for X and compute error bounds.
4. The system of equations is solved for X using the factored form of A.
5. By default, unless params[0] is set to zero, the routine applies iterative refinement to get a small error
and error bounds. Refinement calculates the residual to at least twice the working precision.
6. If equilibration was used, the matrix X is premultiplied by diag(r) so that it solves the original system
before equilibration.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

fact Must be 'F', 'N', or 'E'.

Specifies whether or not the factored form of the matrix A is supplied

If fact = 'N', the matrix A will be copied to af and factored.

If fact = 'E', the matrix A will be equilibrated, if necessary, copied

to af and factored.

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored:

If uplo = 'U', the upper triangle of A is stored.

If uplo = 'L', the lower triangle of A is stored.

n The number of linear equations; the order of the matrix A; n≥ 0.

nrhs The number of right-hand sides; the number of columns of the

matrices B and X; nrhs≥ 0.

a, af, b Arrays: a(size max(1, ldan)), af(size max(1, ldafn)), b, size

max(ldb*nrhs) for column major layout and max(ldb*n) for row
major layout,.
The array a contains the symmetric matrix A as specified by uplo. If
uplo = 'U', the leading n-by-n upper triangular part of a contains the
upper triangular part of the matrix A and the strictly lower triangular
part of a is not referenced. If uplo = 'L', the leading n-by-n lower
triangular part of a contains the lower triangular part of the matrix A
and the strictly upper triangular part of a is not referenced.

754
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The array af is an input argument if fact = 'F'. It contains the
block diagonal matrix D and the multipliers used to obtain the factor U
and L from the factorization A = U*D*UT or A = L*D*LT as computed
by ?sytrf.

The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.

lda The leading dimension of the array a; lda≥ max(1,n).

ldaf The leading dimension of the array af; ldaf≥ max(1,n).

ipiv Array, size at least max(1, n). The array ipiv is an input argument if
fact = 'F'. It contains details of the interchanges and the block
structure of D as determined by ?sytrf. If ipiv[k-1] > 0, rows and
columns k and ipiv[k-1] were interchanged and D(k,k) is a 1-by-1
diagonal block.
If uplo = 'U' and ipiv[i] = ipiv[i - 1] = m < 0, D has a 2-
by-2 diagonal block in rows and columns i and i + 1, and the i-th row
and column of A were interchanged with the m-th row and column.
If uplo = 'L' and ipiv[i] = ipiv[i - 1] = m < 0, D has a 2-
by-2 diagonal block in rows and columns i and i + 1, and the (i + 1)-st
row and column of A were interchanged with the m-th row and
column.

equed Must be 'N' or 'Y'.

equed is an input argument if fact = 'F'. It specifies the form of

s Array, size (n). The array s contains the scale factors for A. If equed
= 'Y', A is multiplied on the left and right by diag(s).
This array is an input argument if fact = 'F' only; otherwise it is an
output argument.
If fact = 'F' and equed = 'Y', each element of s must be positive.

Each element of s should be a power of the radix to ensure a reliable

solution and error estimates. Scaling by powers of the radix does not
cause rounding errors unless the result underflows or overflows.
Rounding errors during scaling lead to refining with a matrix that is
not equivalent to the input matrix, producing error estimates that may
not be reliable.

ldb The leading dimension of the array b; ldb≥ max(1, n) for column
major layout and ldb≥nrhs for row major layout.

ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.

755
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

n_err_bnds Number of error bounds to return for each right hand side and each
type (normwise or componentwise). See err_bnds_norm and
err_bnds_comp descriptions in the Output Arguments section below.

nparams Specifies the number of parameters set in params. If ≤ 0, the params

array is never referenced and default values are used.

params Array, size max(1,nparams). Specifies algorithm parameters. If an

params[0] : Whether to perform iterative refinement or not. Default:

1.0 (for single precision flavors), 1.0D+0 (for double precision
flavors).

=0.0 No refinement is performed and no error

bounds are computed.

=1.0 Use the extra-precise refinement algorithm.

(Other values are reserved for future use.)

params[1] : Maximum number of residual computations allowed for
refinement.

Default 10.0

Aggressive Set to 100.0 to permit convergence using

params[2] : Flag determining if the code will attempt to find a

solution with a small componentwise relative error in the double-
precision algorithm. Positive is true, 0.0 is false. Default: 1.0 (attempt
componentwise convergence).

Output Parameters

x Array, size max(1, ldx*nrhs) for column major layout and max(1, ldx*n)
for row major layout).
If info = 0, the array x contains the solution n-by-nrhs matrix X to the
original system of equations. Note that A and B are modified on exit if
equed≠'N', and the solution to the equilibrated system is:
inv(diag(s))*X.

a If fact = 'E' and equed = 'Y', overwritten by diag(s)Adiag(s).

756
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
af If fact = 'N', af is an output argument and on exit returns the block
diagonal matrix D and the multipliers used to obtain the factor U or L from
the factorization A = U*D*UT or A = L*D*LT.

b If equed = 'N', B is not modified.

If equed = 'Y', B is overwritten by diag(s)*B.

s This array is an output argument if fact≠'F'. Each element of this array is

a power of the radix. See the description of s in Input Arguments section.

rcond Reciprocal scaled condition number. An estimate of the reciprocal Skeel

rpvgrw Contains the reciprocal pivot growth factor:

If this is much less than 1, the stability of the LU factorization of the

err_bnds_norm Array of size nrhs*n_err_bnds. For each right-hand side, contains

information about various error bounds and condition numbers
corresponding to the normwise relative error, which is defined as follows:
Normwise relative error in the i-th solution vector

The array is indexed by the type of error information as described below. Up

to three pieces of information are returned.

err=1 "Trust/don't trust" boolean. Trust the answer if

the reciprocal condition number is less than the
threshold sqrt(n)*slamch(ε) for single
precision flavors and sqrt(n)*dlamch(ε) for
double precision flavors.

err=2 "Guaranteed" error bound. The estimated

forward error, almost certainly within a factor of
10 of the true error so long as the next entry is
greater than the threshold sqrt(n)*slamch(ε)
for single precision flavors and

757
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

sqrt(n)*dlamch(ε) for double precision

flavors. This error bound should only be trusted
if the previous boolean is true.

err=3 Reciprocal condition number. Estimated

normwise reciprocal condition number.
Compared with the threshold
sqrt(n)*slamch(ε) for single precision flavors
and sqrt(n)*dlamch(ε) for double precision
flavors to determine if the error estimate is
"guaranteed". These reciprocal condition
numbers for some appropriately scaled matrix Z
are:

Let z=s*a, where s scales each row by a power

of the radix so all absolute row sums of z are
approximately 1.

The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in err_bnds_norm[(err-1)*nrhs + i - 1].

err_bnds_comp Array of size nrhs*n_err_bnds. For each right-hand side, contains

information about various error bounds and condition numbers
corresponding to the componentwise relative error, which is defined as
follows:
Componentwise relative error in the i-th solution vector:

The array is indexed by the type of error information as described below.

There are currently up to three pieces of information returned for each
right-hand side. If componentwise accuracy is not requested (params[2] =
0.0), then err_bnds_comp is not accessed.

err=1 "Trust/don't trust" boolean. Trust the answer if

the reciprocal condition number is less than the
threshold sqrt(n)*slamch(ε) for single
precision flavors and sqrt(n)*dlamch(ε) for
double precision flavors.

err=2 "Guaranteed" error bpound. The estimated

forward error, almost certainly within a factor of
10 of the true error so long as the next entry is
greater than the threshold sqrt(n)*slamch(ε)
for single precision flavors and
sqrt(n)*dlamch(ε) for double precision
flavors. This error bound should only be trusted
if the previous boolean is true.

758
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
err=3 Reciprocal condition number. Estimated
componentwise reciprocal condition number.
Compared with the threshold
sqrt(n)*slamch(ε) for single precision flavors
and sqrt(n)*dlamch(ε) for double precision
flavors to determine if the error estimate is
"guaranteed". These reciprocal condition
numbers for some appropriately scaled matrix Z
are:

Let z=s(adiag(x)), where x is the solution

for the current right-hand side and s scales each
row of a*diag(x) by a power of the radix so all
absolute row sums of z are approximately 1.

The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in err_bnds_comp[(err-1)*nrhs + i - 1].

ipiv If fact = 'N', ipiv is an output argument and on exit contains details of
the interchanges and the block structure D, as determined by ssytrf for
single precision flavors and dsytrf for double precision flavors.

equed If fact≠'F', then equed is an output argument. It specifies the form of

equilibration that was done (see the description of equed in Input
Arguments section).

params If an entry is less than 0.0, that entry is filled with the default value used
for that parameter, otherwise the entry is not modified.

Return Values
This function returns a value info.

If info = 0, the execution is successful. The solution to every right-hand side is guaranteed.

If info = -i, parameter i had an illegal value.

If 0 < info≤n: Uinfo,info is exactly zero. The factorization has been completed, but the factor U is exactly
singular, so the solution and error bounds could not be computed; rcond = 0 is returned.

See Also
Matrix Storage Schemes

759
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

?hesv
Computes the solution to the system of linear
equations with a Hermitian matrix A and multiple
right-hand sides.

Syntax
lapack_int LAPACKE_chesv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , lapack_complex_float * a , lapack_int lda , lapack_int * ipiv ,
lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zhesv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , lapack_complex_double * a , lapack_int lda , lapack_int * ipiv ,
lapack_complex_double * b , lapack_int ldb );

Include Files
• mkl.h

Description

The routine solves for X the complex system of linear equations A*X = B, where A is an n-by-n symmetric
matrix, the columns of matrix B are individual right-hand sides, and the columns of X are the corresponding
solutions.
The diagonal pivoting method is used to factor A as A = U*D*UH or A = L*D*LH, where U (or L) is a product
of permutation and unit upper (lower) triangular matrices, and D is Hermitian and block diagonal with 1-by-1
and 2-by-2 diagonal blocks.
The factored form of A is then used to solve the system of equations A*X = B.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored and

how A is factored:
If uplo = 'U', the array a stores the upper triangular part of the
matrix A, and A is factored as U*D*UH.

If uplo = 'L', the array a stores the lower triangular part of the
matrix A, and A is factored as L*D*LH.

n The order of matrix A; n≥ 0.

nrhs The number of right-hand sides, the number of columns in B; nrhs≥

a, b Arrays: a(size max(1, ldan)), bbof size max(1, ldbnrhs) for

column major layout and max(1, ldb*n) for row major layout. The
array a contains the upper or the lower triangular part of the
Hermitian matrix A (see uplo).
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.

760
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lda The leading dimension of a; lda≥ max(1, n).

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

Output Parameters

a If info = 0, a is overwritten by the block-diagonal matrix D and the

multipliers used to obtain the factor U (or L) from the factorization of
A as computed by ?hetrf.

b If info = 0, b is overwritten by the solution matrix X.

ipiv Array, size at least max(1, n). Contains details of the interchanges
and the block structure of D, as determined by ?hetrf.

If ipiv[i-1] = k > 0, then dii is a 1-by-1 diagonal block, and the

i-th row and column of A was interchanged with the k-th row and
column.
If uplo = 'U'and ipiv[i] =ipiv[i-1] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and (i)-th row and column of A was
interchanged with the m-th row and column.
If uplo = 'L'and ipiv[i] =ipiv[i-1] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and (i+1)-th row and column of A
was interchanged with the m-th row and column.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, dii is 0. The factorization has been completed, but D is exactly singular, so the solution could
not be computed.

See Also
Matrix Storage Schemes

?hesv_aa
Computes the solution to system of linear equations
for HE matrices.
LAPACK_DECL lapack_int LAPACKE_chesv_aa (int matrix_layout, char uplo, lapack_int n,
lapack_int nrhs, lapack_complex_float * a, lapack_int lda, lapack_int * ipiv,
lapack_complex_float * b, lapack_int ldb );
LAPACK_DECL lapack_int LAPACKE_chesv_aa_work (int matrix_layout, char uplo, lapack_int
n, lapack_int nrhs, lapack_complex_float * a, lapack_int lda, lapack_int * ipiv,
lapack_complex_float * b, lapack_int ldb, lapack_complex_float * work, lapack_int
lwork );

Description
?hesv_aa computes the solution to a complex system of linear equations A * X = B, where A is an n-by-n
Hermitian matrix and X and B are n-by-nrhs matrices. Aasen's algorithm is used to factor A as

A = U * T * UH if uplo = 'U', or

761
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

A = L * T * LH if uplo = 'L',

where U (or L) is a product of permutation and unit upper (lower) triangular matrices, and T is Hermitian and
tridiagonal. The factored form of A is then used to solve the system of equations A * X = B.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo If uplo = 'U': The upper triangle of A is stored.

If uplo = 'L': the lower triangle of A is stored.

n The number of linear equations or the order of the matrix A. n≥ 0.

nrhs The number of right hand sides or the number of columns of the matrix B.
nrhs≥ 0.

a Array of size lda*n. On entry, the Hermitian matrix A.

If uplo = 'U', the leading n-by-n upper triangular part of a contains the
upper triangular part of the matrix A, and the strictly lower triangular part
of a is not referenced.

If uplo = 'L', the leading n-by-n lower triangular part of a contains the
lower triangular part of the matrix A, and the strictly upper triangular part
of a is not referenced.

lda The leading dimension of the array a. lda≥ max(1,n).

b Array of size ldb*nrhs. On entry, the n-by-nrhs right hand side matrix B.

ldb The leading dimension of the array b. ldb≥ max(1,n).

lwork The length of work. lwork≥ max(1, 2*n, 3*n-2), and for best performance
lwork≥ max(1,n*nb), where nb is the optimal blocksize for ?hetrf.
If lwork < n, TRS is done with Level BLAS 2. If lwork≥n, TRS is done with
Level BLAS 3.
If lwork = -1, then a workspace query is assumed; the routine only
calculates the optimal size of the work array, returns this value as the first
entry of the work array, and no error message related to lwork is issued by
xerbla.

Output Parameters

a On exit, if info = 0, the tridiagonal matrix T and the multipliers used to

obtain the factor U or L from the factorization A = U*T*UH or A = L*T*LH as
computed by ?hetrf_aa.

ipiv Array of size (n) On exit, it contains the details of the interchanges: row
and column k of A were interchanged with the row and column ipiv[k].

b On exit, if info = 0, the n-by-nrhs solution matrix X.

work Array of size (max(1, lwork)). On exit, if info = 0, work[0] returns the
optimal lwork.

762
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Return Values
This function returns a value info.

If info = 0: successful exit.

If info < 0: if info = -i, the i-th argument had an illegal value.

If info > 0: if info = i, Di, i is exactly zero. The factorization has been completed, but the block diagonal
matrix D is exactly singular, so the solution could not be computed.

?hesv_rk
?hesv_rk computes the solution to a system of linear
equations A * X = B for Hermitian matrices.
lapack_int LAPACKE_chesv_rk (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, lapack_complex_float * A, lapack_int lda, lapack_complex_float * e, lapack_int *
ipiv, lapack_complex_float * B, lapack_int ldb);
lapack_int LAPACKE_zhesv_rk (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, lapack_complex_double * A, lapack_int lda, lapack_complex_double * e, lapack_int
* ipiv, lapack_complex_double * B, lapack_int ldb);

Description
?hesv_rk computes the solution to a complex system of linear equations A * X = B, where A is an n-by-n
Hermitian matrix and X and B are n-by-nrhs matrices.

The bounded Bunch-Kaufman (rook) diagonal pivoting method is used to factor A as A = P*U*D*(UH)*(PT), if
uplo = 'U', or A = P*L*D*(LH)*(PT), if uplo = 'L', where U (or L) is unit upper (or lower) triangular
matrix, UH (or LH) is the conjugate of U (or L), P is a permutation matrix, PT is the transpose of P, and D is
Hermitian and block diagonal with 1-by-1 and 2-by-2 diagonal blocks.
?hetrf_rk is called to compute the factorization of a complex Hermitian matrix. The factored form of A is
then used to solve the system of equations A * X = B by calling BLAS3 routine ?hetrs_3.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo Specifies whether the upper or lower triangular part of the Hermitian matrix
A is stored:

• = 'U': The upper triangle of A is stored.

• = 'L': The lower triangle of A is stored.

n The number of linear equations; that is, the order of the matrix A. n ≥ 0.

nrhs The number of right-hand sides; that is, the number of columns of the
matrix B. nrhs ≥ 0.

A Array of size max(1, lda*n). On entry, the Hermitian matrix A. If uplo =

'U': the leading n-by-n upper triangular part of A contains the upper
triangular part of the matrix A, and the strictly lower triangular part of A is
not referenced. If uplo = 'L': the leading n-by-n lower triangular part of A
contains the lower triangular part of the matrix A, and the strictly upper
triangular part of A is not referenced.

lda The leading dimension of the array A.

763
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

B On entry, the n-by-nrhs right-hand side matrix B.

The size of B is max(1, ldb*nrhs) for column-major layout and max(1,

ldb*n) for row-major layout.

ldb The leading dimension of the array B. ldb ≥ max(1, n) for column-major
layout and ldb ≥ nrhs for row-major layout.

Output Parameters

A On exit, if info = 0, diagonal of the block diagonal matrix D and factors U

or L as computed by ?hetrf_rk:

• Only diagonal elements of the Hermitian block diagonal matrix D on the

diagonal of A; that is, D(k,k) = A(k,k); (superdiagonal (or subdiagonal)
elements of D are stored on exit in array e).

—and—
• If uplo = 'U', factor U in the superdiagonal part of A. If uplo = 'L',
factor L in the subdiagonal part of A.
For more information, see the description of the ?hetrf_rk routine.

e Array of size n. On exit, contains the output computed by the factorization

routine ?hetrf_rk; that is, the superdiagonal (or subdiagonal) elements of
the Hermitian block diagonal matrix D with 1-by-1 or 2-by-2 diagonal
blocks:

• If uplo = 'U', e(i) = D(i-1,i), i=2:N, e(1) is set to 0.

• If uplo = 'L', e(i) = D(i+1,i), i=1:N-1, e(n) is set to 0.

NOTE For a 1-by-1 diagonal block D(k), where 1 ≤ k ≤ n, the

element e(k) is set to 0 in both the uplo = 'U' and uplo = 'L'
cases.

For more information, see the description of the ?hetrf_rk routine.

ipiv Array of size n. Details of the interchanges and the block structure of D, as
determined by ?hetrf_rk.

B On exit, if info = 0, the n-by-nrhs solution matrix X.

Return Values
This function returns a value info.

= 0: Successful exit.
< 0: If info = -k, the kth argument had an illegal value.

> 0: If info = k, the matrix A is singular. If uplo = 'U', column k in the upper triangular part of A contains
all zeros. If uplo = 'L', column k in the lower triangular part of A contains all zeros. Therefore D(k,k) is
exactly zero, and superdiagonal elements of column k of U (or subdiagonal elements of column k of L ) are
all zeros. The factorization has been completed, but the block diagonal matrix D is exactly singular, and
division by zero will occur if it is used to solve a system of equations.

764
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
?hesvx
Uses the diagonal pivoting factorization to compute
the solution to the complex system of linear equations
with a Hermitian coefficient matrix A, and provides
error bounds on the solution.

Syntax
lapack_int LAPACKE_chesvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const lapack_complex_float* a, lapack_int lda, lapack_complex_float*
af, lapack_int ldaf, lapack_int* ipiv, const lapack_complex_float* b, lapack_int ldb,
lapack_complex_float* x, lapack_int ldx, float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_zhesvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const lapack_complex_double* a, lapack_int lda, lapack_complex_double*
af, lapack_int ldaf, lapack_int* ipiv, const lapack_complex_double* b, lapack_int ldb,
lapack_complex_double* x, lapack_int ldx, double* rcond, double* ferr, double* berr );

Include Files
• mkl.h

Description

The routine uses the diagonal pivoting factorization to compute the solution to a complex system of linear
equations A*X = B, where A is an n-by-n Hermitian matrix, the columns of matrix B are individual right-hand
sides, and the columns of X are the corresponding solutions.
Error bounds on the solution and a condition estimate are also provided.
The routine ?hesvx performs the following steps:

1. If fact = 'N', the diagonal pivoting method is used to factor the matrix A. The form of the
factorization is A = U*D*UH or A = L*D*LH, where U (or L) is a product of permutation and unit upper
(lower) triangular matrices, and D is Hermitian and block diagonal with 1-by-1 and 2-by-2 diagonal
blocks.
2. If some di,i= 0, so that D is exactly singular, then the routine returns with info = i. Otherwise, the
factored form of A is used to estimate the condition number of the matrix A. If the reciprocal of the
condition number is less than machine precision, info = n+1 is returned as a warning, but the routine
still goes on to solve for X and compute error bounds as described below.
3. The system of equations is solved for X using the factored form of A.
4. Iterative refinement is applied to improve the computed solution matrix and calculate error bounds and
backward error estimates for it.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

fact Must be 'F' or 'N'.

Specifies whether or not the factored form of the matrix A has been
supplied on entry.
If fact = 'F': on entry, af and ipiv contain the factored form of A.
Arrays a, af, and ipiv are not modified.
If fact = 'N', the matrix A is copied to af and factored.

765
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored and

how A is factored:
If uplo = 'U', the array a stores the upper triangular part of the
Hermitian matrix A, and A is factored as U*D*UH.

If uplo = 'L', the array a stores the lower triangular part of the
Hermitian matrix A; A is factored as L*D*LH.

n The order of matrix A; n≥ 0.

nrhs The number of right-hand sides, the number of columns in B; nrhs≥

a, af, b Arrays: a(size max(1, lda*n)), af(size max(1, ldaf*n)), bof size
max(1, ldb*nrhs) for column major layout and max(1, ldb*n) for
row major layout.
The array a contains the upper or the lower triangular part of the
Hermitian matrix A (see uplo).
The array af is an input argument if fact = 'F'. It contains he block
diagonal matrix D and the multipliers used to obtain the factor U or L
from the factorization A = U*D*UH or A = L*D*LH as computed
by ?hetrf.

The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.

lda The leading dimension of a; lda≥ max(1, n).

ldaf The leading dimension of af; ldaf≥ max(1, n).

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

ipiv Array, size at least max(1, n). The array ipiv is an input argument if
fact = 'F'. It contains details of the interchanges and the block
structure of D, as determined by ?hetrf.

If ipiv[i-1] = k > 0, then dii is a 1-by-1 diagonal block, and the

i-th row and column of A was interchanged with the k-th row and
column.
If uplo = 'U'and ipiv[i] =ipiv[i-1] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and (i)-th row and column of A was
interchanged with the m-th row and column.
If uplo = 'L'and ipiv[i] =ipiv[i-1] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and (i+1)-th row and column of A
was interchanged with the m-th row and column.

ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.

766
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

x Array, size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout.
If info = 0 or info = n+1, the array x contains the solution matrix
X to the system of equations.

af, ipiv These arrays are output arguments if fact = 'N'. See the
description of af, ipiv in Input Arguments section.

rcond An estimate of the reciprocal condition number of the matrix A. If

rcond is less than the machine precision (in particular, if rcond = 0),
the matrix is singular to working precision. This condition is indicated
by a return code of info > 0.

ferr Array, size at least max(1, nrhs). Contains the estimated forward
error bound for each solution vector xj (the j-th column of the solution
matrix X). If xtrue is the true solution corresponding to xj, ferr[j-1]
is an estimated upper bound for the magnitude of the largest element
in (xj) - xtrue) divided by the magnitude of the largest element in xj.
The estimate is as reliable as the estimate for rcon, and is almost
always a slight overestimate of the true error.

berr Array, size at least max(1, nrhs). Contains the component-wise

relative backward error for each solution vector xj, that is, the
smallest relative change in any element of A or B that makes xj an
exact solution.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

See Also
Matrix Storage Schemes

?hesvxx
Uses extra precise iterative refinement to compute the
solution to the system of linear equations with a
Hermitian indefinite coefficient matrix A applying the
diagonal pivoting factorization.

Syntax
lapack_int LAPACKE_chesvxx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, lapack_complex_float* a, lapack_int lda, lapack_complex_float* af,
lapack_int ldaf, lapack_int* ipiv, char* equed, float* s, lapack_complex_float* b,

767
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lapack_int ldb, lapack_complex_float* x, lapack_int ldx, float* rcond, float* rpvgrw,

float* berr, lapack_int n_err_bnds, float* err_bnds_norm, float* err_bnds_comp,
lapack_int nparams, const float* params );
lapack_int LAPACKE_zhesvxx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, lapack_complex_double* a, lapack_int lda, lapack_complex_double* af,
lapack_int ldaf, lapack_int* ipiv, char* equed, double* s, lapack_complex_double* b,
lapack_int ldb, lapack_complex_double* x, lapack_int ldx, double* rcond, double*
rpvgrw, double* berr, lapack_int n_err_bnds, double* err_bnds_norm, double*
err_bnds_comp, lapack_int nparams, const double* params );

Include Files
• mkl.h

Description

The routine uses the diagonal pivoting factorization to compute the solution to a complex/double complex
system of linear equations A*X = B, where A is an n-by-n Hermitian matrix, the columns of matrix B are
individual right-hand sides, and the columns of X are the corresponding solutions.
Both normwise and maximum componentwise error bounds are also provided on request. The routine returns
a solution with a small guaranteed error (O(eps), where eps is the working machine precision) unless the
matrix is very ill-conditioned, in which case a warning is returned. Relevant condition numbers are also
calculated and returned.
The routine accepts user-provided factorizations and equilibration factors; see definitions of the fact and
equed options. Solving with refinement and using a factorization from a previous call of the routine also
produces a solution with O(eps) errors or warnings but that may not be true for general user-provided
factorizations and equilibration factors if they differ from what the routine would itself produce.
The routine ?hesvxx performs the following steps:

1. If fact = 'E', scaling factors are computed to equilibrate the system:

diag(s)Adiag(s) inv(diag(s))X = diag(s)*B

768
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

fact Must be 'F', 'N', or 'E'.

Specifies whether or not the factored form of the matrix A is supplied

If fact = 'N', the matrix A will be copied to af and factored.

If fact = 'E', the matrix A will be equilibrated, if necessary, copied

to af and factored.

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored:

If uplo = 'U', the upper triangle of A is stored.

If uplo = 'L', the lower triangle of A is stored.

n The number of linear equations; the order of the matrix A; n≥ 0.

nrhs The number of right-hand sides; the number of columns of the

matrices B and X; nrhs≥ 0.

a, af, b Arrays: a(size max(ldan)), af(size max(ldafn)), b, (size

max(ldb*nrhs) for column major layout and max(ldb*n) for row
major layout),.
The array a contains the Hermitian matrix A as specified by uplo. If
uplo = 'U', the leading n-by-n upper triangular part of a contains the
upper triangular part of the matrix A and the strictly lower triangular
part of a is not referenced. If uplo = 'L', the leading n-by-n lower
triangular part of a contains the lower triangular part of the matrix A
and the strictly upper triangular part of a is not referenced.

The array af is an input argument if fact = 'F'. It contains the

block diagonal matrix D and the multipliers used to obtain the factor U
and L from the factorization A = U*D*UT or A = L*D*LT as computed
by ?hetrf.

The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.

lda The leading dimension of the array a; lda≥ max(1,n).

ldaf The leading dimension of the array af; ldaf≥ max(1,n).

ipiv Array, size at least max(1, n). The array ipiv is an input argument if
fact = 'F'. It contains details of the interchanges and the block
structure of D as determined by ?sytrf.

769
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If ipiv[k-1] > 0, rows and columns k and ipiv[k-1] were

interchanged and Dk,k) is a 1-by-1 diagonal block.
If uplo = 'U' and ipiv[i] = ipiv[i - 1] = m < 0, D has a 2-
by-2 diagonal block in rows and columns i and i + 1, and the i-th row
and column of A were interchanged with the m-th row and column.
If uplo = 'L' and ipiv[i] = ipiv[i - 1] = m < 0, D has a 2-
by-2 diagonal block in rows and columns i and i + 1, and the (i + 1)-st
row and column of A were interchanged with the m-th row and
column.

equed Must be 'N' or 'Y'.

equed is an input argument if fact = 'F'. It specifies the form of

Each element of s should be a power of the radix to ensure a reliable

solution and error estimates. Scaling by powers of the radix does not
cause rounding errors unless the result underflows or overflows.
Rounding errors during scaling lead to refining with a matrix that is
not equivalent to the input matrix, producing error estimates that may
not be reliable.

ldb The leading dimension of the array b; ldb≥ max(1, n) for column
major layout and ldb≥nrhs for row major layout.

ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.

n_err_bnds Number of error bounds to return for each right hand side and each
type (normwise or componentwise). See err_bnds_norm and
err_bnds_comp descriptions in the Output Arguments section below.

nparams Specifies the number of parameters set in params. If ≤ 0, the params

array is never referenced and default values are used.

params Array, size max(1,nparams). Specifies algorithm parameters. If an

770
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
params[0] : Whether to perform iterative refinement or not. Default:
1.0 (for single precision flavors), 1.0D+0 (for double precision
flavors).

=0.0 No refinement is performed and no error

bounds are computed.

=1.0 Use the extra-precise refinement algorithm.

(Other values are reserved for future use.)

params[1] : Maximum number of residual computations allowed for
refinement.

Default 10

Aggressive Set to 100 to permit convergence using

params[2] : Flag determining if the code will attempt to find a

solution with a small componentwise relative error in the double-
precision algorithm. Positive is true, 0.0 is false. Default: 1.0 (attempt
componentwise convergence).

Output Parameters

a If fact = 'E' and equed = 'Y', overwritten by diag(s)Adiag(s).

af If fact = 'N', af is an output argument and on exit returns the block

diagonal matrix D and the multipliers used to obtain the factor U or L from
the factorization A = U*D*UT or A = L*D*LT.

b If equed = 'N', B is not modified.

If equed = 'Y', B is overwritten by diag(s)*B.

s This array is an output argument if fact≠'F'. Each element of this array is

a power of the radix. See the description of s in Input Arguments section.

rcond Reciprocal scaled condition number. An estimate of the reciprocal Skeel

771
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

rpvgrw Contains the reciprocal pivot growth factor:

If this is much less than 1, the stability of the LU factorization of the

berr Array, size at least max(1, nrhs). Contains the component-wise relative
backward error for each solution vector xj, that is, the smallest relative
change in any element of A or B that makes xj an exact solution.

err_bnds_norm Array of size nrhs*n_err_bnds. For each right-hand side, contains

information about various error bounds and condition numbers
corresponding to the normwise relative error, which is defined as follows:
Normwise relative error in the i-th solution vector

The array is indexed by the type of error information as described below.

There are currently up to three pieces of information returned.

err=1 "Trust/don't trust" boolean. Trust the answer if

the reciprocal condition number is less than the
threshold sqrt(n)*slamch(ε) for chesvxx and
sqrt(n)*dlamch(ε) for zhesvxx.

err=2 "Guaranteed" error bound. The estimated

forward error, almost certainly within a factor of
10 of the true error so long as the next entry is
greater than the threshold sqrt(n)*slamch(ε)
for chesvxx and sqrt(n)*dlamch(ε) for
zhesvxx. This error bound should only be
trusted if the previous boolean is true.

err=3 Reciprocal condition number. Estimated

normwise reciprocal condition number.
Compared with the threshold
sqrt(n)*slamch(ε) for chesvxx and
sqrt(n)*dlamch(ε)for zhesvxx to determine if
the error estimate is "guaranteed". These
reciprocal condition numbers for some
appropriately scaled matrix Z are:

Let z=s*a, where s scales each row by a power

of the radix so all absolute row sums of z are
approximately 1.

772
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in err_bnds_norm[(err-1)*nrhs + i - 1].

err_bnds_comp Array of size nrhs*n_err_bnds. For each right-hand side, contains

information about various error bounds and condition numbers
corresponding to the componentwise relative error, which is defined as
follows:
Componentwise relative error in the i-th solution vector:

The array is indexed by the type of error information as described below.

There are currently up to three pieces of information returned for each
right-hand side. If componentwise accuracy is not requested (params[2] =
0.0), then err_bnds_comp is not accessed.

err=1 "Trust/don't trust" boolean. Trust the answer if

the reciprocal condition number is less than the
threshold sqrt(n)*slamch(ε) for chesvxx and
sqrt(n)*dlamch(ε) for zhesvxx.

err=2 "Guaranteed" error bpound. The estimated

err=3 Reciprocal condition number. Estimated

componentwise reciprocal condition number.
Compared with the threshold
sqrt(n)*slamch(ε) for chesvxx and
sqrt(n)*dlamch(ε) for zhesvxx to determine
if the error estimate is "guaranteed". These
reciprocal condition numbers for some
appropriately scaled matrix Z are:

Let z=s(adiag(x)), where x is the solution

for the current right-hand side and s scales each
row of a*diag(x) by a power of the radix so all
absolute row sums of z are approximately 1.

The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in err_bnds_comp[(err-1)*nrhs + i - 1].

773
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

equed If fact≠'F', then equed is an output argument. It specifies the form of

equilibration that was done (see the description of equed in Input
Arguments section).

params If an entry is less than 0.0, that entry is filled with the default value used
for that parameter, otherwise the entry is not modified.

Return Values
This function returns a value info.

If info = 0, the execution is successful. The solution to every right-hand side is guaranteed.

If info = -i, the i-th parameter had an illegal value.

If 0 < info≤n: Uinfo,info is exactly zero. The factorization has been completed, but the factor U is exactly
singular, so the solution and error bounds could not be computed; rcond = 0 is returned.

See Also
Matrix Storage Schemes

?spsv
Computes the solution to the system of linear
equations with a real or complex symmetric coefficient
matrix A stored in packed format, and multiple right-
hand sides.

Syntax
lapack_int LAPACKE_sspsv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , float * ap , lapack_int * ipiv , float * b , lapack_int ldb );
lapack_int LAPACKE_dspsv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , double * ap , lapack_int * ipiv , double * b , lapack_int ldb );
lapack_int LAPACKE_cspsv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , lapack_complex_float * ap , lapack_int * ipiv , lapack_complex_float * b ,
lapack_int ldb );
lapack_int LAPACKE_zspsv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , lapack_complex_double * ap , lapack_int * ipiv , lapack_complex_double * b ,
lapack_int ldb );

Include Files
• mkl.h

Description

774
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The routine solves for X the real or complex system of linear equations A*X = B, where A is an n-by-n
symmetric matrix stored in packed format, the columns of matrix B are individual right-hand sides, and the
columns of X are the corresponding solutions.
The diagonal pivoting method is used to factor A as A = U*D*UT or A = L*D*LT, where U (or L) is a product
of permutation and unit upper (lower) triangular matrices, and D is symmetric and block diagonal with 1-
by-1 and 2-by-2 diagonal blocks.
The factored form of A is then used to solve the system of equations A*X = B.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored:

If uplo = 'U', the upper triangle of A is stored.

If uplo = 'L', the lower triangle of A is stored.

n The order of matrix A; n≥ 0.

nrhs The number of right-hand sides, the number of columns in B; nrhs≥

ap, b Arrays: ap (size max(1,n(n+1)/2), bof size max(1, ldbnrhs) for

column major layout and max(1, ldb*n) for row major layout.

The array ap contains the factor U or L, as specified by uplo, in

packed storage (see Matrix Storage Schemes).
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

Output Parameters

ap The block-diagonal matrix D and the multipliers used to obtain the

factor U (or L) from the factorization of A as computed by ?sptrf,
stored as a packed triangular matrix in the same storage format as A.

b If info = 0, b is overwritten by the solution matrix X.

ipiv Array, size at least max(1, n). Contains details of the interchanges
and the block structure of D, as determined by ?sptrf.

If ipiv[i-1] = k > 0, then dii is a 1-by-1 block, and the i-th row
and column of A was interchanged with the k-th row and column.
If uplo = 'U'and ipiv[i]=ipiv[i-1] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and i-th row and column of A was
interchanged with the m-th row and column.
If uplo = 'L'and ipiv[i-1] =ipiv[i] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and (i+1)-th row and column of A
was interchanged with the m-th row and column.

775
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, dii is 0. The factorization has been completed, but D is exactly singular, so the solution could
not be computed.

See Also
Matrix Storage Schemes

?spsvx
Uses the diagonal pivoting factorization to compute
the solution to the system of linear equations with a
real or complex symmetric coefficient matrix A stored
in packed format, and provides error bounds on the
solution.

Syntax
lapack_int LAPACKE_sspsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const float* ap, float* afp, lapack_int* ipiv, const float* b,
lapack_int ldb, float* x, lapack_int ldx, float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_dspsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const double* ap, double* afp, lapack_int* ipiv, const double* b,
lapack_int ldb, double* x, lapack_int ldx, double* rcond, double* ferr, double* berr );
lapack_int LAPACKE_cspsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const lapack_complex_float* ap, lapack_complex_float* afp, lapack_int*
ipiv, const lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x,
lapack_int ldx, float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_zspsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const lapack_complex_double* ap, lapack_complex_double* afp,
lapack_int* ipiv, const lapack_complex_double* b, lapack_int ldb,
lapack_complex_double* x, lapack_int ldx, double* rcond, double* ferr, double* berr );

Include Files
• mkl.h

Description

The routine uses the diagonal pivoting factorization to compute the solution to a real or complex system of
linear equations A*X = B, where A is a n-by-n symmetric matrix stored in packed format, the columns of
matrix B are individual right-hand sides, and the columns of X are the corresponding solutions.
Error bounds on the solution and a condition estimate are also provided.
The routine ?spsvx performs the following steps:

776
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
2. If some di,i= 0, so that D is exactly singular, then the routine returns with info = i. Otherwise, the
factored form of A is used to estimate the condition number of the matrix A. If the reciprocal of the
condition number is less than machine precision, info = n+1 is returned as a warning, but the routine
still goes on to solve for X and compute error bounds as described below.
3. The system of equations is solved for X using the factored form of A.
4. Iterative refinement is applied to improve the computed solution matrix and calculate error bounds and
backward error estimates for it.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

fact Must be 'F' or 'N'.

Specifies whether or not the factored form of the matrix A has been
supplied on entry.
If fact = 'F': on entry, afp and ipiv contain the factored form of A.
Arrays ap, afp, and ipiv are not modified.
If fact = 'N', the matrix A is copied to afp and factored.

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored and

how A is factored:
If uplo = 'U', the array ap stores the upper triangular part of the
symmetric matrix A, and A is factored as U*D*UT.

If uplo = 'L', the array ap stores the lower triangular part of the
symmetric matrix A; A is factored as L*D*LT.

n The order of matrix A; n≥ 0.

nrhs The number of right-hand sides, the number of columns in B; nrhs≥

ap, afp, b Arrays: ap (size max(1,n(n+1)/2), afp (size max(1,n(n+1)/2), bof

size max(1, ldb*nrhs) for column major layout and max(1, ldb*n)
for row major layout.
The array ap contains the upper or lower triangle of the symmetric
matrix A in packed storage (see Matrix Storage Schemes).
The array afp is an input argument if fact = 'F'. It contains the
block diagonal matrix D and the multipliers used to obtain the factor U
or L from the factorization A = U*D*UT or A = L*D*LT as computed
by ?sptrf, in the same storage format as A.

The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

ipiv Array, size at least max(1, n). The array ipiv is an input argument if
fact = 'F'. It contains details of the interchanges and the block
structure of D, as determined by ?sptrf.

777
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.

Output Parameters

x Array, size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout.
If info = 0 or info = n+1, the array x contains the solution matrix
X to the system of equations.

afp, ipiv These arrays are output arguments if fact = 'N'. See the
description of afp, ipiv in Input Arguments section.

rcond An estimate of the reciprocal condition number of the matrix A. If

rcond is less than the machine precision (in particular, if rcond = 0),
the matrix is singular to working precision. This condition is indicated
by a return code of info > 0.

ferr, berr Arrays, size at least max(1, nrhs). Contain the component-wise
forward and relative backward errors, respectively, for each solution
vector.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

See Also
Matrix Storage Schemes

?hpsv
Computes the solution to the system of linear
equations with a Hermitian coefficient matrix A stored
in packed format, and multiple right-hand sides.

778
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_chpsv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , lapack_complex_float * ap , lapack_int * ipiv , lapack_complex_float * b ,
lapack_int ldb );
lapack_int LAPACKE_zhpsv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , lapack_complex_double * ap , lapack_int * ipiv , lapack_complex_double * b ,
lapack_int ldb );

Include Files
• mkl.h

Description

The routine solves for X the system of linear equations A*X = B, where A is an n-by-n Hermitian matrix
stored in packed format, the columns of matrix B are individual right-hand sides, and the columns of X are
the corresponding solutions.
The diagonal pivoting method is used to factor A as A = U*D*UH or A = L*D*LH, where U (or L) is a product
of permutation and unit upper (lower) triangular matrices, and D is Hermitian and block diagonal with 1-by-1
and 2-by-2 diagonal blocks.
The factored form of A is then used to solve the system of equations A*X = B.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored:

If uplo = 'U', the upper triangle of A is stored.

If uplo = 'L', the lower triangle of A is stored.

n The order of matrix A; n≥ 0.

nrhs The number of right-hand sides; the number of columns in B; nrhs≥

ap, b Arrays: ap (size max(1,n(n+1)/2), bof size max(1, ldbnrhs) for

column major layout and max(1, ldb*n) for row major layout.

The array ap contains the factor U or L, as specified by uplo, in

packed storage (see Matrix Storage Schemes).
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

779
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

ap The block-diagonal matrix D and the multipliers used to obtain the

factor U (or L) from the factorization of A as computed by ?hptrf,
stored as a packed triangular matrix in the same storage format as A.

b If info = 0, b is overwritten by the solution matrix X.

ipiv Array, size at least max(1, n). Contains details of the interchanges
and the block structure of D, as determined by ?hptrf.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

If info = i, dii is 0. The factorization has been completed, but D is exactly singular, so the solution could
not be computed.

See Also
Matrix Storage Schemes

?hpsvx
Uses the diagonal pivoting factorization to compute
the solution to the system of linear equations with a
Hermitian coefficient matrix A stored in packed
format, and provides error bounds on the solution.

Syntax
lapack_int LAPACKE_chpsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const lapack_complex_float* ap, lapack_complex_float* afp, lapack_int*
ipiv, const lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x,
lapack_int ldx, float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_zhpsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const lapack_complex_double* ap, lapack_complex_double* afp,
lapack_int* ipiv, const lapack_complex_double* b, lapack_int ldb,
lapack_complex_double* x, lapack_int ldx, double* rcond, double* ferr, double* berr );

Include Files
• mkl.h

Description

780
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The routine uses the diagonal pivoting factorization to compute the solution to a complex system of linear
equations A*X = B, where A is a n-by-n Hermitian matrix stored in packed format, the columns of matrix B
are individual right-hand sides, and the columns of X are the corresponding solutions.
Error bounds on the solution and a condition estimate are also provided.
The routine ?hpsvx performs the following steps:

1. If fact = 'N', the diagonal pivoting method is used to factor the matrix A. The form of the
factorization is A = U*D*UH or A = L*D*LH, where U (or L) is a product of permutation and unit upper
(lower) triangular matrices, and D is a Hermitian and block diagonal with 1-by-1 and 2-by-2 diagonal
blocks.
2. If some di,i = 0, so that D is exactly singular, then the routine returns with info = i. Otherwise, the
factored form of A is used to estimate the condition number of the matrix A. If the reciprocal of the
condition number is less than machine precision, info = n+1 is returned as a warning, but the routine
still goes on to solve for X and compute error bounds as described below.
3. The system of equations is solved for X using the factored form of A.
4. Iterative refinement is applied to improve the computed solution matrix and calculate error bounds and
backward error estimates for it.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

fact Must be 'F' or 'N'.

uplo Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored and

how A is factored:
If uplo = 'U', the array ap stores the upper triangular part of the
Hermitian matrix A, and A is factored as U*D*UH.

If uplo = 'L', the array ap stores the lower triangular part of the
Hermitian matrix A, and A is factored as L*D*LH.

n The order of matrix A; n≥ 0.

nrhs The number of right-hand sides, the number of columns in B; nrhs≥

ap, afp, b Arrays: ap (size max(1,n(n+1)/2), afp (size max(1,n(n+1)/2), bof

size max(1, ldb*nrhs) for column major layout and max(1, ldb*n)
for row major layout.
The array ap contains the upper or lower triangle of the Hermitian
matrix A in packed storage (see Matrix Storage Schemes).

781
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The array afp is an input argument if fact = 'F'. It contains the

block diagonal matrix D and the multipliers used to obtain the factor U
or L from the factorization A = U*D*UH or A = L*D*LH as computed
by ?hptrf, in the same storage format as A.

The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.

ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.

ipiv Array, size at least max(1, n). The array ipiv is an input argument if
fact = 'F'. It contains details of the interchanges and the block
structure of D, as determined by ?hptrf.

ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.

Output Parameters

x Array, size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout.
If info = 0 or info = n+1, the array x contains the solution matrix
X to the system of equations.

afp, ipiv These arrays are output arguments if fact = 'N'. See the
description of afp, ipiv in Input Arguments section.

rcond An estimate of the reciprocal condition number of the matrix A. If

rcond is less than the machine precision (in particular, if rcond = 0),
the matrix is singular to working precision. This condition is indicated
by a return code of info > 0.

ferr Array, size at least max(1, nrhs). Contains the estimated forward
error bound for each solution vector xj (the j-th column of the solution
matrix X). If xtrue is the true solution corresponding to xj, ferr[j-1]
is an estimated upper bound for the magnitude of the largest element
in (xj - xtrue) divided by the magnitude of the largest element in xj.
The estimate is as reliable as the estimate for rcond, and is almost
always a slight overestimate of the true error.

berr Array, size at least max(1, nrhs). Contains the component-wise

relative backward error for each solution vector xj, that is, the
smallest relative change in any element of A or B that makes xj an
exact solution.

782
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, parameter i had an illegal value.

See Also
Matrix Storage Schemes

LAPACK Least Squares and Eigenvalue Problem Routines

This section includes descriptions of LAPACK computational routines and driver routines for solving linear
least squares problems, eigenvalue and singular value problems, and performing a number of related
computational tasks. For a full reference on LAPACK routines and related information see [LUG].
Least Squares Problems. A typical least squares problem is as follows: given a matrix A and a vector b,
find the vector x that minimizes the sum of squares Σi((Ax)i - bi)2 or, equivalently, find the vector x that
minimizes the 2-norm ||Ax - b||2.
In the most usual case, A is an m-by-n matrix with m ≥ n and rank(A) = n. This problem is also referred to
as finding the least squares solution to an overdetermined system of linear equations (here we have more
equations than unknowns). To solve this problem, you can use the QR factorization of the matrix A (see QR
Factorization).
If m < n and rank(A) = m, there exist an infinite number of solutions x which exactly satisfy Ax = b, and
thus minimize the norm ||Ax - b||2. In this case it is often useful to find the unique solution that
minimizes ||x||2. This problem is referred to as finding the minimum-norm solution to an
underdetermined system of linear equations (here we have more unknowns than equations). To solve this
problem, you can use the LQ factorization of the matrix A (see LQ Factorization).
In the general case you may have a rank-deficient least squares problem, with rank(A)< min(m, n): find
the minimum-norm least squares solution that minimizes both ||x||2 and ||Ax - b||2. In this case (or
when the rank of A is in doubt) you can use the QR factorization with pivoting or singular value
decomposition (see Singular Value Decomposition).
Eigenvalue Problems. The eigenvalue problems (from German eigen "own") are stated as follows: given a
matrix A, find the eigenvaluesλ and the corresponding eigenvectorsz that satisfy the equation
Az = λz (right eigenvectors z)
or the equation
zHA = λzH (left eigenvectors z).
If A is a real symmetric or complex Hermitian matrix, the above two equations are equivalent, and the
problem is called a symmetric eigenvalue problem. Routines for solving this type of problems are described
in the topic Symmetric Eigenvalue Problems.
Routines for solving eigenvalue problems with nonsymmetric or non-Hermitian matrices are described in the
topic Nonsymmetric Eigenvalue Problems.
The library also includes routines that handle generalized symmetric-definite eigenvalue problems: find
the eigenvalues λ and the corresponding eigenvectors x that satisfy one of the following equations:
Az = λBz, ABz = λz, or BAz = λz,

783
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

where A is symmetric or Hermitian, and B is symmetric positive-definite or Hermitian positive-definite.

Routines for reducing these problems to standard symmetric eigenvalue problems are described in the topic
Generalized Symmetric-Definite Eigenvalue Problems.
To solve a particular problem, you usually call several computational routines. Sometimes you need to
combine the routines of this chapter with other LAPACK routines described in "LAPACK Routines: Linear
Equations" as well as with BLAS routines described in "BLAS and Sparse BLAS Routines".
For example, to solve a set of least squares problems minimizing ||Ax - b||2 for all columns b of a given
matrix B (where A and B are real matrices), you can call ?geqrf to form the factorization A = QR, then
call ?ormqr to compute C = QHB and finally call the BLAS routine ?trsm to solve for X the system of
equations RX = C.
Another way is to call an appropriate driver routine that performs several tasks in one call. For example, to
solve the least squares problem the driver routine ?gels can be used.

LAPACK Least Squares and Eigenvalue Problem Computational Routines

In the topics that follow, the descriptions of LAPACK computational routines are given. These routines
perform distinct computational tasks that can be used for:
Orthogonal Factorizations
Singular Value Decomposition
Symmetric Eigenvalue Problems
Generalized Symmetric-Definite Eigenvalue Problems
Nonsymmetric Eigenvalue Problems
Generalized Nonsymmetric Eigenvalue Problems
Generalized Singular Value Decomposition
See also the respective driver routines.

Orthogonal Factorizations: LAPACK Computational Routines

This topic describes the LAPACK routines for the QR (RQ) and LQ (QL) factorization of matrices. Routines for
the RZ factorization as well as for generalized QR and RQ factorizations are also included.
QR Factorization. Assume that A is an m-by-n matrix to be factored.
If m≥n, the QR factorization is given by

where R is an n-by-n upper triangular matrix with real diagonal elements, and Q is an m-by-m orthogonal (or
unitary) matrix.
You can use the QR factorization for solving the following least squares problem: minimize ||Ax - b||2
where A is a full-rank m-by-n matrix (m≥n). After factoring the matrix, compute the solution x by solving Rx
= (Q1)Tb.
If m < n, the QR factorization is given by
A = QR = Q(R1R2)
where R is trapezoidal, R1 is upper triangular and R2 is rectangular.

784
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Q is represented as a product of min(m, n) elementary reflectors. Routines are provided to work with Q in
this representation.
LQ Factorization LQ factorization of an m-by-n matrix A is as follows. If m≤n,

where L is an m-by-m lower triangular matrix with real diagonal elements, and Q is an n-by-n orthogonal (or
unitary) matrix.
If m > n, the LQ factorization is

where L1 is an n-by-n lower triangular matrix, L2 is rectangular, and Q is an n-by-n orthogonal (or unitary)
matrix.
You can use the LQ factorization to find the minimum-norm solution of an underdetermined system of linear
equations Ax = b where A is an m-by-n matrix of rank m (m < n). After factoring the matrix, compute the
solution vector x as follows: solve Ly = b for y, and then compute x = (Q1)Hy.
Table "Computational Routines for Orthogonal Factorization" lists LAPACK routines that perform orthogonal
factorization of matrices.
Computational Routines for Orthogonal Factorization
Matrix type, factorization Factorize without Factorize with Generate Apply
pivoting pivoting matrix Q matrix Q

general matrices, QR factorization geqrf geqpf orgqr ormqr

geqrfp geqp3 ungqr unmqr

general matrices, blocked QR geqrt gemqrt

factorization

general matrices, RQ factorization gerqf orgrq ormrq

ungrq unmrq

785
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Matrix type, factorization Factorize without Factorize with Generate Apply

pivoting pivoting matrix Q matrix Q

general matrices, LQ factorization gelqf orglq ormlq

unglq unmlq

general matrices, QL factorization geqlf orgql ormql

ungql unmql

trapezoidal matrices, RZ tzrzf ormrz

factorization
unmrz

pair of matrices, generalized QR ggqrf

factorization

pair of matrices, generalized RQ ggrqf

factorization

triangular-pentagonal matrices, tpqrt tpmqrt

blocked QR factorization

?geqrf
Computes the QR factorization of a general m-by-n
matrix.

Syntax
lapack_int LAPACKE_sgeqrf (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int lda, float* tau);
lapack_int LAPACKE_dgeqrf (int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int lda, double* tau);
lapack_int LAPACKE_cgeqrf (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* tau);
lapack_int LAPACKE_zgeqrf (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* tau);

Include Files
• mkl.h

Description

The routine forms the QR factorization of a general m-by-n matrix A (see Orthogonal Factorizations). No
pivoting is performed.
The routine does not form the matrix Q explicitly. Instead, Q is represented as a product of min(m, n)
elementary reflectors. Routines are provided to work with Q in this representation.

NOTE
This routine supports the Progress Routine feature. See Progress Function for details.

786
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

m The number of rows in the matrix A (m≥ 0).

n The number of columns in A (n≥ 0).

a Array a of size max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout contains the matrix A.

lda The leading dimension of a; at least max(1, m) for column major layout and
at least max(1, n) for row major layout.

Output Parameters

a Overwritten by the factorization data as follows:

The elements on and above the diagonal of the array contain the min(m,n)-
by-n upper trapezoidal matrix R (R is upper triangular if m≥n); the elements
below the diagonal, with the array tau, present the orthogonal matrix Q as
a product of min(m,n) elementary reflectors (see Orthogonal Factorizations).

tau Array, size at least max (1, min(m, n)). Contains scalars that define
elementary reflectors for the matrix Q in its decomposition in a product of
elementary reflectors (see Orthogonal Factorizations).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed factorization is the exact factorization of a matrix A + E, where
||E||2 = O(ε)||A||2.
The approximate number of floating-point operations for real flavors is

(4/3)n3 if m = n,

(2/3)n2(3m-n) if m > n,

(2/3)m2(3n-m) if m < n.

The number of operations for complex flavors is 4 times greater.

To solve a set of least squares problems minimizing ||A*x - b||2 for all columns b of a given matrix B, you
can call the following:

?geqrf (this routine) to factorize A = QR;

ormqr to compute C = QT*B (for real matrices);

unmqr to compute C = QH*B (for complex matrices);

787
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

trsm (a BLAS routine) to solve R*X = C.

(The columns of the computed X are the least squares solution vectors x.)
To compute the elements of Q explicitly, call

orgqr (for real matrices)

ungqr (for complex matrices).

See Also
mkl_progress

Matrix Storage Schemes

?geqrfp
Computes the QR factorization of a general m-by-n
matrix with non-negative diagonal elements.

Syntax
lapack_int LAPACKE_sgeqrfp (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int lda, float* tau);
lapack_int LAPACKE_dgeqrfp (int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int lda, double* tau);
lapack_int LAPACKE_cgeqrfp (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* tau);
lapack_int LAPACKE_zgeqrfp (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* tau);

Include Files
• mkl.h

Description

The routine forms the QR factorization of a general m-by-n matrix A (see Orthogonal Factorizations). No
pivoting is performed. The diagonal entries of R are real and nonnegative.
The routine does not form the matrix Q explicitly. Instead, Q is represented as a product of min(m, n)
elementary reflectors. Routines are provided to work with Q in this representation.

NOTE
This routine supports the Progress Routine feature. See Progress Function for details.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

m The number of rows in the matrix A (m≥ 0).

n The number of columns in A (n≥ 0).

788
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
a Array, size max(1,lda*n) for column major layout and max(1,lda*m) for
row major layout, containing the matrix A.

lda The leading dimension of a; at least max(1, m) for column major layout and
at least max(1, n) for row major layout.

Output Parameters

a Overwritten by the factorization data as follows:

The diagonal elements of the matrix R are real and non-negative.

tau Array, size at least max (1, min(m, n)). Contains scalars that define
elementary reflectors for the matrix Qin its decomposition in a product of
elementary reflectors (see Orthogonal Factorizations).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed factorization is the exact factorization of a matrix A + E, where
||E||2 = O(ε)||A||2.
The approximate number of floating-point operations for real flavors is

(4/3)n3 if m = n,

(2/3)n2(3m-n) if m > n,

(2/3)m2(3n-m) if m < n.

The number of operations for complex flavors is 4 times greater.

To solve a set of least squares problems minimizing ||A*x - b||2 for all columns b of a given matrix B, you
can call the following:

?geqrfp (this routine) to factorize A = QR;

ormqr to compute C = QT*B (for real matrices);

unmqr to compute C = QH*B (for complex matrices);

trsm (a BLAS routine) to solve R*X = C.

(The columns of the computed X are the least squares solution vectors x.)
To compute the elements of Q explicitly, call

orgqr (for real matrices)

789
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ungqr (for complex matrices).

See Also
mkl_progress

Matrix Storage Schemes

?geqrt
Computes a blocked QR factorization of a general real
or complex matrix using the compact WY
representation of Q.

Syntax
lapack_int LAPACKE_sgeqrt (int matrix_layout, lapack_int m, lapack_int n, lapack_int
nb, float* a, lapack_int lda, float* t, lapack_int ldt);
lapack_int LAPACKE_dgeqrt (int matrix_layout, lapack_int m, lapack_int n, lapack_int
nb, double* a, lapack_int lda, double* t, lapack_int ldt);
lapack_int LAPACKE_cgeqrt (int matrix_layout, lapack_int m, lapack_int n, lapack_int
nb, lapack_complex_float* a, lapack_int lda, lapack_complex_float* t, lapack_int ldt);
lapack_int LAPACKE_zgeqrt (int matrix_layout, lapack_int m, lapack_int n, lapack_int
nb, lapack_complex_double* a, lapack_int lda, lapack_complex_double* t, lapack_int
ldt);

Include Files
• mkl.h

Description

The strictly lower triangular matrix V contains the elementary reflectors H(i) in the ith column below the
diagonal. For example, if m=5 and n=3, the matrix V is

790
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
where vi represents one of the vectors that define H(i). The vectors are returned in the lower triangular part
of array a.

NOTE
The 1s along the diagonal of V are not stored in a.

Let k = min(m,n). The number of blocks is b = ceiling(k/nb), where each block is of order nb except for
the last block, which is of order ib = k - (b-1)*nb. For each of the b blocks, a upper triangular block
reflector factor is computed:t1, t2, ..., tb. The nb-by-nb (and ib-by-ib for the last block) ts are stored
in the nb-by-n array t as

t = (t1t2 ... tb).

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

m The number of rows in the matrix A (m ≥ 0).

n The number of columns in A (n ≥ 0).

nb The block size to be used in the blocked QR (min(m, n) ≥ nb ≥ 1).

a Array a of size max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout contains the m-by-n matrix A.

lda The leading dimension of a; at least max(1, m) for column major layout and
max(1, n) for row major layout.

ldt The leading dimension of t; at least nb for column major layout and max(1,
min(m, n)) for row major layout.

Output Parameters

a Overwritten by the factorization data as follows:

The elements on and above the diagonal of the array contain the min(m,n)-
by-n upper trapezoidal matrix R (R is upper triangular if m≥n); the elements
below the diagonal, with the array t, present the orthogonal matrix Q as a
product of min(m,n) elementary reflectors (see Orthogonal Factorizations).

t Array, size max(1, ldt*min(m, n)) for column major layout and max(1,
ldt*nb) for row major layout.
The upper triangular block reflector's factors stored as a sequence of upper
triangular blocks.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info < 0 and info = -i, the i-th parameter had an illegal value.

791
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

?gemqrt
Multiplies a general matrix by the orthogonal/unitary
matrix Q of the QR factorization formed by ?geqrt.

Syntax
lapack_int LAPACKE_sgemqrt (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int nb, const float* v, lapack_int ldv, const float*
t, lapack_int ldt, float* c, lapack_int ldc);
lapack_int LAPACKE_dgemqrt (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int nb, const double* v, lapack_int ldv, const
double* t, lapack_int ldt, double* c, lapack_int ldc);
lapack_int LAPACKE_cgemqrt (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int nb, const lapack_complex_float* v, lapack_int
ldv, const lapack_complex_float* t, lapack_int ldt, lapack_complex_float* c, lapack_int
ldc);
lapack_int LAPACKE_zgemqrt (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int nb, const lapack_complex_double* v, lapack_int
ldv, const lapack_complex_double* t, lapack_int ldt, lapack_complex_double* c,
lapack_int ldc);

Include Files
• mkl.h

Description
The ?gemqrt routine overwrites the general real or complex m-by-n matrixC with

side ='L' side ='R'

trans = 'N': Q*C C*Q
trans = 'T': QT*C C*QT
trans = 'C': QH*C C*QH

where Q is a real orthogonal (complex unitary) matrix defined as the product of k elementary reflectors
Q = H(1) H(2)... H(k) = I - V*T*VT for real flavors, and

Q = H(1) H(2)... H(k) = I - VTVH for complex flavors,

generated using the compact WY representation as returned by geqrt. Q is of order m if side = 'L' and of
order n if side = 'R'.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

side ='L': apply Q, QT, or QH from the left.

='R': apply Q, QT, or QH from the right.

trans ='N', no transpose, apply Q.

='T', transpose, apply QT.
='C', transpose, apply QH.

m The number of rows in the matrix C, (m ≥ 0).

792
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
n The number of columns in the matrix C, (n ≥ 0).

k The number of elementary reflectors whose product defines the matrix Q.

Constraints:
If side = 'L', m ≥ k≥0

If side = 'R', n ≥ k≥0.

nb The block size used for the storage of t, k ≥ nb ≥ 1. This must be the same
value of nb used to generate t in geqrt.

v Array of size max(1, ldv*k) for column major layout, max(1, ldv*m) for
row major layout and side = 'L', and max(1, ldv*n) for row major layout
and side = 'R'.

The ith column must contain the vector which defines the elementary
reflector H(i), for i = 1,2,...,k, as returned by geqrt in the first k columns of
its array argument a.

ldv The leading dimension of the array v.

if side = 'L', ldv must be at least max(1,m) for column major layout and
max(1, k) for row major layout;

if side = 'R', ldv must be at least max(1,n) for column major layout and
max(1, k) for row major layout.

t Array, size max(1, ldt*min(m, n)) for column major layout and max(1,
ldt*nb) for row major layout.
The upper triangular factors of the block reflectors as returned by geqrt.

ldt The leading dimension of the array t. ldt must be at least nb for column
major layout and max(1, k) for row major layout.

c The m-by-n matrix C.

ldc The leadinng dimension of the array c. ldc must be at least max(1, m) for
column major layout and max(1, n) for row major layout.

Output Parameters

c Overwritten by the product QC, CQ, QTC, CQT, QHC, or CQH as

specified by side and trans.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

?geqpf
Computes the QR factorization of a general m-by-n
matrix with pivoting.

793
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
lapack_int LAPACKE_sgeqpf (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int lda, lapack_int* jpvt, float* tau);
lapack_int LAPACKE_dgeqpf (int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int lda, lapack_int* jpvt, double* tau);
lapack_int LAPACKE_cgeqpf (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_int* jpvt, lapack_complex_float* tau);
lapack_int LAPACKE_zgeqpf (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_int* jpvt, lapack_complex_double*
tau);

Include Files
• mkl.h

Description
The routine is deprecated and has been replaced by routine geqp3.
The routine ?geqpf forms the QR factorization of a general m-by-n matrix A with column pivoting: A*P =
Q*R (see Orthogonal Factorizations). Here P denotes an n-by-n permutation matrix.
The routine does not form the matrix Q explicitly. Instead, Q is represented as a product of min(m, n)
elementary reflectors. Routines are provided to work with Q in this representation.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

m The number of rows in the matrix A (m≥ 0).

n The number of columns in A (n≥ 0).

a Array a of size max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout contains the matrix A.

lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.

jpvt Array, size at least max(1, n).

On entry, if jpvt[i - 1] > 0, the i-th column of A is moved to the
beginning of A*P before the computation, and fixed in place during the
computation.
If jpvt[i - 1] = 0, the ith column of A is a free column (that is, it may
be interchanged during the computation with any other free column).

Output Parameters

a Overwritten by the factorization data as follows:

794
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
tau Array, size at least max (1, min(m, n)). Contains additional information on
the matrix Q.

jpvt Overwritten by details of the permutation matrix P in the factorization A*P

= Q*R. More precisely, the columns of A*P are the columns of A in the
following order:
jpvt[0], jpvt[1], ..., jpvt[n - 1].

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed factorization is the exact factorization of a matrix A + E, where

||E||2 = O(ε)||A||2.
The approximate number of floating-point operations for real flavors is

(4/3)n3 if m = n,

(2/3)n2(3m-n) if m > n,

(2/3)m2(3n-m) if m < n.

The number of operations for complex flavors is 4 times greater.

To solve a set of least squares problems minimizing ||A*x - b||2 for all columns b of a given matrix B, you
can call the following:

?geqpf (this routine) to factorize AP = QR;

ormqr to compute C = QT*B (for real matrices);

unmqr to compute C = QH*B (for complex matrices);

trsm (a BLAS routine) to solve R*X = C.

(The columns of the computed X are the permuted least squares solution vectors x; the output array jpvt
specifies the permutation order.)
To compute the elements of Q explicitly, call

orgqr (for real matrices)

ungqr (for complex matrices).

?geqp3
Computes the QR factorization of a general m-by-n
matrix with column pivoting using level 3 BLAS.

Syntax
lapack_int LAPACKE_sgeqp3 (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int lda, lapack_int* jpvt, float* tau);

795
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lapack_int LAPACKE_dgeqp3 (int matrix_layout, lapack_int m, lapack_int n, double* a,

lapack_int lda, lapack_int* jpvt, double* tau);
lapack_int LAPACKE_cgeqp3 (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_int* jpvt, lapack_complex_float* tau);
lapack_int LAPACKE_zgeqp3 (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_int* jpvt, lapack_complex_double*
tau);

Include Files
• mkl.h

Description

The routine forms the QR factorization of a general m-by-n matrix A with column pivoting: A*P = Q*R (see
Orthogonal Factorizations) using Level 3 BLAS. Here P denotes an n-by-n permutation matrix. Use this
routine instead of geqpf for better performance.
The routine does not form the matrix Q explicitly. Instead, Q is represented as a product of min(m, n)
elementary reflectors. Routines are provided to work with Q in this representation.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

m The number of rows in the matrix A (m≥ 0).

n The number of columns in A (n≥ 0).

a Array a of size max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout contains the matrix A.

lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.

jpvt Array, size at least max(1, n).

On entry, if jpvt[i - 1]≠ 0, the i-th column of A is moved to the

beginning of AP before the computation, and fixed in place during the
computation.
If jpvt[i - 1] = 0, the i-th column of A is a free column (that is, it may
be interchanged during the computation with any other free column).

Output Parameters

a Overwritten by the factorization data as follows:

tau Array, size at least max (1, min(m, n)). Contains scalar factors of the
elementary reflectors for the matrix Q.

796
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
jpvt Overwritten by details of the permutation matrix P in the factorization A*P
= Q*R. More precisely, the columns of AP are the columns of A in the
following order:
jpvt[0], jpvt[1], ..., jpvt[n - 1].

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
To solve a set of least squares problems minimizing ||A*x - b||2 for all columns b of a given matrix B, you
can call the following:

?geqp3 (this routine) to factorize AP = QR;

ormqr to compute C = QT*B (for real matrices);

unmqr to compute C = QH*B (for complex matrices);

trsm (a BLAS routine) to solve R*X = C.

(The columns of the computed X are the permuted least squares solution vectors x; the output array jpvt
specifies the permutation order.)
To compute the elements of Q explicitly, call

orgqr (for real matrices)

ungqr (for complex matrices).

?orgqr
Generates the real orthogonal matrix Q of the QR
factorization formed by ?geqrf.

Syntax
lapack_int LAPACKE_sorgqr (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
float* a, lapack_int lda, const float* tau);
lapack_int LAPACKE_dorgqr (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
double* a, lapack_int lda, const double* tau);

Include Files
• mkl.h

Description

The routine generates the whole or part of m-by-m orthogonal matrix Q of the QR factorization formed by
the routine ?geqrf or geqpf. Use this routine after a call to sgeqrf/dgeqrf or sgeqpf/dgeqpf.

797
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Usually Q is determined from the QR factorization of an m by p matrix A with m≥p. To compute the whole
matrix Q, use:

LAPACKE_?orgqr(matrix_layout, m, m, p, a, lda, tau)

To compute the leading p columns of Q (which form an orthonormal basis in the space spanned by the
columns of A):

LAPACKE_?orgqr(matrix_layout, m, p, p, a, lda)
To compute the matrix Qk of the QR factorization of leading k columns of the matrix A:

LAPACKE_?orgqr(matrix_layout, m, m, k, a, lda, tau)

To compute the leading k columns of Qk (which form an orthonormal basis in the space spanned by leading k
columns of the matrix A):

LAPACKE_?orgqr(matrix_layout, m, k, k, a, lda, tau)

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

m The order of the orthogonal matrix Q (m≥ 0).

n The number of columns of Q to be computed

(0 ≤n≤m).

k The number of elementary reflectors whose product defines the matrix Q (0

≤k≤n).

a, tau Arrays:
a and tau are the arrays returned by sgeqrf / dgeqrf or sgeqpf / dgeqpf.

The size of a is max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout .
The size of tau must be at least max(1, k).

lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.

Output Parameters

a Overwritten by n leading columns of the m-by-m orthogonal matrix Q.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed Q differs from an exactly orthogonal matrix by a matrix E such that
||E||2 = O(ε)|*|A||2 where ε is the machine precision.
The total number of floating-point operations is approximately 4*m*n*k - 2*(m + n)*k2 + (4/3)*k3.

798
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If n = k, the number is approximately (2/3)*n2*(3m - n).

The complex counterpart of this routine is ungqr.

?ormqr
Multiplies a real matrix by the orthogonal matrix Q of
the QR factorization formed by ?geqrf or ?geqpf.

Syntax
lapack_int LAPACKE_sormqr (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const float* a, lapack_int lda, const float* tau, float* c,
lapack_int ldc);
lapack_int LAPACKE_dormqr (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const double* a, lapack_int lda, const double* tau, double*
c, lapack_int ldc);

Include Files
• mkl.h

Description

The routine multiplies a real matrix C by Q or QT, where Q is the orthogonal matrix Q of the QR factorization
formed by the routine ?geqrf or ?geqpf.

Depending on the parameters sideleft_right and trans, the routine can form one of the matrix products
Q*C, QT*C, C*Q, or C*QT (overwriting the result on C).

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

side Must be either 'L' or 'R'.

If side='L', Q or QT is applied to C from the left.

If side='R', Q or QT is applied to C from the right.

trans Must be either 'N' or 'T'.

If trans='N', the routine multiplies C by Q.

If trans='T', the routine multiplies C by QT.

m The number of rows in the matrix C (m≥ 0).

n The number of columns in C (n≥ 0).

k The number of elementary reflectors whose product defines the matrix Q.

Constraints:
0 ≤k≤m if side='L';
0 ≤k≤n if side='R'.

a, tau, c Arrays:
a and tau are the arrays returned by sgeqrf / dgeqrf or sgeqpf / dgeqpf.

799
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The size of a is max(1, lda*k) for column major layout, max(1, lda*m) for
row major layout and side = 'L', and max(1, lda*n) for row major layout
and side = 'R'.

The size of tau must be at least max(1, k).

Array c of size max(1, ldc*n) for column major layout and max(1, ldc*m)
for row major layout contains the m-by-n matrix C.

lda The leading dimension of a. Constraints:

if side = 'L', lda≥ max(1, m)for column major layout and max(1, k) for
row major layout ;
if side = 'R', lda≥ max(1, n)for column major layout and max(1, k) for
row major layout.

ldc The leading dimension of c. Constraint:

ldc≥ max(1, m)for column major layout and max(1, n) for row major layout.

Output Parameters

c Overwritten by the product Q*C, QT*C, C*Q, or C*QT (as specified by side
and trans).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The complex counterpart of this routine is unmqr.

?ungqr
Generates the complex unitary matrix Q of the QR
factorization formed by ?geqrf.

Syntax
lapack_int LAPACKE_cungqr (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
lapack_complex_float* a, lapack_int lda, const lapack_complex_float* tau);
lapack_int LAPACKE_zungqr (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
lapack_complex_double* a, lapack_int lda, const lapack_complex_double* tau);

Include Files
• mkl.h

Description

The routine generates the whole or part of m-by-m unitary matrix Q of the QR factorization formed by the
routines ?geqrf or geqpf. Use this routine after a call to cgeqrf/zgeqrf or cgeqpf/zgeqpf.

800
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Usually Q is determined from the QR factorization of an m by p matrix A with m≥p. To compute the whole
matrix Q, use:

LAPACKE_?ungqr(matrix_layout, m, m, p, a, lda, tau)

To compute the leading p columns of Q (which form an orthonormal basis in the space spanned by the
columns of A):

LAPACKE_?ungqr(matrix_layout, m, p, p, a, lda, tau)

To compute the matrix Qk of the QR factorization of the leading k columns of the matrix A:

LAPACKE_?ungqr(matrix_layout, m, m, k, a, lda, tau)

To compute the leading k columns of Qk (which form an orthonormal basis in the space spanned by the
leading k columns of the matrix A):

LAPACKE_?ungqr(matrix_layout, m, k, k, a, lda, tau)

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

m The order of the unitary matrix Q (m≥ 0).

n The number of columns of Q to be computed

(0 ≤n≤m).

k The number of elementary reflectors whose product defines the matrix Q (0

≤k≤n).

a, tau Arrays: a and tau are the arrays returned by cgeqrf/zgeqrf or cgeqpf/
zgeqpf.
The size of a is max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout .
The size of tau must be at least max(1, k).

lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.

Output Parameters

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed Q differs from an exactly unitary matrix by a matrix E such that ||E||2 = O(ε)*||A||2,
where ε is the machine precision.
The total number of floating-point operations is approximately 16*m*n*k - 8*(m + n)*k2 + (16/3)*k3.

If n = k, the number is approximately (8/3)n2(3m - n).

801
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The real counterpart of this routine is orgqr.

?unmqr
Multiplies a complex matrix by the unitary matrix Q of
the QR factorization formed by ?geqrf.

Syntax
lapack_int LAPACKE_cunmqr (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const lapack_complex_float* a, lapack_int lda, const
lapack_complex_float* tau, lapack_complex_float* c, lapack_int ldc);
lapack_int LAPACKE_zunmqr (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const lapack_complex_double* a, lapack_int lda, const
lapack_complex_double* tau, lapack_complex_double* c, lapack_int ldc);

Include Files
• mkl.h

Description

The routine multiplies a rectangular complex matrix C by Q or QH, where Q is the unitary matrix Q of the QR
factorization formed by the routines ?geqrf or geqpf.

Depending on the parameters side and trans, the routine can form one of the matrix products Q*C, QH*C,
C*Q, or C*QH (overwriting the result on C).

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

side Must be either 'L' or 'R'.

If side = 'L', Q or QH is applied to C from the left.

If side = 'R', Q or QH is applied to C from the right.

trans Must be either 'N' or 'C'.

If trans = 'N', the routine multiplies C by Q.

If trans = 'C', the routine multiplies C by QH.

m The number of rows in the matrix C (m≥ 0).

n The number of columns in C (n≥ 0).

k The number of elementary reflectors whose product defines the matrix Q.

Constraints:
0 ≤k≤m if side = 'L';
0 ≤k≤n if side = 'R'.

a, c, tau Arrays:

802
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
a size max(1, lda*k) for column major layout, max(1, lda*m) for row
major layout when side ='L', and max(1, lda*n) for row major layout
when side ='R' and tau are the arrays returned by cgeqrf / zgeqrf or
cgeqpf / zgeqpf.
The size of tau must be at least max(1, k).
c(size max(1, ldc*n) for column major layout and max(1, ldc*m for row
major layout) contains the m-by-n matrix C.

lda The leading dimension of a. Constraints:

lda≥ max(1, m) for column major layout and lda≥ max(1, k) for row
major layout if side = 'L';

lda≥ max(1, n) for column major layout and lda≥ max(1, k) for row
major layout if side = 'R'.

ldc The leading dimension of c. Constraint:

ldc≥ max(1, m) for column major layout and max(1, n) for row major
layout.

Output Parameters

c Overwritten by the product Q*C, QH*C, C*Q, or C*QH (as specified by side
and trans).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The real counterpart of this routine is ormqr.

?gelqf
Computes the LQ factorization of a general m-by-n
matrix.

Syntax
lapack_int LAPACKE_sgelqf (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int lda, float* tau);
lapack_int LAPACKE_dgelqf (int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int lda, double* tau);
lapack_int LAPACKE_cgelqf (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* tau);
lapack_int LAPACKE_zgelqf (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* tau);

Include Files
• mkl.h

803
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Description

The routine forms the LQ factorization of a general m-by-n matrix A (see Orthogonal Factorizations). No
pivoting is performed.
The routine does not form the matrix Q explicitly. Instead, Q is represented as a product of min(m, n)
elementary reflectors. Routines are provided to work with Q in this representation.

NOTE
This routine supports the Progress Routine feature. See Progress Function for details.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

m The number of rows in the matrix A (m≥ 0).

n The number of columns in A (n≥ 0).

a Array a of size max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout contains the matrix A.

lda The leading dimension of a; at least max(1, m) for column major layout and
max(1, n) for row major layout.

Output Parameters

a Overwritten by the factorization data as follows:

The elements on and below the diagonal of the array contain the m-by-
min(m,n) lower trapezoidal matrix L (L is lower triangular if m≤n); the
elements above the diagonal, with the array tau, represent the orthogonal
matrix Q as a product of elementary reflectors.

tau Array, size at least max(1, min(m, n)).

Contains scalars that define elementary reflectors for the matrix Q (see
Orthogonal Factorizations).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed factorization is the exact factorization of a matrix A + E, where
||E||2 = O(ε) ||A||2.
The approximate number of floating-point operations for real flavors is

(4/3)n3 if m = n,

804
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
(2/3)n2(3m-n) if m > n,

(2/3)m2(3n-m) if m < n.

The number of operations for complex flavors is 4 times greater.

To find the minimum-norm solution of an underdetermined least squares problem minimizing ||A*x - b||2
for all columns b of a given matrix B, you can call the following:

?gelqf (this routine) to factorize A = L*Q;

trsm (a BLAS routine) to solve L*Y = B for Y;

ormlq to compute X = (Q1)T*Y (for real matrices);

unmlq to compute X = (Q1)H*Y (for complex matrices).

(The columns of the computed X are the minimum-norm solution vectors x. Here A is an m-by-n matrix with
m < n; Q1 denotes the first m columns of Q).
To compute the elements of Q explicitly, call

orglq (for real matrices)

unglq (for complex matrices).

See Also
mkl_progress

Matrix Storage Schemes

?orglq
Generates the real orthogonal matrix Q of the LQ
factorization formed by ?gelqf.

Syntax
lapack_int LAPACKE_sorglq (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
float* a, lapack_int lda, const float* tau);
lapack_int LAPACKE_dorglq (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
double* a, lapack_int lda, const double* tau);

Include Files
• mkl.h

Description

The routine generates the whole or part of n-by-n orthogonal matrix Q of the LQ factorization formed by the
routines gelqf. Use this routine after a call to sgelqf/dgelqf.

Usually Q is determined from the LQ factorization of an p-by-n matrix A with n≥p. To compute the whole
matrix Q, use:

info = LAPACKE_?orglq(matrix_layout, n, n, p, a, lda, tau)

To compute the leading p rows of Q, which form an orthonormal basis in the space spanned by the rows of A,
use:

info = LAPACKE_?orglq(matrix_layout, p, n, p, a, lda, tau)

805
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

To compute the matrix Qk of the LQ factorization of the leading k rows of A, use:

info = LAPACKE_?orglq(matrix_layout, n, n, k, a, lda, tau)

To compute the leading k rows of Qk, which form an orthonormal basis in the space spanned by the leading k
rows of A, use:

info = LAPACKE_?orgqr(matrix_layout, k, n, k, a, lda, tau)

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

m The number of rows of Q to be computed

(0 ≤m≤n).

n The order of the orthogonal matrix Q (n≥m).

k The number of elementary reflectors whose product defines the matrix Q (0

≤k≤m).

a, tau Arrays: a (size max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout) and tau are the arrays returned by sgelqf/dgelqf.

The size of tau must be at least max(1, k).

lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.

Output Parameters

a Overwritten by m leading rows of the n-by-n orthogonal matrix Q.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed Q differs from an exactly orthogonal matrix by a matrix E such that ||E||2 = O(ε)*||A||2,
where ε is the machine precision.
The total number of floating-point operations is approximately 4*m*n*k - 2*(m + n)*k2 + (4/3)*k3.

If m = k, the number is approximately (2/3)m2(3n - m).

The complex counterpart of this routine is unglq.

?ormlq
Multiplies a real matrix by the orthogonal matrix Q of
the LQ factorization formed by ?gelqf.

Syntax
lapack_int LAPACKE_sormlq (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const float* a, lapack_int lda, const float* tau, float* c,
lapack_int ldc);

806
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_dormlq (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const double* a, lapack_int lda, const double* tau, double*
c, lapack_int ldc);

Include Files
• mkl.h

Description

The routine multiplies a real m-by-n matrix C by Q or QT, where Q is the orthogonal matrix Q of the LQ
factorization formed by the routine gelqf.
Depending on the parameters side and trans, the routine can form one of the matrix products Q*C, QT*C,
C*Q, or C*QT (overwriting the result on C).

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

side Must be either 'L' or 'R'.

If side = 'L', Q or QT is applied to C from the left.

If side = 'R', Q or QT is applied to C from the right.

trans Must be either 'N' or 'T'.

If trans = 'N', the routine multiplies C by Q.

If trans = 'T', the routine multiplies C by QT.

m The number of rows in the matrix C (m≥ 0).

n The number of columns in C (n≥ 0).

k The number of elementary reflectors whose product defines the matrix Q.

Constraints:
0 ≤k≤m if side = 'L';
0 ≤k≤n if side = 'R'.

a, c, tau Arrays:
a and tau are arrays returned by ?gelqf.

The size of a must be:

For side = 'L' and column major layout, max(1, lda*m).

For side = 'R' and column major layout, max(1, lda*n).

For row major layout regardless of side, max(1, lda*k).

The dimension of tau must be at least max(1, k).

c(size max(1, ldc*n) for column major layout and max(1, ldc*m for row
major layout) contains the m-by-n matrix C.

807
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lda The leading dimension of a. For column major layout, lda≥ max(1, k). For
row major layout, if side = 'L', lda≥ max(1, m), or, if side = 'R', lda≥
max(1, n).

ldc The leading dimension of c; ldc≥ max(1, m) for column major layout and
max(1, n) for row major layout.

Output Parameters

c Overwritten by the product Q*C, QT*C, C*Q, or C*QT (as specified by side
and trans).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The complex counterpart of this routine is unmlq.

?unglq
Generates the complex unitary matrix Q of the LQ
factorization formed by ?gelqf.

Syntax
lapack_int LAPACKE_cunglq (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
lapack_complex_float* a, lapack_int lda, const lapack_complex_float* tau);
lapack_int LAPACKE_zunglq (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
lapack_complex_double* a, lapack_int lda, const lapack_complex_double* tau);

Include Files
• mkl.h

Description

The routine generates the whole or part of n-by-n unitary matrix Q of the LQ factorization formed by the
routines gelqf. Use this routine after a call to cgelqf/zgelqf.

Usually Q is determined from the LQ factorization of an p-by-n matrix A with n < p. To compute the whole
matrix Q, use:

info = LAPACKE_?unglq(matrix_layout, n, n, p, a, lda, tau)

To compute the leading p rows of Q, which form an orthonormal basis in the space spanned by the rows of A,
use:

info = LAPACKE_?unglq(matrix_layout, p, n, p, a, lda, tau)

To compute the matrix Qk of the LQ factorization of the leading k rows of A, use:

info = LAPACKE_?unglq(matrix_layout, n, n, k, a, lda, tau)

808
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
To compute the leading k rows of Qk, which form an orthonormal basis in the space spanned by the leading k
rows of A, use:

info = LAPACKE_?ungqr(matrix_layout, k, n, k, a, lda, tau)

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

m The number of rows of Q to be computed (0 ≤m≤n).

n The order of the unitary matrix Q (n≥m).

k The number of elementary reflectors whose product defines the matrix Q (0

≤k≤m).

a, tau Arrays: a (size max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout) and tau are the arrays returned by cgelqf/zgelqf.

The dimension of tau must be at least max(1, k).

lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.

Output Parameters

a Overwritten by m leading rows of the n-by-n unitary matrix Q.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If m = k, the number is approximately (8/3)m2(3n - m) .

The real counterpart of this routine is orglq.

?unmlq
Multiplies a complex matrix by the unitary matrix Q of
the LQ factorization formed by ?gelqf.

Syntax
lapack_int LAPACKE_cunmlq (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const lapack_complex_float* a, lapack_int lda, const
lapack_complex_float* tau, lapack_complex_float* c, lapack_int ldc);
lapack_int LAPACKE_zunmlq (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const lapack_complex_double* a, lapack_int lda, const
lapack_complex_double* tau, lapack_complex_double* c, lapack_int ldc);

809
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description

The routine multiplies a real m-by-n matrix C by Q or QH, where Q is the unitary matrix Q of the LQ
factorization formed by the routine gelqf.
Depending on the parameters side and trans, the routine can form one of the matrix products Q*C, QH*C,
C*Q, or C*QH (overwriting the result on C).

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

side Must be either 'L' or 'R'.

If side = 'L', Q or QH is applied to C from the left.

If side = 'R', Q or QH is applied to C from the right.

trans Must be either 'N' or 'C'.

If trans = 'N', the routine multiplies C by Q.

If trans = 'C', the routine multiplies C by QH.

m The number of rows in the matrix C (m≥ 0).

n The number of columns in C (n≥ 0).

k The number of elementary reflectors whose product defines the matrix Q.

Constraints:
0 ≤k≤m if side = 'L';
0 ≤k≤n if side = 'R'.

a, c, tau Arrays:
a and tau are arrays returned by ?gelqf.

The size of a must be:

For side = 'L' and column major layout, max(1, lda*m).

For side = 'R' and column major layout, max(1, lda*n).

For row major layout regardless of side, max(1, lda*k).

The size of tau must be at least max(1, k).

c(size max(1, ldc*n) for column major layout and max(1, ldc*m for row
major layout) contains the m-by-n matrix C.

lda The leading dimension of a. For column major layout, lda≥ max(1, k). For
row major layout, if side = 'L', lda≥ max(1, m), or, if side = 'R', lda≥
max(1, n).

ldc The leading dimension of c; ldc≥ max(1, m) for column major layout and
max(1, n) for row major layout.

810
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

c Overwritten by the product Q*C, QH*C, C*Q, or C*QH (as specified by side
and trans).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The real counterpart of this routine is ormlq.

?geqlf
Computes the QL factorization of a general m-by-n
matrix.

Include Files
• mkl.h

Description

The routine forms the QL factorization of a general m-by-n matrix A (see Orthogonal Factorizations). No
pivoting is performed.
The routine does not form the matrix Q explicitly. Instead, Q is represented as a product of min(m, n)
elementary reflectors. Routines are provided to work with Q in this representation.

NOTE
This routine supports the Progress Routine feature. See Progress Function for details.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

m The number of rows in the matrix A (m≥ 0).

811
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

n The number of columns in A (n≥ 0).

a Array a of size max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout contains the matrix A.

lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.

Output Parameters

a Overwritten on exit by the factorization data as follows:

if m≥n, the lower triangle of the subarray a(m-n+1:m, 1:n) contains the n-
by-n lower triangular matrix L; if m≤n, the elements on and below the (n-
m)-th superdiagonal contain the m-by-n lower trapezoidal matrix L; in both
cases, the remaining elements, with the array tau, represent the
orthogonal/unitary matrix Q as a product of elementary reflectors.

tau Array, size at least max(1, min(m, n)). Contains scalar factors of the
elementary reflectors for the matrix Q (see Orthogonal Factorizations).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
Related routines include:

orgql to generate matrix Q (for real matrices);

ungql to generate matrix Q (for complex matrices);

ormql to apply matrix Q (for real matrices);

unmql to apply matrix Q (for complex matrices).

See Also
mkl_progress

Matrix Storage Schemes

?orgql
Generates the real matrix Q of the QL factorization
formed by ?geqlf.

Syntax
lapack_int LAPACKE_sorgql (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
float* a, lapack_int lda, const float* tau);
lapack_int LAPACKE_dorgql (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
double* a, lapack_int lda, const double* tau);

812
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h

Description

The routine generates an m-by-n real matrix Q with orthonormal columns, which is defined as the last n
columns of a product of k elementary reflectors H(i) of order m: Q = H(k) *...* H(2)*H(1) as returned
by the routines geqlf. Use this routine after a call to sgeqlf/dgeqlf.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

m The number of rows of the matrix Q (m≥ 0).

n The number of columns of the matrix Q (m≥ n≥ 0).

k The number of elementary reflectors whose product defines the matrix Q

(n≥ k≥ 0).

a, tau Arrays: a (size max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout), tau.
On entry, the (n - k + i)th column of a must contain the vector which
defines the elementary reflector H(i), for i = 1,2,...,k, as returned by
sgeqlf/dgeqlf in the last k columns of its array argument a; tau[i - 1]
must contain the scalar factor of the elementary reflector H(i), as returned
by sgeqlf/dgeqlf;

The size of tau must be at least max(1, k).

lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.

Output Parameters

a Overwritten by the last n columns of the m-by-m orthogonal matrix Q.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The complex counterpart of this routine is ungql.

?ungql
Generates the complex matrix Q of the QL
factorization formed by ?geqlf.

813
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
lapack_int LAPACKE_cungql (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
lapack_complex_float* a, lapack_int lda, const lapack_complex_float* tau);
lapack_int LAPACKE_zungql (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
lapack_complex_double* a, lapack_int lda, const lapack_complex_double* tau);

Include Files
• mkl.h

Description

The routine generates an m-by-n complex matrix Q with orthonormal columns, which is defined as the last n
columns of a product of k elementary reflectors H(i) of order m: Q = H(k) *...* H(2)*H(1) as returned
by the routines geqlf/geqlf . Use this routine after a call to cgeqlf/zgeqlf.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

m The number of rows of the matrix Q (m≥0).

n The number of columns of the matrix Q (m≥n≥0).

k The number of elementary reflectors whose product defines the matrix Q

(n≥k≥0).

a, tau Arrays: a (size max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout), tau.
On entry, the (n - k + i)th column of a must contain the vector which
defines the elementary reflector H(i), for i = 1,2,...,k, as returned by
cgeqlf/zgeqlf in the last k columns of its array argument a;
tau[i - 1] must contain the scalar factor of the elementary reflector H(i), as
returned by cgeqlf/zgeqlf;

The size of tau must be at least max(1, k).

lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.

Output Parameters

a Overwritten by the last n columns of the m-by-m unitary matrix Q.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The real counterpart of this routine is orgql.

814
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
?ormql
Multiplies a real matrix by the orthogonal matrix Q of
the QL factorization formed by ?geqlf.

Syntax
lapack_int LAPACKE_sormql (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const float* a, lapack_int lda, const float* tau, float* c,
lapack_int ldc);
lapack_int LAPACKE_dormql (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const double* a, lapack_int lda, const double* tau, double*
c, lapack_int ldc);

Include Files
• mkl.h

Description

The routine multiplies a real m-by-n matrix C by Q or QT, where Q is the orthogonal matrix Q of the QL
factorization formed by the routine geqlf.
Depending on the parameters side and trans, the routine ormql can form one of the matrix products Q*C,
QT*C, C*Q, or C*QT (overwriting the result over C).

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

side Must be either 'L' or 'R'.

If side = 'L', Q or QT is applied to C from the left.

If side = 'R', Q or QT is applied to C from the right.

trans Must be either 'N' or 'T'.

If trans = 'N', the routine multiplies C by Q.

If trans = 'T', the routine multiplies C by QT.

m The number of rows in the matrix C (m≥ 0).

n The number of columns in C (n≥ 0).

k The number of elementary reflectors whose product defines the matrix Q.

Constraints:
0 ≤k≤m if side = 'L';
0 ≤k≤n if side = 'R'.

a, tau, c Arrays: a, tau, c.

The size of a must be:

For column major layout regardless of side, max(1, lda*k).

For side = 'L' and row major layout, max(1, lda*m).

For side = 'R' and row major layout, max(1, lda*n).

815
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

On entry, the ith column of a must contain the vector which defines the
elementary reflector Hi, for i = 1,2,...,k, as returned by sgeqlf/dgeqlf in
the last k columns of its array argument a.
tau[i - 1] must contain the scalar factor of the elementary reflector Hi, as
returned by sgeqlf/dgeqlf.

The size of tau must be at least max(1, k).

c(size max(1, ldc*n) for column major layout and max(1, ldc*m) for row
major layout) contains the m-by-n matrix C.

lda The leading dimension of a;

if side = 'L', lda≥ max(1, m)for column major layout and max(1, k) for
row major layout ;
if side = 'R', lda≥ max(1, n)for column major layout and max(1, k) for
row major layout.

ldc The leading dimension of c; ldc≥ max(1, m)for column major layout and
max(1, n) for row major layout.

Output Parameters

c Overwritten by the product Q*C, QT*C, C*Q, or C*QT (as specified by side
and trans).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The complex counterpart of this routine is unmql.

?unmql
Multiplies a complex matrix by the unitary matrix Q of
the QL factorization formed by ?geqlf.

Syntax
lapack_int LAPACKE_cunmql (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const lapack_complex_float* a, lapack_int lda, const
lapack_complex_float* tau, lapack_complex_float* c, lapack_int ldc);
lapack_int LAPACKE_zunmql (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const lapack_complex_double* a, lapack_int lda, const
lapack_complex_double* tau, lapack_complex_double* c, lapack_int ldc);

Include Files
• mkl.h

Description

816
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The routine multiplies a complex m-by-n matrix C by Q or QH, where Q is the unitary matrix Q of the QL
factorization formed by the routine geqlf.
Depending on the parameters side and trans, the routine unmql can form one of the matrix products Q*C,
QH*C, C*Q, or C*QH (overwriting the result over C).

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

side Must be either 'L' or 'R'.

If side = 'L', Q or QH is applied to C from the left.

If side = 'R', Q or QH is applied to C from the right.

trans Must be either 'N' or 'C'.

If trans = 'N', the routine multiplies C by Q.

If trans = 'C', the routine multiplies C by QH.

m The number of rows in the matrix C (m≥ 0).

n The number of columns in C (n≥ 0).

k The number of elementary reflectors whose product defines the matrix Q.

Constraints:
0 ≤k≤m if side = 'L';
0 ≤k≤n if side = 'R'.

a, tau, c Arrays: a, tau, c.

The size of a must be:

For column major layout regardless of side, max(1, lda*k).

For side = 'L' and row major layout, max(1, lda*m).

For side = 'R' and row major layout, max(1, lda*n).

On entry, the i-th column of a must contain the vector which defines the
elementary reflector H(i), for i = 1,2,...,k, as returned by cgeqlf/zgeqlf
in the last k columns of its array argument a.
tau[i - 1] must contain the scalar factor of the elementary reflector H(i),
as returned by cgeqlf/zgeqlf.

The size of tau must be at least max(1, k).

c(size max(1, ldc*n) for column major layout and max(1, ldc*m for row
major layout) contains the m-by-n matrix C.

lda The leading dimension of a.

If side = 'L', lda≥ max(1, m)for column major layout and max(1, k) for
row major layout.
If side = 'R', lda≥ max(1, n)for column major layout and max(1, k) for
row major layout.

817
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ldc The leading dimension of c; ldc≥ max(1, m)for column major layout and
max(1, n) for row major layout.

Output Parameters

c Overwritten by the product Q*C, QH*C, C*Q, or C*QH (as specified by side
and trans).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The real counterpart of this routine is ormql.

?gerqf
Computes the RQ factorization of a general m-by-n
matrix.

Syntax
lapack_int LAPACKE_sgerqf (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int lda, float* tau);
lapack_int LAPACKE_dgerqf (int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int lda, double* tau);
lapack_int LAPACKE_cgerqf (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* tau);
lapack_int LAPACKE_zgerqf (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* tau);

Include Files
• mkl.h

Description

The routine forms the RQ factorization of a general m-by-n matrix A(see Orthogonal Factorizations). No
pivoting is performed.
The routine does not form the matrix Q explicitly. Instead, Q is represented as a product of min(m, n)
elementary reflectors. Routines are provided to work with Q in this representation.

NOTE
This routine supports the Progress Routine feature. See Progress Function for details.

818
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

m The number of rows in the matrix A (m≥ 0).

n The number of columns in A (n≥ 0).

a Array a of size max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout contains the m-by-n matrix A.

lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.

Output Parameters

a Overwritten on exit by the factorization data as follows:

if m≤n, the upper triangle of the subarray

a(1:m, n-m+1:n ) contains the m-by-m upper triangular matrix R;

if m≥n, the elements on and above the (m-n)th subdiagonal contain the m-
by-n upper trapezoidal matrix R;
in both cases, the remaining elements, with the array tau, represent the
orthogonal/unitary matrix Q as a product of min(m,n) elementary
reflectors.

tau Array, size at least max (1, min(m, n)). (See Orthogonal Factorizations.)
Contains scalar factors of the elementary reflectors for the matrix Q.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
Related routines include:

orgrq to generate matrix Q (for real matrices);

ungrq to generate matrix Q (for complex matrices);

ormrq to apply matrix Q (for real matrices);

unmrq to apply matrix Q (for complex matrices).

See Also
mkl_progress

Matrix Storage Schemes

819
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

?orgrq
Generates the real matrix Q of the RQ factorization
formed by ?gerqf.

Syntax
lapack_int LAPACKE_sorgrq (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
float* a, lapack_int lda, const float* tau);
lapack_int LAPACKE_dorgrq (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
double* a, lapack_int lda, const double* tau);

Include Files
• mkl.h

Description

The routine generates an m-by-n real matrix with orthonormal rows, which is defined as the last m rows of a
product of k elementary reflectors H(i) of order n: Q = H(1)* H(2)*...*H(k)as returned by the routines
gerqf. Use this routine after a call to sgerqf/dgerqf.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

m The number of rows of the matrix Q (m≥ 0).

n The number of columns of the matrix Q (n≥ m).

k The number of elementary reflectors whose product defines the matrix Q

(m≥ k≥ 0).

a, tau Arrays: a(size max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout), tau.
On entry, the (m - k + i)-th row of a must contain the vector which defines
the elementary reflector H(i), for i = 1,2,...,k, as returned by sgerqf/
dgerqf in the last k rows of its array argument a;
tau[i - 1] must contain the scalar factor of the elementary reflector H(i),
as returned by sgerqf/dgerqf;

The size of tau must be at least max(1, k).

lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.

Output Parameters

a Overwritten by the last m rows of the n-by-n orthogonal matrix Q.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

820
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Application Notes
The complex counterpart of this routine is ungrq.

?ungrq
Generates the complex matrix Q of the RQ
factorization formed by ?gerqf.

Syntax
lapack_int LAPACKE_cungrq (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
lapack_complex_float* a, lapack_int lda, const lapack_complex_float* tau);
lapack_int LAPACKE_zungrq (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
lapack_complex_double* a, lapack_int lda, const lapack_complex_double* tau);

Include Files
• mkl.h

Description

The routine generates an m-by-n complex matrix with orthonormal rows, which is defined as the last m rows
of a product of k elementary reflectors H(i) of order n: Q = H(1)H* H(2)H*...*H(k)H as returned by the
routines gerqf. Use this routine after a call to cgerqf/zgerqf.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

m The number of rows of the matrix Q (m≥0).

n The number of columns of the matrix Q (n≥m ).

k The number of elementary reflectors whose product defines the matrix Q

(m≥k≥0).

a, tau Arrays: a(size max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout), tau.
On entry, the (m - k + i)th row of a must contain the vector which defines
the elementary reflector H(i), for i = 1,2,...,k, as returned by cgerqf/
zgerqf in the last k rows of its array argument a;
tau[i - 1] must contain the scalar factor of the elementary reflector H(i), as
returned by cgerqf/zgerqf;

The size of tau must be at least max(1, k).

lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.

Output Parameters

a Overwritten by the m last rows of the n-by-n unitary matrix Q.

821
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The real counterpart of this routine is orgrq.

?ormrq
Multiplies a real matrix by the orthogonal matrix Q of
the RQ factorization formed by ?gerqf.

Syntax
lapack_int LAPACKE_sormrq (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const float* a, lapack_int lda, const float* tau, float* c,
lapack_int ldc);
lapack_int LAPACKE_dormrq (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const double* a, lapack_int lda, const double* tau, double*
c, lapack_int ldc);

Include Files
• mkl.h

Description

The routine multiplies a real m-by-n matrix C by Q or QT, where Q is the real orthogonal matrix defined as a
product of k elementary reflectors Hi : Q = H1H2 ... Hk as returned by the RQ factorization routine gerqf.

Depending on the parameters side and trans, the routine can form one of the matrix products Q*C, QT*C,
C*Q, or C*QT (overwriting the result over C).

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

side Must be either 'L' or 'R'.

If side = 'L', Q or QT is applied to C from the left.

If side = 'R', Q or QT is applied to C from the right.

trans Must be either 'N' or 'T'.

If trans = 'N', the routine multiplies C by Q.

If trans = 'T', the routine multiplies C by QT.

m The number of rows in the matrix C (m≥ 0).

n The number of columns in C (n≥ 0).

822
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
k The number of elementary reflectors whose product defines the matrix Q.
Constraints:
0 ≤k≤m, if side = 'L';
0 ≤k≤n, if side = 'R'.

a, tau, c Arrays: a(size for side = 'L': max(1, lda*m) for column major layout and
max(1, lda*k) for row major layout; for side = 'R': max(1, lda*n) for
column major layout and max(1, lda*k) for row major layout), tau, c (size
max(1, ldc*n) for column major layout and max(1, ldc*m) for row major
layout).
On entry, the ith row of a must contain the vector which defines the
elementary reflector Hi, for i = 1,2,...,k, as returned by sgerqf/dgerqf in
the last k rows of its array argument a.
tau[i - 1] must contain the scalar factor of the elementary reflector Hi, as
returned by sgerqf/dgerqf.

The size of tau must be at least max(1, k).

c contains the m-by-n matrix C.

lda The leading dimension of a; lda≥ max(1, k)for column major layout. For
row major layout, lda≥ max(1, m) if side = 'L', and lda≥ max(1, n) if
side = 'R'.

ldc The leading dimension of c; ldc≥ max(1, m)for column major layout and
max(1, n) for row major layout.

Output Parameters

c Overwritten by the product Q*C, QT*C, C*Q, or C*QT (as specified by side
and trans).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The complex counterpart of this routine is unmrq.

?unmrq
Multiplies a complex matrix by the unitary matrix Q of
the RQ factorization formed by ?gerqf.

Syntax
lapack_int LAPACKE_cunmrq (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const lapack_complex_float* a, lapack_int lda, const
lapack_complex_float* tau, lapack_complex_float* c, lapack_int ldc);
lapack_int LAPACKE_zunmrq (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const lapack_complex_double* a, lapack_int lda, const
lapack_complex_double* tau, lapack_complex_double* c, lapack_int ldc);

823
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description

The routine multiplies a complex m-by-n matrix C by Q or QH, where Q is the complex unitary matrix defined
as a product of k elementary reflectors H(i) of order n: Q = H(1)H* H(2)H*...*H(k)Has returned by the
RQ factorization routine gerqf .
Depending on the parameters side and trans, the routine can form one of the matrix products Q*C, QH*C,
C*Q, or C*QH (overwriting the result over C).

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

side Must be either 'L' or 'R'.

If side = 'L', Q or QH is applied to C from the left.

If side = 'R', Q or QH is applied to C from the right.

trans Must be either 'N' or 'C'.

If trans = 'N', the routine multiplies C by Q.

If trans = 'C', the routine multiplies C by QH.

m The number of rows in the matrix C (m≥ 0).

n The number of columns in C (n≥ 0).

k The number of elementary reflectors whose product defines the matrix Q.

Constraints:
0 ≤k≤m, if side = 'L';
0 ≤k≤n, if side = 'R'.

a, tau, c Arrays: a(size for side = 'L': max(1, lda*m) for column major layout and
max(1, lda*k) for row major layout; for side = 'R': max(1, lda*n) for
column major layout and max(1, lda*k) for row major layout), tau, c (size
max(1, ldc*n) for column major layout and max(1, ldc*m) for row major
layout).
On entry, the ith row of a must contain the vector which defines the
elementary reflector H(i), for i = 1,2,...,k, as returned by cgerqf/zgerqf in
the last k rows of its array argument a.
tau[i - 1] must contain the scalar factor of the elementary reflector H(i), as
returned by cgerqf/zgerqf.

The size of tau must be at least max(1, k).

c(size max(1, ldc*n) for column major layout and max(1, ldc*m for row
major layout) contains the m-by-n matrix C.

824
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lda The leading dimension of a; lda≥ max(1, k)for column major layout. For row
major layout, lda≥ max(1, m) if side = 'L', and lda≥ max(1, n) if side
= 'R' .

ldc The leading dimension of c; ldc≥ max(1, m)for column major layout and
max(1, n) for row major layout.

Output Parameters

c Overwritten by the product Q*C, QH*C, C*Q, or C*QH (as specified by side
and trans).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The real counterpart of this routine is ormrq.

?tzrzf
Reduces the upper trapezoidal matrix A to upper
triangular form.

Syntax
lapack_int LAPACKE_stzrzf (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int lda, float* tau);
lapack_int LAPACKE_dtzrzf (int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int lda, double* tau);
lapack_int LAPACKE_ctzrzf (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* tau);
lapack_int LAPACKE_ztzrzf (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* tau);

Include Files
• mkl.h

Description

The routine reduces the m-by-n (m≤n) real/complex upper trapezoidal matrix A to upper triangular form by
means of orthogonal/unitary transformations. The upper trapezoidal matrix A = [A1 A2] = [A1:m, 1:m, A1:m, m
+1:n] is factored as

A = [R0]*Z,
where Z is an n-by-n orthogonal/unitary matrix, R is an m-by-m upper triangular matrix, and 0 is the m-by-
(n-m) zero matrix.
The ?tzrzf routine replaces the deprecated ?tzrqf routine.

825
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

m The number of rows in the matrix A (m≥ 0).

n The number of columns in A (n≥m).

a Array a is of size max(1, lda*n) for column major layout and max(1,
lda*m) for row major layout.
The leading m-by-n upper trapezoidal part of the array a contains the
matrix A to be factorized.

lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.

Output Parameters

a Overwritten on exit by the factorization data as follows:

the leading m-by-m upper triangular part of a contains the upper triangular
matrix R, and elements m +1 to n of the first m rows of a, with the array
tau, represent the orthogonal matrix Z as a product of m elementary
reflectors.

tau Array, size at least max (1, m). Contains scalar factors of the elementary
reflectors for the matrix Z.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The factorization is obtained by Householder's method. The k-th transformation matrix, Z(k), which is used
to introduce zeros into the (m - k + 1)-th row of A, is given in the form

826
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
where for real flavors

and for complex flavors

tau is a scalar and z(k) is an l-element vector. tau and z(k) are chosen to annihilate the elements of the k-th
row of A2.
The scalar tau is returned in the k-th element of tau and the vector u(k) in the k-th row of A, such that the
elements of z(k) are stored in the last m - n elements of the k-th row of array a.

The elements of R are returned in the upper triangular part of A.

The matrix Z is given by
Z = Z(1)*Z(2)*...*Z(m).
Related routines include:

ormrz to apply matrix Q (for real matrices)

unmrz to apply matrix Q (for complex matrices).

?ormrz
Multiplies a real matrix by the orthogonal matrix
defined from the factorization formed by ?tzrzf.

Syntax
lapack_int LAPACKE_sormrz (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int l, const float* a, lapack_int lda, const float*
tau, float* c, lapack_int ldc);
lapack_int LAPACKE_dormrz (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int l, const double* a, lapack_int lda, const
double* tau, double* c, lapack_int ldc);

Include Files
• mkl.h

827
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Description

The ?ormrz routine multiplies a real m-by-n matrix C by Q or QT, where Q is the real orthogonal matrix
defined as a product of k elementary reflectors H(i) of order n: Q = H(1)* H(2)*...*H(k) as returned by
the factorization routine tzrzf .
Depending on the parameters side and trans, the routine can form one of the matrix products Q*C, QT*C,
C*Q, or C*QT (overwriting the result over C).
The matrix Q is of order m if side = 'L' and of order n if side = 'R'.

The ?ormrz routine replaces the deprecated ?latzm routine.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

side Must be either 'L' or 'R'.

If side = 'L', Q or QT is applied to C from the left.

If side = 'R', Q or QT is applied to C from the right.

trans Must be either 'N' or 'T'.

If trans = 'N', the routine multiplies C by Q.

If trans = 'T', the routine multiplies C by QT.

m The number of rows in the matrix C (m≥ 0).

n The number of columns in C (n≥ 0).

k The number of elementary reflectors whose product defines the matrix Q.

Constraints:
0 ≤k≤m, if side = 'L';
0 ≤k≤n, if side = 'R'.

l The number of columns of the matrix A containing the meaningful part of

the Householder reflectors. Constraints:
0 ≤l≤m, if side = 'L';
0 ≤l≤n, if side = 'R'.

a, tau, c Arrays: a(size for side = 'L': max(1, lda*m) for column major layout and
max(1, lda*k) for row major layout; for side = 'R': max(1, lda*b) for
column major layout and max(1, lda*k) for row major layout), tau, c (size
max(1, ldc*n) for column major layout and max(1, ldc*m) for row major
layout).
On entry, the ith row of a must contain the vector which defines the
elementary reflector H(i), for i = 1,2,...,k, as returned by stzrzf/dtzrzf
in the last k rows of its array argument a.
tau[i - 1] must contain the scalar factor of the elementary reflector H(i),
as returned by stzrzf/dtzrzf.

The size of tau must be at least max(1, k).

828
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
c contains the m-by-n matrix C.

lda The leading dimension of a; lda≥ max(1, k)for column major layout. For row
major layout, lda≥ max(1, m) if side = 'L', and lda≥ max(1, n) if side
= 'R' .

ldc The leading dimension of c; ldc≥ max(1, m)for column major layout and
max(1, n) for row major layout.

Output Parameters

c Overwritten by the product Q*C, QT*C, C*Q, or C*QT (as specified by side
and trans).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The complex counterpart of this routine is unmrz.

?unmrz
Multiplies a complex matrix by the unitary matrix
defined from the factorization formed by ?tzrzf.

Syntax
lapack_int LAPACKE_cunmrz (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int l, const lapack_complex_float* a, lapack_int
lda, const lapack_complex_float* tau, lapack_complex_float* c, lapack_int ldc);
lapack_int LAPACKE_zunmrz (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int l, const lapack_complex_double* a, lapack_int
lda, const lapack_complex_double* tau, lapack_complex_double* c, lapack_int ldc);

Include Files
• mkl.h

Description

The routine multiplies a complex m-by-n matrix C by Q or QH, where Q is the unitary matrix defined as a
product of k elementary reflectors H(i):

Q = H(1)H* H(2)H...H(k)H as returned by the factorization routine tzrzf.

Depending on the parameters side and trans, the routine can form one of the matrix products Q*C, QH*C,
C*Q, or C*QH (overwriting the result over C).
The matrix Q is of order m if side = 'L' and of order n if side = 'R'.

829
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

side Must be either 'L' or 'R'.

If side = 'L', Q or QH is applied to C from the left.

If side = 'R', Q or QH is applied to C from the right.

trans Must be either 'N' or 'C'.

If trans = 'N', the routine multiplies C by Q.

If trans = 'C', the routine multiplies C by QH.

m The number of rows in the matrix C (m≥ 0).

n The number of columns in C (n≥ 0).

k The number of elementary reflectors whose product defines the matrix Q.

Constraints:
0 ≤k≤m, if side = 'L';
0 ≤k≤n, if side = 'R'.

l The number of columns of the matrix A containing the meaningful part of

the Householder reflectors. Constraints:
0 ≤l≤m, if side = 'L';
0 ≤l≤n, if side = 'R'.

a, tau, c Arrays: a(size for side = 'L': max(1, lda*m) for column major layout and
max(1, lda*k) for row major layout; for side = 'R': max(1, lda*b) for
column major layout and max(1, lda*k) for row major layout), tau, c (size
max(1, ldc*n) for column major layout and max(1, ldc*m) for row major
layout).
On entry, the ith row of a must contain the vector which defines the
elementary reflector H(i), for i = 1,2,...,k, as returned by ctzrzf/ztzrzf
in the last k rows of its array argument a.
tau[i - 1] must contain the scalar factor of the elementary reflector H(i),
as returned by ctzrzf/ztzrzf.

The size of tau must be at least max(1, k).

c contains the m-by-n matrix C.

lda The leading dimension of a; lda≥ max(1, k)for column major layout. For
row major layout, lda≥ max(1, m) if side = 'L', and lda≥ max(1, n) if
side = 'R'.

ldc The leading dimension of c; ldc≥ max(1, m)for column major layout and
max(1, n) for row major layout.

830
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

c Overwritten by the product Q*C, QH*C, C*Q, or C*QH (as specified by side
and trans).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The real counterpart of this routine is ormrz.

?ggqrf
Computes the generalized QR factorization of two
matrices.

Syntax
lapack_int LAPACKE_sggqrf (int matrix_layout, lapack_int n, lapack_int m, lapack_int p,
float* a, lapack_int lda, float* taua, float* b, lapack_int ldb, float* taub);
lapack_int LAPACKE_dggqrf (int matrix_layout, lapack_int n, lapack_int m, lapack_int p,
double* a, lapack_int lda, double* taua, double* b, lapack_int ldb, double* taub);
lapack_int LAPACKE_cggqrf (int matrix_layout, lapack_int n, lapack_int m, lapack_int p,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* taua,
lapack_complex_float* b, lapack_int ldb, lapack_complex_float* taub);
lapack_int LAPACKE_zggqrf (int matrix_layout, lapack_int n, lapack_int m, lapack_int p,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* taua,
lapack_complex_double* b, lapack_int ldb, lapack_complex_double* taub);

Include Files
• mkl.h

Description

The routine forms the generalized QR factorization of an n-by-m matrix A and an n-by-p matrix B as A =
Q*R, B = Q*T*Z, where Q is an n-by-n orthogonal/unitary matrix, Z is a p-by-p orthogonal/unitary matrix,
and R and T assume one of the forms:

831
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

where R11 is upper triangular, and

where T12 or T21 is a p-by-p upper triangular matrix.

In particular, if B is square and nonsingular, the GQR factorization of A and B implicitly gives the QR
factorization of B-1A as:
B-1*A = ZT*(T-1*R) (for real flavors) or B-1*A = ZH*(T-1*R) (for complex flavors).

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

n The number of rows of the matrices A and B (n≥ 0).

m The number of columns in A (m≥ 0).

p The number of columns in B (p≥ 0).

a, b Array a of size max(1, lda*m) for column major layout and max(1, lda*n)
for row major layout contains the matrix A.
Array b of size max(1, ldb*p) for column major layout and max(1, ldb*n)
for row major layout contains the matrix B.

lda The leading dimension of a; at least max(1, n) for column major layout and
at least max(1, m) for row major layout.

ldb The leading dimension of b; at least max(1, n) for column major layout and
at least max(1, p) for row major layout.

832
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

a, b Overwritten by the factorization data as follows:

on exit, the elements on and above the diagonal of the array a contain the
min(n,m)-by-m upper trapezoidal matrix R (R is upper triangular if n≥m);the
elements below the diagonal, with the array taua, represent the orthogonal/
unitary matrix Q as a product of min(n,m) elementary reflectors ;
if n≤p, the upper triangle of the subarray b(1:n, p-n+1:p ) contains the n-
by-n upper triangular matrix T;
if n > p, the elements on and above the (n-p)th subdiagonal contain the n-
by-p upper trapezoidal matrix T; the remaining elements, with the array
taub, represent the orthogonal/unitary matrix Z as a product of elementary
reflectors.

taua, taub Arrays, size at least max (1, min(n, m)) for taua and at least max (1,
min(n, p)) for taub. The array taua contains the scalar factors of the
elementary reflectors which represent the orthogonal/unitary matrix Q.
The array taub contains the scalar factors of the elementary reflectors
which represent the orthogonal/unitary matrix Z.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The matrix Q is represented as a product of elementary reflectors
Q = H(1)H(2)...H(k), where k = min(n,m).
Each H(i) has the form
H(i) = I - τa*v*vT for real flavors, or
H(i) = I - τa*v*vH for complex flavors,
where τa is a real/complex scalar, and v is a real/complex vector with vj = 0 for 1 ≤j≤i - 1, vi = 1.

On exit, fori + 1 ≤j≤n, vj is stored in a[(j - 1) + (i - 1)*lda] for column major layout and in a[(j -
1)*lda + (i - 1)] for row major layout and τa is stored in taua[i - 1]
The matrix Z is represented as a product of elementary reflectors
Z = H(1)H(2)...H(k), where k = min(n,p).
Each H(i) has the form
H(i) = I - τb*v*vT for real flavors, or
H(i) = I - τb*v*vH for complex flavors,
where τb is a real/complex scalar, and v is a real/complex vector with vp - k + 1 = 1, vj = 0 for p - k + 1 ≤j≤p -
1, .
On exit, for 1 ≤j≤p - k + i - 1, vj is stored in b[(n - k + i - 1) + (j - 1)*ldb] for column major layout
and in b[(n - k + i - 1)*ldb + (j - 1)] for row major layout and τb is stored in taub[i - 1].

833
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

?ggrqf
Computes the generalized RQ factorization of two
matrices.

Syntax
lapack_int LAPACKE_sggrqf (int matrix_layout, lapack_int m, lapack_int p, lapack_int n,
float* a, lapack_int lda, float* taua, float* b, lapack_int ldb, float* taub);
lapack_int LAPACKE_dggrqf (int matrix_layout, lapack_int m, lapack_int p, lapack_int n,
double* a, lapack_int lda, double* taua, double* b, lapack_int ldb, double* taub);
lapack_int LAPACKE_cggrqf (int matrix_layout, lapack_int m, lapack_int p, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* taua,
lapack_complex_float* b, lapack_int ldb, lapack_complex_float* taub);
lapack_int LAPACKE_zggrqf (int matrix_layout, lapack_int m, lapack_int p, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* taua,
lapack_complex_double* b, lapack_int ldb, lapack_complex_double* taub);

Include Files
• mkl.h

Description

The routine forms the generalized RQ factorization of an m-by-n matrix A and an p-by-n matrix B as A =
R*Q, B = Z*T*Q, where Q is an n-by-n orthogonal/unitary matrix, Z is a p-by-p orthogonal/unitary matrix,
and R and T assume one of the forms:

where R11 or R21 is upper triangular, and

834
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

where T11 is upper triangular.

In particular, if B is square and nonsingular, the GRQ factorization of A and B implicitly gives the RQ
factorization of A*B-1 as:
A*B-1 = (R*T-1)*ZT (for real flavors) or A*B-1 = (R*T-1)*ZH (for complex flavors).

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

m The number of rows of the matrix A (m≥ 0).

p The number of rows in B (p≥ 0).

n The number of columns of the matrices A and B (n≥ 0).

a, b Arrays:
a(size max(1, lda*n) for column major layout and max(1, lda*m) for row
major layout) contains the m-by-n matrix A.
b(size max(1, ldb*n) for column major layout and max(1, ldb*p) for row
major layout) contains the p-by-n matrix B.

lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.

ldb The leading dimension of b; at least max(1, p)for column major layout and
max(1, n) for row major layout.

Output Parameters

a, b Overwritten by the factorization data as follows:

on exit, if m≤n, element Ri j (1<=i≤j≤m) of upper triangular matrix R is
stored in a[(i - 1) + (n - m + j - 1)*lda] for column major layout
and in a[(i - 1)*lda + (n - m + j - 1)] for row major layout.

835
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

if m > n, the elements on and above the (m-n)th subdiagonal contain the
m-by-n upper trapezoidal matrix R;
the remaining elements, with the array taua, represent the orthogonal/
unitary matrix Q as a product of elementary reflectors.
The elements on and above the diagonal of the array b contain the
min(p,n)-by-n upper trapezoidal matrix T (T is upper triangular if p≥n); the
elements below the diagonal, with the array taub, represent the orthogonal/
unitary matrix Z as a product of elementary reflectors.

taua, taub Arrays, size at least max (1, min(m, n)) for taua and at least max (1,
min(p, n)) for taub.
The array taua contains the scalar factors of the elementary reflectors
which represent the orthogonal/unitary matrix Q.
The array taub contains the scalar factors of the elementary reflectors
which represent the orthogonal/unitary matrix Z.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The matrix Q is represented as a product of elementary reflectors
Q = H(1)H(2)...H(k), where k = min(m,n).
Each H(i) has the form
H(i) = I - taua*v*vT for real flavors, or
H(i) = I - taua*v*vH for complex flavors,
where taua is a real/complex scalar, and v is a real/complex vector with vn - k + i = 1, vn - k + i + 1:n = 0.
On exit, v1:n - k + i - 1 is stored in a(m-k+i,1:n-k+i-1) and taua is stored in taua[i - 1].

The matrix Z is represented as a product of elementary reflectors

Z = H(1)H(2)...H(k), where k = min(p,n).
Each H(i) has the form
H(i) = I - taub*v*vT for real flavors, or
H(i) = I - taub*v*vH for complex flavors,
where taub is a real/complex scalar, and v is a real/complex vector with v1:i - 1 = 0, vi = 1.
On exit, vi + 1:p is stored in b(i+1:p, i) and taub is stored in taub[i - 1].

?tpqrt
Computes a blocked QR factorization of a real or
complex "triangular-pentagonal" matrix, which is
composed of a triangular block and a pentagonal
block, using the compact WY representation for Q.

836
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_stpqrt (int matrix_layout, lapack_int m, lapack_int n, lapack_int l,
lapack_int nb, float* a, lapack_int lda, float* b, lapack_int ldb, float* t, lapack_int
ldt);
lapack_int LAPACKE_dtpqrt (int matrix_layout, lapack_int m, lapack_int n, lapack_int l,
lapack_int nb, double* a, lapack_int lda, double* b, lapack_int ldb, double* t,
lapack_int ldt);
lapack_int LAPACKE_ctpqrt (int matrix_layout, lapack_int m, lapack_int n, lapack_int l,
lapack_int nb, lapack_complex_float* a, lapack_int lda, lapack_complex_float* b,
lapack_int ldb, lapack_complex_float* t, lapack_int ldt);
lapack_int LAPACKE_ztpqrt (int matrix_layout, lapack_int m, lapack_int n, lapack_int l,
lapack_int nb, lapack_complex_double* a, lapack_int lda, lapack_complex_double* b,
lapack_int ldb, lapack_complex_double* t, lapack_int ldt);

Include Files
• mkl.h

Description

The input matrix C is an (n+m)-by-n matrix

where A is an n-by-n upper triangular matrix, and B is an m-by-n pentagonal matrix consisting of an (m-l)-
by-n rectangular matrix B1 on top of an l-by-n upper trapezoidal matrix B2:

The upper trapezoidal matrix B2 consists of the first l rows of an n-by-n upper triangular matrix, where 0 ≤
l ≤ min(m,n). If l=0, B is an m-by-n rectangular matrix. If m=l=n, B is upper triangular. The elementary
reflectors H(i) are stored in the ith column below the diagonal in the (n+m)-by-n input matrix C. The
structure of vectors defining the elementary reflectors is illustrated by:

837
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The elements of the unit matrix I are not stored. Thus, V contains all of the necessary information, and is
returned in array b.

NOTE
Note that V has the same form as B:

The columns of V represent the vectors which define the H(i)s.

The number of blocks is k = ceiling(n/nb), where each block is of order nb except for the last block, which is
of order ib = n - (k-1)*nb. For each of the k blocks, an upper triangular block reflector factor is computed:
T1, T2, ..., Tk. The nb-by-nb (ib-by-ib for the last block) Tis are stored in the nb-by-n array t as

t = [T1T2 ... Tk] .

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

m The total number of rows in the matrix B (m ≥ 0).

n The number of columns in B and the order of the triangular matrix A (n ≥

0).

l The number of rows of the upper trapezoidal part of B (min(m, n) ≥ l ≥ 0).

nb The block size to use in the blocked QR factorization (n ≥ nb ≥ 1).

a, b Arrays: a size lda*n contains the n-by-n upper triangular matrix A.

b size max(1, ldb*n) for column major layout and max(1, ldb*m) for row
major layout, the pentagonal m-by-n matrix B. The first (m-l) rows contain
the rectangular B1 matrix, and the next l rows contain the upper
trapezoidal B2 matrix.

lda The leading dimension of a; at least max(1, n).

838
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ldb The leading dimension of b; at least max(1, m) for column major layout and
at least max(1, n) for row major layout.

ldt The leading dimension of t; at least nb for column major layout and at least
max(1, n) for row major layout.

Output Parameters

a The elements on and above the diagonal of the array contain the upper
triangular matrix R.

b The pentagonal matrix V.

t Array, size ldt*n for column major layout and ldt*nb for row major
layout.
The upper triangular block reflectors stored in compact form as a sequence
of upper triangular blocks.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

?tpmqrt
Applies a real or complex orthogonal matrix obtained
from a "triangular-pentagonal" complex block reflector
to a general real or complex matrix, which consists of
two blocks.

Syntax
lapack_int LAPACKE_stpmqrt (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int l, lapack_int nb, const float* v, lapack_int ldv,
const float* t, lapack_int ldt, float* a, lapack_int lda, float* b, lapack_int ldb);
lapack_int LAPACKE_dtpmqrt (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int l, lapack_int nb, const double* v, lapack_int
ldv, const double* t, lapack_int ldt, double* a, lapack_int lda, double* b, lapack_int
ldb);
lapack_int LAPACKE_ctpmqrt (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int l, lapack_int nb, const lapack_complex_float* v,
lapack_int ldv, const lapack_complex_float* t, lapack_int ldt, lapack_complex_float* a,
lapack_int lda, lapack_complex_float* b, lapack_int ldb);
lapack_int LAPACKE_ztpmqrt (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int l, lapack_int nb, const lapack_complex_double*
v, lapack_int ldv, const lapack_complex_double* t, lapack_int ldt,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* b, lapack_int ldb);

Include Files
• mkl.h

839
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Description

The columns of the pentagonal matrix V contain the elementary reflectors H(1), H(2), ..., H(k); V is
composed of a rectangular block V1 and a trapezoidal block V2:

The size of the trapezoidal block V2 is determined by the parameter l, where 0 ≤ l ≤ k. V2 is upper
trapezoidal, consisting of the first l rows of a k-by-k upper triangular matrix.

If l=k, V2 is upper triangular;

If l=0, there is no trapezoidal block, so V = V1 is rectangular.

If side = 'L':

840
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

where A is k-by-n, B is m-by-n and V is m-by-k.

If side = 'R':

where A is m-by-k, B is m-by-n and V is n-by-k.

The real/complex orthogonal matrix Q is formed from V and T.

If trans='N' and side='L', c contains Q * C on exit.

If trans='T' and side='L', C contains QT * C on exit.

If trans='C' and side='L', C contains QH * C on exit.

If trans='N' and side='R', C contains C * Q on exit.

If trans='T' and side='R', C contains C * QT on exit.

841
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If trans='C' and side='R', C contains C * QH on exit.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

side ='L': apply Q, QT, or QH from the left.

='R': apply Q, QT, or QH from the right.

trans ='N', no transpose, apply Q.

='T', transpose, apply QT.
='C', transpose, apply QH.

m The number of rows in the matrix B, (m ≥ 0).

n The number of columns in the matrix B, (n ≥ 0).

k The number of elementary reflectors whose product defines the matrix Q,

(k ≥ 0).

l The order of the trapezoidal part of V (k ≥ l ≥ 0).

nb The block size used for the storage of t, k ≥ nb ≥ 1. This must be the same
value of nb used to generate t in tpqrt.

v Size ldv*k for column major layout; ldv*m for row major layout and side
= 'L', ldv*n for row major layout and side = 'R'.

The ith column must contain the vector which defines the elementary
reflector H(i), for i = 1,2,...,k, as returned by tpqrt in array argument b.

ldv The leading dimension of the array v.

If side = 'L', ldv must be at least max(1,m) for column major layout and
max(1, k for row major layout;

If side = 'R', ldv must be at least max(1,n) for column major layout and
max(1, k for row major layout.

t Array, size ldt*k for column major layout and ldt*nb for row major
layout.
The upper triangular factors of the block reflectors as returned by tpqrt

ldt The leading dimension of the array t. ldt must be at least nb for column
major layout and max(1, k for row major layout.

a If side = 'L', size lda*n for column major layout and lda*k for row major
layout ..
If side = 'R', size lda*k for column major layout and lda*m for row major
layout ..
The k-by-n or m-by-k matrix A.

lda The leading dimension of the array a.

842
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If side = 'L', lda must be at least max(1,k) for column major layout and
max(1, n for row major layout.

If side = 'R', lda must be at least max(1,m) for column major layout and
max(1, k for row major layout.

b Size ldb*n for column major layout and ldb*m for row major layout.

The m-by-n matrix B.

ldb The leading dimension of the array b. ldb must be at least max(1,m) for
column major layout and max(1, n for row major layout.

Output Parameters

a Overwritten by the corresponding block of the product QC, CQ, QT*C,

C*QT, QH*C, or C*QH.

b Overwritten by the corresponding block of the product QC, CQ, QT*C,

C*QT, QH*C, or C*QH.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Singular Value Decomposition: LAPACK Computational Routines

This topic describes LAPACK routines for computing the singular value decomposition (SVD) of a general m-
by-n matrix A:
A = UΣVH.
In this decomposition, U and V are unitary (for complex A) or orthogonal (for real A); Σ is an m-by-n
diagonal matrix with real diagonal elements σi:
σ1≥σ2≥ ... ≥σmin(m, n)≥ 0.
The diagonal elements σi are singular values of A. The first min(m, n) columns of the matrices U and V are,
respectively, left and right singular vectors of A. The singular values and singular vectors satisfy
Avi = σiui and AHui = σivi
where ui and vi are the i-th columns of U and V, respectively.
To find the SVD of a general matrix A, call the LAPACK routine ?gebrd or ?gbbrd for reducing A to a
bidiagonal matrix B by a unitary (orthogonal) transformation: A = QBPH. Then call ?bdsqr, which forms the
SVD of a bidiagonal matrix: B = U1ΣV1H.
Thus, the sought-for SVD of A is given by A = UΣVH =(QU1)Σ(V1HPH).
Table "Computational Routines for Singular Value Decomposition (SVD)" lists LAPACK routines that perform
singular value decomposition of matrices.
Computational Routines for Singular Value Decomposition (SVD)
Operation Real matrices Complex matrices

Reduce A to a bidiagonal matrix B: A = QBPH ?gebrd ?gebrd

(full storage)

843
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Operation Real matrices Complex matrices

Reduce A to a bidiagonal matrix B: A = QBPH ?gbbrd ?gbbrd

(band storage)

Generate the orthogonal (unitary) matrix Q or ?orgbr ?ungbr

Apply the orthogonal (unitary) matrix Q or P ?ormbr ?unmbr

Form singular value decomposition of the ?bdsqr ?bdsdc ?bdsqr

bidiagonal matrix B: B = UΣVH

You can use the SVD to find a minimum-norm solution to a (possibly) rank-deficient least squares problem of
minimizing ||Ax - b||2. The effective rank k of the matrix A can be determined as the number of singular
values which exceed a suitable threshold. The minimum-norm solution is
x = Vk(Σk)-1c
where Σk is the leading k-by-k submatrix of Σ, the matrix Vk consists of the first k columns of V = PV1, and
the vector c consists of the first k elements of UHb = U1HQHb.

?gebrd
Reduces a general matrix to bidiagonal form.

Syntax
lapack_int LAPACKE_sgebrd( int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int lda, float* d, float* e, float* tauq, float* taup );
lapack_int LAPACKE_dgebrd( int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int lda, double* d, double* e, double* tauq, double* taup );
lapack_int LAPACKE_cgebrd( int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float* a, lapack_int lda, float* d, float* e, lapack_complex_float*
tauq, lapack_complex_float* taup );
lapack_int LAPACKE_zgebrd( int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double* a, lapack_int lda, double* d, double* e, lapack_complex_double*
tauq, lapack_complex_double* taup );

Include Files
• mkl.h

Description

The routine reduces a general m-by-n matrix A to a bidiagonal matrix B by an orthogonal (unitary)
transformation.

H B1 H
If m≥n, the reduction is given by A = QBP = P = Q1B1PH ,
0
where B1 is an n-by-n upper diagonal matrix, Q and P are orthogonal or, for a complex A, unitary matrices;
Q1 consists of the first n columns of Q.
If m < n, the reduction is given by

A = QBPH = Q(B10)PH = Q1B1P1H,

where B1 is an m-by-m lower diagonal matrix, Q and P are orthogonal or, for a complex A, unitary matrices;
P1 consists of the first m columns of P.

844
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The routine does not form the matrices Q and P explicitly, but represents them as products of elementary
reflectors. Routines are provided to work with the matrices Q and P in this representation:
If the matrix A is real,

• to compute Q and P explicitly, call orgbr.

• to multiply a general matrix by Q or P, call ormbr.
If the matrix A is complex,

• to compute Q and P explicitly, call ungbr.

• to multiply a general matrix by Q or P, call unmbr.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

m The number of rows in the matrix A (m≥ 0).

n The number of columns in A (n≥ 0).

a Arrays:
a(size max(1, lda*n) for column major layout and max(1, lda*m) for row
major layout) contains the matrix A.

lda The leading dimension of a; at least max(1, m) for column major layout
and at least max(1, n) for row major layout.

Output Parameters

a If m≥n, the diagonal and first super-diagonal of a are overwritten by the

upper bidiagonal matrix B. The elements below the diagonal, with the array
tauq, represent the orthogonal matrix Q as a product of elementary
reflectors, and the elements above the first superdiagonal, with the array
taup, represent the orthogonal matrix P as a product of elementary
reflectors.
If m < n, the diagonal and first sub-diagonal of a are overwritten by the
lower bidiagonal matrix B. The elements below the first subdiagonal, with
the array tauq, represent the orthogonal matrix Q as a product of
elementary reflectors, and the elements above the diagonal, with the array
taup, represent the orthogonal matrix P as a product of elementary
reflectors.

d Array, size at least max(1, min(m, n)).

Contains the diagonal elements of B.

e Array, size at least max(1, min(m, n) - 1). Contains the off-diagonal

elements of B.

tauq, taup Arrays, size at least max (1, min(m, n)). The scalar factors of the
elementary reflectors which represent the orthogonal or unitary matrices P
and Q.

Return Values
This function returns a value info.

845
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed matrices Q, B, and P satisfy QBPH = A + E, where ||E||2 = c(n)ε ||A||2, c(n) is a
modestly increasing function of n, and ε is the machine precision.
The approximate number of floating-point operations for real flavors is
(4/3)*n2*(3*m - n) for m≥n,
(4/3)*m2*(3*n - m) for m < n.
The number of operations for complex flavors is four times greater.
If n is much less than m, it can be more efficient to first form the QR factorization of A by calling geqrf and
then reduce the factor R to bidiagonal form. This requires approximately 2*n2*(m + n) floating-point
operations.
If m is much less than n, it can be more efficient to first form the LQ factorization of A by calling gelqf and
then reduce the factor L to bidiagonal form. This requires approximately 2*m2*(m + n) floating-point
operations.

?gbbrd
Reduces a general band matrix to bidiagonal form.

Syntax
lapack_int LAPACKE_sgbbrd( int matrix_layout, char vect, lapack_int m, lapack_int n,
lapack_int ncc, lapack_int kl, lapack_int ku, float* ab, lapack_int ldab, float* d,
float* e, float* q, lapack_int ldq, float* pt, lapack_int ldpt, float* c, lapack_int
ldc );
lapack_int LAPACKE_dgbbrd( int matrix_layout, char vect, lapack_int m, lapack_int n,
lapack_int ncc, lapack_int kl, lapack_int ku, double* ab, lapack_int ldab, double* d,
double* e, double* q, lapack_int ldq, double* pt, lapack_int ldpt, double* c, lapack_int
ldc );
lapack_int LAPACKE_cgbbrd( int matrix_layout, char vect, lapack_int m, lapack_int n,
lapack_int ncc, lapack_int kl, lapack_int ku, lapack_complex_float* ab, lapack_int
ldab, float* d, float* e, lapack_complex_float* q, lapack_int ldq,
lapack_complex_float* pt, lapack_int ldpt, lapack_complex_float* c, lapack_int ldc );
lapack_int LAPACKE_zgbbrd( int matrix_layout, char vect, lapack_int m, lapack_int n,
lapack_int ncc, lapack_int kl, lapack_int ku, lapack_complex_double* ab, lapack_int
ldab, double* d, double* e, lapack_complex_double* q, lapack_int ldq,
lapack_complex_double* pt, lapack_int ldpt, lapack_complex_double* c, lapack_int ldc );

Include Files
• mkl.h

Description
The routine reduces an m-by-n band matrix A to upper bidiagonal matrix B: A = Q*B*PH. Here the matrices
Q and P are orthogonal (for real A) or unitary (for complex A). They are determined as products of Givens
rotation matrices, and may be formed explicitly by the routine if required. The routine can also update a
matrix C as follows: C = QH*C.

846
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

vect Must be 'N' or 'Q' or 'P' or 'B'.

If vect = 'N', neither Q nor PH is generated.

If vect = 'Q', the routine generates the matrix Q.

If vect = 'P', the routine generates the matrix PH.

If vect = 'B', the routine generates both Q and PH.

m The number of rows in the matrix A (m≥ 0).

n The number of columns in A (n≥ 0).

ncc The number of columns in C (ncc≥ 0).

kl The number of sub-diagonals within the band of A (kl≥ 0).

ku The number of super-diagonals within the band of A (ku≥ 0).

ab, c Arrays:
ab(size max(1, ldab*n) for column major layout and max(1, ldab*m) for
row major layout) contains the matrix A in band storage (see Matrix
Storage Schemes).
c(size max(1, ldc*ncc) for column major layout and max(1, ldc*m) for
row major layout) contains an m-by-ncc matrix C.
If ncc = 0, the array c is not referenced.

ldab The leading dimension of the array ab (ldab≥kl + ku + 1).

ldq The leading dimension of the output array q.

ldq≥ max(1, m) if vect = 'Q' or 'B', ldq≥ 1 otherwise.

ldpt The leading dimension of the output array pt.

ldpt≥ max(1, n) if vect = 'P' or 'B', ldpt≥ 1 otherwise.

ldc The leading dimension of the array c.

ldc≥ max(1, m) if ncc > 0; ldc≥ 1 if ncc = 0.

Output Parameters

ab Overwritten by values generated during the reduction.

d Array, size at least max(1, min(m, n)). Contains the diagonal elements of
the matrix B.

e Array, size at least max(1, min(m, n) - 1).

Contains the off-diagonal elements of B.

q, pt Arrays:
qsize max(1, ldq*m) contains the output m-by-m matrix Q.

847
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

psize max(1, ldpt*n) contains the output n-by-n matrix PT.

c Overwritten by the product QH*C.

c is not referenced if ncc = 0.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed matrices Q, B, and P satisfy Q*B*PH = A + E, where ||E||2 = c(n)ε ||A||2, c(n) is a
modestly increasing function of n, and ε is the machine precision.
If m = n, the total number of floating-point operations for real flavors is approximately the sum of:

6n2(kl + ku) if vect = 'N' and ncc = 0,

3*n2*ncc*(kl + ku - 1)/(kl + ku) if C is updated, and
3*n3*(kl + ku - 1)/(kl + ku) if either Q or PH is generated (double this if both).
To estimate the number of operations for complex flavors, use the same formulas with the coefficients 20
and 10 (instead of 6 and 3).

?orgbr
Generates the real orthogonal matrix Q or PT
determined by ?gebrd.

Syntax
lapack_int LAPACKE_sorgbr (int matrix_layout, char vect, lapack_int m, lapack_int n,
lapack_int k, float* a, lapack_int lda, const float* tau);
lapack_int LAPACKE_dorgbr (int matrix_layout, char vect, lapack_int m, lapack_int n,
lapack_int k, double* a, lapack_int lda, const double* tau);

Include Files
• mkl.h

Description

The routine generates the whole or part of the orthogonal matrices Q and PT formed by the routines gebrd.
Use this routine after a call to sgebrd/dgebrd. All valid combinations of arguments are described in Input
parameters. In most cases you need the following:
To compute the whole m-by-m matrix Q:

LAPACKE_?orgbr(matrix_layout, 'Q', m, m, n, a, lda, tau )

(note that the array a must have at least m columns).
To form the n leading columns of Q if m > n:

LAPACKE_?orgbr(matrix_layout, 'Q', m, n, n, a, lda, tau )

848
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
To compute the whole n-by-n matrix PT:

LAPACKE_?orgbr(matrix_layout, 'P', n, n, m, a, lda, tau )

(note that the array a must have at least n rows).
To form the m leading rows of PT if m < n:

LAPACKE_?orgbr(matrix_layout, 'P', m, n, m, a, lda, tau )

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

vect Must be 'Q' or 'P'.

If vect = 'Q', the routine generates the matrix Q.

If vect = 'P', the routine generates the matrix PT.

m, n The number of rows (m) and columns (n) in the matrix Q or PT to be

returned (m≥ 0, n≥ 0).

If vect = 'Q', m ≥ n ≥ min(m, k).

If vect = 'P', n ≥ m ≥ min(n, k).

k If vect = 'Q', the number of columns in the original m-by-k matrix

reduced by gebrd.
If vect = 'P', the number of rows in the original k-by-n matrix reduced
by gebrd.

a Array, size at least lda*n for column major layout and lda*m for row major
layout. The vectors which define the elementary reflectors, as returned by
gebrd.

lda The leading dimension of the array a. lda ≥ max(1, m) for column major
layout and at least max(1, n) for row major layout .

tau Array, size min (m,k) if vect = 'Q', min (n,k) if vect = 'P'.
Scalar factor of the elementary reflector H(i) or G(i), which determines Q
and PT as returned by gebrd in the array tauq or taup.

Output Parameters

a Overwritten by the orthogonal matrix Q or PT (or the leading rows or

columns thereof) as specified by vect, m, and n.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed matrix Q differs from an exactly orthogonal matrix by a matrix E such that ||E||2 = O(ε).

The approximate numbers of floating-point operations for the cases listed in Description are as follows:

849
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

To form the whole of Q:

(4/3)*n*(3m2 - 3m*n + n2) if m > n;
(4/3)*m3 if m≤n.
To form the n leading columns of Q when m > n:

(2/3)*n2*(3m - n) if m > n.
To form the whole of PT:
(4/3)*n3 if m≥n;
(4/3)*m*(3n2 - 3m*n + m2) if m < n.
To form the m leading columns of PT when m < n:

(2/3)*n2*(3m - n) if m > n.
The complex counterpart of this routine is ungbr.

?ormbr
Multiplies an arbitrary real matrix by the real
orthogonal matrix Q or PT determined by ?gebrd.

Syntax
lapack_int LAPACKE_sormbr (int matrix_layout, char vect, char side, char trans,
lapack_int m, lapack_int n, lapack_int k, const float* a, lapack_int lda, const float*
tau, float* c, lapack_int ldc);
lapack_int LAPACKE_dormbr (int matrix_layout, char vect, char side, char trans,
lapack_int m, lapack_int n, lapack_int k, const double* a, lapack_int lda, const
double* tau, double* c, lapack_int ldc);

Include Files
• mkl.h

Description

Given an arbitrary real matrix C, this routine forms one of the matrix products Q*C, QT*C, C*Q, C*QT, P*C,
PT*C, C*P, C*PT, where Q and P are orthogonal matrices computed by a call to gebrd. The routine overwrites
the product on C.

Input Parameters
In the descriptions below, r denotes the order of Q or PT:
If side = 'L', r = m; if side = 'R', r = n.

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

vect Must be 'Q' or 'P'.

If vect = 'Q', then Q or QT is applied to C.

If vect = 'P', then P or PT is applied to C.

side Must be 'L' or 'R'.

850
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If side = 'L', multipliers are applied to C from the left.

If side = 'R', they are applied to C from the right.

trans Must be 'N' or 'T'.

If trans = 'N', then Q or P is applied to C.

If trans = 'T', then QT or PT is applied to C.

m The number of rows in C.

n The number of columns in C.

k One of the dimensions of A in ?gebrd:

If vect = 'Q', the number of columns in A;

If vect = 'P', the number of rows in A.

Constraints: m≥ 0, n≥ 0, k≥ 0.

a, c Arrays:
a is the array a as returned by ?gebrd.

The size of a depends on the value of the matrix_layout, vect, and side
parameters:

matrix_layout vect side size

column major 'Q' - max(1, lda*k)

column major 'P' 'L' max(1, lda*m)

column major 'P' 'R' max(1, lda*n)

row major 'Q' 'L' max(1, lda*m)

row major 'Q' 'R' max(1, lda*n)

row major 'P' - max(1, lda*k)

c(size max(1, ldc*n) for column major layout and max(1, ldc*m) for row
major layout) holds the matrix C.

lda The leading dimension of a. Constraints:

lda≥ max(1, r) for column major layout and at least max(1, k) for row
major layout if vect = 'Q';

lda≥ max(1, min(r,k)) for column major layout and at least max(1, r) for
row major layout if vect = 'P'.

ldc The leading dimension of c; ldc≥ max(1, m) for column major layout and
ldc≥ max(1, n) for row major layout .

tau Array, size at least max (1, min(r, k)).

For vect = 'Q', the array tauq as returned by ?gebrd. For vect = 'P',
the array taup as returned by ?gebrd.

851
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

c Overwritten by the product Q*C, QT*C, C*Q, C*Q,T, P*C, PT*C, C*P, or C*PT,
as specified by vect, side, and trans.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed product differs from the exact product by a matrix E such that ||E||2 = O(ε)*||C||2.

The total number of floating-point operations is approximately

2*n*k(2*m - k) if side = 'L' and m≥k;
2*m*k(2*n - k) if side = 'R' and n≥k;
2*m2*n if side = 'L' and m < k;
2*n2*m if side = 'R' and n < k.
The complex counterpart of this routine is unmbr.

?ungbr
Generates the complex unitary matrix Q or PH
determined by ?gebrd.

Syntax
lapack_int LAPACKE_cungbr (int matrix_layout, char vect, lapack_int m, lapack_int n,
lapack_int k, lapack_complex_float* a, lapack_int lda, const lapack_complex_float*
tau);
lapack_int LAPACKE_zungbr (int matrix_layout, char vect, lapack_int m, lapack_int n,
lapack_int k, lapack_complex_double* a, lapack_int lda, const lapack_complex_double*
tau);

Include Files
• mkl.h

Description
The routine generates the whole or part of the unitary matrices Q and PH formed by the routines gebrd. Use
this routine after a call to cgebrd/zgebrd. All valid combinations of arguments are described in Input
Parameters; in most cases you need the following:
To compute the whole m-by-m matrix Q, use:

LAPACKE_?ungbr(matrix_layout, 'Q', m, m, n, a, lda, tau)

(note that the arraya must have at least m columns).
To form the n leading columns of Q if m > n, use:

LAPACKE_?ungbr(matrix_layout, 'Q', m, n, n, a, lda, tau)

852
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
To compute the whole n-by-n matrix PH, use:

LAPACKE_?ungbr(matrix_layout, 'P', n, n, m, a, lda, tau)

(note that the array a must have at least n rows).
To form the m leading rows of PH if m < n, use:

LAPACKE_?ungbr(matrix_layout, 'P', m, m, n, a, lda, tau)

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

vect Must be 'Q' or 'P'.

If vect = 'Q', the routine generates the matrix Q.

If vect = 'P', the routine generates the matrix PH.

m The number of required rows of Q or PH.

n The number of required columns of Q or PH.

k One of the dimensions of A in ?gebrd:

If vect = 'Q', the number of columns in A;

If vect = 'P', the number of rows in A.

Constraints: m≥ 0, n≥ 0, k≥ 0.

For vect = 'Q': k≤n≤m if m > k, or m = n if m≤k.

For vect = 'P': k≤m≤n if n > k, or m = n if n≤k.

a Arrays:
a, size at least lda*n for column major layout and lda*m for row major
layout, is the array a as returned by ?gebrd.

lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.

tau For vect = 'Q', the array tauq as returned by ?gebrd. For vect = 'P',
the array taup as returned by ?gebrd.

The dimension of tau must be at least max(1, min(m, k)) for vect = 'Q',
or max(1, min(m, k)) for vect = 'P'.

Output Parameters

a Overwritten by the orthogonal matrix Q or PT (or the leading rows or

columns thereof) as specified by vect, m, and n.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

853
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Application Notes
The computed matrix Q differs from an exactly orthogonal matrix by a matrix E such that ||E||2 = O(ε).

The approximate numbers of possible floating-point operations are listed below:

To compute the whole matrix Q:
(16/3)n(3m2 - 3m*n + n2) if m > n;
(16/3)m3 if m≤n.
To form the n leading columns of Q when m > n:

(8/3)n2(3m - n2).
To compute the whole matrix PH:
(16/3)n3 if m≥n;
(16/3)m(3n2 - 3m*n + m2) if m < n.
To form the m leading columns of PH when m < n:

(8/3)n2(3m - n2) if m > n.

The real counterpart of this routine is orgbr.

?unmbr
Multiplies an arbitrary complex matrix by the unitary
matrix Q or P determined by ?gebrd.

Syntax
lapack_int LAPACKE_cunmbr (int matrix_layout, char vect, char side, char trans,
lapack_int m, lapack_int n, lapack_int k, const lapack_complex_float* a, lapack_int
lda, const lapack_complex_float* tau, lapack_complex_float* c, lapack_int ldc);
lapack_int LAPACKE_zunmbr (int matrix_layout, char vect, char side, char trans,
lapack_int m, lapack_int n, lapack_int k, const lapack_complex_double* a, lapack_int
lda, const lapack_complex_double* tau, lapack_complex_double* c, lapack_int ldc);

Include Files
• mkl.h

Description

Given an arbitrary complex matrix C, this routine forms one of the matrix products Q*C, QH*C, C*Q, C*QH,
P*C, PH*C, C*P, or C*PH, where Q and P are unitary matrices computed by a call to gebrd/gebrd. The routine
overwrites the product on C.

Input Parameters
In the descriptions below, r denotes the order of Q or PH:
If side = 'L', r = m; if side = 'R', r = n.

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

vect Must be 'Q' or 'P'.

854
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If vect = 'Q', then Q or QH is applied to C.

If vect = 'P', then P or PH is applied to C.

side Must be 'L' or 'R'.

If side = 'L', multipliers are applied to C from the left.

If side = 'R', they are applied to C from the right.

trans Must be 'N' or 'C'.

If trans = 'N', then Q or P is applied to C.

If trans = 'C', then QH or PH is applied to C.

m The number of rows in C.

n The number of columns in C.

k One of the dimensions of A in ?gebrd:

If vect = 'Q', the number of columns in A;

If vect = 'P', the number of rows in A.

Constraints: m≥ 0, n≥ 0, k≥ 0.

a, c Arrays:
a is the array a as returned by ?gebrd.

The size of a depends on the value of the matrix_layout, vect, and side
parameters:

matrix_layout vect side size

column major 'Q' - max(1, lda*k)

column major 'P' 'L' max(1, lda*m)

column major 'P' 'R' max(1, lda*n)

row major 'Q' 'L' max(1, lda*m)

row major 'Q' 'R' max(1, lda*n)

row major 'P' - max(1, lda*k)

c(size max(1, ldc*n) for column major layout and max(1, ldc*m for row
major layout) holds the matrix C.

lda The leading dimension of a. Constraints:

lda≥ max(1, r) for column major layout and at least max(1, k) for row
major layout if vect = 'Q';

lda≥ max(1, min(r,k)) for column major layout and at least max(1, r) for
row major layout if vect = 'P'.

ldc The leading dimension of c; ldc≥ max(1, m).

855
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

tau Array, size at least max (1, min(r, k)).

For vect = 'Q', the array tauq as returned by ?gebrd. For vect = 'P',
the array taup as returned by ?gebrd.

Output Parameters

c Overwritten by the product Q*C, QH*C, C*Q, C*QH, P*C, PH*C, C*P, or
C*PH, as specified by vect, side, and trans.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed product differs from the exact product by a matrix E such that ||E||2 = O(ε)*||C||2.

The total number of floating-point operations is approximately

8*n*k(2*m - k) if side = 'L' and m≥k;
8*m*k(2*n - k) if side = 'R' and n≥k;
8*m2*n if side = 'L' and m < k;
8*n2*m if side = 'R' and n < k.
The real counterpart of this routine is ormbr.

?bdsqr
Computes the singular value decomposition of a
general matrix that has been reduced to bidiagonal
form.

Syntax
lapack_int LAPACKE_sbdsqr( int matrix_layout, char uplo, lapack_int n, lapack_int ncvt,
lapack_int nru, lapack_int ncc, float* d, float* e, float* vt, lapack_int ldvt, float*
u, lapack_int ldu, float* c, lapack_int ldc );
lapack_int LAPACKE_dbdsqr( int matrix_layout, char uplo, lapack_int n, lapack_int ncvt,
lapack_int nru, lapack_int ncc, double* d, double* e, double* vt, lapack_int ldvt,
double* u, lapack_int ldu, double* c, lapack_int ldc );
lapack_int LAPACKE_cbdsqr( int matrix_layout, char uplo, lapack_int n, lapack_int ncvt,
lapack_int nru, lapack_int ncc, float* d, float* e, lapack_complex_float* vt,
lapack_int ldvt, lapack_complex_float* u, lapack_int ldu, lapack_complex_float* c,
lapack_int ldc );
lapack_int LAPACKE_zbdsqr( int matrix_layout, char uplo, lapack_int n, lapack_int ncvt,
lapack_int nru, lapack_int ncc, double* d, double* e, lapack_complex_double* vt,
lapack_int ldvt, lapack_complex_double* u, lapack_int ldu, lapack_complex_double* c,
lapack_int ldc );

Include Files
• mkl.h

856
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description

The routine computes the singular values and, optionally, the right and/or left singular vectors from the
Singular Value Decomposition (SVD) of a real n-by-n (upper or lower) bidiagonal matrix B using the implicit
zero-shift QR algorithm. The SVD of B has the form B = Q*S*PH where S is the diagonal matrix of singular
values, Q is an orthogonal matrix of left singular vectors, and P is an orthogonal matrix of right singular
vectors. If left singular vectors are requested, this subroutine actually returns U *Q instead of Q, and, if right
singular vectors are requested, this subroutine returns PH *VT instead of PH, for given real/complex input
matrices U and VT. When U and VT are the orthogonal/unitary matrices that reduce a general matrix A to
bidiagonal form: A = U*B*VT, as computed by ?gebrd, then

A = (U*Q)*S*(PH*VT)
is the SVD of A. Optionally, the subroutine may also compute QH *C for a given real/complex input matrix C.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

If uplo = 'U', B is an upper bidiagonal matrix.

If uplo = 'L', B is a lower bidiagonal matrix.

n The order of the matrix B (n≥ 0).

ncvt The number of columns of the matrix VT, that is, the number of right
singular vectors (ncvt≥ 0).

Set ncvt = 0 if no right singular vectors are required.

nru The number of rows in U, that is, the number of left singular vectors (nru≥
0).
Set nru = 0 if no left singular vectors are required.

ncc The number of columns in the matrix C used for computing the product
QH*C (ncc≥ 0). Set ncc = 0 if no matrix C is supplied.

d, e Arrays:
d contains the diagonal elements of B.
The size of d must be at least max(1, n).

e contains the (n-1) off-diagonal elements of B.

The size of e must be at least max(1, n - 1).

vt, u, c Arrays:
vt, size max(1, ldvt*ncvt) for column major layout and max(1, ldvt*n)
for row major layout, contains an n-by-ncvt matrix VT.
vt is not referenced if ncvt = 0.

u, size max(1, ldu*n) for column major layout and max(1, ldu*nru) for
row major layout, contains an nru by n matrix U.
u is not referenced if nru = 0.

857
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

c, size max(1, ldc*ncc) for column major layout and max(1, ldc*n) for
row major layout, contains the n-by-ncc matrix C for computing the
product QH*C.

ldvt The leading dimension of vt. Constraints:

ldvt≥ max(1, n) if ncvt > 0 for column major layout and ldvt≥ max(1,
ncvt) for row major layout;
ldvt≥ 1 if ncvt = 0.

ldu The leading dimension of u. Constraint:

ldu≥ max(1, nru) for column major layout and ldu≥ max(1, n) for row
major layout .

ldc The leading dimension of c. Constraints:

ldc≥ max(1, n) if ncc > 0 for column major layout and ldc≥ max(1, ncc)
for row major layout; ldc≥ 1 otherwise.

Output Parameters

d On exit, if info = 0, overwritten by the singular values in decreasing order

(see info).

e On exit, if info = 0, e is destroyed. See also info below.

c Overwritten by the product QH*C.

vt On exit, this array is overwritten by PH *VT. Not referenced if ncvt = 0.

u On exit, this array is overwritten by U *Q. Not referenced if nru = 0.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info > 0,

If ncvt = nru = ncc = 0,

• info = 1, a split was marked by a positive value in e

• info = 2, the current block of z not diagonalized after 100*n iterations (in the inner while loop)
• info = 3, termination criterion of the outer while loop is not met (the program created more than n
unreduced blocks).
In all other cases when ncvt, nru, or ncc > 0, the algorithm did not converge; d and e contain the elements
of a bidiagonal matrix that is orthogonally similar to the input matrix B; if info = i, i elements of e have
not converged to zero.

Application Notes
Each singular value and singular vector is computed to high relative accuracy. However, the reduction to
bidiagonal form (prior to calling the routine) may decrease the relative accuracy in the small singular values
of the original matrix if its singular values vary widely in magnitude.
If si is an exact singular value of B, and si is the corresponding computed value, then

858
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
|si - σi| ≤p*(m,n)*ε*σi
where p(m, n) is a modestly increasing function of m and n, and ε is the machine precision.
If only singular values are computed, they are computed more accurately than when some singular vectors
are also computed (that is, the function p(m, n) is smaller).
If ui is the corresponding exact left singular vector of B, and wi is the corresponding computed left singular
vector, then the angle θ(ui, wi) between them is bounded as follows:
θ(ui, wi) ≤p(m,n)*ε / min i≠j(|σi - σj|/|σi + σj|).
Here mini≠j(|σi - σj|/|σi + σj|) is the relative gap between σi and the other singular values. A similar
error bound holds for the right singular vectors.
The total number of real floating-point operations is roughly proportional to n2 if only the singular values are
computed. About 6n2*nru additional operations (12n2*nru for complex flavors) are required to compute the
left singular vectors and about 6n2*ncvt operations (12n2*ncvt for complex flavors) to compute the right
singular vectors.

?bdsdc
Computes the singular value decomposition of a real
bidiagonal matrix using a divide and conquer method.

Syntax
lapack_int LAPACKE_sbdsdc (int matrix_layout, char uplo, char compq, lapack_int n,
float* d, float* e, float* u, lapack_int ldu, float* vt, lapack_int ldvt, float* q,
lapack_int* iq);
lapack_int LAPACKE_dbdsdc (int matrix_layout, char uplo, char compq, lapack_int n,
double* d, double* e, double* u, lapack_int ldu, double* vt, lapack_int ldvt, double* q,
lapack_int* iq);

Include Files
• mkl.h

Description
The routine computes the Singular Value Decomposition (SVD) of a real n-by-n (upper or lower) bidiagonal
matrix B: B = U*Σ*VT, using a divide and conquer method, where Σ is a diagonal matrix with non-negative
diagonal elements (the singular values of B), and U and V are orthogonal matrices of left and right singular
vectors, respectively. ?bdsdc can be used to compute all singular values, and optionally, singular vectors or
singular vectors in compact form.
This rotuine
uses ?lasd0, ?lasd1, ?lasd2, ?lasd3, ?lasd4, ?lasd5, ?lasd6, ?lasd7, ?lasd8, ?lasd9, ?lasda,
?lasdq, ?lasdt.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

If uplo = 'U', B is an upper bidiagonal matrix.

If uplo = 'L', B is a lower bidiagonal matrix.

859
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

compq Must be 'N', 'P', or 'I'.

If compq = 'N', compute singular values only.

If compq = 'P', compute singular values and compute singular vectors in

compact form.
If compq = 'I', compute singular values and singular vectors.

n The order of the matrix B (n ≥ 0).

d, e Arrays:
d contains the n diagonal elements of the bidiagonal matrix B. The size of d
must be at least max(1, n).
e contains the off-diagonal elements of the bidiagonal matrix B. The size of
e must be at least max(1, n).

ldu The leading dimension of the output array u; ldu≥ 1.

If singular vectors are desired, then ldu≥ max(1, n), regardless of the value
of matrix_layout.

ldvt The leading dimension of the output array vt; ldvt≥ 1.

If singular vectors are desired, then ldvt≥ max(1, n), regardless of the value
of matrix_layout.

Output Parameters

d If info = 0, overwritten by the singular values of B.

e On exit, e is overwritten.

u, vt, q Arrays: u(size ldun), vt(size ldvtn), q(size ≥n(11 + 2smlsiz +

8*int(log2(n/(smlsiz+1)))) where smlsiz is returned by ilaenv and is
equal to maximum size of the subproblems at the bottom of the
computation tree )..
If compq = 'I', then on exit u contains the left singular vectors of the
bidiagonal matrix B, unless info≠ 0 (seeinfo). For other values of compq, u
is not referenced.
if compq = 'I', then on exit vtT contains the right singular vectors of the
bidiagonal matrix B, unless info≠ 0 (seeinfo). For other values of compq,
vt is not referenced.
If compq = 'P', then on exit, if info = 0, q and iq contain the left and
right singular vectors in a compact form. Specifically, q contains all the
float (for sbdsdc) or double (for dbdsdc) data for singular vectors. For
other values of compq, q is not referenced.

iq Array: iq(size ≥n(3 + 3int(log2(n/(smlsiz+1)))) where smlsiz is

returned by ilaenv and is equal to maximum size of the subproblems at
the bottom of the computation tree.).

860
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If compq = 'P', then on exit, if info = 0, q and iq contain the left and
right singular vectors in a compact form. Specifically, iq contains all the
lapack_int data for singular vectors. For other values of compq, iq is not
referenced.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, the algorithm failed to compute a singular value. The update process of divide and conquer
failed.

Symmetric Eigenvalue Problems: LAPACK Computational Routines

Symmetric eigenvalue problems are posed as follows: given an n-by-n real symmetric or complex
Hermitian matrix A, find the eigenvalues λ and the corresponding eigenvectors z that satisfy the equation
Az = λz (or, equivalently, zHA = λzH).
In such eigenvalue problems, all n eigenvalues are real not only for real symmetric but also for complex
Hermitian matrices A, and there exists an orthonormal system of n eigenvectors. If A is a symmetric or
Hermitian positive-definite matrix, all eigenvalues are positive.
To solve a symmetric eigenvalue problem with LAPACK, you usually need to reduce the matrix to tridiagonal
form and then solve the eigenvalue problem with the tridiagonal matrix obtained. LAPACK includes routines
for reducing the matrix to a tridiagonal form by an orthogonal (or unitary) similarity transformation A =
QTQH as well as for solving tridiagonal symmetric eigenvalue problems. These routines are listed in Table
"Computational Routines for Solving Symmetric Eigenvalue Problems".
There are different routines for symmetric eigenvalue problems, depending on whether you need all
eigenvectors or only some of them or eigenvalues only, whether the matrix A is positive-definite or not, and
so on.
These routines are based on three primary algorithms for computing eigenvalues and eigenvectors of
symmetric problems: the divide and conquer algorithm, the QR algorithm, and bisection followed by inverse
iteration. The divide and conquer algorithm is generally more efficient and is recommended for computing all
eigenvalues and eigenvectors. Furthermore, to solve an eigenvalue problem using the divide and conquer
algorithm, you need to call only one routine. In general, more than one routine has to be called if the QR
algorithm or bisection followed by inverse iteration is used.
Computational Routines for Solving Symmetric Eigenvalue Problems
Operation Real symmetric matrices Complex Hermitian
matrices

Reduce to tridiagonal form A = QTQH (full sytrd hetrd

storage)

Reduce to tridiagonal form A = QTQH sptrd hptrd

(packed storage)

Reduce to tridiagonal form A = QTQH (band sbtrd hbtrd

storage).

Generate matrix Q (full storage) orgtr ungtr

Generate matrix Q (packed storage) opgtr upgtr

Apply matrix Q (full storage) ormtr unmtr

861
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Operation Real symmetric matrices Complex Hermitian

matrices

Apply matrix Q (packed storage) opmtr upmtr

Find all eigenvalues of a tridiagonal matrix sterf

Find all eigenvalues and eigenvectors of a steqr stedc steqr stedc

tridiagonal matrix T

Find all eigenvalues and eigenvectors of a pteqr pteqr

tridiagonal positive-definite matrix T.

Find selected eigenvalues of a tridiagonal stebz stegr stegr

matrix T

Find selected eigenvectors of a tridiagonal stein stegr stein stegr

matrix T

Find selected eigenvalues and eigenvectors stemr stemr

of f a real symmetric tridiagonal matrix T

Compute the reciprocal condition numbers disna disna

for the eigenvectors

?sytrd
Reduces a real symmetric matrix to tridiagonal form.

Syntax
lapack_int LAPACKE_ssytrd (int matrix_layout, char uplo, lapack_int n, float* a,
lapack_int lda, float* d, float* e, float* tau);
lapack_int LAPACKE_dsytrd (int matrix_layout, char uplo, lapack_int n, double* a,
lapack_int lda, double* d, double* e, double* tau);

Include Files
• mkl.h

Description

The routine reduces a real symmetric matrix A to symmetric tridiagonal form T by an orthogonal similarity
transformation: A = Q*T*QT. The orthogonal matrix Q is not formed explicitly but is represented as a
product of n-1 elementary reflectors. Routines are provided for working with Q in this representation (see
Application Notes below).

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

If uplo = 'U', a stores the upper triangular part of A.

If uplo = 'L', a stores the lower triangular part of A.

862
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
n The order of the matrix A (n≥ 0).

a a (size max(1, lda*n)) is an array containing either upper or lower

triangular part of the matrix A, as specified by uplo. If uplo = 'U', the
leading n-by-n upper triangular part of a contains the upper triangular part
of the matrix A, and the strictly lower triangular part of A is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of a contains the
lower triangular part of the matrix A, and the strictly upper triangular part
of A is not referenced.

lda The leading dimension of a; at least max(1, n).

Output Parameters

a On exit,
if uplo = 'U', the diagonal and first superdiagonal of A are overwritten by
the corresponding elements of the tridiagonal matrix T, and the elements
above the first superdiagonal, with the array tau, represent the orthogonal
matrix Q as a product of elementary reflectors;
if uplo = 'L', the diagonal and first subdiagonal of A are overwritten by
the corresponding elements of the tridiagonal matrix T, and the elements
below the first subdiagonal, with the array tau, represent the orthogonal
matrix Q as a product of elementary reflectors.

d, e, tau Arrays:
d contains the diagonal elements of the matrix T.
The size of d must be at least max(1, n).
e contains the off-diagonal elements of T.
The size of e must be at least max(1, n-1).
tau stores (n-1) scalars that define elementary reflectors in decomposition
of the orthogonal matrix Q in a product of n-1 elementary reflectors. tau(n)
is used as workspace.
The size of tau must be at least max(1, n).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed matrix T is exactly similar to a matrix A+E, where ||E||2 = c(n)*ε*||A||2, c(n) is a
modestly increasing function of n, and ε is the machine precision.
The approximate number of floating-point operations is (4/3)n3.
After calling this routine, you can call the following:

orgtr to form the computed matrix Q explicitly

ormtr to multiply a real matrix by Q.

863
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The complex counterpart of this routine is ?hetrd.

?orgtr
Generates the real orthogonal matrix Q determined
by ?sytrd.

Syntax
lapack_int LAPACKE_sorgtr (int matrix_layout, char uplo, lapack_int n, float* a,
lapack_int lda, const float* tau);
lapack_int LAPACKE_dorgtr (int matrix_layout, char uplo, lapack_int n, double* a,
lapack_int lda, const double* tau);

Include Files
• mkl.h

Description

The routine explicitly generates the n-by-n orthogonal matrix Q formed by ?sytrd when reducing a real
symmetric matrix A to tridiagonal form: A = Q*T*QT. Use this routine after a call to ?sytrd.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Use the same uplo as supplied to ?sytrd.

n The order of the matrix Q (n≥ 0).

a, tau Arrays:
a (size max(1, lda*n)) is the array a as returned by ?sytrd.

tau is the array tau as returned by ?sytrd.

The size of tau must be at least max(1, n-1).

lda The leading dimension of a; at least max(1, n).

Output Parameters

a Overwritten by the orthogonal matrix Q.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed matrix Q differs from an exactly orthogonal matrix by a matrix E such that ||E||2 = O(ε),
where ε is the machine precision.

864
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The approximate number of floating-point operations is (4/3)n3.
The complex counterpart of this routine is ungtr.

?ormtr
Multiplies a real matrix by the real orthogonal matrix
Q determined by ?sytrd.

Syntax
lapack_int LAPACKE_sormtr (int matrix_layout, char side, char uplo, char trans,
lapack_int m, lapack_int n, const float* a, lapack_int lda, const float* tau, float* c,
lapack_int ldc);
lapack_int LAPACKE_dormtr (int matrix_layout, char side, char uplo, char trans,
lapack_int m, lapack_int n, const double* a, lapack_int lda, const double* tau, double*
c, lapack_int ldc);

Include Files
• mkl.h

Description

The routine multiplies a real matrix C by Q or QT, where Q is the orthogonal matrix Q formed by sytrd when
reducing a real symmetric matrix A to tridiagonal form: A = Q*T*QT. Use this routine after a call to ?sytrd.

Depending on the parameters side and trans, the routine can form one of the matrix products Q*C, QT*C,
C*Q, or C*QT (overwriting the result on C).

Input Parameters
In the descriptions below, r denotes the order of Q:
If side = 'L', r = m; if side = 'R', r = n.

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

side Must be either 'L' or 'R'.

If side = 'L', Q or QT is applied to C from the left.

If side = 'R', Q or QT is applied to C from the right.

uplo Must be 'U' or 'L'.

Use the same uplo as supplied to ?sytrd.

trans Must be either 'N' or 'T'.

If trans = 'N', the routine multiplies C by Q.

If trans = 'T', the routine multiplies C by QT.

m The number of rows in the matrix C (m≥ 0).

n The number of columns in C (n≥ 0).

a, c, tau a (size max(1, lda*r)) and tau are the arrays returned by ?sytrd.

The size of tau must be at least max(1, r-1).

865
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

c(size max(1, ldc*n) for column major layout and max(1, ldc*m) for row
major layout) contains the matrix C.

lda The leading dimension of a; lda≥ max(1, r).

ldc The leading dimension of c; ldc≥ max(1, m) for column major layout and
at least max(1, n) for row major layout .

Output Parameters

c Overwritten by the product Q*C, QT*C, C*Q, or C*QT (as specified by side
and trans).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed product differs from the exact product by a matrix E such that ||E||2 = O(ε)*||C||2.

The total number of floating-point operations is approximately 2*m2*n, if side = 'L', or 2*n2*m, if side =
'R'.
The complex counterpart of this routine is unmtr.

?hetrd
Reduces a complex Hermitian matrix to tridiagonal
form.

Syntax
lapack_int LAPACKE_chetrd( int matrix_layout, char uplo, lapack_int n,
lapack_complex_float* a, lapack_int lda, float* d, float* e, lapack_complex_float*
tau );
lapack_int LAPACKE_zhetrd( int matrix_layout, char uplo, lapack_int n,
lapack_complex_double* a, lapack_int lda, double* d, double* e, lapack_complex_double*
tau );

Include Files
• mkl.h

Description

The routine reduces a complex Hermitian matrix A to symmetric tridiagonal form T by a unitary similarity
transformation: A = Q*T*QH. The unitary matrix Q is not formed explicitly but is represented as a product of
n-1 elementary reflectors. Routines are provided to work with Q in this representation. (They are described
later in this topic.)

866
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

If uplo = 'U', a stores the upper triangular part of A.

If uplo = 'L', a stores the lower triangular part of A.

n The order of the matrix A (n≥ 0).

a a (size max(1, lda*n)) is an array containing either upper or lower

lda The leading dimension of a; at least max(1, n).

Output Parameters

d, e Arrays:
d contains the diagonal elements of the matrix T.
The dimension of d must be at least max(1, n).
e contains the off-diagonal elements of T.
The dimension of e must be at least max(1, n-1).

tau Array, size at least max(1, n-1). Stores (n-1) scalars that define elementary
reflectors in decomposition of the unitary matrix Q in a product of n-1
elementary reflectors.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

867
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Application Notes
The computed matrix T is exactly similar to a matrix A + E, where ||E||2 = c(n)*ε*||A||2, c(n) is a
modestly increasing function of n, and ε is the machine precision.
The approximate number of floating-point operations is (16/3)n3.

After calling this routine, you can call the following:

ungtr to form the computed matrix Q explicitly

unmtr to multiply a complex matrix by Q.

The real counterpart of this routine is ?sytrd.

?ungtr
Generates the complex unitary matrix Q determined
by ?hetrd.

Syntax
lapack_int LAPACKE_cungtr (int matrix_layout, char uplo, lapack_int n,
lapack_complex_float* a, lapack_int lda, const lapack_complex_float* tau);
lapack_int LAPACKE_zungtr (int matrix_layout, char uplo, lapack_int n,
lapack_complex_double* a, lapack_int lda, const lapack_complex_double* tau);

Include Files
• mkl.h

Description

The routine explicitly generates the n-by-n unitary matrix Q formed by ?hetrd when reducing a complex
Hermitian matrix A to tridiagonal form: A = Q*T*QH. Use this routine after a call to ?hetrd.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

Use the same uplo as supplied to ?hetrd.

n The order of the matrix Q (n≥ 0).

a, tau Arrays:
a (size max(1, lda*n)) is the array a as returned by ?hetrd.

tau is the array tau as returned by ?hetrd.

The dimension of tau must be at least max(1, n-1).

lda The leading dimension of a; at least max(1, n).

Output Parameters

a Overwritten by the unitary matrix Q.

868
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed matrix Q differs from an exactly unitary matrix by a matrix E such that ||E||2 = O(ε), where
ε is the machine precision.
The approximate number of floating-point operations is (16/3)n3.

The real counterpart of this routine is orgtr.

?unmtr
Multiplies a complex matrix by the complex unitary
matrix Q determined by ?hetrd.

Syntax
lapack_int LAPACKE_cunmtr (int matrix_layout, char side, char uplo, char trans,
lapack_int m, lapack_int n, const lapack_complex_float* a, lapack_int lda, const
lapack_complex_float* tau, lapack_complex_float* c, lapack_int ldc);
lapack_int LAPACKE_zunmtr (int matrix_layout, char side, char uplo, char trans,
lapack_int m, lapack_int n, const lapack_complex_double* a, lapack_int lda, const
lapack_complex_double* tau, lapack_complex_double* c, lapack_int ldc);

Include Files
• mkl.h

Description

The routine multiplies a complex matrix C by Q or QH, where Q is the unitary matrix Q formed by ?hetrd
when reducing a complex Hermitian matrix A to tridiagonal form: A = Q*T*QH. Use this routine after a call
to ?hetrd.

Depending on the parameters side and trans, the routine can form one of the matrix products Q*C, QH*C,
C*Q, or C*QH (overwriting the result on C).

Input Parameters
In the descriptions below, r denotes the order of Q:
If side = 'L', r = m; if side = 'R', r = n.

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

side Must be either 'L' or 'R'.

If side = 'L', Q or QH is applied to C from the left.

If side = 'R', Q or QH is applied to C from the right.

uplo Must be 'U' or 'L'.

869
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Use the same uplo as supplied to ?hetrd.

trans Must be either 'N' or 'T'.

If trans = 'N', the routine multiplies C by Q.

If trans = 'C', the routine multiplies C by QH.

m The number of rows in the matrix C (m≥ 0).

n The number of columns in C (n≥ 0).

a, c, tau a (size max(1, lda*r)) and tau are the arrays returned by ?hetrd.

The dimension of tau must be at least max(1, r-1).

c(size max(1, ldc*n) for column major layout and max(1, ldc*m) for row
major layout) contains the matrix C.

lda The leading dimension of a; lda≥ max(1, r).

ldc The leading dimension of c; ldc≥ max(1, n) for column major layout and
ldc≥ max(1, m) for row major layout .

Output Parameters

c Overwritten by the product Q*C, QH*C, C*Q, or C*QH (as specified by side
and trans).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed product differs from the exact product by a matrix E such that ||E||2 = O(ε)*||C||2, where
ε is the machine precision.
The total number of floating-point operations is approximately 8*m2*n if side = 'L' or 8*n2*m if side =
'R'.
The real counterpart of this routine is ormtr.

?sptrd
Reduces a real symmetric matrix to tridiagonal form
using packed storage.

Syntax
lapack_int LAPACKE_ssptrd (int matrix_layout, char uplo, lapack_int n, float* ap,
float* d, float* e, float* tau);
lapack_int LAPACKE_dsptrd (int matrix_layout, char uplo, lapack_int n, double* ap,
double* d, double* e, double* tau);

Include Files
• mkl.h

870
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description

The routine reduces a packed real symmetric matrix A to symmetric tridiagonal form T by an orthogonal
similarity transformation: A = Q*T*QT. The orthogonal matrix Q is not formed explicitly but is represented as
a product of n-1 elementary reflectors. Routines are provided for working with Q in this representation. See
Application Notes below for details.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

If uplo = 'U', ap stores the packed upper triangle of A.

If uplo = 'L', ap stores the packed lower triangle of A.

n The order of the matrix A (n≥ 0).

ap Array, size at least max(1, n(n+1)/2). Contains either upper or lower

triangle of A (as specified by uplo) in the packed form described in Matrix
Storage Schemes.

Output Parameters

ap Overwritten by the tridiagonal matrix T and details of the orthogonal matrix

Q, as specified by uplo.

d, e, tau Arrays:
d contains the diagonal elements of the matrix T.
The dimension of d must be at least max(1, n).
e contains the off-diagonal elements of T.
The dimension of e must be at least max(1, n-1).
tau Stores (n-1) scalars that define elementary reflectors in decomposition
of the matrix Q in a product of n-1 reflectors.

The dimension of tau must be at least max(1, n-1).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The matrix Q is represented as a product of n-1 elementary reflectors, as follows :
• If uplo = 'U', Q = H(n-1) ... H(2)H(1)

Each H(i) has the form

H(i) = I - tau*v*vT
where tau is a real scalar and v is a real vector with v(i+1:n) = 0 and v(i) = 1.

871
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

On exit, tau is stored in tau[i - 1], and v(1:i-1) is stored in AP, overwriting A(1:i-1, i+1).
• If uplo = 'L', Q = H(1)H(2) ... H(n-1)

Each H(i) has the form

H(i) = I - tau*v*vT
where tau is a real scalar and v is a real vector with v(1:i) = 0 and v(i+1) = 1.

On exit, tau is stored in tau[i - 1], and v(i+2:n) is stored in AP, overwriting A(i+2:n, i).

The computed matrix T is exactly similar to a matrix A+E, where ||E||2 = c(n)*ε*||A||2, c(n) is a
modestly increasing function of n, and ε is the machine precision. The approximate number of floating-point
operations is (4/3)n3.

After calling this routine, you can call the following:

opgtr to form the computed matrix Q explicitly

opmtr to multiply a real matrix by Q.

The complex counterpart of this routine is hptrd.

?opgtr
Generates the real orthogonal matrix Q determined
by ?sptrd.

Syntax
lapack_int LAPACKE_sopgtr (int matrix_layout, char uplo, lapack_int n, const float* ap,
const float* tau, float* q, lapack_int ldq);
lapack_int LAPACKE_dopgtr (int matrix_layout, char uplo, lapack_int n, const double*
ap, const double* tau, double* q, lapack_int ldq);

Include Files
• mkl.h

Description

The routine explicitly generates the n-by-n orthogonal matrix Q formed by sptrd when reducing a packed real
symmetric matrix A to tridiagonal form: A = Q*T*QT. Use this routine after a call to ?sptrd.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'. Use the same uplo as supplied to ?sptrd.

n The order of the matrix Q (n≥ 0).

ap, tau Arrays ap and tau, as returned by ?sptrd.

The size of ap must be at least max(1, n(n+1)/2).

The size of tau must be at least max(1, n-1).

ldq The leading dimension of the output array q; at least max(1, n).

872
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

q Array, size (size max(1, ldq*n)) .

Contains the computed matrix Q.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed matrix Q differs from an exactly orthogonal matrix by a matrix E such that ||E||2 = O(ε),
where ε is the machine precision.
The approximate number of floating-point operations is (4/3)n3.

The complex counterpart of this routine is upgtr.

?opmtr
Multiplies a real matrix by the real orthogonal matrix
Q determined by ?sptrd.

Syntax
lapack_int LAPACKE_sopmtr (int matrix_layout, char side, char uplo, char trans,
lapack_int m, lapack_int n, const float* ap, const float* tau, float* c, lapack_int
ldc);
lapack_int LAPACKE_dopmtr (int matrix_layout, char side, char uplo, char trans,
lapack_int m, lapack_int n, const double* ap, const double* tau, double* c, lapack_int
ldc);

Include Files
• mkl.h

Description

The routine multiplies a real matrix C by Q or QT, where Q is the orthogonal matrix Q formed by sptrd when
reducing a packed real symmetric matrix A to tridiagonal form: A = Q*T*QT. Use this routine after a call
to ?sptrd.

Depending on the parameters side and trans, the routine can form one of the matrix products Q*C, QT*C,
C*Q, or C*QT (overwriting the result on C).

Input Parameters
In the descriptions below, r denotes the order of Q:
If side = 'L', r = m; if side = 'R', r = n.

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

side Must be either 'L' or 'R'.

873
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If side = 'L', Q or QT is applied to C from the left.

If side = 'R', Q or QT is applied to C from the right.

uplo Must be 'U' or 'L'.

Use the same uplo as supplied to ?sptrd.

trans Must be either 'N' or 'T'.

If trans = 'N', the routine multiplies C by Q.

If trans = 'T', the routine multiplies C by QT.

m The number of rows in the matrix C (m≥ 0).

n The number of columns in C (n≥ 0).

ap, tau, c ap and tau are the arrays returned by ?sptrd.

The dimension of ap must be at least max(1, r(r+1)/2).

The dimension of tau must be at least max(1, r-1).
c(size max(1, ldc*n) for column major layout and max(1, ldc*m) for row
major layout) contains the matrix C.

ldc The leading dimension of c; ldc≥ max(1, n) for column major layout and
ldc≥ max(1, m) for row major layout .

Output Parameters

c Overwritten by the product Q*C, QT*C, C*Q, or C*QT (as specified by side
and trans).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed product differs from the exact product by a matrix E such that ||E||2 = O(ε) ||C||2, where
ε is the machine precision.
The total number of floating-point operations is approximately 2*m2*n if side = 'L', or 2*n2*m if side =
'R'.
The complex counterpart of this routine is upmtr.

?hptrd
Reduces a complex Hermitian matrix to tridiagonal
form using packed storage.

Syntax
lapack_int LAPACKE_chptrd( int matrix_layout, char uplo, lapack_int n,
lapack_complex_float* ap, float* d, float* e, lapack_complex_float* tau );

874
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_zhptrd( int matrix_layout, char uplo, lapack_int n,
lapack_complex_double* ap, double* d, double* e, lapack_complex_double* tau );

Include Files
• mkl.h

Description

The routine reduces a packed complex Hermitian matrix A to symmetric tridiagonal form T by a unitary
similarity transformation: A = Q*T*QH. The unitary matrix Q is not formed explicitly but is represented as a
product of n-1 elementary reflectors. Routines are provided for working with Q in this representation (see
Application Notes below).

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

If uplo = 'U', ap stores the packed upper triangle of A.

If uplo = 'L', ap stores the packed lower triangle of A.

n The order of the matrix A (n≥ 0).

ap Array, size at least max(1, n(n+1)/2). Contains either upper or lower

triangle of A (as specified by uplo) in the packed form described in "Matrix
Storage Schemes.

Output Parameters

ap Overwritten by the tridiagonal matrix T and details of the unitary matrix Q,

as specified by uplo.

d, e Arrays:
d contains the diagonal elements of the matrix T.
The size of d must be at least max(1, n).
e contains the off-diagonal elements of T.
The size of e must be at least max(1, n-1).

tau Array, size at least max(1, n-1). Stores (n-1) scalars that define elementary
reflectors in decomposition of the unitary matrix Q in a product of
reflectors.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

875
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Application Notes
The computed matrix T is exactly similar to a matrix A + E, where ||E||2 = c(n)*ε*||A||2, c(n) is a
modestly increasing function of n, and ε is the machine precision.
The approximate number of floating-point operations is (16/3)n3.

After calling this routine, you can call the following:

upgtr to form the computed matrix Q explicitly

upmtr to multiply a complex matrix by Q.

The real counterpart of this routine is sptrd.

?upgtr
Generates the complex unitary matrix Q determined
by ?hptrd.

Syntax
lapack_int LAPACKE_cupgtr (int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float* ap, const lapack_complex_float* tau, lapack_complex_float* q,
lapack_int ldq);
lapack_int LAPACKE_zupgtr (int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double* ap, const lapack_complex_double* tau, lapack_complex_double* q,
lapack_int ldq);

Include Files
• mkl.h

Description

The routine explicitly generates the n-by-n unitary matrix Q formed by hptrd when reducing a packed
complex Hermitian matrix A to tridiagonal form: A = Q*T*QH. Use this routine after a call to ?hptrd.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'. Use the same uplo as supplied to ?hptrd.

n The order of the matrix Q (n≥ 0).

ap, tau Arrays ap and tau, as returned by ?hptrd.

The dimension of ap must be at least max(1, n(n+1)/2).

The dimension of tau must be at least max(1, n-1).

ldq The leading dimension of the output array q;

at least max(1, n).

Output Parameters

q Array, size (size max(1, ldq*n)) .

876
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Contains the computed matrix Q.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

The real counterpart of this routine is opgtr.

?upmtr
Multiplies a complex matrix by the unitary matrix Q
determined by ?hptrd.

Syntax
lapack_int LAPACKE_cupmtr (int matrix_layout, char side, char uplo, char trans,
lapack_int m, lapack_int n, const lapack_complex_float* ap, const lapack_complex_float*
tau, lapack_complex_float* c, lapack_int ldc);
lapack_int LAPACKE_zupmtr (int matrix_layout, char side, char uplo, char trans,
lapack_int m, lapack_int n, const lapack_complex_double* ap, const
lapack_complex_double* tau, lapack_complex_double* c, lapack_int ldc);

Include Files
• mkl.h

Description

The routine multiplies a complex matrix C by Q or QH, where Q is the unitary matrix formed by hptrd when
reducing a packed complex Hermitian matrix A to tridiagonal form: A = Q*T*QH. Use this routine after a call
to ?hptrd.

Depending on the parameters side and trans, the routine can form one of the matrix products Q*C, QH*C,
C*Q, or C*QH (overwriting the result on C).

Input Parameters
In the descriptions below, r denotes the order of Q:
If side = 'L', r = m; if side = 'R', r = n.

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

side Must be either 'L' or 'R'.

If side = 'L', Q or QH is applied to C from the left.

If side = 'R', Q or QH is applied to C from the right.

877
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

uplo Must be 'U' or 'L'.

Use the same uplo as supplied to ?hptrd.

trans Must be either 'N' or 'T'.

If trans = 'N', the routine multiplies C by Q.

If trans = 'T', the routine multiplies C by QH.

m The number of rows in the matrix C (m≥ 0).

n The number of columns in C (n≥ 0).

ap, tau, c, ap and tau are the arrays returned by ?hptrd.

The size of ap must be at least max(1, r(r+1)/2).

The size of tau must be at least max(1, r-1).
c(size max(1, ldc*n) for column major layout and max(1, ldc*m) for row
major layout) contains the matrix C.

ldc The leading dimension of c; ldc≥ max(1, m) for column major layout and
ldc≥ max(1, n) for row major layout .

Output Parameters

c Overwritten by the product Q*C, QH*C, C*Q, or C*QH (as specified by side
and trans).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

?sbtrd
Reduces a real symmetric band matrix to tridiagonal
form.

Syntax
lapack_int LAPACKE_ssbtrd (int matrix_layout, char vect, char uplo, lapack_int n,
lapack_int kd, float* ab, lapack_int ldab, float* d, float* e, float* q, lapack_int
ldq);
lapack_int LAPACKE_dsbtrd (int matrix_layout, char vect, char uplo, lapack_int n,
lapack_int kd, double* ab, lapack_int ldab, double* d, double* e, double* q, lapack_int
ldq);

878
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h

Description

The routine reduces a real symmetric band matrix A to symmetric tridiagonal form T by an orthogonal
similarity transformation: A = Q*T*QT. The orthogonal matrix Q is determined as a product of Givens
rotations.
If required, the routine can also form the matrix Q explicitly.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

vect Must be 'V', 'N', or 'U'.

If vect = 'V', the routine returns the explicit matrix Q.

If vect = 'N', the routine does not return Q.

If vect = 'U', the routine updates matrix X by forming X*Q.

uplo Must be 'U' or 'L'.

If uplo = 'U', ab stores the upper triangular part of A.

If uplo = 'L', ab stores the lower triangular part of A.

n The order of the matrix A (n≥0).

kd The number of super- or sub-diagonals in A

(kd≥0).

ab, q ab(size at least max(1, ldab*n) for column major layout and at least
max(1, ldab*(kd+ 1)) for row major layout) is an array containing either
upper or lower triangular part of the matrix A (as specified by uplo) in band
storage format.
q (size max(1, ldq*n)) is an array.
If vect = 'U', the q array must contain an n-by-n matrix X.

If vect = 'N' or 'V', the q parameter need not be set.

ldab The leading dimension of ab; at least kd+1 for column major layout and n
for row major layout .

ldq The leading dimension of q. Constraints:

ldq≥ max(1, n) if vect = 'V' or 'U';
ldq≥ 1 if vect = 'N'.

879
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

ab On exit, the diagonal elements of the array ab are overwritten by the

diagonal elements of the tridiagonal matrix T. If kd > 0, the elements on
the first superdiagonal (if uplo = 'U') or the first subdiagonal (if uplo =
'L') are ovewritten by the off-diagonal elements of T. The rest of ab is
overwritten by values generated during the reduction.

d, e, q Arrays:
d contains the diagonal elements of the matrix T.
The size of d must be at least max(1, n).
e contains the off-diagonal elements of T.
The size of e must be at least max(1, n-1).
q is not referenced if vect = 'N'.

If vect = 'V', q contains the n-by-n matrix Q.

If vect = 'U', q contains the product X* Q.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed matrix T is exactly similar to a matrix A+E, where ||E||2 = c(n)*ε*||A||2, c(n) is a
modestly increasing function of n, and ε is the machine precision. The computed matrix Q differs from an
exactly orthogonal matrix by a matrix E such that ||E||2 = O(ε).

The total number of floating-point operations is approximately 6n2*kd if vect = 'N', with 3n3*(kd-1)/kd
additional operations if vect = 'V'.

The complex counterpart of this routine is hbtrd.

?hbtrd
Reduces a complex Hermitian band matrix to
tridiagonal form.

Syntax
lapack_int LAPACKE_chbtrd( int matrix_layout, char vect, char uplo, lapack_int n,
lapack_int kd, lapack_complex_float* ab, lapack_int ldab, float* d, float* e,
lapack_complex_float* q, lapack_int ldq );
lapack_int LAPACKE_zhbtrd( int matrix_layout, char vect, char uplo, lapack_int n,
lapack_int kd, lapack_complex_double* ab, lapack_int ldab, double* d, double* e,
lapack_complex_double* q, lapack_int ldq );

Include Files
• mkl.h

Description

880
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The routine reduces a complex Hermitian band matrix A to symmetric tridiagonal form T by a unitary
similarity transformation: A = Q*T*QH. The unitary matrix Q is determined as a product of Givens rotations.

If required, the routine can also form the matrix Q explicitly.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

vect Must be 'V', 'N', or 'U'.

If vect = 'V', the routine returns the explicit matrix Q.

If vect = 'N', the routine does not return Q.

If vect = 'U', the routine updates matrix X by forming Q*X.

uplo Must be 'U' or 'L'.

If uplo = 'U', ab stores the upper triangular part of A.

If uplo = 'L', ab stores the lower triangular part of A.

n The order of the matrix A (n≥ 0).

kd The number of super- or sub-diagonals in A

(kd≥ 0).

ab ab(size at least max(1, ldab*n) for column major layout and at least
max(1, ldab*(kd+ 1)) for row major layout) is an array containing either
upper or lower triangular part of the matrix A (as specified by uplo) in band
storage format.

q q (size max(1, ldq*n)) is an array.

If vect = 'U', the q array must contain an n-by-n matrix X.

If vect = 'N' or 'V', the q parameter need not be set.'

ldab The leading dimension of ab; at least kd+1 for column major layout and n
for row major layout.

ldq The leading dimension of q. Constraints:

ldq≥ max(1, n) if vect = 'V' or 'U';
ldq≥ 1 if vect = 'N'.

Output Parameters

ab On exit, the diagonal elements of the array ab are overwritten by the

d, e Arrays:
d contains the diagonal elements of the matrix T.
The dimension of d must be at least max(1, n).

881
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

e contains the off-diagonal elements of T.

The dimension of e must be at least max(1, n-1).

q If vect = 'N', q is not referenced.

If vect = 'V', q contains the n-by-n matrix Q.

If vect = 'U', q contains the product X* Q.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed matrix T is exactly similar to a matrix A + E, where ||E||2 = c(n)*ε*||A||2, c(n) is a
modestly increasing function of n, and ε is the machine precision. The computed matrix Q differs from an
exactly unitary matrix by a matrix E such that ||E||2 = O(ε).

The total number of floating-point operations is approximately 20n2*kd if vect = 'N', with 10n3*(kd-1)/
kd additional operations if vect = 'V'.
The real counterpart of this routine is sbtrd.

?sterf
Computes all eigenvalues of a real symmetric
tridiagonal matrix using QR algorithm.

Syntax
lapack_int LAPACKE_ssterf (lapack_int n, float* d, float* e);
lapack_int LAPACKE_dsterf (lapack_int n, double* d, double* e);

Include Files
• mkl.h

Description

The routine computes all the eigenvalues of a real symmetric tridiagonal matrix T (which can be obtained by
reducing a symmetric or Hermitian matrix to tridiagonal form). The routine uses a square-root-free variant of
the QR algorithm.
If you need not only the eigenvalues but also the eigenvectors, call steqr.

Input Parameters

n The order of the matrix T (n≥ 0).

d, e Arrays:
d contains the diagonal elements of T.
The dimension of d must be at least max(1, n).
e contains the off-diagonal elements of T.

882
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The dimension of e must be at least max(1, n-1).

Output Parameters

d The n eigenvalues in ascending order, unless info > 0.

e On exit, the array is overwritten; see info.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = i, the algorithm failed to find all the eigenvalues after 30n iterations:

i off-diagonal elements have not converged to zero. On exit, d and e contain, respectively, the diagonal and
off-diagonal elements of a tridiagonal matrix orthogonally similar to T.
If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed eigenvalues and eigenvectors are exact for a matrix T+E such that ||E||2 = O(ε)*||T||2,
where ε is the machine precision.
If λi is an exact eigenvalue, and mi is the corresponding computed value, then
|μi - λi| ≤c(n)*ε*||T||2
where c(n) is a modestly increasing function of n.

The total number of floating-point operations depends on how rapidly the algorithm converges. Typically, it is
about 14n2.

?steqr
Computes all eigenvalues and eigenvectors of a
symmetric or Hermitian matrix reduced to tridiagonal
form (QR algorithm).

Syntax
lapack_int LAPACKE_ssteqr( int matrix_layout, char compz, lapack_int n, float* d,
float* e, float* z, lapack_int ldz );
lapack_int LAPACKE_dsteqr( int matrix_layout, char compz, lapack_int n, double* d,
double* e, double* z, lapack_int ldz );
lapack_int LAPACKE_csteqr( int matrix_layout, char compz, lapack_int n, float* d,
float* e, lapack_complex_float* z, lapack_int ldz );
lapack_int LAPACKE_zsteqr( int matrix_layout, char compz, lapack_int n, double* d,
double* e, lapack_complex_double* z, lapack_int ldz );

Include Files
• mkl.h

Description

883
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The routine computes all the eigenvalues and (optionally) all the eigenvectors of a real symmetric tridiagonal
matrix T. In other words, the routine can compute the spectral factorization: T = Z*Λ*ZT. Here Λ is a
diagonal matrix whose diagonal elements are the eigenvalues λi; Z is an orthogonal matrix whose columns
are eigenvectors. Thus,
T*zi = λi*zi for i = 1, 2, ..., n.
The routine normalizes the eigenvectors so that ||zi||2 = 1.

You can also use the routine for computing the eigenvalues and eigenvectors of an arbitrary real symmetric
(or complex Hermitian) matrix A reduced to tridiagonal form T: A = Q*T*QH. In this case, the spectral
factorization is as follows: A = Q*T*QH = (Q*Z)*Λ*(Q*Z)H. Before calling ?steqr, you must reduce A to
tridiagonal form and generate the explicit matrix Q by calling the following routines:

for real matrices: for complex matrices:

full storage ?sytrd, ?orgtr ?hetrd, ?ungtr

packed storage ?sptrd, ?opgtr ?hptrd, ?upgtr

band storage ?sbtrd(vect='V') ?hbtrd(vect='V')

If you need eigenvalues only, it's more efficient to call sterf. If T is positive-definite, pteqr can compute small
eigenvalues more accurately than ?steqr.

To solve the problem by a single call, use one of the divide and conquer routines stevd, syevd, spevd, or
sbevd for real symmetric matrices or heevd, hpevd, or hbevd for complex Hermitian matrices.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

compz Must be 'N' or 'I' or 'V'.

If compz = 'N', the routine computes eigenvalues only.

If compz = 'I', the routine computes the eigenvalues and eigenvectors of

the tridiagonal matrix T.
If compz = 'V', the routine computes the eigenvalues and eigenvectors of
the original symmetric matrix. On entry, z must contain the orthogonal
matrix used to reduce the original matrix to tridiagonal form.

n The order of the matrix T (n≥ 0).

d, e Arrays:
d contains the diagonal elements of T.
The size of d must be at least max(1, n).
e contains the off-diagonal elements of T.
The size of e must be at least max(1, n-1).

z Array, size max(1, ldz*n).

If compz = 'N' or 'I', z need not be set.

If vect = 'V', z must contain the orthogonal matrix used in the reduction
to tridiagonal form.

884
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ldz The leading dimension of z. Constraints:
ldz≥ 1 if compz = 'N';
ldz≥ max(1, n) if compz = 'V' or 'I'.

Output Parameters

d The n eigenvalues in ascending order, unless info > 0.

e On exit, the array is overwritten; see info.

z If info = 0, contains the n-by-n matrix the columns of which are

orthonormal eigenvectors (the i-th column corresponds to the i-th
eigenvalue).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = i, the algorithm failed to find all the eigenvalues after 30n iterations: i off-diagonal elements have
not converged to zero. On exit, d and e contain, respectively, the diagonal and off-diagonal elements of a
tridiagonal matrix orthogonally similar to T.
If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed eigenvalues and eigenvectors are exact for a matrix T+E such that ||E||2 = O(ε)*||T||2,
where ε is the machine precision.
If λi is an exact eigenvalue, and μi is the corresponding computed value, then
|μi - λi| ≤c(n)*ε*||T||2

where c(n) is a modestly increasing function of n.

If zi is the corresponding exact eigenvector, and wi is the corresponding computed vector, then the angle
θ(zi, wi) between them is bounded as follows:
θ(zi, wi) ≤c(n)*ε*||T||2 / mini≠j|λi - λj|.
The total number of floating-point operations depends on how rapidly the algorithm converges. Typically, it is
about
24n2 if compz = 'N';
7n3 (for complex flavors, 14n3) if compz = 'V' or 'I'.

?stemr
Computes selected eigenvalues and eigenvectors of a
real symmetric tridiagonal matrix.

Syntax
lapack_int LAPACKE_sstemr( int matrix_layout, char jobz, char range, lapack_int n,
const float* d, float* e, float vl, float vu, lapack_int il, lapack_int iu, lapack_int*
m, float* w, float* z, lapack_int ldz, lapack_int nzc, lapack_int* isuppz,
lapack_logical* tryrac );

885
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lapack_int LAPACKE_dstemr( int matrix_layout, char jobz, char range, lapack_int n,

const double* d, double* e, double vl, double vu, lapack_int il, lapack_int iu,
lapack_int* m, double* w, double* z, lapack_int ldz, lapack_int nzc, lapack_int* isuppz,
lapack_logical* tryrac );
lapack_int LAPACKE_cstemr( int matrix_layout, char jobz, char range, lapack_int n,
const float* d, float* e, float vl, float vu, lapack_int il, lapack_int iu, lapack_int*
m, float* w, lapack_complex_float* z, lapack_int ldz, lapack_int nzc, lapack_int*
isuppz, lapack_logical* tryrac );
lapack_int LAPACKE_zstemr( int matrix_layout, char jobz, char range, lapack_int n,
const double* d, double* e, double vl, double vu, lapack_int il, lapack_int iu,
lapack_int* m, double* w, lapack_complex_double* z, lapack_int ldz, lapack_int nzc,
lapack_int* isuppz, lapack_logical* tryrac );

Include Files
• mkl.h

Description

The routine computes selected eigenvalues and, optionally, eigenvectors of a real symmetric tridiagonal
matrix T. Any such unreduced matrix has a well defined set of pairwise different real eigenvalues, the
corresponding real eigenvectors are pairwise orthogonal.
The spectrum may be computed either completely or partially by specifying either an interval (vl,vu] or a
range of indices il:iu for the desired eigenvalues.

Depending on the number of desired eigenvalues, these are computed either by bisection or the dqds
algorithm. Numerically orthogonal eigenvectors are computed by the use of various suitable L*D*LT
factorizations near clusters of close eigenvalues (referred to as RRRs, Relatively Robust Representations). An
informal sketch of the algorithm follows.
For each unreduced block (submatrix) of T,

a. Compute T - sigma*I = L*D*LT, so that L and D define all the wanted eigenvalues to high relative
accuracy. This means that small relative changes in the entries of L and D cause only small relative
changes in the eigenvalues and eigenvectors. The standard (unfactored) representation of the
tridiagonal matrix T does not have this property in general.
b. Compute the eigenvalues to suitable accuracy. If the eigenvectors are desired, the algorithm attains full
accuracy of the computed eigenvalues only right before the corresponding vectors have to be
computed, see steps c and d.
c. For each cluster of close eigenvalues, select a new shift close to the cluster, find a new factorization,
and refine the shifted eigenvalues to suitable accuracy.
d. For each eigenvalue with a large enough relative separation compute the corresponding eigenvector by
forming a rank revealing twisted factorization. Go back to step c for any clusters that remain.

Normal execution of ?stemr may create NaNs and infinities and may abort due to a floating point exception
in environments that do not handle NaNs and infinities in the IEEE standard default manner.
For more details, see: [Dhillon04], [Dhillon04-02], [Dhillon97]

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

886
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If jobz = 'N', then only eigenvalues are computed.

If jobz = 'V', then eigenvalues and eigenvectors are computed.

range Must be 'A' or 'V' or 'I'.

If range = 'A', the routine computes all eigenvalues.

If range = 'V', the routine computes all eigenvalues in the half-open

interval: (vl, vu].

If range = 'I', the routine computes eigenvalues with indices il to iu.

n The order of the matrix T (n≥0).

d Array, size n.
Contains n diagonal elements of the tridiagonal matrix T.

e Array, size n.
Contains (n-1) off-diagonal elements of the tridiagonal matrix T in
elements 0 to n-2 of e. e[n - 1] need not be set on input, but is used
internally as workspace.

vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues. Constraint: vl<vu.

If range = 'A' or 'I', vl and vu are not referenced.

il, iu If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint: 1≤il≤iu≤n, if n>0.

If range = 'A' or 'V', il and iu are not referenced.

ldz The leading dimension of the output array z.

if jobz = 'V', then ldz ≥ max(1, n) for column major layout and ldz≥
max(1, m) for row major layout ;

ldz ≥ 1 otherwise.

nzc The number of eigenvectors to be held in the array z.

If range = 'A', then nzc≥max(1, n);

If range = 'V', then nzc is greater than or equal to the number of

eigenvalues in the half-open interval: (vl, vu].

If range = 'I', then nzc≥iu-il+1.

If nzc = -1, then a workspace query is assumed; the routine calculates the
number of columns of the array z that are needed to hold the eigenvectors.
This value is returned as the first entry of the array z, and no error
message related to nzc is issued by the routine xerbla.

tryrac If tryrac is true, it indicates that the code should check whether the
tridiagonal matrix defines its eigenvalues to high relative accuracy. If so,
the code uses relative-accuracy preserving algorithms that might be (a bit)

887
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

slower depending on the matrix. If the matrix does not define its
eigenvalues to high relative accuracy, the code can uses possibly faster
algorithms.
If tryrac is not true, the code is not required to guarantee relatively
accurate eigenvalues and can use the fastest possible techniques.

Output Parameters

d On exit, the array d is overwritten.

e On exit, the array e is overwritten.

m The total number of eigenvalues found, 0≤m≤n.

If range = 'A', then m=n, and if range = 'I', then m=iu-il+1.

w Array, size n.
The first m elements contain the selected eigenvalues in ascending order.

z Array z(size max(1, ldz*m) for column major layout and max(1, ldz*n) for
row major layout) .
If jobz = 'V', and info = 0, then the first m columns of z contain the
orthonormal eigenvectors of the matrix T corresponding to the selected
eigenvalues, with the i-th column of z holding the eigenvector associated
with w(i).

If jobz = 'N', then z is not referenced.

Note: the exact value of m is not known in advance and can be computed
with a workspace query by setting nzc=-1, see description of the
parameter nzc.

isuppz Array, size (2*max(1, m)).

The support of the eigenvectors in z, that is the indices indicating the

nonzero elements in z. The i-th computed eigenvector is nonzero only in
elements isuppz[2*i - 2] through isuppz[2*i - 1]. This is relevant in
the case when the matrix is split. isuppz is only accessed when jobz =
'V' and n>0.

tryrac On exit, , set to true. tryrac is set to false if the matrix does not define its
eigenvalues to high relative accuracy.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info > 0, an internal error occurred.

888
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
?stedc
Computes all eigenvalues and eigenvectors of a
symmetric tridiagonal matrix using the divide and
conquer method.

Syntax
lapack_int LAPACKE_sstedc( int matrix_layout, char compz, lapack_int n, float* d,
float* e, float* z, lapack_int ldz );
lapack_int LAPACKE_dstedc( int matrix_layout, char compz, lapack_int n, double* d,
double* e, double* z, lapack_int ldz );
lapack_int LAPACKE_cstedc( int matrix_layout, char compz, lapack_int n, float* d,
float* e, lapack_complex_float* z, lapack_int ldz );
lapack_int LAPACKE_zstedc( int matrix_layout, char compz, lapack_int n, double* d,
double* e, lapack_complex_double* z, lapack_int ldz );

Include Files
• mkl.h

Description

The routine computes all the eigenvalues and (optionally) all the eigenvectors of a symmetric tridiagonal
matrix using the divide and conquer method. The eigenvectors of a full or band real symmetric or complex
Hermitian matrix can also be found if sytrd/hetrd or sptrd/hptrd or sbtrd/hbtrd has been used to reduce this
matrix to tridiagonal form.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

compz Must be 'N' or 'I' or 'V'.

If compz = 'N', the routine computes eigenvalues only.

If compz = 'I', the routine computes the eigenvalues and eigenvectors of

the tridiagonal matrix.
If compz = 'V', the routine computes the eigenvalues and eigenvectors of
original symmetric/Hermitian matrix. On entry, the array z must contain the
orthogonal/unitary matrix used to reduce the original matrix to tridiagonal
form.

n The order of the symmetric tridiagonal matrix (n≥ 0).

d, e Arrays:
d contains the diagonal elements of the tridiagonal matrix.
The dimension of d must be at least max(1, n).
e contains the subdiagonal elements of the tridiagonal matrix.
The dimension of e must be at least max(1, n-1).

z Array z is of size max(1, ldz*n).

889
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If compz = 'V', then, on entry, z must contain the orthogonal/unitary

matrix used to reduce the original matrix to tridiagonal form.

ldz The leading dimension of z. Constraints:

ldz≥ 1 if compz = 'N';

ldz≥ max(1, n) if compz = 'V' or 'I'.

Output Parameters

d The n eigenvalues in ascending order, unless info≠ 0.

e On exit, the array is overwritten; see info.

z If info = 0, then if compz = 'V', z contains the orthonormal eigenvectors

of the original symmetric/Hermitian matrix, and if compz = 'I', z contains
the orthonormal eigenvectors of the symmetric tridiagonal matrix. If compz
= 'N', z is not referenced.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, the algorithm failed to compute an eigenvalue while working on the submatrix lying in rows and
columns i/(n+1) through mod(i, n+1).

?stegr
Computes selected eigenvalues and eigenvectors of a
real symmetric tridiagonal matrix.

Syntax
lapack_int LAPACKE_sstegr( int matrix_layout, char jobz, char range, lapack_int n,
float* d, float* e, float vl, float vu, lapack_int il, lapack_int iu, float abstol,
lapack_int* m, float* w, float* z, lapack_int ldz, lapack_int* isuppz );
lapack_int LAPACKE_dstegr( int matrix_layout, char jobz, char range, lapack_int n,
double* d, double* e, double vl, double vu, lapack_int il, lapack_int iu, double abstol,
lapack_int* m, double* w, double* z, lapack_int ldz, lapack_int* isuppz );
lapack_int LAPACKE_cstegr( int matrix_layout, char jobz, char range, lapack_int n,
float* d, float* e, float vl, float vu, lapack_int il, lapack_int iu, float abstol,
lapack_int* m, float* w, lapack_complex_float* z, lapack_int ldz, lapack_int* isuppz );
lapack_int LAPACKE_zstegr( int matrix_layout, char jobz, char range, lapack_int n,
double* d, double* e, double vl, double vu, lapack_int il, lapack_int iu, double abstol,
lapack_int* m, double* w, lapack_complex_double* z, lapack_int ldz, lapack_int*
isuppz );

Include Files
• mkl.h

890
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description

The routine computes selected eigenvalues and, optionally, eigenvectors of a real symmetric tridiagonal
matrix T.
The spectrum may be computed either completely or partially by specifying either an interval (vl,vu] or a
range of indices il:iu for the desired eigenvalues.

?stegr is a compatibility wrapper around the improved stemr routine. See its description for further details.
Note that the abstol parameter no longer provides any benefit and hence is no longer used.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

If job = 'N', then only eigenvalues are computed.

If job = 'V', then eigenvalues and eigenvectors are computed.

range Must be 'A' or 'V' or 'I'.

If range = 'A', the routine computes all eigenvalues.

If range = 'V', the routine computes eigenvalues w[i] in the half-open

interval:
vl<w[i]≤vu.
If range = 'I', the routine computes eigenvalues with indices il to iu.

n The order of the matrix T (n≥ 0).

d, e Arrays:
d contains the diagonal elements of T.
The dimension of d must be at least max(1, n).
e contains the subdiagonal elements of T in elements 1 to n-1; e(n) need
not be set on input, but it is used as a workspace.
The dimension of e must be at least max(1, n).

vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
Constraint: vl< vu.

If range = 'A' or 'I', vl and vu are not referenced.

il, iu If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint: 1 ≤il≤iu≤n, if n > 0.

If range = 'A' or 'V', il and iu are not referenced.

abstol Unused. Was the absolute error tolerance for the eigenvalues/eigenvectors
in previous versions.

891
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ldz The leading dimension of the output array z. Constraints:

ldz≥ 1 if jobz = 'N';
ldz≥ max(1, n) if jobz = 'V'.

Output Parameters

d, e On exit, d and e are overwritten.

m The total number of eigenvalues found,

0 ≤m≤n.
If range = 'A', m = n, and if range = 'I', m = iu-il+1.

w Array, size at least max(1, n).

The selected eigenvalues in ascending order, stored in w[0] to w[m - 1].

z Array z(size max(1,ldz*m)).

If jobz = 'V', and if info = 0, the first m columns of z contain the

orthonormal eigenvectors of the matrix T corresponding to the selected
eigenvalues, with the i-th column of z holding the eigenvector associated
with w[i - 1].

If jobz = 'N', then z is not referenced.

Note: if range = 'V', the exact value of m is not known in advance and an
upper bound must be used. Using n = m is always safe.

isuppz Array, size at least (2*max(1, m)).

The support of the eigenvectors in z, that is the indices indicating the

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info > 0, an internal error occurred.

?pteqr
Computes all eigenvalues and (optionally) all
eigenvectors of a real symmetric positive-definite
tridiagonal matrix.

Syntax
lapack_int LAPACKE_spteqr( int matrix_layout, char compz, lapack_int n, float* d,
float* e, float* z, lapack_int ldz );
lapack_int LAPACKE_dpteqr( int matrix_layout, char compz, lapack_int n, double* d,
double* e, double* z, lapack_int ldz );

892
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_cpteqr( int matrix_layout, char compz, lapack_int n, float* d,
float* e, lapack_complex_float* z, lapack_int ldz );
lapack_int LAPACKE_zpteqr( int matrix_layout, char compz, lapack_int n, double* d,
double* e, lapack_complex_double* z, lapack_int ldz );

Include Files
• mkl.h

Description

The routine computes all the eigenvalues and (optionally) all the eigenvectors of a real symmetric positive-
definite tridiagonal matrix T. In other words, the routine can compute the spectral factorization: T =
Z*Λ*ZT.
Here Λ is a diagonal matrix whose diagonal elements are the eigenvalues λi; Z is an orthogonal matrix whose
columns are eigenvectors. Thus,
T*zi = λi*zi for i = 1, 2, ..., n.
(The routine normalizes the eigenvectors so that ||zi||2 = 1.)

You can also use the routine for computing the eigenvalues and eigenvectors of real symmetric (or complex
Hermitian) positive-definite matrices A reduced to tridiagonal form T: A = Q*T*QH. In this case, the spectral
factorization is as follows: A = Q*T*QH = (QZ)*Λ*(QZ)H. Before calling ?pteqr, you must reduce A to
tridiagonal form and generate the explicit matrix Q by calling the following routines:

for real matrices: for complex matrices:

full storage ?sytrd, ?orgtr ?hetrd, ?ungtr

packed storage ?sptrd, ?opgtr ?hptrd, ?upgtr

band storage ?sbtrd(vect='V') ?hbtrd(vect='V')

The routine first factorizes T as L*D*LH where L is a unit lower bidiagonal matrix, and D is a diagonal matrix.
Then it forms the bidiagonal matrix B = L*D1/2 and calls ?bdsqr to compute the singular values of B, which
are the square roots of the eigenvalues of T.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

compz Must be 'N' or 'I' or 'V'.

If compz = 'N', the routine computes eigenvalues only.

If compz = 'I', the routine computes the eigenvalues and eigenvectors of

the tridiagonal matrix T.
If compz = 'V', the routine computes the eigenvalues and eigenvectors of
A (and the array z must contain the matrix Q on entry).

n The order of the matrix T (n≥ 0).

d, e Arrays:
d contains the diagonal elements of T.

893
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The size of d must be at least max(1, n).

e contains the off-diagonal elements of T.
The size of e must be at least max(1, n-1).

z Array, size max(1, ldz*n)

If compz = 'N' or 'I', z need not be set.

If compz = 'V', z must contain the orthogonal matrix used in the

reduction to tridiagonal form..

ldz The leading dimension of z. Constraints:

ldz≥ 1 if compz = 'N';

ldz≥ max(1, n) if compz = 'V' or 'I'.

Output Parameters

d The n eigenvalues in descending order, unless info > 0.

e On exit, the array is overwritten.

z If info = 0, contains an n-byn matrix the columns of which are

orthonormal eigenvectors. (The i-th column corresponds to the i-th
eigenvalue.)

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = i, the leading minor of order i (and hence T itself) is not positive-definite.

If info = n + i, the algorithm for computing singular values failed to converge; i off-diagonal elements
have not converged to zero.
If info = -i, the i-th parameter had an illegal value.

Application Notes
If λi is an exact eigenvalue, and μi is the corresponding computed value, then
|μi - λi| ≤c(n)*ε*K*λi
where c(n) is a modestly increasing function of n, ε is the machine precision, and K = ||DTD||2 *||
(DTD)-1||2, D is diagonal with dii = tii-1/2.
If zi is the corresponding exact eigenvector, and wi is the corresponding computed vector, then the angle θ(zi,
wi) between them is bounded as follows:
θ(ui, wi) ≤c(n)εK / mini≠j(|λi - λj|/|λi + λj|).
Here mini≠j(|λi - λj|/|λi + λj|) is the relative gap between λi and the other eigenvalues.

The total number of floating-point operations depends on how rapidly the algorithm converges.
Typically, it is about
30n2 if compz = 'N';

894
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
6n3 (for complex flavors, 12n3) if compz = 'V' or 'I'.

?stebz
Computes selected eigenvalues of a real symmetric
tridiagonal matrix by bisection.

Syntax
lapack_int LAPACKE_sstebz (char range, char order, lapack_int n, float vl, float vu,
lapack_int il, lapack_int iu, float abstol, const float* d, const float* e, lapack_int*
m, lapack_int* nsplit, float* w, lapack_int* iblock, lapack_int* isplit);
lapack_int LAPACKE_dstebz (char range, char order, lapack_int n, double vl, double vu,
lapack_int il, lapack_int iu, double abstol, const double* d, const double* e,
lapack_int* m, lapack_int* nsplit, double* w, lapack_int* iblock, lapack_int* isplit);

Include Files
• mkl.h

Description

The routine computes some (or all) of the eigenvalues of a real symmetric tridiagonal matrix T by bisection.
The routine searches for zero or negligible off-diagonal elements to see if T splits into block-diagonal form T
= diag(T1, T2, ...). Then it performs bisection on each of the blocks Ti and returns the block index of
each computed eigenvalue, so that a subsequent call to stein can also take advantage of the block structure.

Input Parameters

range Must be 'A' or 'V' or 'I'.

If range = 'A', the routine computes all eigenvalues.

If range = 'V', the routine computes eigenvalues w[i] in the half-open

interval: vl < w[i]≤vu.

If range = 'I', the routine computes eigenvalues with indices il to iu.

order Must be 'B' or 'E'.

If order = 'B', the eigenvalues are to be ordered from smallest to largest

within each split-off block.
If order = 'E', the eigenvalues for the entire matrix are to be ordered
from smallest to largest.

n The order of the matrix T (n≥ 0).

vl, vu If range = 'V', the routine computes eigenvalues w[i] in the half-open
interval:
vl < w[i]) ≤vu.
If range = 'A' or 'I', vl and vu are not referenced.

il, iu Constraint: 1 ≤il≤iu≤n.

If range = 'I', the routine computes eigenvalues w[i] such that il≤i≤iu
(assuming that the eigenvalues w[i] are in ascending order).

895
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If range = 'A' or 'V', il and iu are not referenced.

abstol The absolute tolerance to which each eigenvalue is required. An eigenvalue

(or cluster) is considered to have converged if it lies in an interval of width
abstol.
If abstol≤ 0.0, then the tolerance is taken as eps*|T|, where eps is the
machine precision, and |T| is the 1-norm of the matrix T.

d, e Arrays:
d contains the diagonal elements of T.
The size of d must be at least max(1, n).
e contains the off-diagonal elements of T.
The size of e must be at least max(1, n-1).

Output Parameters

m The actual number of eigenvalues found.

nsplit The number of diagonal blocks detected in T.

w Array, size at least max(1, n). The computed eigenvalues, stored in w[0] to
w[m - 1].

iblock, isplit Arrays, size at least max(1, n).

A positive value iblock[i] is the block number of the eigenvalue stored in
w[i] (see also info).
The leading nsplit elements of isplit contain points at which T splits into
blocks Ti as follows: the block T1 contains rows/columns 1 to isplit[0];
the block T2 contains rows/columns isplit[0]+1 to isplit[1], and so
on.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = 1, for range = 'A' or 'V', the algorithm failed to compute some of the required eigenvalues to
the desired accuracy; iblock[i] < 0 indicates that the eigenvalue stored in w[i] failed to converge.
If info = 2, for range = 'I', the algorithm failed to compute some of the required eigenvalues. Try calling
the routine again with range = 'A'.

If info = 3:

for range = 'A' or 'V', same as info = 1;

for range = 'I', same as info = 2.

If info = 4, no eigenvalues have been computed. The floating-point arithmetic on the computer is not
behaving as expected.
If info = -i, the i-th parameter had an illegal value.

896
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Application Notes
The eigenvalues of T are computed to high relative accuracy which means that if they vary widely in
magnitude, then any small eigenvalues will be computed more accurately than, for example, with the
standard QR method. However, the reduction to tridiagonal form (prior to calling the routine) may exclude
the possibility of obtaining high relative accuracy in the small eigenvalues of the original matrix if its
eigenvalues vary widely in magnitude.

?stein
Computes the eigenvectors corresponding to specified
eigenvalues of a real symmetric tridiagonal matrix.

Syntax
lapack_int LAPACKE_sstein( int matrix_layout, lapack_int n, const float* d, const
float* e, lapack_int m, const float* w, const lapack_int* iblock, const lapack_int*
isplit, float* z, lapack_int ldz, lapack_int* ifailv );
lapack_int LAPACKE_dstein( int matrix_layout, lapack_int n, const double* d, const
double* e, lapack_int m, const double* w, const lapack_int* iblock, const lapack_int*
isplit, double* z, lapack_int ldz, lapack_int* ifailv );
lapack_int LAPACKE_cstein( int matrix_layout, lapack_int n, const float* d, const
float* e, lapack_int m, const float* w, const lapack_int* iblock, const lapack_int*
isplit, lapack_complex_float* z, lapack_int ldz, lapack_int* ifailv );
lapack_int LAPACKE_zstein( int matrix_layout, lapack_int n, const double* d, const
double* e, lapack_int m, const double* w, const lapack_int* iblock, const lapack_int*
isplit, lapack_complex_double* z, lapack_int ldz, lapack_int* ifailv );

Include Files
• mkl.h

Description

The routine computes the eigenvectors of a real symmetric tridiagonal matrix T corresponding to specified
eigenvalues, by inverse iteration. It is designed to be used in particular after the specified eigenvalues have
been computed by ?stebz with order = 'B', but may also be used when the eigenvalues have been
computed by other routines.
If you use this routine after ?stebz, it can take advantage of the block structure by performing inverse
iteration on each block Ti separately, which is more efficient than using the whole matrix T.
If T has been formed by reduction of a full symmetric or Hermitian matrix A to tridiagonal form, you can
transform eigenvectors of T to eigenvectors of A by calling ?ormtr or ?opmtr (for real flavors) or by
calling ?unmtr or ?upmtr (for complex flavors).

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

n The order of the matrix T (n≥ 0).

m The number of eigenvectors to be returned.

d, e, w Arrays:

897
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

d contains the diagonal elements of T.

The size of d must be at least max(1, n).
e contains the sub-diagonal elements of T stored in elements 1 to n-1
The size of e must be at least max(1, n-1).
w contains the eigenvalues of T, stored in w[0] to w[m - 1] (as returned
by stebz). Eigenvalues of T1 must be supplied first, in non-decreasing
order; then those of T2, again in non-decreasing order, and so on.
Constraint:
if iblock[i] = iblock[i+1], w[i] ≤w[i+1].

The size of w must be at least max(1, n).

iblock, isplit Arrays, size at least max(1, n). The arrays iblock and isplit, as returned
by ?stebz with order = 'B'.

If you did not call ?stebz with order = 'B', set all elements of iblock to
1, and isplit[0] to n.)

ldz The leading dimension of the output array z; ldz≥ max(1, n) for column
major layout and ldz>=max(1,m) for row major layout.

Output Parameters

z Array, size at least max(1,ldz*m) for column major layout and

max(1,ldz*n) for row major layout.

If info = 0, z contains an n-by-n matrix the columns of which are

orthonormal eigenvectors. (The i-th column corresponds to the ith
eigenvalue.)

ifailv Array, size at least max(1, m).

If info = i > 0, the first i elements of ifailv contain the indices of any
eigenvectors that failed to converge.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = i, then i eigenvectors (as indicated by the parameter ifailv) each failed to converge in 5 iterations.
The current iterates are stored in the corresponding columns/rows of the array z.
If info = -i, the i-th parameter had an illegal value.

Application Notes
Each computed eigenvector zi is an exact eigenvector of a matrix T+Ei, where ||Ei||2 = O(ε)*||T||2.
However, a set of eigenvectors computed by this routine may not be orthogonal to so high a degree of
accuracy as those computed by ?steqr.

898
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
?disna
Computes the reciprocal condition numbers for the
eigenvectors of a symmetric/ Hermitian matrix or for
the left or right singular vectors of a general matrix.

Syntax
lapack_int LAPACKE_sdisna (char job, lapack_int m, lapack_int n, const float* d, float*
sep);
lapack_int LAPACKE_ddisna (char job, lapack_int m, lapack_int n, const double* d,
double* sep);

Include Files
• mkl.h

Description

The routine computes the reciprocal condition numbers for the eigenvectors of a real symmetric or complex
Hermitian matrix or for the left or right singular vectors of a general m-by-n matrix.
The reciprocal condition number is the 'gap' between the corresponding eigenvalue or singular value and the
nearest other one.
The bound on the error, measured by angle in radians, in the i-th computed vector is given by
?lamch('E')*(anorm/sep(i))
where anorm = ||A||2 = max( |d(j)| ). sep(i) is not allowed to be smaller than slamch('E')*anorm in
order to limit the size of the error bound.
?disna may also be used to compute error bounds for eigenvectors of the generalized symmetric definite
eigenproblem.

Input Parameters

job Must be 'E','L', or 'R'. Specifies for which problem the reciprocal
condition numbers should be computed:
job = 'E': for the eigenvectors of a symmetric/Hermitian matrix;

job = 'L': for the left singular vectors of a general matrix;

job = 'R': for the right singular vectors of a general matrix.

m The number of rows of the matrix (m≥ 0).

n If job = 'L', or 'R', the number of columns of the matrix (n≥ 0). Ignored
if job = 'E'.

d Array, dimension at least max(1,m) if job = 'E', and at least max(1,

min(m,n)) if job = 'L' or 'R'.

This array must contain the eigenvalues (if job = 'E') or singular values
(if job = 'L' or 'R') of the matrix, in either increasing or decreasing
order.
If singular values, they must be non-negative.

899
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

sep Array, dimension at least max(1,m) if job = 'E', and at least max(1,
min(m,n)) if job = 'L' or 'R'. The reciprocal condition numbers of the
vectors.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Generalized Symmetric-Definite Eigenvalue Problems: LAPACK Computational Routines

Generalized symmetric-definite eigenvalue problems are as follows: find the eigenvalues λ and the
corresponding eigenvectors z that satisfy one of these equations:
Az = λBz, ABz = λz, or BAz = λz,
where A is an n-by-n symmetric or Hermitian matrix, and B is an n-by-n symmetric positive-definite or
Hermitian positive-definite matrix.
In these problems, there exist n real eigenvectors corresponding to real eigenvalues (even for complex
Hermitian matrices A and B).
Routines described in this topic allow you to reduce the above generalized problems to standard symmetric
eigenvalue problem Cy = λy, which you can solve by calling LAPACK routines described earlier in this
chapter (see Symmetric Eigenvalue Problems).
Different routines allow the matrices to be stored either conventionally or in packed storage. Prior to
reduction, the positive-definite matrix B must first be factorized using either potrf or pptrf.
The reduction routine for the banded matrices A and B uses a split Cholesky factorization for which a specific
routine pbstf is provided. This refinement halves the amount of work required to form matrix C.
Table "Computational Routines for Reducing Generalized Eigenproblems to Standard Problems" lists LAPACK
routines that can be used to solve generalized symmetric-definite eigenvalue problems.
Computational Routines for Reducing Generalized Eigenproblems to Standard Problems
Matrix type Reduce to standard Reduce to standard Reduce to standard Factorize
problems (full problems (packed problems (band band
storage) storage) matrices) matrix

real sygst spgst sbgst pbstf

symmetric
matrices

complex hegst hpgst hbgst pbstf

Hermitian
matrices

?sygst
Reduces a real symmetric-definite generalized
eigenvalue problem to the standard form.

Syntax
lapack_int LAPACKE_ssygst (int matrix_layout, lapack_int itype, char uplo, lapack_int
n, float* a, lapack_int lda, const float* b, lapack_int ldb);
lapack_int LAPACKE_dsygst (int matrix_layout, lapack_int itype, char uplo, lapack_int
n, double* a, lapack_int lda, const double* b, lapack_int ldb);

900
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h

Description

The routine reduces real symmetric-definite generalized eigenproblems

A*z = λ*B*z, A*B*z = λ*z, or B*A*z = λ*z
to the standard form C*y = λ*y. Here A is a real symmetric matrix, and B is a real symmetric positive-
definite matrix. Before calling this routine, call ?potrf to compute the Cholesky factorization: B = UT*U or B
= L*LT.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

itype Must be 1 or 2 or 3.
If itype = 1, the generalized eigenproblem is A*z = lambda*B*z

for uplo = 'U': C = inv(UT)Ainv(U), z = inv(U)*y;

for uplo = 'L': C = inv(L)Ainv(LT), z = inv(LT)*y.

If itype = 2, the generalized eigenproblem is ABz = lambda*z

for uplo = 'U': C = UAUT, z = inv(U)*y;

for uplo = 'L': C = LTAL, z = inv(LT)*y.

If itype = 3, the generalized eigenproblem is BAz = lambda*z

for uplo = 'U': C = UAUT, z = UT*y;

for uplo = 'L': C = LTAL, z = L*y.

uplo Must be 'U' or 'L'.

If uplo = 'U', the array a stores the upper triangle of A; you must supply
B in the factored form B = UT*U.

If uplo = 'L', the array a stores the lower triangle of A; you must supply
B in the factored form B = L*LT.

n The order of the matrices A and B (n≥ 0).

a, b Arrays:
a (size max(1, lda*n)) contains the upper or lower triangle of A.

b (size max(1, ldb*n)) contains the Cholesky-factored matrix B:

B = UTU or B = LLT (as returned by ?potrf).

lda The leading dimension of a; at least max(1, n).

ldb The leading dimension of b; at least max(1, n).

901
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

a The upper or lower triangle of A is overwritten by the upper or lower

triangle of C, as specified by the arguments itype and uplo.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
Forming the reduced matrix C is a stable procedure. However, it involves implicit multiplication by inv(B) (if
itype = 1) or B (if itype = 2 or 3). When the routine is used as a step in the computation of eigenvalues
and eigenvectors of the original problem, there may be a significant loss of accuracy if B is ill-conditioned
with respect to inversion.
The approximate number of floating-point operations is n3.

?hegst
Reduces a complex Hermitian positive-definite
generalized eigenvalue problem to the standard form.

Syntax
lapack_int LAPACKE_chegst (int matrix_layout, lapack_int itype, char uplo, lapack_int
n, lapack_complex_float* a, lapack_int lda, const lapack_complex_float* b, lapack_int
ldb);
lapack_int LAPACKE_zhegst (int matrix_layout, lapack_int itype, char uplo, lapack_int
n, lapack_complex_double* a, lapack_int lda, const lapack_complex_double* b, lapack_int
ldb);

Include Files
• mkl.h

Description

The routine reduces a complex Hermitian positive-definite generalized eigenvalue problem to standard form.

itype Problem Result

1 Ax = λB*x A overwritten by inv(UH)Ainv(U) or

inv(L)*A*inv(LH)

2 ABx = λx A overwritten by UA*UH or LHAL

3 B*A*x = λ*x

Before calling this routine, you must call ?potrf to compute the Cholesky factorization: B = UH*U or B =
L*LH.

902
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

itype Must be 1 or 2 or 3.
If itype = 1, the generalized eigenproblem is A*z = lambda*B*z

for uplo = 'U': C = (UH)-1AU-1;

for uplo = 'L': C = L-1A(LH)-1.

If itype = 2, the generalized eigenproblem is ABz = lambda*z

for uplo = 'U': C = UAUH;

for uplo = 'L': C = LHAL.

If itype = 3, the generalized eigenproblem is BAz = lambda*z

for uplo = 'U': C = UAUH;

for uplo = 'L': C = LHAL.

uplo Must be 'U' or 'L'.

If uplo = 'U', the array a stores the upper triangle of A; you must supply
B in the factored form B = UH*U.

If uplo = 'L', the array a stores the lower triangle of A; you must supply
B in the factored form B = L*LH.

n The order of the matrices A and B (n≥ 0).

a, b Arrays:
a (size max(1, lda*n)) contains the upper or lower triangle of A.

b (size max(1, ldb*n)) contains the Cholesky-factored matrix B:

B = UHU or B = LLH (as returned by ?potrf).

lda The leading dimension of a; at least max(1, n).

ldb The leading dimension of b; at least max(1, n).

Output Parameters

a The upper or lower triangle of A is overwritten by the upper or lower

triangle of C, as specified by the arguments itype and uplo.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
Forming the reduced matrix C is a stable procedure. However, it involves implicit multiplication by B-1 (if
itype = 1) or B (if itype = 2 or 3). When the routine is used as a step in the computation of eigenvalues
and eigenvectors of the original problem, there may be a significant loss of accuracy if B is ill-conditioned
with respect to inversion.
The approximate number of floating-point operations is n3.

903
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

?spgst
Reduces a real symmetric-definite generalized
eigenvalue problem to the standard form using packed
storage.

Syntax
lapack_int LAPACKE_sspgst (int matrix_layout, lapack_int itype, char uplo, lapack_int
n, float* ap, const float* bp);
lapack_int LAPACKE_dspgst (int matrix_layout, lapack_int itype, char uplo, lapack_int
n, double* ap, const double* bp);

Include Files
• mkl.h

Description

The routine reduces real symmetric-definite generalized eigenproblems

A*x = λ*B*x, A*B*x = λ*x, or B*A*x = λ*x
to the standard form C*y = λ*y, using packed matrix storage. Here A is a real symmetric matrix, and B is a
real symmetric positive-definite matrix. Before calling this routine, call ?pptrf to compute the Cholesky
factorization: B = UT*U or B = L*LT.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

itype Must be 1 or 2 or 3.
If itype = 1, the generalized eigenproblem is A*z = lambda*B*z

for uplo = 'U': C = inv(UT)Ainv(U), z = inv(U)*y;

for uplo = 'L': C = inv(L)Ainv(LT), z = inv(LT)*y.

If itype = 2, the generalized eigenproblem is ABz = lambda*z

for uplo = 'U': C = UAUT, z = inv(U)*y;

for uplo = 'L': C = LTAL, z = inv(LT)*y.

If itype = 3, the generalized eigenproblem is BAz = lambda*z

for uplo = 'U': C = UAUT, z = UT*y;

for uplo = 'L': C = LTAL, z = L*y.

uplo Must be 'U' or 'L'.

If uplo = 'U', ap stores the packed upper triangle of A;

you must supply B in the factored form B = UT*U.

If uplo = 'L', ap stores the packed lower triangle of A;

you must supply B in the factored form B = L*LT.

n The order of the matrices A and B (n≥ 0).

904
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ap, bp Arrays:
ap contains the packed upper or lower triangle of A.
The dimension of ap must be at least max(1, n*(n+1)/2).
bp contains the packed Cholesky factor of B (as returned by ?pptrf with
the same uplo value).
The dimension of bp must be at least max(1, n*(n+1)/2).

Output Parameters

ap The upper or lower triangle of A is overwritten by the upper or lower

triangle of C, as specified by the arguments itype and uplo.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

?hpgst
Reduces a generalized eigenvalue problem with a
Hermitian matrix to a standard eigenvalue problem
using packed storage.

Syntax
lapack_int LAPACKE_chpgst (int matrix_layout, lapack_int itype, char uplo, lapack_int
n, lapack_complex_float* ap, const lapack_complex_float* bp);
lapack_int LAPACKE_zhpgst (int matrix_layout, lapack_int itype, char uplo, lapack_int
n, lapack_complex_double* ap, const lapack_complex_double* bp);

Include Files
• mkl.h

Description

The routine reduces generalized eigenproblems with Hermitian matrices

A*z = λ*B*z, A*B*z = λ*z, or B*A*z = λ*z.
to standard eigenproblems C*y = λ*y, using packed matrix storage. Here A is a complex Hermitian matrix,
and B is a complex Hermitian positive-definite matrix. Before calling this routine, you must call ?pptrf to
compute the Cholesky factorization: B = UH*U or B = L*LH.

905
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

itype Must be 1 or 2 or 3.
If itype = 1, the generalized eigenproblem is A*z = lambda*B*z

for uplo = 'U': C = inv(UH)Ainv(U), z = inv(U)*y;

for uplo = 'L': C = inv(L)Ainv(LH), z = inv(LH)*y.

If itype = 2, the generalized eigenproblem is ABz = lambda*z

for uplo = 'U': C = UAUH, z = inv(U)*y;

for uplo = 'L': C = LHAL, z = inv(LH)*y.

If itype = 3, the generalized eigenproblem is BAz = lambda*z

for uplo = 'U': C = UAUH, z = UH*y;

for uplo = 'L': C = LHAL, z = L*y.

uplo Must be 'U' or 'L'.

If uplo = 'U', ap stores the packed upper triangle of A; you must supply
B in the factored form B = UH*U.

If uplo = 'L', ap stores the packed lower triangle of A; you must supply B
in the factored form B = L*LH.

n The order of the matrices A and B (n≥ 0).

ap, bp Arrays:
ap contains the packed upper or lower triangle of A.
The dimension of a must be at least max(1, n*(n+1)/2).
bp contains the packed Cholesky factor of B (as returned by ?pptrf with
the same uplo value).
The dimension of b must be at least max(1, n*(n+1)/2).

Output Parameters

ap The upper or lower triangle of A is overwritten by the upper or lower

triangle of C, as specified by the arguments itype and uplo.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

906
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The approximate number of floating-point operations is n3.

?sbgst
Reduces a real symmetric-definite generalized
eigenproblem for banded matrices to the standard
form using the factorization performed by ?pbstf.

Syntax
lapack_int LAPACKE_ssbgst (int matrix_layout, char vect, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, float* ab, lapack_int ldab, const float* bb, lapack_int
ldbb, float* x, lapack_int ldx);
lapack_int LAPACKE_dsbgst (int matrix_layout, char vect, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, double* ab, lapack_int ldab, const double* bb, lapack_int
ldbb, double* x, lapack_int ldx);

Include Files
• mkl.h

Description

To reduce the real symmetric-definite generalized eigenproblem A*z = λ*B*z to the standard form C*y=λ*y,
where A, B and C are banded, this routine must be preceded by a call to pbstf, which computes the split
Cholesky factorization of the positive-definite matrix B: B=ST*S. The split Cholesky factorization, compared
with the ordinary Cholesky factorization, allows the work to be approximately halved.
This routine overwrites A with C = XT*A*X, where X = inv(S)*Q and Q is an orthogonal matrix chosen
(implicitly) to preserve the bandwidth of A. The routine also has an option to allow the accumulation of X,
and then, if z is an eigenvector of C, X*z is an eigenvector of the original system.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

vect Must be 'N' or 'V'.

If vect = 'N', then matrix X is not returned;

If vect = 'V', then matrix X is returned.

uplo Must be 'U' or 'L'.

If uplo = 'U', ab stores the upper triangular part of A.

If uplo = 'L', ab stores the lower triangular part of A.

n The order of the matrices A and B (n≥ 0).

ka The number of super- or sub-diagonals in A

(ka≥ 0).

kb The number of super- or sub-diagonals in B

(ka≥kb≥ 0).

907
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ab, bb ab(size at least max(1, ldab*n) for column major layout and at least
max(1, ldab*(ka + 1)) for row major layout) is an array containing either
upper or lower triangular part of the symmetric matrix A (as specified by
uplo) in band storage format.
bb(size at least max(1, ldbb*n) for column major layout and at least
max(1, ldbb*(kb + 1)) for row major layout) is an array containing the
banded split Cholesky factor of B as specified by uplo, n and kb and
returned by pbstf/pbstf.

ldab The leading dimension of the array ab; must be at least ka+1 for column
major layout and max(1, n) for row major layout.

ldbb The leading dimension of the array bb; must be at least kb+1 for column
major layout and max(1, n) for row major layout.

ldx The leading dimension of the output array x. Constraints: if vect = 'N',
then ldx≥ 1;

if vect = 'V', then ldx≥ max(1, n).

Output Parameters

ab On exit, this array is overwritten by the upper or lower triangle of C as

specified by uplo.

x Array.
If vect = 'V', then x (size at least max(1, ldx*n)) contains the n-by-n
matrix X = inv(S)*Q.

If vect = 'N', then x is not referenced.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
Forming the reduced matrix C involves implicit multiplication by inv(B). When the routine is used as a step
in the computation of eigenvalues and eigenvectors of the original problem, there may be a significant loss of
accuracy if B is ill-conditioned with respect to inversion.
If ka and kb are much less than n then the total number of floating-point operations is approximately
6n2*kb, when vect = 'N'. Additional (3/2)n3*(kb/ka) operations are required when vect = 'V'.

?hbgst
Reduces a complex Hermitian positive-definite
generalized eigenproblem for banded matrices to the
standard form using the factorization performed
by ?pbstf.

908
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_chbgst (int matrix_layout, char vect, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, lapack_complex_float* ab, lapack_int ldab, const
lapack_complex_float* bb, lapack_int ldbb, lapack_complex_float* x, lapack_int ldx);
lapack_int LAPACKE_zhbgst (int matrix_layout, char vect, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, lapack_complex_double* ab, lapack_int ldab, const
lapack_complex_double* bb, lapack_int ldbb, lapack_complex_double* x, lapack_int ldx);

Include Files
• mkl.h

Description

To reduce the complex Hermitian positive-definite generalized eigenproblem A*z = λ*B*z to the standard
form C*x = λ*y, where A, B and C are banded, this routine must be preceded by a call to pbstf/pbstf, which
computes the split Cholesky factorization of the positive-definite matrix B: B = SH*S. The split Cholesky
factorization, compared with the ordinary Cholesky factorization, allows the work to be approximately halved.
This routine overwrites A with C = XH*A*X, where X = inv(S)*Q, and Q is a unitary matrix chosen
(implicitly) to preserve the bandwidth of A. The routine also has an option to allow the accumulation of X,
and then, if z is an eigenvector of C, X*z is an eigenvector of the original system.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

vect Must be 'N' or 'V'.

If vect = 'N', then matrix X is not returned;

If vect = 'V', then matrix X is returned.

uplo Must be 'U' or 'L'.

If uplo = 'U', ab stores the upper triangular part of A.

If uplo = 'L', ab stores the lower triangular part of A.

n The order of the matrices A and B (n≥ 0).

ka The number of super- or sub-diagonals in A

(ka≥ 0).

kb The number of super- or sub-diagonals in B

(ka≥kb≥ 0).

909
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

bb(size at least max(1, ldbb*n) for column major layout and at least
max(1, ldbb*(kb + 1)) for row major layout) is an array containing the
banded split Cholesky factor of B as specified by uplo, n and kb and
returned by pbstf/pbstf.

ldab The leading dimension of the array ab; must be at least ka+1 for column
major layout and max(1, n) for row major layout.

ldbb The leading dimension of the array bb; must be at least kb+1 for column
major layout and max(1, n) for row major layout.

ldx The leading dimension of the output array x. Constraints:

if vect = 'N', then ldx≥ 1;

if vect = 'V', then ldx≥ max(1, n).

Output Parameters

ab On exit, this array is overwritten by the upper or lower triangle of C as

specified by uplo.

x Array.
If vect = 'V', then x (size at least max(1, ldx*n)) contains the n-by-n
matrix X = inv(S)*Q.

If vect = 'N', then x is not referenced.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
Forming the reduced matrix C involves implicit multiplication by inv(B). When the routine is used as a step
in the computation of eigenvalues and eigenvectors of the original problem, there may be a significant loss of
accuracy if B is ill-conditioned with respect to inversion. The total number of floating-point operations is
approximately 20n2*kb, when vect = 'N'. Additional 5n3*(kb/ka) operations are required when vect =
'V'. All these estimates assume that both ka and kb are much less than n.

?pbstf
Computes a split Cholesky factorization of a real
symmetric or complex Hermitian positive-definite
banded matrix used in ?sbgst/?hbgst .

Syntax
lapack_int LAPACKE_spbstf (int matrix_layout, char uplo, lapack_int n, lapack_int kb,
float* bb, lapack_int ldbb);
lapack_int LAPACKE_dpbstf (int matrix_layout, char uplo, lapack_int n, lapack_int kb,
double* bb, lapack_int ldbb);
lapack_int LAPACKE_cpbstf (int matrix_layout, char uplo, lapack_int n, lapack_int kb,
lapack_complex_float* bb, lapack_int ldbb);

910
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_zpbstf (int matrix_layout, char uplo, lapack_int n, lapack_int kb,
lapack_complex_double* bb, lapack_int ldbb);

Include Files
• mkl.h

Description

The routine computes a split Cholesky factorization of a real symmetric or complex Hermitian positive-
definite band matrix B. It is to be used in conjunction with sbgst/hbgst.
The factorization has the form B = ST*S (or B = SH*S for complex flavors), where S is a band matrix of the
same bandwidth as B and the following structure: S is upper triangular in the first (n+kb)/2 rows and lower
triangular in the remaining rows.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo Must be 'U' or 'L'.

If uplo = 'U', bb stores the upper triangular part of B.

If uplo = 'L', bb stores the lower triangular part of B.

n The order of the matrix B (n≥ 0).

kb The number of super- or sub-diagonals in B

(kb≥ 0).

bb bb(size at least max(1, ldbb*n) for column major layout and at least
max(1, ldbb*(kb + 1)) for row major layout) is an array containing either
upper or lower triangular part of the matrix B (as specified by uplo) in band
storage format.

ldbb The leading dimension of bb; must be at least kb+1for column major and at
least max(1, n) for row major.

Output Parameters

bb On exit, this array is overwritten by the elements of the split Cholesky

factor S.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = i, then the factorization could not be completed, because the updated element bii would be the
square root of a negative number; hence the matrix B is not positive-definite.
If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed factor S is the exact factor of a perturbed matrix B + E, where

911
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

c(n) is a modest linear function of n, and ε is the machine precision.

The total number of floating-point operations for real flavors is approximately n(kb+1)2. The number of
operations for complex flavors is 4 times greater. All these estimates assume that kb is much less than n.
After calling this routine, you can call sbgst/hbgst to solve the generalized eigenproblem Az = λBz, where A
and B are banded and B is positive-definite.

Nonsymmetric Eigenvalue Problems: LAPACK Computational Routines

This topic describes LAPACK routines for solving nonsymmetric eigenvalue problems, computing the Schur
factorization of general matrices, as well as performing a number of related computational tasks.
A nonsymmetric eigenvalue problem is as follows: given a nonsymmetric (or non-Hermitian) matrix A, find
the eigenvaluesλ and the corresponding eigenvectorsz that satisfy the equation
Az = λz (right eigenvectors z)
or the equation
zHA = λzH (left eigenvectors z).
Nonsymmetric eigenvalue problems have the following properties:
• The number of eigenvectors may be less than the matrix order (but is not less than the number of
distinct eigenvalues of A).
• Eigenvalues may be complex even for a real matrix A.
• If a real nonsymmetric matrix has a complex eigenvalue a+bi corresponding to an eigenvector z, then a-
bi is also an eigenvalue. The eigenvalue a-bi corresponds to the eigenvector whose elements are
complex conjugate to the elements of z.
To solve a nonsymmetric eigenvalue problem with LAPACK, you usually need to reduce the matrix to the
upper Hessenberg form and then solve the eigenvalue problem with the Hessenberg matrix obtained. Table
"Computational Routines for Solving Nonsymmetric Eigenvalue Problems" lists LAPACK routines to reduce the
matrix to the upper Hessenberg form by an orthogonal (or unitary) similarity transformation A = QHQH as
well as routines to solve eigenvalue problems with Hessenberg matrices, forming the Schur factorization of
such matrices and computing the corresponding condition numbers.

Computational Routines for Solving Nonsymmetric Eigenvalue Problems

Operation performed Routines for real matrices Routines for complex matrices

Reduce to Hessenberg form ?gehrd, ?gehrd

A = QHQH

Generate the matrix Q ?orghr ?unghr

Apply the matrix Q ?ormhr ?unmhr

Balance matrix ?gebal ?gebal

Transform eigenvectors of ?gebak ?gebak

balanced matrix to those of
the original matrix

Find eigenvalues and Schur ?hseqr ?hseqr

factorization (QR algorithm)

Find eigenvectors from ?hsein ?hsein

Hessenberg form (inverse
iteration)

912
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Operation performed Routines for real matrices Routines for complex matrices

Find eigenvectors from ?trevc ?trevc

Schur factorization

Estimate sensitivities of ?trsna ?trsna

eigenvalues and
eigenvectors

Reorder Schur factorization ?trexc ?trexc

Reorder Schur factorization, ?trsen ?trsen

find the invariant subspace
and estimate sensitivities

Solves Sylvester's equation. ?trsyl ?trsyl

?gehrd
Reduces a general matrix to upper Hessenberg form.

Syntax
lapack_int LAPACKE_sgehrd (int matrix_layout, lapack_int n, lapack_int ilo, lapack_int
ihi, float* a, lapack_int lda, float* tau);
lapack_int LAPACKE_dgehrd (int matrix_layout, lapack_int n, lapack_int ilo, lapack_int
ihi, double* a, lapack_int lda, double* tau);
lapack_int LAPACKE_cgehrd (int matrix_layout, lapack_int n, lapack_int ilo, lapack_int
ihi, lapack_complex_float* a, lapack_int lda, lapack_complex_float* tau);
lapack_int LAPACKE_zgehrd (int matrix_layout, lapack_int n, lapack_int ilo, lapack_int
ihi, lapack_complex_double* a, lapack_int lda, lapack_complex_double* tau);

Include Files
• mkl.h

Description

The routine reduces a general matrix A to upper Hessenberg form H by an orthogonal or unitary similarity
transformation A = Q*H*QH. Here H has real subdiagonal elements.

The routine does not form the matrix Q explicitly. Instead, Q is represented as a product of elementary
reflectors. Routines are provided to work with Q in this representation.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

n The order of the matrix A (n≥ 0).

ilo, ihi If A is an output by ?gebal, then ilo and ihi must contain the values
returned by that routine. Otherwise ilo = 1 and ihi = n. (If n > 0, then
1 ≤ilo≤ihi≤n; if n = 0, ilo = 1 and ihi = 0.)

a Arrays:

913
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

a (size max(1, lda*n)) contains the matrix A.

lda The leading dimension of a; at least max(1, n).

Output Parameters

a The elements on and above the subdiagonal contain the upper Hessenberg
matrix H. The subdiagonal elements of H are real. The elements below the
subdiagonal, with the array tau, represent the orthogonal matrix Q as a
product of n elementary reflectors.

tau Array, size at least max (1, n-1).

Contains scalars that define elementary reflectors for the matrix Q.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed Hessenberg matrix H is exactly similar to a nearby matrix A + E, where ||E||2 < c(n)ε||
A||2, c(n) is a modestly increasing function of n, and ε is the machine precision.
The approximate number of floating-point operations for real flavors is (2/3)*(ihi - ilo)2(2ihi + 2ilo
+ 3n); for complex flavors it is 4 times greater.

?orghr
Generates the real orthogonal matrix Q determined
by ?gehrd.

Syntax
lapack_int LAPACKE_sorghr (int matrix_layout, lapack_int n, lapack_int ilo, lapack_int
ihi, float* a, lapack_int lda, const float* tau);
lapack_int LAPACKE_dorghr (int matrix_layout, lapack_int n, lapack_int ilo, lapack_int
ihi, double* a, lapack_int lda, const double* tau);

Include Files
• mkl.h

Description

The routine explicitly generates the orthogonal matrix Q that has been determined by a preceding call to
sgehrd/dgehrd. (The routine ?gehrd reduces a real general matrix A to upper Hessenberg form H by an
orthogonal similarity transformation, A = Q*H*QT, and represents the matrix Q as a product of ihi-
iloelementary reflectors. Here ilo and ihi are values determined by sgebal/dgebal when balancing the
matrix; if the matrix has not been balanced, ilo = 1 and ihi = n.)

The matrix Q generated by ?orghr has the structure:

914
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

where Q22 occupies rows and columns ilo to ihi.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

n The order of the matrix Q (n≥ 0).

ilo, ihi These must be the same parameters ilo and ihi, respectively, as supplied
to ?gehrd. (If n > 0, then 1 ≤ilo≤ihi≤n; if n = 0, ilo = 1 and ihi =
0.)

a, tau Arrays: a (size max(1, lda*n)) contains details of the vectors which define
the elementary reflectors, as returned by ?gehrd.

tau contains further details of the elementary reflectors, as returned

by ?gehrd.

The dimension of tau must be at least max (1, n-1).

lda The leading dimension of a; at least max(1, n).

Output Parameters

a Overwritten by the n-by-n orthogonal matrix Q.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

915
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Application Notes
The computed matrix Q differs from the exact result by a matrix E such that ||E||2 = O(ε), where ε is the
machine precision.
The approximate number of floating-point operations is (4/3)(ihi-ilo)3.

The complex counterpart of this routine is unghr.

?ormhr
Multiplies an arbitrary real matrix C by the real
orthogonal matrix Q determined by ?gehrd.

Syntax
lapack_int LAPACKE_sormhr (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int ilo, lapack_int ihi, const float* a, lapack_int lda, const
float* tau, float* c, lapack_int ldc);
lapack_int LAPACKE_dormhr (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int ilo, lapack_int ihi, const double* a, lapack_int lda, const
double* tau, double* c, lapack_int ldc);

Include Files
• mkl.h

Description

The routine multiplies a matrix C by the orthogonal matrix Q that has been determined by a preceding call to
sgehrd/dgehrd. (The routine ?gehrd reduces a real general matrix A to upper Hessenberg form H by an
orthogonal similarity transformation, A = Q*H*QT, and represents the matrix Q as a product of ihi-
iloelementary reflectors. Here ilo and ihi are values determined by sgebal/dgebal when balancing the
matrix;if the matrix has not been balanced, ilo = 1 and ihi = n.)

With ?ormhr, you can form one of the matrix products Q*C, QT*C, C*Q, or C*QT, overwriting the result on C
(which may be any real rectangular matrix).
A common application of ?ormhr is to transform a matrix V of eigenvectors of H to the matrix QV of
eigenvectors of A.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

side Must be 'L' or 'R'.

If side= 'L', then the routine forms QC or QTC.

If side= 'R', then the routine forms CQ or CQT.

trans Must be 'N' or 'T'.

If trans= 'N', then Q is applied to C.

If trans= 'T', then QT is applied to C.

m The number of rows in C (m≥ 0).

916
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
n The number of columns in C (n≥ 0).

ilo, ihi These must be the same parameters ilo and ihi, respectively, as supplied
to ?gehrd.

If m > 0 and side = 'L', then 1 ≤ilo≤ihi≤m.

If m = 0 and side = 'L', then ilo = 1 and ihi = 0.

If n > 0 and side = 'R', then 1 ≤ilo≤ihi≤n.

If n = 0 and side = 'R', then ilo = 1 and ihi = 0.

a, tau, c Arrays:
a(size max(1,lda*n) for side='R' and size max(1,lda*m) for side='L')
contains details of the vectors which define the elementary reflectors, as
returned by ?gehrd.

tau contains further details of the elementary reflectors, as returned

by ?gehrd .

The dimension of tau must be at least max (1, m-1) if side = 'L' and at
least max (1, n-1) if side = 'R'.

c(size max(1, ldc*n) for column major layout and max(1, ldc*m for row
major layout) contains the m by n matrix C.

lda The leading dimension of a; at least max(1, m) if side = 'L' and at least
max (1, n) if side = 'R'.

ldc The leading dimension of c; at least max(1, m) for column major layout and
at least max(1, n) for row major layout .

Output Parameters

c C is overwritten by product QC, QTC, CQ, or CQT as specified by side and

trans.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed matrix Q differs from the exact result by a matrix E such that ||E||2 = O(ε)|*|C||2, where
ε is the machine precision.
The approximate number of floating-point operations is
2n(ihi-ilo)2 if side = 'L';
2m(ihi-ilo)2 if side = 'R'.
The complex counterpart of this routine is unmhr.

917
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

?unghr
Generates the complex unitary matrix Q determined
by ?gehrd.

Syntax
lapack_int LAPACKE_cunghr (int matrix_layout, lapack_int n, lapack_int ilo, lapack_int
ihi, lapack_complex_float* a, lapack_int lda, const lapack_complex_float* tau);
lapack_int LAPACKE_zunghr (int matrix_layout, lapack_int n, lapack_int ilo, lapack_int
ihi, lapack_complex_double* a, lapack_int lda, const lapack_complex_double* tau);

Include Files
• mkl.h

Description

The routine is intended to be used following a call to cgehrd/zgehrd, which reduces a complex matrix A to
upper Hessenberg form H by a unitary similarity transformation: A = Q*H*QH. ?gehrd represents the matrix
Q as a product of ihi-iloelementary reflectors. Here ilo and ihi are values determined by cgebal/zgebal
when balancing the matrix; if the matrix has not been balanced, ilo = 1 and ihi = n.

Use the routine unghr to generate Q explicitly as a square matrix. The matrix Q has the structure:

where Q22 occupies rows and columns ilo to ihi.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

n The order of the matrix Q (n≥ 0).

918
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ilo, ihi These must be the same parameters ilo and ihi, respectively, as supplied
to ?gehrd . (If n > 0, then 1 ≤ilo≤ihi≤n. If n = 0, then ilo = 1 and
ihi = 0.)

a, tau Arrays:
a (size max(1, lda*n)) contains details of the vectors which define the
elementary reflectors, as returned by ?gehrd.
tau contains further details of the elementary reflectors, as returned
by ?gehrd .

The dimension of tau must be at least max (1, n-1).

lda The leading dimension of a; at least max(1, n).

Output Parameters

a Overwritten by the n-by-n unitary matrix Q.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed matrix Q differs from the exact result by a matrix E such that ||E||2 = O(ε), where ε is the
machine precision.
The approximate number of real floating-point operations is (16/3)(ihi-ilo)3.

The real counterpart of this routine is orghr.

?unmhr
Multiplies an arbitrary complex matrix C by the
complex unitary matrix Q determined by ?gehrd.

Syntax
lapack_int LAPACKE_cunmhr (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int ilo, lapack_int ihi, const lapack_complex_float* a, lapack_int
lda, const lapack_complex_float* tau, lapack_complex_float* c, lapack_int ldc);
lapack_int LAPACKE_zunmhr (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int ilo, lapack_int ihi, const lapack_complex_double* a,
lapack_int lda, const lapack_complex_double* tau, lapack_complex_double* c, lapack_int
ldc);

Include Files
• mkl.h

Description

919
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The routine multiplies a matrix C by the unitary matrix Q that has been determined by a preceding call to
cgehrd/zgehrd. (The routine ?gehrd reduces a real general matrix A to upper Hessenberg form H by an
orthogonal similarity transformation, A = Q*H*QH, and represents the matrix Q as a product of ihi-ilo
elementary reflectors. Here ilo and ihi are values determined by cgebal/zgebal when balancing the
matrix; if the matrix has not been balanced, ilo = 1 and ihi = n.)

With ?unmhr, you can form one of the matrix products Q*C, QH*C, C*Q, or C*QH, overwriting the result on C
(which may be any complex rectangular matrix). A common application of this routine is to transform a
matrix V of eigenvectors of H to the matrix QV of eigenvectors of A.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

side Must be 'L' or 'R'.

If side = 'L', then the routine forms QC or QHC.

If side = 'R', then the routine forms CQ or CQH.

trans Must be 'N' or 'C'.

If trans = 'N', then Q is applied to C.

If trans = 'T', then QH is applied to C.

m The number of rows in C (m≥ 0).

n The number of columns in C (n≥ 0).

ilo, ihi These must be the same parameters ilo and ihi, respectively, as supplied
to ?gehrd .

If m > 0 and side = 'L', then 1 ≤ilo≤ihi≤m.

If m = 0 and side = 'L', then ilo = 1 and ihi = 0.

If n > 0 and side = 'R', then 1 ≤ilo≤ihi≤n.

If n = 0 and side = 'R', then ilo =1 and ihi = 0.

a, tau, c Arrays:
a(size max(1,lda*n) for side='R' and size max(1,lda*m) for side='L')
contains details of the vectors which define the elementary reflectors, as
returned by ?gehrd.

tau contains further details of the elementary reflectors, as returned

by ?gehrd.

The dimension of tau must be at least max (1, m-1)

if side = 'L' and at least max (1, n-1) if side = 'R'.

c(size max(1, ldc*n) for column major layout and max(1, ldc*m for row
major layout) contains the m-by-n matrix C.

lda The leading dimension of a; at least max(1, m) if side = 'L' and at least
max (1, n) if side = 'R'.

ldc The leading dimension of c; at least max(1, m) for column major layout and
at least max(1, n) for row major layout.

920
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

c C is overwritten by QC, or QHC, or CQH, or CQ as specified by side and

trans.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed matrix Q differs from the exact result by a matrix E such that ||E||2 = O(ε)*||C||2, where
ε is the machine precision.
The approximate number of floating-point operations is
8n(ihi-ilo)2 if side = 'L';
8m(ihi-ilo)2 if side = 'R'.
The real counterpart of this routine is ormhr.

?gebal
Balances a general matrix to improve the accuracy of
computed eigenvalues and eigenvectors.

Syntax
lapack_int LAPACKE_sgebal( int matrix_layout, char job, lapack_int n, float* a,
lapack_int lda, lapack_int* ilo, lapack_int* ihi, float* scale );
lapack_int LAPACKE_dgebal( int matrix_layout, char job, lapack_int n, double* a,
lapack_int lda, lapack_int* ilo, lapack_int* ihi, double* scale );
lapack_int LAPACKE_cgebal( int matrix_layout, char job, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_int* ilo, lapack_int* ihi, float*
scale );
lapack_int LAPACKE_zgebal( int matrix_layout, char job, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_int* ilo, lapack_int* ihi, double*
scale );

Include Files
• mkl.h

Description

The routine balances a matrix A by performing either or both of the following two similarity transformations:
(1) The routine first attempts to permute A to block upper triangular form:

921
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

where P is a permutation matrix, and A'11 and A'33 are upper triangular. The diagonal elements of A'11 and
A'33 are eigenvalues of A. The rest of the eigenvalues of A are the eigenvalues of the central diagonal block
A'22, in rows and columns ilo to ihi. Subsequent operations to compute the eigenvalues of A (or its Schur
factorization) need only be applied to these rows and columns; this can save a significant amount of work if
ilo > 1 and ihi < n.
If no suitable permutation exists (as is often the case), the routine sets ilo = 1 and ihi = n, and A'22 is
the whole of A.
(2) The routine applies a diagonal similarity transformation to A', to make the rows and columns of A'22 as
close in norm as possible:

This scaling can reduce the norm of the matrix (that is, ||A''22|| < ||A'22||), and hence reduce the
effect of rounding errors on the accuracy of computed eigenvalues and eigenvectors.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

job Must be 'N' or 'P' or 'S' or 'B'.

If job = 'N', then A is neither permuted nor scaled (but ilo, ihi, and scale
get their values).
If job = 'P', then A is permuted but not scaled.

If job = 'S', then A is scaled but not permuted.

If job = 'B', then A is both scaled and permuted.

n The order of the matrix A (n≥ 0).

a Array a (size max(1, lda*n)) contains the matrix A.

lda The leading dimension of a; at least max(1, n).

Output Parameters

a Overwritten by the balanced matrix (a is not referenced if job = 'N').

922
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ilo, ihi The values ilo and ihi such that on exit a(i,j) is zero if i > j and 1 ≤j <
ilo or ihi < j≤n.
If job = 'N' or 'S', then ilo = 1 and ihi = n.

scale Array, size at least max(1, n).

Contains details of the permutations and scaling factors.
More precisely, if pj is the index of the row and column interchanged with
row and column j, and dj is the scaling factor used to balance row and
column j, then
scale[j - 1] = pj for j = 1, 2,..., ilo-1, ihi+1,..., n;
scale[j - 1] = dj for j = ilo, ilo + 1,..., ihi.
The order in which the interchanges are made is n to ihi+1, then 1 to ilo-1.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The errors are negligible, compared with those in subsequent computations.
If the matrix A is balanced by this routine, then any eigenvectors computed subsequently are eigenvectors of
the matrix A'' and hence you must call gebak to transform them back to eigenvectors of A.
If the Schur vectors of A are required, do not call this routine with job = 'S' or 'B', because then the
balancing transformation is not orthogonal (not unitary for complex flavors).
If you call this routine with job = 'P', then any Schur vectors computed subsequently are Schur vectors of
the matrix A'', and you need to call gebak (with side = 'R') to transform them back to Schur vectors of A.

The total number of floating-point operations is proportional to n2.

?gebak
Transforms eigenvectors of a balanced matrix to those
of the original nonsymmetric matrix.

Syntax
lapack_int LAPACKE_sgebak( int matrix_layout, char job, char side, lapack_int n,
lapack_int ilo, lapack_int ihi, const float* scale, lapack_int m, float* v, lapack_int
ldv );
lapack_int LAPACKE_dgebak( int matrix_layout, char job, char side, lapack_int n,
lapack_int ilo, lapack_int ihi, const double* scale, lapack_int m, double* v,
lapack_int ldv );
lapack_int LAPACKE_cgebak( int matrix_layout, char job, char side, lapack_int n,
lapack_int ilo, lapack_int ihi, const float* scale, lapack_int m, lapack_complex_float*
v, lapack_int ldv );
lapack_int LAPACKE_zgebak( int matrix_layout, char job, char side, lapack_int n,
lapack_int ilo, lapack_int ihi, const double* scale, lapack_int m,
lapack_complex_double* v, lapack_int ldv );

923
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description

The routine is intended to be used after a matrix A has been balanced by a call to ?gebal, and eigenvectors
of the balanced matrix A''22 have subsequently been computed. For a description of balancing, see gebal. The
balanced matrix A'' is obtained as A''= D*P*A*PT*inv(D), where P is a permutation matrix and D is a
diagonal scaling matrix. This routine transforms the eigenvectors as follows:
if x is a right eigenvector of A'', then PT*inv(D)*x is a right eigenvector of A; if y is a left eigenvector of A'',
then PT*D*y is a left eigenvector of A.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

job Must be 'N' or 'P' or 'S' or 'B'. The same parameter job as supplied
to ?gebal.

side Must be 'L' or 'R'.

If side = 'L', then left eigenvectors are transformed.

If side = 'R', then right eigenvectors are transformed.

n The number of rows of the matrix of eigenvectors (n≥ 0).

ilo, ihi The values ilo and ihi, as returned by ?gebal. (If n > 0, then 1
≤ilo≤ihi≤n;
if n = 0, then ilo = 1 and ihi = 0.)

scale Array, size at least max(1, n).

Contains details of the permutations and/or the scaling factors used to
balance the original general matrix, as returned by ?gebal.

m The number of columns of the matrix of eigenvectors (m≥ 0).

v Arrays:
v(size max(1, ldv*n) for column major layout and max(1, ldv*m) for row
major layout) contains the matrix of left or right eigenvectors to be
transformed.

ldv The leading dimension of v; at least max(1, n) for column major layout and
at least max(1, m) for row major layout .

Output Parameters

v Overwritten by the transformed eigenvectors.

Return Values
This function returns a value info.

If info=0, the execution is successful.

924
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = -i, the i-th parameter had an illegal value.

Application Notes
The errors in this routine are negligible.
The approximate number of floating-point operations is approximately proportional to m*n.

?hseqr
Computes all eigenvalues and (optionally) the Schur
factorization of a matrix reduced to Hessenberg form.

Syntax
lapack_int LAPACKE_shseqr( int matrix_layout, char job, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, float* h, lapack_int ldh, float* wr, float* wi, float*
z, lapack_int ldz );
lapack_int LAPACKE_dhseqr( int matrix_layout, char job, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, double* h, lapack_int ldh, double* wr, double* wi,
double* z, lapack_int ldz );
lapack_int LAPACKE_chseqr( int matrix_layout, char job, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, lapack_complex_float* h, lapack_int ldh,
lapack_complex_float* w, lapack_complex_float* z, lapack_int ldz );
lapack_int LAPACKE_zhseqr( int matrix_layout, char job, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, lapack_complex_double* h, lapack_int ldh,
lapack_complex_double* w, lapack_complex_double* z, lapack_int ldz );

Include Files
• mkl.h

Description
The routine computes all the eigenvalues, and optionally the Schur factorization, of an upper Hessenberg
matrix H: H = Z*T*ZH, where T is an upper triangular (or, for real flavors, quasi-triangular) matrix (the
Schur form of H), and Z is the unitary or orthogonal matrix whose columns are the Schur vectors zi.
You can also use this routine to compute the Schur factorization of a general matrix A which has been
reduced to upper Hessenberg form H:
A = Q*H*QH, where Q is unitary (orthogonal for real flavors);
A = (QZ)*T*(QZ)H.
In this case, after reducing A to Hessenberg form by gehrd, call orghr to form Q explicitly and then pass Q
to ?hseqr with compz = 'V'.

You can also call gebal to balance the original matrix before reducing it to Hessenberg form by ?hseqr, so
that the Hessenberg matrix H will have the structure:

925
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

where H11 and H33 are upper triangular.

If so, only the central diagonal block H22 (in rows and columns ilo to ihi) needs to be further reduced to
Schur form (the blocks H12 and H23 are also affected). Therefore the values of ilo and ihi can be supplied
to ?hseqr directly. Also, after calling this routine you must call gebak to permute the Schur vectors of the
balanced matrix to those of the original matrix.
If ?gebal has not been called, however, then ilo must be set to 1 and ihi to n. Note that if the Schur
factorization of A is required, ?gebal must not be called with job = 'S' or 'B', because the balancing
transformation is not unitary (for real flavors, it is not orthogonal).
?hseqr uses a multishift form of the upper Hessenberg QR algorithm. The Schur vectors are normalized so
that ||zi||2 = 1, but are determined only to within a complex factor of absolute value 1 (for the real
flavors, to within a factor ±1).

Input Parameters

job Must be 'E' or 'S'.

If job = 'E', then eigenvalues only are required.

If job = 'S', then the Schur form T is required.

compz Must be 'N' or 'I' or 'V'.

If compz = 'N', then no Schur vectors are computed (and the array z is not
referenced).
If compz = 'I', then the Schur vectors of H are computed (and the array z
is initialized by the routine).
If compz = 'V', then the Schur vectors of A are computed (and the array z
must contain the matrix Q on entry).

n The order of the matrix H (n≥ 0).

926
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ilo, ihi If A has been balanced by ?gebal, then ilo and ihi must contain the values
returned by ?gebal. Otherwise, ilo must be set to 1 and ihi to n.

h, z Arrays:
h (size max(1, ldh*n)) ) The n-by-n upper Hessenberg matrix H.

z (size max(1, ldz*n))

If compz = 'V', then z must contain the matrix Q from the reduction to
Hessenberg form.
If compz = 'I', then z need not be set.

If compz = 'N', then z is not referenced.

ldh The leading dimension of h; at least max(1, n).

ldz The leading dimension of z;

If compz = 'N', then ldz≥ 1.

If compz = 'V' or 'I', then ldz≥ max(1, n).

Output Parameters

w Array, size at least max (1, n). Contains the computed eigenvalues, unless
info>0. The eigenvalues are stored in the same order as on the diagonal of
the Schur form T (if computed).

wr, wi Arrays, size at least max (1, n) each.

Contain the real and imaginary parts, respectively, of the computed
eigenvalues, unless info > 0. Complex conjugate pairs of eigenvalues
appear consecutively with the eigenvalue having positive imaginary part
first. The eigenvalues are stored in the same order as on the diagonal of the
Schur form T (if computed).

h If info = 0 and job = 'S', h contains the upper quasi-triangular matrix T

from the Schur decomposition (the Schur form).
If info = 0 and job = 'E', the contents of h are unspecified on exit. (The
output value of h when info > 0 is given under the description of info
below.)

z If compz = 'V' and info = 0, then z contains Q*Z.

If compz = 'I' and info = 0, then z contains the unitary or orthogonal

matrix Z of the Schur vectors of H.
If compz = 'N', then z is not referenced.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, ?hseqr failed to compute all of the eigenvalues. Elements 1,2, ..., ilo-1 and i+1, i+2, ..., n of
the eigenvalue arrays (wr and wi for real flavors and w for complex flavors) contain the real and imaginary
parts of those eigenvalues that have been successfully found.

927
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If info > 0, and job = 'E', then on exit, the remaining unconverged eigenvalues are the eigenvalues of
the upper Hessenberg matrix rows and columns ilo through info of the final output value of H.
If info > 0, and job = 'S', then on exit (initial value of H)*U = U*(final value of H), where U is a unitary
matrix. The final value of H is upper Hessenberg and triangular in rows and columns info+1 through ihi.

If info > 0, and compz = 'V', then on exit (final value of Z) = (initial value of Z)*U, where U is the
unitary matrix (regardless of the value of job).
If info > 0, and compz = 'I', then on exit (final value of Z) = U, where U is the unitary matrix (regardless
of the value of job).
If info > 0, and compz = 'N', then Z is not accessed.

Application Notes
The computed Schur factorization is the exact factorization of a nearby matrix H + E, where ||E||2 < O(ε)
||H||2/si, and ε is the machine precision.
If λi is an exact eigenvalue, and μi is the corresponding computed value, then |λi - μi|≤c(n)*ε*||H||2/si,
where c(n) is a modestly increasing function of n, and si is the reciprocal condition number of λi. The
condition numbers si may be computed by calling trsna.
The total number of floating-point operations depends on how rapidly the algorithm converges; typical
numbers are as follows.

If only eigenvalues are computed: 7n3 for real flavors

25n3 for complex flavors.

If the Schur form is computed: 10n3 for real flavors

35n3 for complex flavors.

If the full Schur factorization is 20n3 for real flavors

computed: 70n3 for complex flavors.

?hsein
Computes selected eigenvectors of an upper
Hessenberg matrix that correspond to specified
eigenvalues.

Syntax
lapack_int LAPACKE_shsein( int matrix_layout, char side, char eigsrc, char initv,
lapack_logical* select, lapack_int n, const float* h, lapack_int ldh, float* wr, const
float* wi, float* vl, lapack_int ldvl, float* vr, lapack_int ldvr, lapack_int mm,
lapack_int* m, lapack_int* ifaill, lapack_int* ifailr );
lapack_int LAPACKE_dhsein( int matrix_layout, char side, char eigsrc, char initv,
lapack_logical* select, lapack_int n, const double* h, lapack_int ldh, double* wr,
const double* wi, double* vl, lapack_int ldvl, double* vr, lapack_int ldvr, lapack_int
mm, lapack_int* m, lapack_int* ifaill, lapack_int* ifailr );
lapack_int LAPACKE_chsein( int matrix_layout, char side, char eigsrc, char initv, const
lapack_logical* select, lapack_int n, const lapack_complex_float* h, lapack_int ldh,
lapack_complex_float* w, lapack_complex_float* vl, lapack_int ldvl,
lapack_complex_float* vr, lapack_int ldvr, lapack_int mm, lapack_int* m, lapack_int*
ifaill, lapack_int* ifailr );

928
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_zhsein( int matrix_layout, char side, char eigsrc, char initv, const
lapack_logical* select, lapack_int n, const lapack_complex_double* h, lapack_int ldh,
lapack_complex_double* w, lapack_complex_double* vl, lapack_int ldvl,
lapack_complex_double* vr, lapack_int ldvr, lapack_int mm, lapack_int* m, lapack_int*
ifaill, lapack_int* ifailr );

Include Files
• mkl.h

Description

The routine computes left and/or right eigenvectors of an upper Hessenberg matrix H, corresponding to
selected eigenvalues.
The right eigenvector x and the left eigenvector y, corresponding to an eigenvalue λ, are defined by: H*x =
λ*x and yH*H = λ*yH (or HH*y = λ**y). Here λ* denotes the conjugate of λ.
The eigenvectors are computed by inverse iteration. They are scaled so that, for a real eigenvector x, max|
xi| = 1, and for a complex eigenvector, max(|Rexi| + |Imxi|) = 1.
If H has been formed by reduction of a general matrix A to upper Hessenberg form, then eigenvectors of H
may be transformed to eigenvectors of A by ormhr or unmhr.

Input Parameters

side Must be 'R' or 'L' or 'B'.

If side = 'R', then only right eigenvectors are computed.

If side = 'L', then only left eigenvectors are computed.

If side = 'B', then all eigenvectors are computed.

eigsrc Must be 'Q' or 'N'.

If eigsrc = 'Q', then the eigenvalues of H were found using hseqr; thus if
H has any zero sub-diagonal elements (and so is block triangular), then the
j-th eigenvalue can be assumed to be an eigenvalue of the block containing
the j-th row/column. This property allows the routine to perform inverse
iteration on just one diagonal block. If eigsrc = 'N', then no such
assumption is made and the routine performs inverse iteration using the
whole matrix.

initv Must be 'N' or 'U'.

If initv = 'N', then no initial estimates for the selected eigenvectors are
supplied.
If initv = 'U', then initial estimates for the selected eigenvectors are
supplied in vl and/or vr.

select Array, size at least max (1, n). Specifies which eigenvectors are to be
computed.
For real flavors:
To obtain the real eigenvector corresponding to the real eigenvalue wr[j],
set select[j] to 1

929
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

To select the complex eigenvector corresponding to the complex eigenvalue

(wr[j - 1], wi[j - 1]) with complex conjugate (wr[j], wi[j]), set select[j - 1]
and/or select[j] to 1; the eigenvector corresponding to the first eigenvalue
in the pair is computed.
For complex flavors:
To select the eigenvector corresponding to the eigenvalue w[j], set select[j]
to 1

n The order of the matrix H (n≥ 0).

h, vl, vr Arrays:
h (size max(1, ldh*n)) The n-by-n upper Hessenberg matrix H. If an NAN
value is detected in h, the routine returns with info = -6.

vl(size max(1, ldvl*mm) for column major layout and max(1, ldvl*n) for
row major layout)
If initv = 'V' and side = 'L' or 'B', then vl must contain starting
vectors for inverse iteration for the left eigenvectors. Each starting vector
must be stored in the same column or columns as will be used to store the
corresponding eigenvector.
If initv = 'N', then vl need not be set.

The array vl is not referenced if side = 'R'.

vr(size max(1, ldvr*mm) for column major layout and max(1, ldvr*n) for
row major layout)
If initv = 'V' and side = 'R' or 'B', then vr must contain starting
vectors for inverse iteration for the right eigenvectors. Each starting vector
must be stored in the same column or columns as will be used to store the
corresponding eigenvector.
If initv = 'N', then vr need not be set.

The array vr is not referenced if side = 'L'.

ldh The leading dimension of h; at least max(1, n).

w Array, size at least max (1, n).

Contains the eigenvalues of the matrix H.
If eigsrc = 'Q', the array must be exactly as returned by ?hseqr.

wr, wi Arrays, size at least max (1, n) each.

Contain the real and imaginary parts, respectively, of the eigenvalues of the
matrix H. Complex conjugate pairs of values must be stored in consecutive
elements of the arrays. If eigsrc = 'Q', the arrays must be exactly as
returned by ?hseqr.

ldvl The leading dimension of vl.

If side = 'L' or 'B', ldvl≥ max(1,n) for column major layout and ldvl≥
max(1, mm) for row major layout .

If side = 'R', ldvl≥ 1.

ldvr The leading dimension of vr.

930
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If side = 'R' or 'B', ldvr≥ max(1,n) for column major layout and ldvr≥
max(1, mm) for row major layout .

If side = 'L', ldvr≥1.

mm The number of columns in vl and/or vr.

Must be at least m, the actual number of columns required (see Output
Parameters below).
For real flavors, m is obtained by counting 1 for each selected real
eigenvector and 2 for each selected complex eigenvector (see select).
For complex flavors, m is the number of selected eigenvectors (see select).
Constraint:
0 ≤mm≤n.

Output Parameters

select Overwritten for real flavors only.

If a complex eigenvector was selected as specified above, then select[j - 1]
is set to 1 and select[j] to 0

w The real parts of some elements of w may be modified, as close eigenvalues

are perturbed slightly in searching for independent eigenvectors.

wr Some elements of wr may be modified, as close eigenvalues are perturbed

slightly in searching for independent eigenvectors.

vl, vr If side = 'L' or 'B', vl contains the computed left eigenvectors (as
specified by select).
If side = 'R' or 'B', vr contains the computed right eigenvectors (as
specified by select).
The eigenvectors treated column-wise form a rectangular n-by-mm matrix.

For real flavors: a real eigenvector corresponding to a real eigenvalue

occupies one column of the matrix; a complex eigenvector corresponding to
a complex eigenvalue occupies two columns: the first column holds the real
part of the eigenvector and the second column holds the imaginary part of
the eigenvector. The matrix is stored in a one-dimensional array as
described by matrix_layout (using either column major or row major
layout).

m For real flavors: the number of columns of vl and/or vr required to store the
selected eigenvectors.
For complex flavors: the number of selected eigenvectors.

ifaill, ifailr Arrays, size at least max(1, mm) each.

ifaill[i - 1] = 0 if the ith column of vl converged;
ifaill[i - 1] = j > 0 if the eigenvector stored in the i-th column of vl
(corresponding to the jth eigenvalue) failed to converge.
ifailr[i - 1] = 0 if the ith column of vr converged;

931
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ifailr[i - 1] = j > 0 if the eigenvector stored in the i-th column of vr

(corresponding to the jth eigenvalue) failed to converge.
For real flavors: if the ith and (i+1)th columns of vl contain a selected
complex eigenvector, then ifaill[i - 1] and ifaill[i] are set to the same value.
A similar rule holds for vr and ifailr.
The array ifaill is not referenced if side = 'R'. The array ifailr is not
referenced if side = 'L'.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info > 0, then i eigenvectors (as indicated by the parameters ifaill and/or ifailr above) failed to converge.
The corresponding columns of vl and/or vr contain no useful information.

Application Notes
Each computed right eigenvector x i is the exact eigenvector of a nearby matrix A + Ei, such that ||Ei|| <
O(ε)||A||. Hence the residual is small:
||Axi - λixi|| = O(ε)||A||.
However, eigenvectors corresponding to close or coincident eigenvalues may not accurately span the relevant
subspaces.
Similar remarks apply to computed left eigenvectors.

?trevc
Computes selected eigenvectors of an upper (quasi-)
triangular matrix computed by ?hseqr.

Syntax
lapack_int LAPACKE_strevc( int matrix_layout, char side, char howmny, lapack_logical*
select, lapack_int n, const float* t, lapack_int ldt, float* vl, lapack_int ldvl, float*
vr, lapack_int ldvr, lapack_int mm, lapack_int* m );
lapack_int LAPACKE_dtrevc( int matrix_layout, char side, char howmny, lapack_logical*
select, lapack_int n, const double* t, lapack_int ldt, double* vl, lapack_int ldvl,
double* vr, lapack_int ldvr, lapack_int mm, lapack_int* m );
lapack_int LAPACKE_ctrevc( int matrix_layout, char side, char howmny, const
lapack_logical* select, lapack_int n, lapack_complex_float* t, lapack_int ldt,
lapack_complex_float* vl, lapack_int ldvl, lapack_complex_float* vr, lapack_int ldvr,
lapack_int mm, lapack_int* m );
lapack_int LAPACKE_ztrevc( int matrix_layout, char side, char howmny, const
lapack_logical* select, lapack_int n, lapack_complex_double* t, lapack_int ldt,
lapack_complex_double* vl, lapack_int ldvl, lapack_complex_double* vr, lapack_int ldvr,
lapack_int mm, lapack_int* m );

Include Files
• mkl.h

932
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description

The routine computes some or all of the right and/or left eigenvectors of an upper triangular matrix T (or, for
real flavors, an upper quasi-triangular matrix T). Matrices of this type are produced by the Schur
factorization of a general matrix: A = Q*T*QH, as computed by hseqr.

The right eigenvector x and the left eigenvector y of T corresponding to an eigenvalue w, are defined by:
T*x = w*x, yH*T = w*yH, where yH denotes the conjugate transpose of y.
The eigenvalues are not input to this routine, but are read directly from the diagonal blocks of T.
This routine returns the matrices X and/or Y of right and left eigenvectors of T, or the products Q*X and/or
Q*Y, where Q is an input matrix.
If Q is the orthogonal/unitary factor that reduces a matrix A to Schur form T, then Q*X and Q*Y are the
matrices of right and left eigenvectors of A.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

side Must be 'R' or 'L' or 'B'.

If side = 'R', then only right eigenvectors are computed.

If side = 'L', then only left eigenvectors are computed.

If side = 'B', then all eigenvectors are computed.

howmny Must be 'A' or 'B' or 'S'.

If howmny = 'A', then all eigenvectors (as specified by side) are

computed.
If howmny = 'B', then all eigenvectors (as specified by side) are computed
and backtransformed by the matrices supplied in vl and vr.
If howmny = 'S', then selected eigenvectors (as specified by side and
select) are computed.

select Array, size at least max (1, n).

If howmny = 'S', select specifies which eigenvectors are to be computed.

If howmny = 'A' or 'B', select is not referenced.

For real flavors:

If omega[j] is a real eigenvalue, the corresponding real eigenvector is
computed if select[j] is 1.

If omega[j - 1] and omega[j] are the real and imaginary parts of a complex
eigenvalue, the corresponding complex eigenvector is computed if either
select[j - 1] or select[j] is 1, and on exit select[j - 1] is set to 1and select[j]
is set to 0.

For complex flavors:

The eigenvector corresponding to the j-th eigenvalue is computed if select[j
- 1] is 1.

933
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

n The order of the matrix T (n≥ 0).

t, vl, vr Arrays:
t (size max(1, ldt*n)) contains the n-by-n matrix T in Schur canonical
form. For complex flavors ctrevc and ztrevc, contains the upper
triangular matrix T.
vl(size max(1, ldvl*mm) for column major layout and max(1, ldvl*n) for
row major layout)
If howmny = 'B' and side = 'L' or 'B', then vl must contain an n-by-n
matrix Q (usually the matrix of Schur vectors returned by ?hseqr).

If howmny = 'A' or 'S', then vl need not be set.

The array vl is not referenced if side = 'R'.

vr(size max(1, ldvr*mm) for column major layout and max(1, ldvr*n) for
row major layout)
If howmny = 'B' and side = 'R' or 'B', then vr must contain an n-by-n
matrix Q (usually the matrix of Schur vectors returned by ?hseqr). .

If howmny = 'A' or 'S', then vr need not be set.

The array vr is not referenced if side = 'L'.

ldt The leading dimension of t; at least max(1, n).

ldvl The leading dimension of vl.

If side = 'L' or 'B', ldvl≥n.

If side = 'R', ldvl≥ 1.

ldvr The leading dimension of vr.

If side = 'R' or 'B', ldvr≥n.

If side = 'L', ldvr≥ 1.

mm The number of columns in the arrays vl and/or vr. Must be at least m (the
precise number of columns required).
If howmny = 'A' or 'B', mm = n.

If howmny = 'S': for real flavors, mm is obtained by counting 1 for each

selected real eigenvector and 2 for each selected complex eigenvector;
for complex flavors, mm is the number of selected eigenvectors (see
select).
Constraint: 0 ≤mm≤n.

Output Parameters

select If a complex eigenvector of a real matrix was selected as specified above,

then select[j] is set to 1 and select[j + 1] to 0

t ctrevc/ztrevc modify the t array, which is restored on exit.

vl, vr If side = 'L' or 'B', vl contains the computed left eigenvectors (as
specified by howmny and select).

934
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If side = 'R' or 'B', vr contains the computed right eigenvectors (as
specified by howmny and select).
The eigenvectors treated column-wise form a rectangular n-by-mm matrix.

For real flavors: a real eigenvector corresponding to a real eigenvalue

m For complex flavors: the number of selected eigenvectors.

If howmny = 'A' or 'B', m is set to n.

For real flavors: the number of columns of vl and/or vr actually used to

store the selected eigenvectors.
If howmny = 'A' or 'B', m is set to n.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
If xi is an exact right eigenvector and yi is the corresponding computed eigenvector, then the angle θ(yi,
xi) between them is bounded as follows: θ(yi,xi)≤(c(n)ε||T||2)/sepi where sepi is the reciprocal
condition number of xi. The condition number sepi may be computed by calling ?trsna.

?trevc3
Computes selected eigenvectors of an upper (quasi-)
triangular matrix computed by ?hseqr using Level 3
BLAS

Syntax
call strevc3(side, howmny, select, n, t, ldt, vl, ldvl, vr, ldvr, mm, m, work, lwork,
info)
call dtrevc3(side, howmny, select, n, t, ldt, vl, ldvl, vr, ldvr, mm, m, work, lwork,
info)
call ctrevc3(side, howmny, select, n, t, ldt, vl, ldvl, vr, ldvr, mm, m, work, lwork,
rwork, lrwork, info)
call ztrevc3(side, howmny, select, n, t, ldt, vl, ldvl, vr, ldvr, mm, m, work, lwork,
rwork, lrwork, info)

Include Files
• mkl.fi

935
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Description
This routine computes some or all of the right and left eigenvectors of an upper triangular matrix T (or, for
real flavors, an upper quasi-triangular matrix T) using Level 3 BLAS. Matrices of this type are produced by
the Schur factorization of a general matrix: A =Q*T*QH, as computed by hseqr.

The right eigenvector x and the left eigenvector y of T corresponding to an eigenvalue w are defined by the
following:

Tx = wx, yHT = wyH

where yH denotes the conjugate transpose of y.
The eigenvalues are not passed to this routine but are read directly from the diagonal blocks of T.
This routine returns one or both of the matrices X and Y of the right and left eigenvectors of T, or one or both
of the products Q*X and Q*Y, where Q is an input matrix.

If Q is the orthogonal/unitary factor that reduces a matrix A to Schur form T, then Q*X and Q*Y are the
matrices of the right and left eigenvectors of A.

Input Parameters

side CHARACTER*1
Must be 'R', 'L', or 'B'.

• If side = 'R', only right eigenvectors are computed.

• If side = 'L', only left eigenvectors are computed.
• If side = 'B', all eigenvectors are computed.

howmny CHARACTER*1
Must be 'A', 'B', or 'S'.

• If howmny = 'A', all eigenvectors (as specified by side) are

computed.
• If howmny = 'B', all eigenvectors (as specified by side) are
computed and back-transformed by the matrices supplied in vl
and vr.
• If howmny = 'S', selected eigenvectors (as specified by side and
select) are computed.

select Array with a size of at least max (1, n)

If howmny = 'S', select specifies which eigenvectors are to be

computed. If howmny = 'A' or howmny = 'B', select is not
referenced.
For real flavors:

• If omega(j) is a real eigenvalue and select(j) is .TRUE., the

corresponding real eigenvector is computed.
• If omega(j) and omega(j + 1) are the real and imaginary parts
of a complex eigenvalue and either select(j) or select(j + 1)
is .TRUE., the corresponding complex eigenvector is computed,
and on exit select(j) is set to .TRUE. and select(j + 1) is set
to .FALSE..

For complex flavors:

936
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• If select(j) is .TRUE., the eigenvector corresponding to the jth
eigenvalue is computed.

n INTEGER
The order of the matrix T (n≥ 0).

t, vl, vr, work • REAL for strevc3

• DOUBLE PRECISION for dtrevc3
• COMPLEX for ctrevc3
• DOUBLE COMPLEX for ztrevc3
Arrays:

• t(ldt,*) contains the n-by-n matrix T in Schur canonical form.

For complex flavors ctrevc3 and ztrevc3, the array contains the
upper triangular matrix T.
The second dimension of t must be at least max(1, n).
• vl(ldvl,*)
If howmny = 'B' and side = 'L' or 'B', then vl must contain an
n-by-n matrix Q (usually the matrix of Schur vectors returned
by ?hseqr).

If howmny = 'A' or 'S', vl need not be set.

The second dimension of vl must be at least max(1, mm) if side

= 'L' or 'B', and at least 1 if side = 'R'.

The array vl is not referenced if side = 'R'.

• vr(ldvr,*)
If howmny = 'B' and side = 'R' or 'B', vr must contain an n-by-
n matrix Q (usually the matrix of Schur vectors returned
by ?hseqr).

If howmny = 'A' or 'S', vr need not be set.

The second dimension of vr must be at least max(1, mm) if side

= 'R' or 'B', and at least 1 if side = 'L'.

The array vr is not referenced if side = 'L'.

• work(*) is a workspace array, and its dimension is max (1,
lwork).

lwork INTEGER
The size of the work array. Must be at least max(1, 3*n) for real
flavors, and at least max(1, 2*n) for complex flavors.

If lwork = -1, a workspace query is assumed; the routine calculates

only the optimal size of the work array and returns this value as the
first entry of the work array, and no error message related to lwork is
issued by xerbla. For details, see "Application Notes" below.

ldt INTEGER
The leading dimension of t. It is at least max(1, n).

ldvl INTEGER

937
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The leading dimension of vl.

• If side = 'L' or 'B', ldvl≥n.

• If side = 'R', ldvl≥ 1.

ldvr INTEGER
The leading dimension of vr.

• If side = 'R' or 'B', ldvr≥n.

• If side = 'L', ldvr≥ 1.

mm INTEGER
The number of columns in one or both of the arrays vl and vr. Must
be at least m (the precise number of columns required).

• If howmny = 'A' or 'B', mm = n.

• If howmny = 'S': for real flavors, mm is obtained by counting 1 for
each selected real eigenvector and 2 for each selected complex
eigenvector; for complex flavors, mm is the number of selected
eigenvectors (see select).

Constraint: 0 ≤mm≤n.

rwork • REAL for ctrevc3

• DOUBLE PRECISION for ztrevc3
The workspace array is used in complex flavors only. Its dimensionis
max (1, lrwork).

lrwork INTEGER
The size of the rwork array. It must be at least max(1, n).

If lrwork = -1, a workspace query is assumed; the routine calculates

only the optimal size of the work array and returns this value as the
first entry of the rwork array, and no error message related to
lrwork is issued by xerbla. For details, see "Application Notes" below.

Output Parameters

select If a complex eigenvector of a real matrix was selected as specified

above, then select(j) is set to .TRUE. and select(j + 1) is set
to .FALSE..

t COMPLEX for ctrevc3

DOUBLE COMPLEX for ztrevc3
ctrevc3 or ztrevc3 modifies the t(ldt,*) array, which is restored
on exit.

vl, vr If side = 'L' or 'B', vl contains the computed left eigenvectors (as
specified by howmny and select).

If side = 'R' or 'B', vr contains the computed right eigenvectors

(as specified by howmny and select).

938
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Treated column-wise, the eigenvectors form a rectangular n-by-mm
matrix.

For real flavors A real eigenvector corresponding to a real

eigenvalue occupies one column of the matrix; a complex
eigenvector corresponding to a complex eigenvalue
occupies two columns. The first column holds the real part
of the eigenvector, and the second column holds the
imaginary part of the eigenvector. The matrix is stored in a
one-dimensional array as described by matrix_layout
(using either column major or row major layout).

m INTEGER

For complex flavors The number of selected

eigenvectors. If howmny = 'A' or 'B', m is set to n.

For real flavors The number of columns of one or both of

vl and vr actually used to store the selected eigenvectors.
If howmny = 'A' or 'B', m is set to n.

work(1) On exit, if info = 0, work(1) returns the required optimal size of

lwork.

rwork(1) On exit, if info = 0, then rwork(1) returns the required optimal size
of lrwork.

info INTEGER
If info = 0, the execution is successful.

If info = -i, the ith parameter contained an illegal value.

Application Notes
If xi is an exact right eigenvector and yi is the corresponding computed eigenvector, the angle θ(yi, xi)
between them is bounded as follows:

θ(yi,xi)≤(c(n)ε||T||2)/sepi
where sepi is the reciprocal condition number of xi. You can compute the condition number sepi by
calling ?trsna.

See Also
Matrix Storage Schemes

?trsna
Estimates condition numbers for specified eigenvalues
and right eigenvectors of an upper (quasi-) triangular
matrix.

939
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
lapack_int LAPACKE_strsna( int matrix_layout, char job, char howmny, const
lapack_logical* select, lapack_int n, const float* t, lapack_int ldt, const float* vl,
lapack_int ldvl, const float* vr, lapack_int ldvr, float* s, float* sep, lapack_int mm,
lapack_int* m );
lapack_int LAPACKE_dtrsna( int matrix_layout, char job, char howmny, const
lapack_logical* select, lapack_int n, const double* t, lapack_int ldt, const double*
vl, lapack_int ldvl, const double* vr, lapack_int ldvr, double* s, double* sep,
lapack_int mm, lapack_int* m );
lapack_int LAPACKE_ctrsna( int matrix_layout, char job, char howmny, const
lapack_logical* select, lapack_int n, const lapack_complex_float* t, lapack_int ldt,
const lapack_complex_float* vl, lapack_int ldvl, const lapack_complex_float* vr,
lapack_int ldvr, float* s, float* sep, lapack_int mm, lapack_int* m );
lapack_int LAPACKE_ztrsna( int matrix_layout, char job, char howmny, const
lapack_logical* select, lapack_int n, const lapack_complex_double* t, lapack_int ldt,
const lapack_complex_double* vl, lapack_int ldvl, const lapack_complex_double* vr,
lapack_int ldvr, double* s, double* sep, lapack_int mm, lapack_int* m );

Include Files
• mkl.h

Description

The routine estimates condition numbers for specified eigenvalues and/or right eigenvectors of an upper
triangular matrix T (or, for real flavors, upper quasi-triangular matrix T in canonical Schur form). These are
the same as the condition numbers of the eigenvalues and right eigenvectors of an original matrix A =
Z*T*ZH (with unitary or, for real flavors, orthogonal Z), from which T may have been derived.
The routine computes the reciprocal of the condition number of an eigenvalue λi as si = |vT*u|/(||u||E||
v||E) for real flavors and si = |vH*u|/(||u||E||v||E) for complex flavors,
where:
• u and v are the right and left eigenvectors of T, respectively, corresponding to λi.
• vT/vH denote transpose/conjugate transpose of v, respectively.
This reciprocal condition number always lies between zero (ill-conditioned) and one (well-conditioned).
An approximate error estimate for a computed eigenvalue λi is then given by ε*||T||/si, where ε is the
machine precision.
To estimate the reciprocal of the condition number of the right eigenvector corresponding to λi, the routine
first calls trexc to reorder the diagonal elements of matrix T so that λi is in the leading position:

940
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

The reciprocal condition number of the eigenvector is then estimated as sepi, the smallest singular value of
the matrix T22 - λi*I.
An approximate error estimate for a computed right eigenvector u corresponding to λi is then given by ε*||
T||/sepi.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

job Must be 'E' or 'V' or 'B'.

If job = 'E', then condition numbers for eigenvalues only are computed.

If job = 'V', then condition numbers for eigenvectors only are computed.

If job = 'B', then condition numbers for both eigenvalues and

eigenvectors are computed.

howmny Must be 'A' or 'S'.

If howmny = 'A', then the condition numbers for all eigenpairs are
computed.
If howmny = 'S', then condition numbers for selected eigenpairs (as
specified by select) are computed.

select Array, size at least max (1, n) if howmny = 'S' and at least 1 otherwise.

Specifies the eigenpairs for which condition numbers are to be computed if

howmny= 'S'.

For real flavors:

To select condition numbers for the eigenpair corresponding to the real
eigenvalue λj, select[j] must be set 1;

to select condition numbers for the eigenpair corresponding to a complex

conjugate pair of eigenvalues λj and λj + 1), select[j - 1] and/or select[j]
must be set 1

For complex flavors

To select condition numbers for the eigenpair corresponding to the
eigenvalue λj, select[j] must be set 1select is not referenced if howmny =
'A'.

941
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

n The order of the matrix T (n≥ 0).

t, vl, vr Arrays:
t (size max(1, ldt*n)) contains the n-by-n matrix T.

vl(size max(1, ldvl*mm) for column major layout and max(1, ldvl*n) for
row major layout)
If job = 'E' or 'B', then vl must contain the left eigenvectors of T (or of
any matrix Q*T*QH with Q unitary or orthogonal) corresponding to the
eigenpairs specified by howmny and select. The eigenvectors must be
stored in consecutive columns of vl, as returned by trevc or hsein.
The array vl is not referenced if job = 'V'.

vr(size max(1, ldvr*mm) for column major layout and max(1, ldvr*n) for
row major layout)
If job = 'E' or 'B', then vr must contain the right eigenvectors of T (or of
any matrix Q*T*QH with Q unitary or orthogonal) corresponding to the
eigenpairs specified by howmny and select. The eigenvectors must be
stored in consecutive columns of vr, as returned by trevc or hsein.
The array vr is not referenced if job = 'V'.

ldt The leading dimension of t; at least max(1, n).

ldvl The leading dimension of vl.

If job = 'E' or 'B', ldvl≥ max(1,n) for column major layout and ldvl≥
max(1, mm) for row major layout .

If job = 'V', ldvl≥ 1.

ldvr The leading dimension of vr.

If job = 'E' or 'B', ldvr≥ max(1,n) for column major layout and ldvr≥
max(1, mm) for row major layout .

If job = 'R', ldvr≥ 1.

mm The number of elements in the arrays s and sep, and the number of
columns in vl and vr (if used). Must be at least m (the precise number
required).
If howmny = 'A', mm = n;

if howmny = 'S', for real flavorsmm is obtained by counting 1 for each

selected real eigenvalue and 2 for each selected complex conjugate pair of
eigenvalues.
for complex flavorsmm is the number of selected eigenpairs (see select).
Constraint:
0 ≤mm≤n.

Output Parameters

s Array, size at least max(1, mm) if job = 'E' or 'B' and at least 1 if job =
'V'.

942
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Contains the reciprocal condition numbers of the selected eigenvalues if job
= 'E' or 'B', stored in consecutive elements of the array. Thus s[j - 1],
sep[j - 1] and the j-th columns of vl and vr all correspond to the same
eigenpair (but not in general the j th eigenpair unless all eigenpairs have
been selected).
For real flavors: for a complex conjugate pair of eigenvalues, two
consecutive elements of s are set to the same value. The array s is not
referenced if job = 'V'.

sep Array, size at least max(1, mm) if job = 'V' or 'B' and at least 1 if job =
'E'. Contains the estimated reciprocal condition numbers of the selected
right eigenvectors if job = 'V' or 'B', stored in consecutive elements of
the array.
For real flavors: for a complex eigenvector, two consecutive elements of sep
are set to the same value; if the eigenvalues cannot be reordered to
compute sep[j - 1], then sep[j - 1] is set to zero; this can only occur when
the true value would be very small anyway. The array sep is not referenced
if job = 'E'.

m For complex flavors: the number of selected eigenpairs.

If howmny = 'A', m is set to n.

For real flavors: the number of elements of s and/or sep actually used to
store the estimated condition numbers.
If howmny = 'A', m is set to n.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed values sepi may overestimate the true value, but seldom by a factor of more than 3.

?trexc
Reorders the Schur factorization of a general matrix.

Syntax
lapack_int LAPACKE_strexc( int matrix_layout, char compq, lapack_int n, float* t,
lapack_int ldt, float* q, lapack_int ldq, lapack_int* ifst, lapack_int* ilst );
lapack_int LAPACKE_dtrexc( int matrix_layout, char compq, lapack_int n, double* t,
lapack_int ldt, double* q, lapack_int ldq, lapack_int* ifst, lapack_int* ilst );
lapack_int LAPACKE_ctrexc( int matrix_layout, char compq, lapack_int n,
lapack_complex_float* t, lapack_int ldt, lapack_complex_float* q, lapack_int ldq,
lapack_int ifst, lapack_int ilst );
lapack_int LAPACKE_ztrexc( int matrix_layout, char compq, lapack_int n,
lapack_complex_double* t, lapack_int ldt, lapack_complex_double* q, lapack_int ldq,
lapack_int ifst, lapack_int ilst );

943
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description
The routine reorders the Schur factorization of a general matrix A = Q*T*QH, so that the diagonal element or
block of T with row index ifst is moved to row ilst.
The reordered Schur form S is computed by an unitary (or, for real flavors, orthogonal) similarity
transformation: S = ZH*T*Z. Optionally the updated matrix P of Schur vectors is computed as P = Q*Z,
giving A = P*S*PH.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

compq Must be 'V' or 'N'.

If compq = 'V', then the Schur vectors (Q) are updated.

If compq = 'N', then no Schur vectors are updated.

n The order of the matrix T (n≥ 0).

t, q Arrays:
t (size max(1, ldt*n)) contains the n-by-n matrix T.

q (size max(1, ldq*n))

If compq = 'V', then q must contain Q (Schur vectors).

If compq = 'N', then q is not referenced.

ldt The leading dimension of t; at least max(1, n).

ldq The leading dimension of q;

If compq = 'N', then ldq≥ 1.

If compq = 'V', then ldq≥ max(1, n).

ifst, ilst 1 ≤ifst≤n; 1 ≤ilst≤n.

Must specify the reordering of the diagonal elements (or blocks, which is
possible for real flavors) of the matrix T. The element (or block) with row
index ifst is moved to row ilst by a sequence of exchanges between
adjacent elements (or blocks).

Output Parameters

t Overwritten by the updated matrix S.

q If compq = 'V', q contains the updated matrix of Schur vectors.

ifst, ilst Overwritten for real flavors only.

If ifst pointed to the second row of a 2 by 2 block on entry, it is changed to
point to the first row; ilst always points to the first row of the block in its
final position (which may differ from its input value by ±1).

944
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed matrix S is exactly similar to a matrix T+E, where ||E||2 = O(ε)*||T||2, and ε is the
machine precision.
Note that if a 2 by 2 diagonal block is involved in the re-ordering, its off-diagonal elements are in general
changed; the diagonal elements and the eigenvalues of the block are unchanged unless the block is
sufficiently ill-conditioned, in which case they may be noticeably altered. It is possible for a 2 by 2 block to
break into two 1 by 1 blocks, that is, for a pair of complex eigenvalues to become purely real.
The approximate number of floating-point operations is

for real flavors: 6n(ifst-ilst) if compq = 'N';

12n(ifst-ilst) if compq = 'V';

for complex flavors: 20n(ifst-ilst) if compq = 'N';

40n(ifst-ilst) if compq = 'V'.

?trsen
Reorders the Schur factorization of a matrix and
(optionally) computes the reciprocal condition
numbers for the selected cluster of eigenvalues and
respective invariant subspace.

Syntax
lapack_int LAPACKE_strsen( int matrix_layout, char job, char compq, const
lapack_logical* select, lapack_int n, float* t, lapack_int ldt, float* q, lapack_int
ldq, float* wr, float* wi, lapack_int* m, float* s, float* sep );
lapack_int LAPACKE_dtrsen( int matrix_layout, char job, char compq, const
lapack_logical* select, lapack_int n, double* t, lapack_int ldt, double* q, lapack_int
ldq, double* wr, double* wi, lapack_int* m, double* s, double* sep );
lapack_int LAPACKE_ctrsen( int matrix_layout, char job, char compq, const
lapack_logical* select, lapack_int n, lapack_complex_float* t, lapack_int ldt,
lapack_complex_float* q, lapack_int ldq, lapack_complex_float* w, lapack_int* m, float*
s, float* sep );
lapack_int LAPACKE_ztrsen( int matrix_layout, char job, char compq, const
lapack_logical* select, lapack_int n, lapack_complex_double* t, lapack_int ldt,
lapack_complex_double* q, lapack_int ldq, lapack_complex_double* w, lapack_int* m,
double* s, double* sep );

Include Files
• mkl.h

Description

945
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The routine reorders the Schur factorization of a general matrix A = Q*T*QT (for real flavors) or A = Q*T*QH
(for complex flavors) so that a selected cluster of eigenvalues appears in the leading diagonal elements (or,
for real flavors, diagonal blocks) of the Schur form. The reordered Schur form R is computed by a unitary
(orthogonal) similarity transformation: R = ZH*T*Z. Optionally the updated matrix P of Schur vectors is
computed as P = Q*Z, giving A = P*R*PH.

Let

where the selected eigenvalues are precisely the eigenvalues of the leading m-by-m submatrix T11. Let P be
correspondingly partitioned as (Q1Q2) where Q1 consists of the first m columns of Q. Then A*Q1 = Q1*T11,
and so the m columns of Q1 form an orthonormal basis for the invariant subspace corresponding to the
selected cluster of eigenvalues.
Optionally the routine also computes estimates of the reciprocal condition numbers of the average of the
cluster of eigenvalues and of the invariant subspace.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

job Must be 'N' or 'E' or 'V' or 'B'.

If job = 'N', then no condition numbers are required.

If job = 'E', then only the condition number for the cluster of eigenvalues
is computed.
If job = 'V', then only the condition number for the invariant subspace is
computed.
If job = 'B', then condition numbers for both the cluster and the invariant
subspace are computed.

compq Must be 'V' or 'N'.

If compq = 'V', then Q of the Schur vectors is updated.

If compq = 'N', then no Schur vectors are updated.

select Array, size at least max (1, n).

Specifies the eigenvalues in the selected cluster. To select an eigenvalue λj,
select[j] must be 1

946
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
For real flavors: to select a complex conjugate pair of eigenvalues λj and λj
+1 (corresponding 2 by 2 diagonal block), select[j - 1] and/or select[j] must
be 1; the complex conjugate λjand λj + 1 must be either both included in the
cluster or both excluded.

n The order of the matrix T (n≥ 0).

t, q Arrays:
t (size max(1, ldt*n)) Theupper quasi-triangular n-by-n matrix T, in Schur
canonical form.
q (size max(1, ldq*n))

If compq = 'V', then q must contain the matrix Q of Schur vectors.

If compq = 'N', then q is not referenced.

ldt The leading dimension of t; at least max(1, n).

ldq The leading dimension of q;

If compq = 'N', then ldq≥ 1.

If compq = 'V', then ldq≥ max(1, n).

Output Parameters

t Overwritten by the reordered matrix R in Schur canonical form with the

selected eigenvalues in the leading diagonal blocks.

q If compq = 'V', q contains the updated matrix of Schur vectors; the first
m columns of the Q form an orthogonal basis for the specified invariant
subspace.

w Array, size at least max(1, n). The recorded eigenvalues of R. The

eigenvalues are stored in the same order as on the diagonal of R.

wr, wi Arrays, size at least max(1, n). Contain the real and imaginary parts,
respectively, of the reordered eigenvalues of R. The eigenvalues are stored
in the same order as on the diagonal of R. Note that if a complex
eigenvalue is sufficiently ill-conditioned, then its value may differ
significantly from its value before reordering.

m For complex flavors: the dimension of the specified invariant subspaces,

which is the same as the number of selected eigenvalues (see select).
For real flavors: the dimension of the specified invariant subspace. The
value of m is obtained by counting 1 for each selected real eigenvalue and 2
for each selected complex conjugate pair of eigenvalues (see select).
Constraint: 0 ≤m≤n.

s If job = 'E' or 'B', s is a lower bound on the reciprocal condition number

of the average of the selected cluster of eigenvalues.
If m = 0 or n, then s = 1.

For real flavors: if info = 1, then s is set to zero.s is not referenced if job
= 'N' or 'V'.

947
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

sep If job = 'V' or 'B', sep is the estimated reciprocal condition number of
the specified invariant subspace.
If m = 0 or n, then sep = |T|.

For real flavors: if info = 1, then sep is set to zero.

sep is not referenced if job = 'N' or 'E'.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = 1, the reordering of T failed because some eigenvalues are too close to separate (the problem is
very ill-conditioned); T may have been partially reordered, and wr and wi contain the eigenvalues in the
same order as in T; s and sep (if requested) are set to zero.

Application Notes
The computed matrix R is exactly similar to a matrix T+E, where ||E||2 = O(ε)*||T||2, and ε is the
machine precision. The computed s cannot underestimate the true reciprocal condition number by more than
a factor of (min(m, n-m))1/2; sep may differ from the true value by (m*n-m2)1/2. The angle between the
computed invariant subspace and the true subspace is O(ε)*||A||2/sep. Note that if a 2-by-2 diagonal
block is involved in the re-ordering, its off-diagonal elements are in general changed; the diagonal elements
and the eigenvalues of the block are unchanged unless the block is sufficiently ill-conditioned, in which case
they may be noticeably altered. It is possible for a 2-by-2 block to break into two 1-by-1 blocks, that is, for a
pair of complex eigenvalues to become purely real.

?trsyl
Solves Sylvester equation for real quasi-triangular or
complex triangular matrices.

Syntax
lapack_int LAPACKE_strsyl( int matrix_layout, char trana, char tranb, lapack_int isgn,
lapack_int m, lapack_int n, const float* a, lapack_int lda, const float* b, lapack_int
ldb, float* c, lapack_int ldc, float* scale );
lapack_int LAPACKE_dtrsyl( int matrix_layout, char trana, char tranb, lapack_int isgn,
lapack_int m, lapack_int n, const double* a, lapack_int lda, const double* b,
lapack_int ldb, double* c, lapack_int ldc, double* scale );
lapack_int LAPACKE_ctrsyl( int matrix_layout, char trana, char tranb, lapack_int isgn,
lapack_int m, lapack_int n, const lapack_complex_float* a, lapack_int lda, const
lapack_complex_float* b, lapack_int ldb, lapack_complex_float* c, lapack_int ldc,
float* scale );
lapack_int LAPACKE_ztrsyl( int matrix_layout, char trana, char tranb, lapack_int isgn,
lapack_int m, lapack_int n, const lapack_complex_double* a, lapack_int lda, const
lapack_complex_double* b, lapack_int ldb, lapack_complex_double* c, lapack_int ldc,
double* scale );

Include Files
• mkl.h

948
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description

The routine solves the Sylvester matrix equation op(A)*X±X*op(B) = α*C, where op(A) = A or AH, and the
matrices A and B are upper triangular (or, for real flavors, upper quasi-triangular in canonical Schur form); α≤
1 is a scale factor determined by the routine to avoid overflow in X; A is m-by-m, B is n-by-n, and C and X
are both m-by-n. The matrix X is obtained by a straightforward process of back substitution.
The equation has a unique solution if and only if αi±βi≠ 0, where {αi} and {βi} are the eigenvalues of A and
B, respectively, and the sign (+ or -) is the same as that used in the equation to be solved.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

trana Must be 'N' or 'T' or 'C'.

If trana = 'N', then op(A) = A.

If trana = 'T', then op(A) = AT (real flavors only).

If trana = 'C' then op(A) = AH.

tranb Must be 'N' or 'T' or 'C'.

If tranb = 'N', then op(B) = B.

If tranb = 'T', then op(B) = BT (real flavors only).

If tranb = 'C', then op(B) = BH.

isgn Indicates the form of the Sylvester equation.

If isgn = +1, op(A)*X + X*op(B) = alpha*C.

If isgn = -1, op(A)X - Xop(B) = alpha*C.

m The order of A, and the number of rows in X and C (m≥ 0).

n The order of B, and the number of columns in X and C (n≥ 0).

a, b, c Arrays:
a (size max(1, lda*m)) contains the matrix A.

b (size max(1, ldb*n)) contains the matrix B.

c(size max(1, ldc*n) for column major layout and max(1, ldc*m for row
major layout) contains the matrix C.

lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.

ldb The leading dimension of b; at least max(1, n).

ldc The leading dimension of c; at least max(1, m) for column major layout and
at least max(1, n) for row major layout .

Output Parameters

c Overwritten by the solution matrix X.

949
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

scale The value of the scale factor α.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = 1, A and B have common or close eigenvalues; perturbed values were used to solve the equation.

Application Notes
Let X be the exact, Y the corresponding computed solution, and R the residual matrix: R = C - (AY±YB).
Then the residual is always small:
||R||F = O(ε)*(||A||F +||B||F)*||Y||F.
However, Y is not necessarily the exact solution of a slightly perturbed equation; in other words, the solution
is not backwards stable.
For the forward error, the following bound holds:
||Y - X||F≤||R||F/sep(A,B)
but this may be a considerable overestimate. See [Golub96] for a definition of sep(A, B).
The approximate number of floating-point operations for real flavors is m*n*(m + n). For complex flavors it
is 4 times greater.

Generalized Nonsymmetric Eigenvalue Problems: LAPACK Computational Routines

This topic describes LAPACK routines for solving generalized nonsymmetric eigenvalue problems, reordering
the generalized Schur factorization of a pair of matrices, as well as performing a number of related
computational tasks.
A generalized nonsymmetric eigenvalue problem is as follows: given a pair of nonsymmetric (or non-
Hermitian) n-by-n matrices A and B, find the generalized eigenvaluesλ and the corresponding generalized
eigenvectorsx and y that satisfy the equations
Ax = λBx (right generalized eigenvectors x)
and
yHA = λyHB (left generalized eigenvectors y).
Table "Computational Routines for Solving Generalized Nonsymmetric Eigenvalue Problems" lists LAPACK
routines used to solve the generalized nonsymmetric eigenvalue problems and the generalized Sylvester
equation.
Computational Routines for Solving Generalized Nonsymmetric Eigenvalue Problems
Routine Operation performed
name

gghrd Reduces a pair of matrices to generalized upper Hessenberg form using orthogonal/
unitary transformations.

ggbal Balances a pair of general real or complex matrices.

ggbak Forms the right or left eigenvectors of a generalized eigenvalue problem.

gghd3 Reduces a pair of matrices to generalized upper Hessenberg form.

950
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Routine Operation performed
name

hgeqz Implements the QZ method for finding the generalized eigenvalues of the matrix pair
(H,T).

tgevc Computes some or all of the right and/or left generalized eigenvectors of a pair of upper
triangular matrices

tgexc Reorders the generalized Schur decomposition of a pair of matrices (A,B) so that one
diagonal block of (A,B) moves to another row index.

tgsen Reorders the generalized Schur decomposition of a pair of matrices (A,B) so that a
selected cluster of eigenvalues appears in the leading diagonal blocks of (A,B).

tgsyl Solves the generalized Sylvester equation.

tgsyl Estimates reciprocal condition numbers for specified eigenvalues and/or eigenvectors of a
pair of matrices in generalized real Schur canonical form.

?gghrd
Reduces a pair of matrices to generalized upper
Hessenberg form using orthogonal/unitary
transformations.

Syntax
lapack_int LAPACKE_sgghrd (int matrix_layout, char compq, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, float* a, lapack_int lda, float* b, lapack_int ldb,
float* q, lapack_int ldq, float* z, lapack_int ldz);
lapack_int LAPACKE_dgghrd (int matrix_layout, char compq, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, double* a, lapack_int lda, double* b, lapack_int ldb,
double* q, lapack_int ldq, double* z, lapack_int ldz);
lapack_int LAPACKE_cgghrd (int matrix_layout, char compq, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, lapack_complex_float* a, lapack_int lda,
lapack_complex_float* b, lapack_int ldb, lapack_complex_float* q, lapack_int ldq,
lapack_complex_float* z, lapack_int ldz);
lapack_int LAPACKE_zgghrd (int matrix_layout, char compq, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, lapack_complex_double* a, lapack_int lda,
lapack_complex_double* b, lapack_int ldb, lapack_complex_double* q, lapack_int ldq,
lapack_complex_double* z, lapack_int ldz);

Include Files
• mkl.h

Description

The routine reduces a pair of real/complex matrices (A,B) to generalized upper Hessenberg form using
orthogonal/unitary transformations, where A is a general matrix and B is upper triangular. The form of the
generalized eigenvalue problem is A*x = λ*B*x, and B is typically made upper triangular by computing its
QR factorization and moving the orthogonal matrix Q to the left side of the equation.
This routine simultaneously reduces A to a Hessenberg matrix H:
QH*A*Z = H

951
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

and transforms B to another upper triangular matrix T:

QH*B*Z = T
in order to reduce the problem to its standard form H*y = λ*T*y, where y = ZH*x.

The orthogonal/unitary matrices Q and Z are determined as products of Givens rotations. They may either be
formed explicitly, or they may be postmultiplied into input matrices Q1 and Z1, so that
Q1*A*Z1H = (Q1*Q)*H*(Z1*Z)H
Q1*B*Z1H = (Q1*Q)*T*(Z1*Z)H
If Q1 is the orthogonal/unitary matrix from the QR factorization of B in the original equation A*x = λ*B*x,
then the routine ?gghrd reduces the original problem to generalized Hessenberg form.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

compq Must be 'N', 'I', or 'V'.

If compq = 'N', matrix Q is not computed.

If compq = 'I', Q is initialized to the unit matrix, and the orthogonal/

unitary matrix Q is returned;
If compq = 'V', Q must contain an orthogonal/unitary matrix Q1 on entry,
and the product Q1*Q is returned.

compz Must be 'N', 'I', or 'V'.

If compz = 'N', matrix Z is not computed.

If compz = 'I', Z is initialized to the unit matrix, and the orthogonal/

unitary matrix Z is returned;
If compz = 'V', Z must contain an orthogonal/unitary matrix Z1 on entry,
and the product Z1*Z is returned.

n The order of the matrices A and B (n≥ 0).

ilo, ihi ilo and ihi mark the rows and columns of A which are to be reduced. It is
assumed that A is already upper triangular in rows and columns 1:ilo-1 and
ihi+1:n. Values of ilo and ihi are normally set by a previous call to ggbal;
otherwise they should be set to 1 and n respectively.
Constraint:
If n > 0, then 1 ≤ilo≤ihi≤n;

if n = 0, then ilo = 1 and ihi = 0.

a, b, q, z Arrays:
a (size max(1, lda*n)) contains the n-by-n general matrix A.

b (size max(1, ldb*n)) contains the n-by-n upper triangular matrix B.

q (size max(1, ldq*n))

If compq = 'N', then q is not referenced.

If compq = 'V', then q must contain the orthogonal/unitary matrix Q1,

typically from the QR factorization of B.

952
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
z (size max(1, ldz*n))

If compz = 'N', then z is not referenced.

If compz = 'V', then z must contain the orthogonal/unitary matrix Z1.

lda The leading dimension of a; at least max(1, n).

ldb The leading dimension of b; at least max(1, n).

ldq The leading dimension of q;

If compq = 'N', then ldq≥ 1.

If compq = 'I'or 'V', then ldq≥ max(1, n).

ldz The leading dimension of z;

If compz = 'N', then ldz≥ 1.

If compz = 'I'or 'V', then ldz≥ max(1, n).

Output Parameters

a On exit, the upper triangle and the first subdiagonal of A are overwritten
with the upper Hessenberg matrix H, and the rest is set to zero.

b On exit, overwritten by the upper triangular matrix T = QHBZ. The

elements below the diagonal are set to zero.

q If compq = 'I', then q contains the orthogonal/unitary matrix Q, ;

If compq = 'V', then q is overwritten by the product Q1*Q.

z If compz = 'I', then z contains the orthogonal/unitary matrix Z;

If compz = 'V', then z is overwritten by the product Z1*Z.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

?ggbal
Balances a pair of general real or complex matrices.

Syntax
lapack_int LAPACKE_sggbal( int matrix_layout, char job, lapack_int n, float* a,
lapack_int lda, float* b, lapack_int ldb, lapack_int* ilo, lapack_int* ihi, float*
lscale, float* rscale );
lapack_int LAPACKE_dggbal( int matrix_layout, char job, lapack_int n, double* a,
lapack_int lda, double* b, lapack_int ldb, lapack_int* ilo, lapack_int* ihi, double*
lscale, double* rscale );
lapack_int LAPACKE_cggbal( int matrix_layout, char job, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* b, lapack_int ldb,
lapack_int* ilo, lapack_int* ihi, float* lscale, float* rscale );

953
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lapack_int LAPACKE_zggbal( int matrix_layout, char job, lapack_int n,

lapack_complex_double* a, lapack_int lda, lapack_complex_double* b, lapack_int ldb,
lapack_int* ilo, lapack_int* ihi, double* lscale, double* rscale );

Include Files
• mkl.h

Description

The routine balances a pair of general real/complex matrices (A,B). This involves, first, permuting A and B by
similarity transformations to isolate eigenvalues in the first 1 to ilo-1 and last ihi+1 to n elements on the
diagonal;and second, applying a diagonal similarity transformation to rows and columns ilo to ihi to make the
rows and columns as close in norm as possible. Both steps are optional. Balancing may reduce the 1-norm of
the matrices, and improve the accuracy of the computed eigenvalues and/or eigenvectors in the generalized
eigenvalue problem A*x = λ*B*x.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

job Specifies the operations to be performed on A and B. Must be 'N' or 'P' or

'S' or 'B'.
If job = 'N ', then no operations are done; simply set ilo =1, ihi=n,
lscale[i] =1.0 and rscale[i]=1.0 for
i = 0,..., n - 1.
If job = 'P', then permute only.

If job = 'S', then scale only.

If job = 'B', then both permute and scale.

n The order of the matrices A and B (n≥ 0).

a, b Arrays:
a (size max(1, lda*n)) contains the matrix A.

b (size max(1, ldb*n)) contains the matrix B.

If job = 'N', a and b are not referenced.

lda The leading dimension of a; at least max(1, n).

ldb The leading dimension of b; at least max(1, n).

Output Parameters

a, b Overwritten by the balanced matrices A and B, respectively.

ilo, ihi ilo and ihi are set to integers such that on exit Ai, j = 0 and Bi, j = 0 if i>j
and j=1,...,ilo-1 or i=ihi+1,..., n.

If job = 'N'or 'S', then ilo = 1 and ihi = n.

lscale, rscale Arrays, size at least max(1, n).

954
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lscale contains details of the permutations and scaling factors applied to the
left side of A and B.
If Pj is the index of the row interchanged with row j, and Dj is the scaling
factor applied to row j, then
lscale[j - 1] = Pj, for j = 1,..., ilo-1
= Dj, for j = ilo,...,ihi
= Pj, for j = ihi+1,..., n.
rscale contains details of the permutations and scaling factors applied to the
right side of A and B.
If Pj is the index of the column interchanged with column j, and Dj is the
scaling factor applied to column j, then
rscale[j - 1] = Pj, for j = 1,..., ilo-1
= Dj, for j = ilo,...,ihi
= Pj, for j = ihi+1,..., n
The order in which the interchanges are made is n to ihi+1, then 1 to ilo-1.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

?ggbak
Forms the right or left eigenvectors of a generalized
eigenvalue problem.

Syntax
lapack_int LAPACKE_sggbak( int matrix_layout, char job, char side, lapack_int n,
lapack_int ilo, lapack_int ihi, const float* lscale, const float* rscale, lapack_int m,
float* v, lapack_int ldv );
lapack_int LAPACKE_dggbak( int matrix_layout, char job, char side, lapack_int n,
lapack_int ilo, lapack_int ihi, const double* lscale, const double* rscale, lapack_int
m, double* v, lapack_int ldv );
lapack_int LAPACKE_cggbak( int matrix_layout, char job, char side, lapack_int n,
lapack_int ilo, lapack_int ihi, const float* lscale, const float* rscale, lapack_int m,
lapack_complex_float* v, lapack_int ldv );
lapack_int LAPACKE_zggbak( int matrix_layout, char job, char side, lapack_int n,
lapack_int ilo, lapack_int ihi, const double* lscale, const double* rscale, lapack_int
m, lapack_complex_double* v, lapack_int ldv );

Include Files
• mkl.h

Description

The routine forms the right or left eigenvectors of a real/complex generalized eigenvalue problem

955
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

A*x = λ*B*x
by backward transformation on the computed eigenvectors of the balanced pair of matrices output by ggbal.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

job Specifies the type of backward transformation required. Must be 'N', 'P',
'S', or 'B'.
If job = 'N', then no operations are done; return.

If job = 'P', then do backward transformation for permutation only.

If job = 'S', then do backward transformation for scaling only.

If job = 'B', then do backward transformation for both permutation and

scaling. This argument must be the same as the argument job supplied
to ?ggbal.

side Must be 'L' or 'R'.

If side = 'L', then v contains left eigenvectors.

If side = 'R', then v contains right eigenvectors.

n The number of rows of the matrix V (n≥ 0).

ilo, ihi The integers ilo and ihi determined by ?gebal. Constraint:

If n > 0, then 1 ≤ilo≤ihi≤n;

if n = 0, then ilo = 1 and ihi = 0.

lscale, rscale Arrays, size at least max(1, n).

The array lscale contains details of the permutations and/or scaling factors
applied to the left side of A and B, as returned by ?ggbal.

The array rscale contains details of the permutations and/or scaling factors
applied to the right side of A and B, as returned by ?ggbal.

m The number of columns of the matrix V

(m≥ 0).

v Array v(size max(1, ldv*m) for column major layout and max(1, ldv*n) for
row major layout) . Contains the matrix of right or left eigenvectors to be
transformed, as returned by tgevc.

ldv The leading dimension of v; at least max(1, n) for column major layout and
at least max(1, m) for row major layout .

Output Parameters

v Overwritten by the transformed eigenvectors

Return Values
This function returns a value info.

If info=0, the execution is successful.

956
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = -i, the i-th parameter had an illegal value.

?gghd3
Reduces a pair of matrices to generalized upper
Hessenberg form.

Syntax
lapack_int LAPACKE_sgghd3 (int matrix_layout, char compq, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, float * a, lapack_int lda, float * b, lapack_int ldb,
float * q, lapack_int ldq, float * z, lapack_int ldz);
lapack_int LAPACKE_dgghd3 (int matrix_layout, char compq, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, double * a, lapack_int lda, double * b, lapack_int ldb,
double * q, lapack_int ldq, double * z, lapack_int ldz);
lapack_int LAPACKE_cgghd3 (int matrix_layout, char compq, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, lapack_complex_float * a, lapack_int lda,
lapack_complex_float * b, lapack_int ldb, lapack_complex_float * q, lapack_int ldq,
lapack_complex_float * z, lapack_int ldz);
lapack_int LAPACKE_zgghd3 (int matrix_layout, char compq, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, lapack_complex_double * a, lapack_int lda,
lapack_complex_double * b, lapack_int ldb, lapack_complex_double * q, lapack_int ldq,
lapack_complex_double * z, lapack_int ldz);

Include Files
• mkl.h

Description
?gghd3 reduces a pair of real or complex matrices (A, B) to generalized upper Hessenberg form using
orthogonal/unitary transformations, where A is a general matrix and B is upper triangular. The form of the
generalized eigenvalue problem is
A*x = λ*B*x,
and B is typically made upper triangular by computing its QR factorization and moving the orthogonal/unitary
matrix Q to the left side of the equation.
This subroutine simultaneously reduces A to a Hessenberg matrix H:
QT*A*Z = H for real flavors
or
QT*A*Z = H for complex flavors
and transforms B to another upper triangular matrix T:
QT*B*Z = T for real flavors
or
QT*B*Z = T for complex flavors
in order to reduce the problem to its standard form
H*y = λ*T*y
where y = ZT*x for real flavors
or
y = ZT*x for complex flavors.

957
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The orthogonal/unitary matrices Q and Z are determined as products of Givens rotations. They may either be
formed explicitly, or they may be postmultiplied into input matrices Q1 and Z1, so that
for real flavors:
Q1 * A * Z1T = (Q1*Q) * H * (Z1*Z)T
Q1 * B * Z1T = (Q1*Q) * T * (Z1*Z)T
for complex flavors:
Q1 * A * Z1H = (Q1*Q) * H * (Z1*Z)T
Q1 * B * Z1T = (Q1*Q) * T * (Z1*Z)T
If Q1 is the orthogonal/unitary matrix from the QR factorization of B in the original equation A*x = λ*B*x,
then ?gghd3 reduces the original problem to generalized Hessenberg form.

This is a blocked variant of ?gghrd, using matrix-matrix multiplications for parts of the computation to
enhance performance.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

compq = 'N': do not compute q;

= 'I': q is initialized to the unit matrix, and the orthogonal/unitary matrix Q

is returned;
= 'V': q must contain an orthogonal/unitary matrix Q1 on entry, and the
product Q1*q is returned.

compz = 'N': do not compute z;

= 'I': z is initialized to the unit matrix, and the orthogonal/unitary matrix Z

is returned;
= 'V': z must contain an orthogonal/unitary matrix Z1 on entry, and the
product Z1*z is returned.

n The order of the matrices A and B.

n≥ 0.

ilo, ihi ilo and ihi mark the rows and columns of a which are to be reduced. It is
assumed that a is already upper triangular in rows and columns 1:ilo - 1
and ihi + 1:n. ilo and ihi are normally set by a previous call to ?ggbal;
otherwise they should be set to 1 and n, respectively.

1 ≤ilo≤ihi≤n, if n > 0; ilo=1 and ihi=0, if n=0.

a Array, size (lda*n).

On entry, the n-by-n general matrix to be reduced.

lda The leading dimension of the array a.

lda≥ max(1,n).

b Array, (ldb*n).

On entry, then-by-n upper triangular matrix B.

958
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ldb The leading dimension of the array b.

ldb≥ max(1,n).

q Array, size (ldq*n).

On entry, if compq = 'V', the orthogonal/unitary matrix Q1, typically from

the QR factorization of b.

ldq The leading dimension of the array q.

ldq≥n if compq='V' or 'I'; ldq≥ 1 otherwise.

z Array, size (ldz*n).

On entry, if compz = 'V', the orthogonal/unitary matrix Z1.

Not referenced if compz='N'.

ldz The leading dimension of the array z. ldz≥n if compz='V' or 'I'; ldz≥ 1
otherwise.

Output Parameters

a On exit, the upper triangle and the first subdiagonal of a are

overwritten with the upper Hessenberg matrix H, and the rest is set to
zero.

b On exit, the upper triangular matrix T = QTBZ for real flavors or T =

QHBZ for complex flavors. The elements below the diagonal are set to
zero.

q On exit, if compq='I', the orthogonal/unitary matrix Q, and if compq =

'V', the product Q1*Q.
Not referenced if compq='N'.

z On exit, if compz='I', the orthogonal/unitary matrix Z, and if compz =

'V', the product Z1*Z.
Not referenced if compz='N'.

Return Values
This function returns a value info.

= 0: successful exit.
< 0: if info = -i, the i-th argument had an illegal value.

Application Notes
This routine reduces A to Hessenberg form and maintains B in using a blocked variant of Moler and Stewart's
original algorithm, as described by Kagstrom, Kressner, Quintana-Orti, and Quintana-Orti (BIT 2008).

?hgeqz
Implements the QZ method for finding the generalized
eigenvalues of the matrix pair (H,T).

959
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
lapack_int LAPACKE_shgeqz( int matrix_layout, char job, char compq, char compz,
lapack_int n, lapack_int ilo, lapack_int ihi, float* h, lapack_int ldh, float* t,
lapack_int ldt, float* alphar, float* alphai, float* beta, float* q, lapack_int ldq,
float* z, lapack_int ldz );
lapack_int LAPACKE_dhgeqz( int matrix_layout, char job, char compq, char compz,
lapack_int n, lapack_int ilo, lapack_int ihi, double* h, lapack_int ldh, double* t,
lapack_int ldt, double* alphar, double* alphai, double* beta, double* q, lapack_int ldq,
double* z, lapack_int ldz );
lapack_int LAPACKE_chgeqz( int matrix_layout, char job, char compq, char compz,
lapack_int n, lapack_int ilo, lapack_int ihi, lapack_complex_float* h, lapack_int ldh,
lapack_complex_float* t, lapack_int ldt, lapack_complex_float* alpha,
lapack_complex_float* beta, lapack_complex_float* q, lapack_int ldq,
lapack_complex_float* z, lapack_int ldz );
lapack_int LAPACKE_zhgeqz( int matrix_layout, char job, char compq, char compz,
lapack_int n, lapack_int ilo, lapack_int ihi, lapack_complex_double* h, lapack_int ldh,
lapack_complex_double* t, lapack_int ldt, lapack_complex_double* alpha,
lapack_complex_double* beta, lapack_complex_double* q, lapack_int ldq,
lapack_complex_double* z, lapack_int ldz );

Include Files
• mkl.h

Description
The routine computes the eigenvalues of a real/complex matrix pair (H,T), where H is an upper Hessenberg
matrix and T is upper triangular, using the double-shift version (for real flavors) or single-shift version (for
complex flavors) of the QZ method. Matrix pairs of this type are produced by the reduction to generalized
upper Hessenberg form of a real/complex matrix pair (A,B):
A = Q1*H*Z1H, B = Q1*T*Z1H,
as computed by ?gghrd.

For real flavors:

If job = 'S', then the Hessenberg-triangular pair (H,T) is reduced to generalized Schur form,

H = Q*S*ZT, T = Q*P*ZT,
where Q and Z are orthogonal matrices, P is an upper triangular matrix, and S is a quasi-triangular matrix
with 1-by-1 and 2-by-2 diagonal blocks. The 1-by-1 blocks correspond to real eigenvalues of the matrix pair
(H,T) and the 2-by-2 blocks correspond to complex conjugate pairs of eigenvalues.
Additionally, the 2-by-2 upper triangular diagonal blocks of P corresponding to 2-by-2 blocks of S are reduced
to positive diagonal form, that is, if Sj + 1, j is non-zero, then Pj + 1, j = Pj, j + 1 = 0, Pj, j > 0, and Pj +
1, j + 1 > 0.

For complex flavors:

If job = 'S', then the Hessenberg-triangular pair (H,T) is reduced to generalized Schur form,

H = Q* S*ZH, T = Q*P*ZH,
where Q and Z are unitary matrices, and S and P are upper triangular.
For all function flavors:

960
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Optionally, the orthogonal/unitary matrix Q from the generalized Schur factorization may be post-multiplied
by an input matrix Q1, and the orthogonal/unitary matrix Z may be post-multiplied by an input matrix Z1.
If Q1 and Z1 are the orthogonal/unitary matrices from ?gghrd that reduced the matrix pair (A,B) to
generalized upper Hessenberg form, then the output matrices Q1Q and Z1Z are the orthogonal/unitary
factors from the generalized Schur factorization of (A,B):
A = (Q1Q)*S *(Z1Z)H, B = (Q1Q)*P*(Z1Z)H.
To avoid overflow, eigenvalues of the matrix pair (H,T) (equivalently, of (A,B)) are computed as a pair of
values (alpha,beta). For chgeqz/zhgeqz, alpha and beta are complex, and for shgeqz/dhgeqz, alpha is
complex and beta real. If beta is nonzero, λ = alpha/beta is an eigenvalue of the generalized
nonsymmetric eigenvalue problem (GNEP)
A*x = λ*B*x
and if alpha is nonzero, μ = beta/alpha is an eigenvalue of the alternate form of the GNEP

μ*A*y = B*y .
Real eigenvalues (for real flavors) or the values of alpha and beta for the i-th eigenvalue (for complex
flavors) can be read directly from the generalized Schur form:
alpha = Si, i, beta = Pi, i.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

job Specifies the operations to be performed. Must be 'E' or 'S'.

If job = 'E', then compute eigenvalues only;

If job = 'S', then compute eigenvalues and the Schur form.

compq Must be 'N', 'I', or 'V'.

If compq = 'N', left Schur vectors (q) are not computed;

If compq = 'I', q is initialized to the unit matrix and the matrix of left
Schur vectors of (H,T) is returned;
If compq = 'V', q must contain an orthogonal/unitary matrix Q1 on entry
and the product Q1*Q is returned.

compz Must be 'N', 'I', or 'V'.

If compz = 'N', right Schur vectors (z) are not computed;

If compz = 'I', z is initialized to the unit matrix and the matrix of right
Schur vectors of (H,T) is returned;
If compz = 'V', z must contain an orthogonal/unitary matrix Z1 on entry
and the product Z1*Z is returned.

n The order of the matrices H, T, Q, and Z

(n≥ 0).

ilo, ihi ilo and ihi mark the rows and columns of H which are in Hessenberg form.
It is assumed that H is already upper triangular in rows and columns 1:ilo-1
and ihi+1:n.
Constraint:

961
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If n > 0, then 1 ≤ilo≤ihi≤n;

if n = 0, then ilo = 1 and ihi = 0.

h, t, q, z Arrays:
On entry, h (size max(1, ldh*n)) contains the n-by-n upper Hessenberg
matrix H.
On entry, t (size max(1, ldt*n)) contains the n-by-n upper triangular
matrix T.
q (size max(1, ldq*n)) :

On entry, if compq = 'V', this array contains the orthogonal/unitary matrix

Q1 used in the reduction of (A,B) to generalized Hessenberg form.
If compq = 'N', then q is not referenced.

z (size max(1, ldz*n)) :

On entry, if compz = 'V', this array contains the orthogonal/unitary matrix

Z1 used in the reduction of (A,B) to generalized Hessenberg form.
If compz = 'N', then z is not referenced.

ldh The leading dimension of h; at least max(1, n).

ldt The leading dimension of t; at least max(1, n).

ldq The leading dimension of q;

If compq = 'N', then ldq≥ 1.

If compq = 'I'or 'V', then ldq≥ max(1, n).

ldz The leading dimension of z;

If compq = 'N', then ldz≥ 1.

If compq = 'I'or 'V', then ldz≥ max(1, n).

Output Parameters

h For real flavors:

If job = 'S', then on exit h contains the upper quasi-triangular matrix S
from the generalized Schur factorization.
If job = 'E', then on exit the diagonal blocks of h match those of S, but
the rest of h is unspecified.
For complex flavors:
If job = 'S', then, on exit, h contains the upper triangular matrix S from
the generalized Schur factorization.
If job = 'E', then on exit the diagonal of h matches that of S, but the rest
of h is unspecified.

t If job = 'S', then, on exit, t contains the upper triangular matrix P from
the generalized Schur factorization.
For real flavors:

962
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
2-by-2 diagonal blocks of P corresponding to 2-by-2 blocks of S are reduced
to positive diagonal form, that is, if h(j+1,j) is non-zero, then t(j
+1,j)=t(j,j+1)=0 and t(j,j) and t(j+1,j+1) will be positive.
If job = 'E', then on exit the diagonal blocks of t match those of P, but
the rest of t is unspecified.
For complex flavors:
if job = 'E', then on exit the diagonal of t matches that of P, but the rest
of t is unspecified.

alphar, alphai Arrays, size at least max(1, n). The real and imaginary parts, respectively,
of each scalar alpha defining an eigenvalue of GNEP.
If alphai[j - 1] is zero, then the j-th eigenvalue is real; if positive, then the
j-th and (j+1)-th eigenvalues are a complex conjugate pair, with
alphai[j] = -alphai[j - 1].

alpha Array, size at least max(1, n).

The complex scalars alpha that define the eigenvalues of GNEP. alphai[i
- 1] = Si, i in the generalized Schur factorization.

beta Array, size at least max(1, n).

For real flavors:
The scalars beta that define the eigenvalues of GNEP.
Together, the quantities alpha = (alphar[j - 1], alphai[j - 1]) and
beta = beta[j - 1] represent the j-th eigenvalue of the matrix pair
(A,B), in one of the forms lambda = alpha/beta or mu = beta/alpha.
Since either lambda or mu may overflow, they should not, in general, be
computed.
For complex flavors:
The real non-negative scalars beta that define the eigenvalues of GNEP.
beta[i - 1] = Pi, i in the generalized Schur factorization. Together, the
quantities alpha = alpha[j - 1] and beta = beta[j - 1] represent
the j-th eigenvalue of the matrix pair (A,B), in one of the forms lambda =
alpha/beta or mu = beta/alpha. Since either lambda or mu may
overflow, they should not, in general, be computed.

q On exit, if compq = 'I', q is overwritten by the orthogonal/unitary matrix

of left Schur vectors of the pair (H,T), and if compq = 'V', q is overwritten
by the orthogonal/unitary matrix of left Schur vectors of (A,B).

z On exit, if compz = 'I', z is overwritten by the orthogonal/unitary matrix

of right Schur vectors of the pair (H,T), and if compz = 'V', z is
overwritten by the orthogonal/unitary matrix of right Schur vectors of
(A,B).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

963
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If info = 1,..., n, the QZ iteration did not converge.

(H,T) is not in Schur form, but alphar[i - 1], alphai[i - 1] (for real flavors), alpha[i - 1] (for complex flavors),
and beta[i - 1], i=info+1,..., n should be correct.

If info = n+1,...,2n, the shift calculation failed.

(H,T) is not in Schur form, but alphar[i - 1], alphai[i - 1] (for real flavors), alpha[i - 1] (for complex flavors),
and beta[i - 1], i =info-n+1,..., n should be correct.

?tgevc
Computes some or all of the right and/or left
generalized eigenvectors of a pair of upper triangular
matrices.

Syntax
lapack_int LAPACKE_stgevc (int matrix_layout, char side, char howmny, const
lapack_logical* select, lapack_int n, const float* s, lapack_int lds, const float* p,
lapack_int ldp, float* vl, lapack_int ldvl, float* vr, lapack_int ldvr, lapack_int mm,
lapack_int* m);
lapack_int LAPACKE_dtgevc (int matrix_layout, char side, char howmny, const
lapack_logical* select, lapack_int n, const double* s, lapack_int lds, const double* p,
lapack_int ldp, double* vl, lapack_int ldvl, double* vr, lapack_int ldvr, lapack_int mm,
lapack_int* m);
lapack_int LAPACKE_ctgevc (int matrix_layout, char side, char howmny, const
lapack_logical* select, lapack_int n, const lapack_complex_float* s, lapack_int lds,
const lapack_complex_float* p, lapack_int ldp, lapack_complex_float* vl, lapack_int
ldvl, lapack_complex_float* vr, lapack_int ldvr, lapack_int mm, lapack_int* m);
lapack_int LAPACKE_ztgevc (int matrix_layout, char side, char howmny, const
lapack_logical* select, lapack_int n, const lapack_complex_double* s, lapack_int lds,
const lapack_complex_double* p, lapack_int ldp, lapack_complex_double* vl, lapack_int
ldvl, lapack_complex_double* vr, lapack_int ldvr, lapack_int mm, lapack_int* m);

Include Files
• mkl.h

Description

The routine computes some or all of the right and/or left eigenvectors of a pair of real/complex matrices
(S,P), where S is quasi-triangular (for real flavors) or upper triangular (for complex flavors) and P is upper
triangular.
Matrix pairs of this type are produced by the generalized Schur factorization of a real/complex matrix pair
(A,B):
A = Q*S*ZH, B = Q*P*ZH
as computed by ?gghrd plus ?hgeqz.

The right eigenvector x and the left eigenvector y of (S,P) corresponding to an eigenvalue w are defined by:
S*x = w*P*x, yH*S = w*yH*P
The eigenvalues are not input to this routine, but are computed directly from the diagonal blocks or diagonal
elements of S and P.

964
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
This routine returns the matrices X and/or Y of right and left eigenvectors of (S,P), or the products Z*X
and/or Q*Y, where Z and Q are input matrices.
If Q and Z are the orthogonal/unitary factors from the generalized Schur factorization of a matrix pair (A,B),
then Z*X and Q*Y are the matrices of right and left eigenvectors of (A,B).

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

side Must be 'R', 'L', or 'B'.

If side = 'R', compute right eigenvectors only.

If side = 'L', compute left eigenvectors only.

If side = 'B', compute both right and left eigenvectors.

howmny Must be 'A', 'B', or 'S'.

If howmny = 'A', compute all right and/or left eigenvectors.

If howmny = 'B', compute all right and/or left eigenvectors,

backtransformed by the matrices in vr and/or vl.
If howmny = 'S', compute selected right and/or left eigenvectors, specified
by the logical array select.

select Array, size at least max (1, n).

If howmny = 'S', select specifies the eigenvectors to be computed.

If howmny = 'A'or 'B', select is not referenced.

For real flavors:

If w[j] is a real eigenvalue, the corresponding real eigenvector is computed
if select[j] is 1.

If w[j] and omega[j + 1] are the real and imaginary parts of a complex
eigenvalue, the corresponding complex eigenvector is computed if either
select[j] or select[j + 1] is 1, and on exit select[j] is set to 1and select[j +
1] is set to 0.

For complex flavors:

The eigenvector corresponding to the j-th eigenvalue is computed if
select[j] is 1.

n The order of the matrices S and P (n≥ 0).

s, p, vl, vr Arrays:
s (size max(1, lds*n)) contains the matrix S from a generalized Schur
factorization as computed by ?hgeqz. This matrix is upper quasi-triangular
for real flavors, and upper triangular for complex flavors.
p (size max(1, ldp*n)) contains the upper triangular matrix P from a
generalized Schur factorization as computed by ?hgeqz.

For real flavors, 2-by-2 diagonal blocks of P corresponding to 2-by-2 blocks

of S must be in positive diagonal form.
For complex flavors, P must have real diagonal elements.

965
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If side = 'L' or 'B' and howmny = 'B', vl(size max(1, ldvl*mm) for
column major layout and max(1, ldvl*n) for row major layout) must
contain an n-by-n matrix Q (usually the orthogonal/unitary matrix Q of left
Schur vectors returned by ?hgeqz).

If side = 'R', vl is not referenced.

If side = 'R' or 'B' and howmny = 'B', vr(size max(1, ldvr*mm) for
column major layout and max(1, ldvr*n) for row major layout) must
contain an n-by-n matrix Z (usually the orthogonal/unitary matrix Z of right
Schur vectors returned by ?hgeqz).

If side = 'L', vr is not referenced.

lds The leading dimension of s; at least max(1, n).

ldp The leading dimension of p; at least max(1, n).

ldvl The leading dimension of vl;

If side = 'L' or 'B', then ldvl≥n for column major layout and ldvl≥
max(1, mm) for row major layout.

If side = 'R', then ldvl≥ 1 .

ldvr The leading dimension of vr;

If side = 'R' or 'B', then ldvr≥n for column major layout and ldvr≥
max(1, mm) for row major layout.

If side = 'L', then ldvr≥ 1.

mm The number of columns in the arrays vl and/or vr (mm≥m).

Output Parameters

vl On exit, if side = 'L' or 'B', vl contains:

if howmny = 'A', the matrix Y of left eigenvectors of (S,P);

if howmny = 'B', the matrix Q*Y;

if howmny = 'S', the left eigenvectors of (S,P) specified by select, stored

consecutively in the columns of vl, in the same order as their eigenvalues.
For real flavors:
A complex eigenvector corresponding to a complex eigenvalue is stored in
two consecutive columns, the first holding the real part, and the second the
imaginary part.

vr On exit, if side = 'R' or 'B', vr contains:

if howmny = 'A', the matrix X of right eigenvectors of (S,P);

if howmny = 'B', the matrix Z*X;

if howmny = 'S', the right eigenvectors of (S,P) specified by select, stored

consecutively in the columns of vr, in the same order as their eigenvalues.
For real flavors:

966
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
A complex eigenvector corresponding to a complex eigenvalue is stored in
two consecutive columns, the first holding the real part, and the second the
imaginary part.

m The number of columns in the arrays vl and/or vr actually used to store the
eigenvectors.
If howmny = 'A' or 'B', m is set to n.

For real flavors:

Each selected real eigenvector occupies one column and each selected
complex eigenvector occupies two columns.
For complex flavors:
Each selected eigenvector occupies one column.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

For real flavors:

if info = i>0, the 2-by-2 block (i:i+1) does not have a complex eigenvalue.

?tgexc
Reorders the generalized Schur decomposition of a
pair of matrices (A,B) so that one diagonal block of
(A,B) moves to another row index.

Syntax
lapack_int LAPACKE_stgexc (int matrix_layout, lapack_logical wantq, lapack_logical
wantz, lapack_int n, float* a, lapack_int lda, float* b, lapack_int ldb, float* q,
lapack_int ldq, float* z, lapack_int ldz, lapack_int* ifst, lapack_int* ilst);
lapack_int LAPACKE_dtgexc (int matrix_layout, lapack_logical wantq, lapack_logical
wantz, lapack_int n, double* a, lapack_int lda, double* b, lapack_int ldb, double* q,
lapack_int ldq, double* z, lapack_int ldz, lapack_int* ifst, lapack_int* ilst);
lapack_int LAPACKE_ctgexc (int matrix_layout, lapack_logical wantq, lapack_logical
wantz, lapack_int n, lapack_complex_float* a, lapack_int lda, lapack_complex_float* b,
lapack_int ldb, lapack_complex_float* q, lapack_int ldq, lapack_complex_float* z,
lapack_int ldz, lapack_int ifst, lapack_int ilst);
lapack_int LAPACKE_ztgexc (int matrix_layout, lapack_logical wantq, lapack_logical
wantz, lapack_int n, lapack_complex_double* a, lapack_int lda, lapack_complex_double*
b, lapack_int ldb, lapack_complex_double* q, lapack_int ldq, lapack_complex_double* z,
lapack_int ldz, lapack_int ifst, lapack_int ilst);

Include Files
• mkl.h

Description

967
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The routine reorders the generalized real-Schur/Schur decomposition of a real/complex matrix pair (A,B)
using an orthogonal/unitary equivalence transformation
(A,B) = Q*(A,B)*ZH,
so that the diagonal block of (A, B) with row index ifst is moved to row ilst. Matrix pair (A, B) must be in a
generalized real-Schur/Schur canonical form (as returned by gges), that is, A is block upper triangular with
1-by-1 and 2-by-2 diagonal blocks and B is upper triangular. Optionally, the matrices Q and Z of generalized
Schur vectors are updated.
Qin*Ain*ZinT = Qout*Aout*ZoutT
Qin*Bin*ZinT = Qout*Bout*ZoutT.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

wantq, wantz If wantq = 1, update the left transformation matrix Q;

If wantq = 0, do not update Q;

If wantz = 1, update the right transformation matrix Z;

If wantz = 0, do not update Z.

n The order of the matrices A and B (n≥ 0).

a, b, q, z Arrays:
a (size max(1, lda*n)) contains the matrix A.

b (size max(1, ldb*n)) contains the matrix B.

q (size at least 1 if wantq = 0 and at least max(1, ldq*n) if wantq = 1)

If wantq = 0, then q is not referenced.

If wantq = 1, then q must contain the orthogonal/unitary matrix Q.

z (size at least 1 if wantz = 0 and at least max(1, ldz*n) if wantz = 1)

If wantz = 0, then z is not referenced.

If wantz = 1, then z must contain the orthogonal/unitary matrix Z.

lda The leading dimension of a; at least max(1, n).

ldb The leading dimension of b; at least max(1, n).

ldq The leading dimension of q;

If wantq = 0, then ldq≥ 1.

If wantq = 1, then ldq≥ max(1, n).

ldz The leading dimension of z;

If wantz = 0, then ldz≥ 1.

If wantz = 1, then ldz≥ max(1, n).

ifst, ilst Specify the reordering of the diagonal blocks of (A, B). The block with row
index ifst is moved to row ilst, by a sequence of swapping between adjacent
blocks. Constraint: 1 ≤ifst, ilst≤n.

968
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

a, b, q, z Overwritten by the updated matrices A,B, Q, and Z respectively.

ifst, ilst Overwritten for real flavors only.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = 1, the transformed matrix pair (A, B) would be too far from generalized Schur form; the problem
is ill-conditioned. (A, B) may have been partially reordered, and ilst points to the first row of the current
position of the block being moved.

?tgsen
Reorders the generalized Schur decomposition of a
pair of matrices (A,B) so that a selected cluster of
eigenvalues appears in the leading diagonal blocks of
(A,B).

Syntax
lapack_int LAPACKE_stgsen( int matrix_layout, lapack_int ijob, lapack_logical wantq,
lapack_logical wantz, const lapack_logical* select, lapack_int n, float* a, lapack_int
lda, float* b, lapack_int ldb, float* alphar, float* alphai, float* beta, float* q,
lapack_int ldq, float* z, lapack_int ldz, lapack_int* m, float* pl, float* pr, float*
dif );
lapack_int LAPACKE_dtgsen( int matrix_layout, lapack_int ijob, lapack_logical wantq,
lapack_logical wantz, const lapack_logical* select, lapack_int n, double* a, lapack_int
lda, double* b, lapack_int ldb, double* alphar, double* alphai, double* beta, double* q,
lapack_int ldq, double* z, lapack_int ldz, lapack_int* m, double* pl, double* pr,
double* dif );
lapack_int LAPACKE_ctgsen( int matrix_layout, lapack_int ijob, lapack_logical wantq,
lapack_logical wantz, const lapack_logical* select, lapack_int n, lapack_complex_float*
a, lapack_int lda, lapack_complex_float* b, lapack_int ldb, lapack_complex_float*
alpha, lapack_complex_float* beta, lapack_complex_float* q, lapack_int ldq,
lapack_complex_float* z, lapack_int ldz, lapack_int* m, float* pl, float* pr, float*
dif );
lapack_int LAPACKE_ztgsen( int matrix_layout, lapack_int ijob, lapack_logical wantq,
lapack_logical wantz, const lapack_logical* select, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* b, lapack_int ldb,
lapack_complex_double* alpha, lapack_complex_double* beta, lapack_complex_double* q,
lapack_int ldq, lapack_complex_double* z, lapack_int ldz, lapack_int* m, double* pl,
double* pr, double* dif );

Include Files
• mkl.h

969
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Description

The routine reorders the generalized real-Schur/Schur decomposition of a real/complex matrix pair (A, B) (in
terms of an orthogonal/unitary equivalence transformation QT*(A,B)*Z for real flavors or QH*(A,B)*Z for
complex flavors), so that a selected cluster of eigenvalues appears in the leading diagonal blocks of the pair
(A, B). The leading columns of Q and Z form orthonormal/unitary bases of the corresponding left and right
eigenspaces (deflating subspaces).
(A, B) must be in generalized real-Schur/Schur canonical form (as returned by gges), that is, A and B are
both upper triangular.
?tgsen also computes the generalized eigenvalues
ωj = (alphar(j) + alphai(j)*i)/beta(j) (for real flavors)
ωj = alpha(j)/beta(j) (for complex flavors)
of the reordered matrix pair (A, B).
Optionally, the routine computes the estimates of reciprocal condition numbers for eigenvalues and
eigenspaces. These are Difu[(A11, B11), (A22, B22)] and Difl[(A11, B11), (A22, B22)], that is, the
separation(s) between the matrix pairs (A11, B11) and (A22, B22) that correspond to the selected cluster and
the eigenvalues outside the cluster, respectively, and norms of "projections" onto left and right eigenspaces
with respect to the selected cluster in the (1,1)-block.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

ijob Specifies whether condition numbers are required for the cluster of
eigenvalues (pl and pr) or the deflating subspaces Difu and Difl.

If ijob =0, only reorder with respect to select;

If ijob =1, reciprocal of norms of "projections" onto left and right

eigenspaces with respect to the selected cluster (pl and pr);
If ijob =2, compute upper bounds on Difu and Difl, using F-norm-based
estimate (dif (1:2));
If ijob =3, compute estimate of Difu and Difl, using 1-norm-based
estimate (dif (1:2)). This option is about 5 times as expensive as ijob =2;

If ijob =4,>compute pl, pr and dif (i.e., options 0, 1 and 2 above). This is
an economic version to get it all;
If ijob =5, compute pl, pr and dif (i.e., options 0, 1 and 3 above).

wantq, wantz If wantq = 1, update the left transformation matrix Q;

If wantq = 0, do not update Q;

If wantz = 1, update the right transformation matrix Z;

If wantz = 0, do not update Z.

select Array, size at least max (1, n). Specifies the eigenvalues in the selected
cluster.

970
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
To select an eigenvalue ωj, select[j - 1] must be 1For real flavors: to select
a complex conjugate pair of eigenvalues ωj and ωj + 1 (corresponding 2 by 2
diagonal block), select[j - 1] and/or select[j] must be set to 1; the complex
conjugate ωj and ωj + 1 must be either both included in the cluster or both
excluded.

n The order of the matrices A and B (n≥ 0).

a, b, q, z Arrays:
a (size max(1, lda*n)) contains the matrix A.

For real flavors: A is upper quasi-triangular, with (A, B) in generalized real

Schur canonical form.
For complex flavors: A is upper triangular, in generalized Schur canonical
form.
b (size max(1, ldb*n)) contains the matrix B.

For real flavors: B is upper triangular, with (A, B) in generalized real Schur
canonical form.
For complex flavors: B is upper triangular, in generalized Schur canonical
form.
q (size at least 1 if wantq = 0 and at least max(1, ldq*n) if wantq = 1)

If wantq = 1, then q is an n-by-n matrix;

If wantq = 0, then q is not referenced.

z (size at least 1 if wantz = 0 and at least max(1, ldz*n) if wantz = 1)

If wantz = 1, then z is an n-by-n matrix;

If wantz = 0, then z is not referenced.

lda The leading dimension of a; at least max(1, n).

ldb The leading dimension of b; at least max(1, n).

ldq The leading dimension of q; ldq≥ 1.

If wantq = 1, then ldq≥ max(1, n).

ldz The leading dimension of z; ldz≥ 1.

If wantz = 1, then ldz≥ max(1, n).

Output Parameters

a, b Overwritten by the reordered matrices A and B, respectively.

alphar, alphai Arrays, size at least max(1, n). Contain values that form generalized
eigenvalues in real flavors.
See beta.

alpha Array, size at least max(1, n). Contain values that form generalized
eigenvalues in complex flavors.
See beta.

beta Array, size at least max(1, n).

971
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

For real flavors:

On exit, (alphar[j] + alphai[j]*i)/beta[j], j=0,..., n - 1, will be the
generalized eigenvalues.
alphar[j] + alphai[j]*i and beta[j], j=0,..., n - 1 are the diagonals
of the complex Schur form (S,T) that would result if the 2-by-2 diagonal
blocks of the real generalized Schur form of (A,B) were further reduced to
triangular form using complex unitary transformations.
If alphai[j - 1] is zero, then the j-th eigenvalue is real; if positive, then
the j-th and (j + 1)-st eigenvalues are a complex conjugate pair, with
alphai[j] negative.
For complex flavors:
The diagonal elements of A and B, respectively, when the pair (A,B) has
been reduced to generalized Schur form. alpha[i]/beta[i],i=0,..., n - 1
are the generalized eigenvalues.

q If wantq = 1, then, on exit, Q has been postmultiplied by the left

orthogonal transformation matrix which reorder (A, B). The leading m
columns of Q form orthonormal bases for the specified pair of left
eigenspaces (deflating subspaces).

z If wantz = 1, then, on exit, Z has been postmultiplied by the left orthogonal

transformation matrix which reorder (A, B). The leading m columns of Z
form orthonormal bases for the specified pair of left eigenspaces (deflating
subspaces).

m The dimension of the specified pair of left and right eigen-spaces (deflating
subspaces); 0 ≤m≤n.

pl, pr If ijob = 1, 4, or 5, pl and pr are lower bounds on the reciprocal of the

norm of "projections" onto left and right eigenspaces with respect to the
selected cluster.
0 < pl, pr≤ 1. If m = 0 or m = n, pl = pr = 1.
If ijob = 0, 2 or 3, pl and pr are not referenced

dif Array, size (2).

If ijob≥ 2, dif(1:2) store the estimates of Difu and Difl.

If ijob = 2 or 4, dif(1:2) are F-norm-based upper bounds on Difu and

Difl.
If ijob = 3 or 5, dif(1:2) are 1-norm-based estimates of Difu and Difl.

If m = 0 or m = n, dif(1:2) = F-norm([A, B]).

If ijob = 0 or 1, dif is not referenced.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

972
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = 1, Reordering of (A, B) failed because the transformed matrix pair (A, B) would be too far from
generalized Schur form; the problem is very ill-conditioned. (A, B) may have been partially reordered.
If ijob > 0, 0 is returned in dif, pl and pr.

?tgsyl
Solves the generalized Sylvester equation.

Syntax
lapack_int LAPACKE_stgsyl( int matrix_layout, char trans, lapack_int ijob, lapack_int
m, lapack_int n, const float* a, lapack_int lda, const float* b, lapack_int ldb, float*
c, lapack_int ldc, const float* d, lapack_int ldd, const float* e, lapack_int lde,
float* f, lapack_int ldf, float* scale, float* dif );
lapack_int LAPACKE_dtgsyl( int matrix_layout, char trans, lapack_int ijob, lapack_int
m, lapack_int n, const double* a, lapack_int lda, const double* b, lapack_int ldb,
double* c, lapack_int ldc, const double* d, lapack_int ldd, const double* e, lapack_int
lde, double* f, lapack_int ldf, double* scale, double* dif );
lapack_int LAPACKE_ctgsyl( int matrix_layout, char trans, lapack_int ijob, lapack_int
m, lapack_int n, const lapack_complex_float* a, lapack_int lda, const
lapack_complex_float* b, lapack_int ldb, lapack_complex_float* c, lapack_int ldc, const
lapack_complex_float* d, lapack_int ldd, const lapack_complex_float* e, lapack_int lde,
lapack_complex_float* f, lapack_int ldf, float* scale, float* dif );
lapack_int LAPACKE_ztgsyl( int matrix_layout, char trans, lapack_int ijob, lapack_int
m, lapack_int n, const lapack_complex_double* a, lapack_int lda, const
lapack_complex_double* b, lapack_int ldb, lapack_complex_double* c, lapack_int ldc,
const lapack_complex_double* d, lapack_int ldd, const lapack_complex_double* e,
lapack_int lde, lapack_complex_double* f, lapack_int ldf, double* scale, double* dif );

Include Files
• mkl.h

Description

The routine solves the generalized Sylvester equation:

A*R-L*B = scale*C
D*R-L*E = scale*F
where R and L are unknown m-by-n matrices, (A, D), (B, E) and (C, F) are given matrix pairs of size m-by-
m, n-by-n and m-by-n, respectively, with real/complex entries. (A, D) and (B, E) must be in generalized real-
Schur/Schur canonical form, that is, A, B are upper quasi-triangular/triangular and D, E are upper triangular.
The solution (R, L) overwrites (C, F). The factor scale, 0≤scale≤1, is an output scaling factor chosen to avoid
overflow.
In matrix notation the above equation is equivalent to the following: solve Z*x = scale*b, where Z is
defined as

973
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Here Ik is the identity matrix of size k and XT is the transpose/conjugate-transpose of X. kron(X, Y) is the
Kronecker product between the matrices X and Y.
If trans = 'T' (for real flavors), or trans = 'C' (for complex flavors), the routine ?tgsyl solves the
transposed/conjugate-transposed system ZT*y = scale*b, which is equivalent to solve for R and L in

AT*R+DT*L = scale*C
R*BT+L*ET = scale*(-F)
This case (trans = 'T' for stgsyl/dtgsyl or trans = 'C' for ctgsyl/ztgsyl) is used to compute an
one-norm-based estimate of Dif[(A, D), (B, E)], the separation between the matrix pairs (A,D) and
(B,E).
If ijob ≥ 1, ?tgsyl computes a Frobenius norm-based estimate of Dif[(A, D), (B,E)]. That is, the
reciprocal of a lower bound on the reciprocal of the smallest singular value of Z. This is a level 3 BLAS
algorithm.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

trans Must be 'N', 'T', or 'C'.

If trans = 'N', solve the generalized Sylvester equation.

If trans = 'T', solve the 'transposed' system (for real flavors only).

If trans = 'C', solve the ' conjugate transposed' system (for complex
flavors only).

ijob Specifies what kind of functionality to be performed:

If ijob =0, solve the generalized Sylvester equation only;

If ijob =1, perform the functionality of ijob =0 and ijob =3;

If ijob =2, perform the functionality of ijob =0 and ijob =4;

If ijob =3, only an estimate of Dif[(A, D), (B, E)] is computed (look ahead
strategy is used);
If ijob =4, only an estimate of Dif[(A, D), (B,E)] is computed (?gecon on
sub-systems is used). If trans = 'T' or 'C', ijob is not referenced.

m The order of the matrices A and D, and the row dimension of the matrices
C, F, R and L.

n The order of the matrices B and E, and the column dimension of the
matrices C, F, R and L.

a, b, c, d, e, f Arrays:

974
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
a (size max(1, lda*m)) contains the upper quasi-triangular (for real flavors)
or upper triangular (for complex flavors) matrix A.
b (size max(1, ldb*n)) contains the upper quasi-triangular (for real flavors)
or upper triangular (for complex flavors) matrix B.
c(size max(1, ldc*n) for column major layout and max(1, ldc*m) for row
major layout) contains the right-hand-side of the first matrix equation in
the generalized Sylvester equation (as defined by trans)
d (size max(1, ldd*m)) contains the upper triangular matrix D.

e (size max(1, lde*n)) contains the upper triangular matrix E.

f(size max(1, ldf*n) for column major layout and max(1, ldf*m) for row
major layout) contains the right-hand-side of the second matrix equation in
the generalized Sylvester equation (as defined by trans)

lda The leading dimension of a; at least max(1, m).

ldb The leading dimension of b; at least max(1, n).

ldc The leading dimension of c; at least max(1, m) for column major layout and
at least max(1, n) for row major layout .

ldd The leading dimension of d; at least max(1, m).

lde The leading dimension of e; at least max(1, n).

ldf The leading dimension of f; at least max(1, m) for column major layout and
at least max(1, n) for row major layout .

Output Parameters

c If ijob=0, 1, or 2, overwritten by the solution R.

If ijob=3 or 4 and trans = 'N', c holds R, the solution achieved during the
computation of the Dif-estimate.

f If ijob=0, 1, or 2, overwritten by the solution L.

If ijob=3 or 4 and trans = 'N', f holds L, the solution achieved during

the computation of the Dif-estimate.

dif On exit, dif is the reciprocal of a lower bound of the reciprocal of the Dif-
function, that is, dif is an upper bound of Dif[(A, D), (B, E)] =
sigma_min(Z), where Z as defined in the description.
If ijob = 0, or trans = 'T' (for real flavors), or trans = 'C' (for
complex flavors), dif is not touched.

scale On exit, scale is the scaling factor in the generalized Sylvester equation.
If 0 < scale < 1, c and f hold the solutions R and L, respectively, to a
slightly perturbed system but the input matrices A, B, D and E have not
been changed.
If scale = 0, c and f hold the solutions R and L, respectively, to the
homogeneous system with C = F = 0. Normally, scale = 1.

975
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info > 0, (A, D) and (B, E) have common or close eigenvalues.

?tgsna
Estimates reciprocal condition numbers for specified
eigenvalues and/or eigenvectors of a pair of matrices
in generalized real Schur canonical form.

Syntax
lapack_int LAPACKE_stgsna( int matrix_layout, char job, char howmny, const
lapack_logical* select, lapack_int n, const float* a, lapack_int lda, const float* b,
lapack_int ldb, const float* vl, lapack_int ldvl, const float* vr, lapack_int ldvr,
float* s, float* dif, lapack_int mm, lapack_int* m );
lapack_int LAPACKE_dtgsna( int matrix_layout, char job, char howmny, const
lapack_logical* select, lapack_int n, const double* a, lapack_int lda, const double* b,
lapack_int ldb, const double* vl, lapack_int ldvl, const double* vr, lapack_int ldvr,
double* s, double* dif, lapack_int mm, lapack_int* m );
lapack_int LAPACKE_ctgsna( int matrix_layout, char job, char howmny, const
lapack_logical* select, lapack_int n, const lapack_complex_float* a, lapack_int lda,
const lapack_complex_float* b, lapack_int ldb, const lapack_complex_float* vl,
lapack_int ldvl, const lapack_complex_float* vr, lapack_int ldvr, float* s, float* dif,
lapack_int mm, lapack_int* m );
lapack_int LAPACKE_ztgsna( int matrix_layout, char job, char howmny, const
lapack_logical* select, lapack_int n, const lapack_complex_double* a, lapack_int lda,
const lapack_complex_double* b, lapack_int ldb, const lapack_complex_double* vl,
lapack_int ldvl, const lapack_complex_double* vr, lapack_int ldvr, double* s, double*
dif, lapack_int mm, lapack_int* m );

Include Files
• mkl.h

Description

The real flavors stgsna/dtgsna of this routine estimate reciprocal condition numbers for specified
eigenvalues and/or eigenvectors of a matrix pair (A, B) in generalized real Schur canonical form (or of any
matrix pair (Q*A*ZT, Q*B*ZT) with orthogonal matrices Q and Z.
(A, B) must be in generalized real Schur form (as returned by gges/gges), that is, A is block upper triangular
with 1-by-1 and 2-by-2 diagonal blocks. B is upper triangular.
The complex flavors ctgsna/ztgsna estimate reciprocal condition numbers for specified eigenvalues and/or
eigenvectors of a matrix pair (A, B). (A, B) must be in generalized Schur canonical form, that is, A and B are
both upper triangular.

976
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

job Specifies whether condition numbers are required for eigenvalues or

eigenvectors. Must be 'E' or 'V' or 'B'.

If job = 'E', for eigenvalues only (compute s ).

If job = 'V', for eigenvectors only (compute dif ).

If job = 'B', for both eigenvalues and eigenvectors (compute both s and
dif).

howmny Must be 'A' or 'S'.

If howmny = 'A', compute condition numbers for all eigenpairs.

If howmny = 'S', compute condition numbers for selected eigenpairs

specified by the logical array select.

select Array, size at least max (1, n).

If howmny = 'S', select specifies the eigenpairs for which condition
numbers are required.
If howmny = 'A', select is not referenced.

For real flavors:

To select condition numbers for the eigenpair corresponding to a real
eigenvalue ωj, select[j - 1] must be set to 1; to select condition numbers
corresponding to a complex conjugate pair of eigenvalues ωj and ωj + 1,
either select[j - 1] or select[j] must be set to 1.

For complex flavors:

To select condition numbers for the corresponding j-th eigenvalue and/or
eigenvector, select[j - 1] must be set to 1.

n The order of the square matrix pair (A, B)

(n≥ 0).

a, b, vl, vr Arrays:
a (size max(1, lda*n)) contains the upper quasi-triangular (for real flavors)
or upper triangular (for complex flavors) matrix A in the pair (A, B).
b (size max(1, ldb*n)) contains the upper triangular matrix B in the pair
(A, B).
If job = 'E' or 'B', vl(size max(1, ldvl*m) for column major layout and
max(1, ldvl*n) for row major layout) must contain left eigenvectors of (A,
B), corresponding to the eigenpairs specified by howmny and select. The
eigenvectors must be stored in consecutive columns of vl, as returned
by ?tgevc.

If job = 'V', vl is not referenced.

977
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If job = 'E' or 'B', vr(size max(1, ldvr*m) for column major layout and
max(1, ldvr*n) for row major layout) must contain right eigenvectors of
(A, B), corresponding to the eigenpairs specified by howmny and select.
The eigenvectors must be stored in consecutive columns of vr, as returned
by ?tgevc.

If job = 'V', vr is not referenced.

lda The leading dimension of a; at least max(1, n).

ldb The leading dimension of b; at least max(1, n).

ldvl The leading dimension of vl; ldvl≥ 1.

If job = 'E' or 'B', then ldvl≥ max(1, n) for column major layout and
ldvl≥ max(1, m) for row major layout .

ldvr The leading dimension of vr; ldvr≥ 1.

If job = 'E' or 'B', then ldvr≥ max(1, n) for column major layout and
ldvr≥ max(1, m) for row major layout.

mm The number of elements in the arrays s and dif (mm≥m).

Output Parameters

s Array, size mm.

If job = 'E' or 'B', contains the reciprocal condition numbers of the
selected eigenvalues, stored in consecutive elements of the array.
If job = 'V', s is not referenced.

For real flavors:

For a complex conjugate pair of eigenvalues two consecutive elements of s
are set to the same value. Thus, s[j - 1], dif[j - 1], and the j-th columns of
vl and vr all correspond to the same eigenpair (but not in general the j-th
eigenpair, unless all eigenpairs are selected).

dif Array, size mm.

If job = 'V' or 'B', contains the estimated reciprocal condition numbers
of the selected eigenvectors, stored in consecutive elements of the array.
If the eigenvalues cannot be reordered to compute dif[j], dif[j] is set to 0;
this can only occur when the true value would be very small anyway.
If job = 'E', dif is not referenced.

For real flavors:

For a complex eigenvector, two consecutive elements of dif are set to the
same value.
For complex flavors:
For each eigenvalue/vector specified by select, dif stores a Frobenius norm-
based estimate of Difl.

m The number of elements in the arrays s and dif used to store the specified
condition numbers; for each selected eigenvalue one element is used.

978
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If howmny = 'A', m is set to n.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Generalized Singular Value Decomposition: LAPACK Computational Routines

This topic describes LAPACK computational routines used for finding the generalized singular value
decomposition (GSVD) of two matrices A and B as
UHAQ = D1*(0 R),
VHBQ = D2*(0 R),
where U, V, and Q are orthogonal/unitary matrices, R is a nonsingular upper triangular matrix, and D1, D2
are “diagonal” matrices of the structure detailed in the routines description section.
Table “Computational Routines for Generalized Singular Value Decomposition” lists LAPACK routines that
perform generalized singular value decomposition of matrices.

Computational Routines for Generalized Singular Value Decomposition

Routine name Operation performed

ggsvp Computes the preprocessing decomposition for the generalized SVD

ggsvp3 Performs preprocessing for a generalized SVD.

ggsvd3 Computes generalized SVD.

tgsja Computes the generalized SVD of two upper triangular or trapezoidal

matrices

You can use routines listed in the above table as well as the driver routine ggsvd to find the GSVD of a pair of
general rectangular matrices.

?ggsvp
Computes the preprocessing decomposition for the
generalized SVD (deprecated).

Syntax
lapack_int LAPACKE_sggsvp( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, float* a, lapack_int lda, float* b, lapack_int
ldb, float tola, float tolb, lapack_int* k, lapack_int* l, float* u, lapack_int ldu,
float* v, lapack_int ldv, float* q, lapack_int ldq );
lapack_int LAPACKE_dggsvp( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, double* a, lapack_int lda, double* b,
lapack_int ldb, double tola, double tolb, lapack_int* k, lapack_int* l, double* u,
lapack_int ldu, double* v, lapack_int ldv, double* q, lapack_int ldq );
lapack_int LAPACKE_cggsvp( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, lapack_complex_float* a, lapack_int lda,
lapack_complex_float* b, lapack_int ldb, float tola, float tolb, lapack_int* k,
lapack_int* l, lapack_complex_float* u, lapack_int ldu, lapack_complex_float* v,
lapack_int ldv, lapack_complex_float* q, lapack_int ldq );

979
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lapack_int LAPACKE_zggsvp( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, lapack_complex_double* a, lapack_int lda,
lapack_complex_double* b, lapack_int ldb, double tola, double tolb, lapack_int* k,
lapack_int* l, lapack_complex_double* u, lapack_int ldu, lapack_complex_double* v,
lapack_int ldv, lapack_complex_double* q, lapack_int ldq );

Include Files
• mkl.h

Description
This routine is deprecated; use ggsvp3.

The routine computes orthogonal matrices U, V and Q such that

where the k-by-k matrix A12 and l-by-l matrix B13 are nonsingular upper triangular; A23 is l-by-l upper
triangular if m-k-l≥0, otherwise A23 is (m-k)-by-l upper trapezoidal. The sum k+l is equal to the effective
numerical rank of the (m+p)-by-n matrix (AH,BH)H.
This decomposition is the preprocessing step for computing the Generalized Singular Value Decomposition
(GSVD), see subroutine ?tgsja.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobu Must be 'U' or 'N'.

If jobu = 'U', orthogonal/unitary matrix U is computed.

980
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If jobu = 'N', U is not computed.

jobv Must be 'V' or 'N'.

If jobv = 'V', orthogonal/unitary matrix V is computed.

If jobv = 'N', V is not computed.

jobq Must be 'Q' or 'N'.

If jobq = 'Q', orthogonal/unitary matrix Q is computed.

If jobq = 'N', Q is not computed.

m The number of rows of the matrix A (m≥ 0).

p The number of rows of the matrix B (p≥ 0).

n The number of columns of the matrices A and B (n≥ 0).

a, b Arrays:
a(size at least max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout) contains the m-by-n matrix A.
b(size at least max(1, ldb*n) for column major layout and max(1, ldb*p)
for row major layout) contains the p-by-n matrix B.

lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.

ldb The leading dimension of b; at least max(1, p)for column major layout and
max(1, n) for row major layout.

tola, tolb tola and tolb are the thresholds to determine the effective numerical rank of
matrix B and a subblock of A. Generally, they are set to
tola = max(m, n)*||A||*MACHEPS,
tolb = max(p, n)*||B||*MACHEPS.
The size of tola and tolb may affect the size of backward errors of the
decomposition.

ldu The leading dimension of the output array u . ldu≥ max(1, m) if jobu =
'U'; ldu≥ 1 otherwise.

ldv The leading dimension of the output array v . ldv≥ max(1, p) if jobv =
'V'; ldv≥ 1 otherwise.

ldq The leading dimension of the output array q . ldq≥ max(1, n) if jobq =
'Q'; ldq≥ 1 otherwise.

Output Parameters

a Overwritten by the triangular (or trapezoidal) matrix described in the

Description section.

b Overwritten by the triangular matrix described in the Description section.

k, l On exit, k and l specify the dimension of subblocks. The sum k + l is equal

to effective numerical rank of (AH, BH)H.

981
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

u, v, q Arrays:
If jobu = 'U', u (size max(1, ldu*m)) contains the orthogonal/unitary
matrix U.
If jobu = 'N', u is not referenced.

If jobv = 'V', v (size max(1, ldv*p)) contains the orthogonal/unitary

matrix V.
If jobv = 'N', v is not referenced.

If jobq = 'Q', q (size max(1, ldq*n)) contains the orthogonal/unitary

matrix Q.
If jobq = 'N', q is not referenced.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

?ggsvp3
Performs preprocessing for a generalized SVD.

Syntax
lapack_int LAPACKE_sggsvp3 (int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, float * a, lapack_int lda, float * b,
lapack_int ldb, float tola, float tolb, lapack_int * k, lapack_int * l, float * u,
lapack_int ldu, float * v, lapack_int ldv, float * q, lapack_int ldq);
lapack_int LAPACKE_dggsvp3 (int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, double * a, lapack_int lda, double * b,
lapack_int ldb, double tola, double tolb, lapack_int * k, lapack_int * l, double * u,
lapack_int ldu, double * v, lapack_int ldv, double * q, lapack_int ldq);
lapack_int LAPACKE_cggsvp3 (int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, lapack_complex_float * a, lapack_int lda,
lapack_complex_float * b, lapack_int ldb, float tola, float tolb, lapack_int * k,
lapack_int * l, lapack_complex_float * u, lapack_int ldu, lapack_complex_float * v,
lapack_int ldv, lapack_complex_float * q, lapack_int ldq);
lapack_int LAPACKE_zggsvp3 (int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, lapack_complex_double * a, lapack_int lda,
lapack_complex_double * b, lapack_int ldb, double tola, double tolb, lapack_int * k,
lapack_int * l, lapack_complex_double * u, lapack_int ldu, lapack_complex_double * v,
lapack_int ldv, lapack_complex_double * q, lapack_int ldq);

Include Files
• mkl_lapack.h

Include Files
• mkl.h

982
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
?ggsvp3 computes orthogonal or unitary matrices U, V, and Q such that
for real flavors:

n−k−l k l
T k 0 A12 A13
U AQ = if m - k - l≥ 0;
0l 0 A23
m−k−l 0 0 0
n−k−l k l
T
U AQ = k 0 A12 A13 if m - k - l< 0;
m−k 0 0 A23
n−k−l k l
T
V BQ = l 0 0 B13
p−l 00 0
for complex flavors:

n−k−l k l
H k 0 A12 A13
U AQ = if m - k - l≥ 0;
0l 0 A23
m−k−l 0 0 0
n−k−l k l
H
U AQ = k 0 A12 A13 if m - k-l< 0;
m−k 0 0 A23
n−k−l k l
H
V BQ = l 0 0 B13
p−l 00 0
where the k-by-k matrix A12 and l-by-l matrix B13 are nonsingular upper triangular; A23 is l-by-l upper
triangular if m-k-l≥ 0, otherwise A23 is (m-k-by-l upper trapezoidal. k + l = the effective numerical rank of
the (m + p)-by-n matrix (AT,BT)T for real flavors or (AH,BH)H for complex flavors.

This decomposition is the preprocessing step for computing the Generalized Singular Value Decomposition
(GSVD), see ?ggsvd3.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobu = 'U': Orthogonal/unitary matrix U is computed;

= 'N': U is not computed.

jobv = 'V': Orthogonal/unitary matrix V is computed;

= 'N': V is not computed.

jobq = 'Q': Orthogonal/unitary matrix Q is computed;

= 'N': Q is not computed.

m The number of rows of the matrix A.

m≥ 0.

983
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

p The number of rows of the matrix B.

p≥ 0.

n The number of columns of the matrices A and B.

n≥ 0.

a Array, size (lda*n).

On entry, the m-by-n matrix A.

lda The leading dimension of the array a.

lda≥ max(1,m).

b Array, size (ldb*n).

On entry, the p-by-n matrix B.

ldb The leading dimension of the array b.

ldb≥ max(1,p).

tola, tolb tola and tolb are the thresholds to determine the effective numerical rank
of matrix B and a subblock of A. Generally, they are set to
tola = max(m,n)*norm(a)*MACHEPS,
tolb = max(p,n)*norm(b)*MACHEPS.
The size of tola and tolb may affect the size of backward errors of the
decomposition.

ldu The leading dimension of the array u.

ldu≥ max(1,m) if jobu = 'U'; ldu≥ 1 otherwise.

ldv The leading dimension of the array v.

ldv≥ max(1,p) if jobv = 'V'; ldv≥ 1 otherwise.

ldq The leading dimension of the array q.

ldq≥ max(1,n) if jobq = 'Q'; ldq≥ 1 otherwise.

Output Parameters

a On exit, a contains the triangular (or trapezoidal) matrix described in

the Description section.

b On exit, b contains the triangular matrix described in the Description

section.

k, l On exit, k and l specify the dimension of the subblocks described in

Description section.
k + l = effective numerical rank of (AT,BT)T for real flavors or
(AH,BH)H for complex flavors.

u Array, size (ldu*m).

If jobu = 'U', u contains the orthogonal/unitary matrix U.

984
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If jobu = 'N', u is not referenced.

v Array, size (ldv*p).

If jobv = 'V', v contains the orthogonal/unitary matrix V.

If jobv = 'N', v is not referenced.

q Array, size (ldq*n).

If jobq = 'Q', q contains the orthogonal/unitary matrix Q.

If jobq = 'N', q is not referenced.

Return Values
This function returns a value info.

= 0: successful exit.
< 0: if info = -i, the i-th argument had an illegal value.

Application Notes
The subroutine uses LAPACK subroutine ?geqp3 for the QR factorization with column pivoting to detect the
effective numerical rank of the A matrix. It may be replaced by a better rank determination strategy.
?ggsvp3 replaces the deprecated subroutine ?ggsvp.

?ggsvd3
Computes generalized SVD.

Syntax
lapack_int LAPACKE_sggsvd3 (int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int n, lapack_int p, lapack_int * k, lapack_int * l, float * a,
lapack_int lda, float * b, lapack_int ldb, float * alpha, float * beta, float * u,
lapack_int ldu, float * v, lapack_int ldv, float * q, lapack_int ldq, lapack_int *
iwork);
lapack_int LAPACKE_dggsvd3 (int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int n, lapack_int p, lapack_int * k, lapack_int * l, double * a,
lapack_int lda, double * b, lapack_int ldb, double * alpha, double * beta, double * u,
lapack_int ldu, double * v, lapack_int ldv, double * q, lapack_int ldq, lapack_int *
iwork);
lapack_int LAPACKE_cggsvd3 (int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int n, lapack_int p, lapack_int * k, lapack_int * l,
lapack_complex_float * a, lapack_int lda, lapack_complex_float * b, lapack_int ldb,
float * alpha, float * beta, lapack_complex_float * u, lapack_int ldu,
lapack_complex_float * v, lapack_int ldv, lapack_complex_float * q, lapack_int ldq,
lapack_int * iwork);
lapack_int LAPACKE_zggsvd3 (int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int n, lapack_int p, lapack_int * k, lapack_int * l,
lapack_complex_double * a, lapack_int lda, lapack_complex_double * b, lapack_int ldb,
double * alpha, double * beta, lapack_complex_double * u, lapack_int ldu,
lapack_complex_double * v, lapack_int ldv, lapack_complex_double * q, lapack_int ldq,
lapack_int * iwork);

985
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description
?ggsvd3 computes the generalized singular value decomposition (GSVD) of an m-by-n real or complex matrix
A and p-by-n real or complex matrix B:

UTAQ = D1( 0 R ), VTBQ = D2( 0 R ) for real flavors

or
UH*A*Q = D1*( 0 R ), VH*B*Q = D2*( 0 R ) for complex flavors
where U, V and Q are orthogonal/unitary matrices.
Let k+l = the effective numerical rank of the matrix (ATBT)T for real flavors or the matrix (AH,BH)H for
complex flavors, then R is a (k + l)-by-(k + l) nonsingular upper triangular matrix, D1 and D2 are m-by-(k +
l) and p-by-(k + l) "diagonal" matrices and of the following structures, respectively:
If m-k-l≥ 0,

k l
k I 0
D1 =
l
0 C
m−k−l 0 0
k l
D2 = l 0S
p−l 0 0

n−k−l k l
0 R =k 0 R11 R12
l 0 0 R22
where
C = diag( alpha(k+1), ... , alpha(k+l) ),

S = diag( beta(k+1), ... , beta(k+l) ),

C2 + S2 = I.
If m - k - l < 0,

k m−k k+l −m
D1 = k I 0 0
m−k 0 C 0
k m−k k+l −m
m−k 0 S 0
D2 =
k+l −m 0 0 I
p−l 0 0 0

n−k−l k m−k k+l −m

k 0 R11 R12 R13
0R =
m−k 0 0 R22 R23
k+l −m 0 0 0 R33
where
C = diag(alpha(k + 1), ... , alpha(m)),

986
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
S = diag(beta(k + 1), ... , beta(m)),

C2 + S2 = I.
The routine computes C, S, R, and optionally the orthogonal/unitary transformation matrices U, V and Q.
In particular, if B is an n-by-n nonsingular matrix, then the GSVD of A and B implicitly gives the SVD of
A*inv(B):
A*inv(B) = U*(D1*inv(D2))*VT for real flavors
or
A*inv(B) = U*(D1*inv(D2))*VH for complex flavors.
If (AT,BT)T for real flavors or (AH,BH)H for complex flavors has orthonormal columns, then the GSVD of A and
B is also equal to the CS decomposition of A and B. Furthermore, the GSVD can be used to derive the
solution of the eigenvalue problem:
AT*AX = λ* BT*BX for real flavors
or
AH*AX = λ* BH*BX for complex flavors
In some literature, the GSVD of A and B is presented in the form
UT*A*X = ( 0 D1 ), VT*B*X = ( 0 D2 ) for real (A, B)
or
UH*A*X = ( 0 D1 ), VH*B*X = ( 0 D2 ) for complex (A, B)
where U and V are orthogonal and X is nonsingular, D1 and D2 are "diagonal''. The former GSVD form can be
converted to the latter form by taking the nonsingular matrix X as
I 0
X = Q*
0 inv R

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobu = 'U': Orthogonal/unitary matrix U is computed;

= 'N': U is not computed.

jobv = 'V': Orthogonal/unitary matrix V is computed;

= 'N': V is not computed.

jobq = 'Q': Orthogonal/unitary matrix Q is computed;

= 'N': Q is not computed.

m The number of rows of the matrix A.

m≥ 0.

n The number of columns of the matrices A and B.

n≥ 0.

p The number of rows of the matrix B.

p≥ 0.

a Array, size (lda*n).

987
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

On entry, the m-by-n matrix A.

lda The leading dimension of the array a.

lda≥ max(1,m).

b Array, size (ldb*n).

On entry, the p-by-n matrix B.

ldb The leading dimension of the array b.

ldb≥ max(1,p).

ldu The leading dimension of the array u.

ldu≥ max(1,m) if jobu = 'U'; ldu≥ 1 otherwise.

ldv The leading dimension of the array v.

ldv≥ max(1,p) if jobv = 'V'; ldv≥ 1 otherwise.

ldq The leading dimension of the array q.

ldq≥ max(1,n) if jobq = 'Q'; ldq≥ 1 otherwise.

iwork Array, size (n).

Output Parameters

k, l On exit, k and l specify the dimension of the subblocks described in

the Description section.
k + l = effective numerical rank of (AT,BT)T for real flavors or
(AH,BH)H for complex flavors.

a On exit, a contains the triangular matrix R, or part of R.

If m-k-l≥ 0, R is stored in the elements of array a corresponding to A1:

k + l,n - k - l + 1:n.
R11 R12 R13
If m - k - l < 0, is stored in the elements of array a
0 R22 R23
corresponding to A(1:m, n - k - l + 1:n, and R33 is stored in bthe
elements of array a corresponding to Am - k + 1:l,n + m - k - l + 1:n on
exit.

b On exit, b contains part of the triangular matrix R if m - k - l < 0.

See Description for details.

alpha Array, size (n)

beta Array, size (n)

On exit, alpha and beta contain the generalized singular value pairs
of a and b;

alpha[0: k - 1] = 1,
beta[0: k - 1] = 0,
and if m - k - l≥ 0,

988
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
alpha[k:k + l - 1] = C,
beta[k:k + l - 1] = S,
or if m - k - l < 0,

alpha[k:m - 1] = C, alpha[m: k + l - 1] = 0
beta[k: m - 1] =S, beta[m: k + l - 1] = 1
and
alpha[k + l: n - 1] = 0
beta[k + l : n - 1] = 0

u Array, size (ldu*m).

If jobu = 'U', u contains the m-by-m orthogonal/unitary matrix U.

If jobu = 'N', u is not referenced.

v Array, size (ldv*p).

If jobv = 'V', v contains the p-by-p orthogonal/unitary matrix V.

If jobv = 'N', v is not referenced.

q Array, size (ldq*n).

If jobq = 'Q', q contains the n-by-n orthogonal/unitary matrix Q.

If jobq = 'N', q is not referenced.

iwork On exit, iwork stores the sorting information. More precisely, the
following loop uses iwork to sort alpha:

for (i = k; i<min(m,k + l); i++) {

swap (alpha[i], alpha[iwork[i] - 1]);
}
such that alpha[0] ≥alpha[1] ≥ ... ≥alpha[n - 1].

Return Values
This function returns a value info.

= 0: successful exit.
< 0: if info = -i, the i-th argument had an illegal value.

> 0: if info = 1, the Jacobi-type procedure failed to converge.

For further details, see subroutine ?tgsja.

Application Notes
?ggsvd3 replaces the deprecated subroutine ?ggsvd.

?tgsja
Computes the generalized SVD of two upper triangular
or trapezoidal matrices.

989
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
lapack_int LAPACKE_stgsja( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, lapack_int k, lapack_int l, float* a,
lapack_int lda, float* b, lapack_int ldb, float tola, float tolb, float* alpha, float*
beta, float* u, lapack_int ldu, float* v, lapack_int ldv, float* q, lapack_int ldq,
lapack_int* ncycle );
lapack_int LAPACKE_dtgsja( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, lapack_int k, lapack_int l, double* a,
lapack_int lda, double* b, lapack_int ldb, double tola, double tolb, double* alpha,
double* beta, double* u, lapack_int ldu, double* v, lapack_int ldv, double* q,
lapack_int ldq, lapack_int* ncycle );
lapack_int LAPACKE_ctgsja( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, lapack_int k, lapack_int l,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* b, lapack_int ldb, float
tola, float tolb, float* alpha, float* beta, lapack_complex_float* u, lapack_int ldu,
lapack_complex_float* v, lapack_int ldv, lapack_complex_float* q, lapack_int ldq,
lapack_int* ncycle );
lapack_int LAPACKE_ztgsja( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, lapack_int k, lapack_int l,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* b, lapack_int ldb,
double tola, double tolb, double* alpha, double* beta, lapack_complex_double* u,
lapack_int ldu, lapack_complex_double* v, lapack_int ldv, lapack_complex_double* q,
lapack_int ldq, lapack_int* ncycle );

Include Files
• mkl.h

Description

The routine computes the generalized singular value decomposition (GSVD) of two real/complex upper
triangular (or trapezoidal) matrices A and B. On entry, it is assumed that matrices A and B have the following
forms, which may be obtained by the preprocessing subroutine ggsvp from a general m-by-n matrix A and p-
by-n matrix B:

990
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

where the k-by-k matrix A12 and l-by-l matrix B13 are nonsingular upper triangular; A23 is l-by-l upper
triangular if m-k-l≥0, otherwise A23 is (m-k)-by-l upper trapezoidal.

On exit,
UH*A*Q = D1*(0 R), VH*B*Q = D2*(0 R),
where U, V and Q are orthogonal/unitary matrices, R is a nonsingular upper triangular matrix, and D1 and D2
are "diagonal" matrices, which are of the following structures:
If m-k-l≥0,

991
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

where
C = diag(alpha[k],...,alpha[k+l-1])
S = diag(beta[k],...,beta[k+l-1])
C2 + S2 = I
R is stored in a(1:k+l, n-k-l+1:n ) on exit.
If m-k-l < 0,

992
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

where
C = diag(alpha[k],...,alpha[m-1]),
S = diag(beta[k],...,beta[m-1]),
C2 + S2 = I

On exit, is stored in a(1:m, n-k-l+1:n ) and R33 is stored

in b(m-k+1:l, n+m-k-l+1:n ).
The computation of the orthogonal/unitary transformation matrices U, V or Q is optional. These matrices may
either be formed explicitly, or they may be postmultiplied into input matrices U1, V1, or Q1.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobu Must be 'U', 'I', or 'N'.

If jobu = 'U', u must contain an orthogonal/unitary matrix U1 on entry.

If jobu = 'I', u is initialized to the unit matrix.

If jobu = 'N', u is not computed.

jobv Must be 'V', 'I', or 'N'.

If jobv = 'V', v must contain an orthogonal/unitary matrix V1 on entry.

If jobv = 'I', v is initialized to the unit matrix.

If jobv = 'N', v is not computed.

993
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

jobq Must be 'Q', 'I', or 'N'.

If jobq = 'Q', q must contain an orthogonal/unitary matrix Q1 on entry.

If jobq = 'I', q is initialized to the unit matrix.

If jobq = 'N', q is not computed.

m The number of rows of the matrix A (m≥ 0).

p The number of rows of the matrix B (p≥ 0).

n The number of columns of the matrices A and B (n≥ 0).

k, l Specify the subblocks in the input matrices A and B, whose GSVD is

computed.

a, b, u, v, q Arrays:
a(size at least max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout) contains the m-by-n matrix A.
b(size at least max(1, ldb*n) for column major layout and max(1, ldb*p)
for row major layout) contains the p-by-n matrix B.
If jobu = 'U', u (size max(1, ldu*m)) must contain a matrix U1 (usually
the orthogonal/unitary matrix returned by ?ggsvp).

If jobv = 'V', v (size at least max(1, ldv*p)) must contain a matrix V1

(usually the orthogonal/unitary matrix returned by ?ggsvp).

If jobq = 'Q', q (size at least max(1, ldq*n)) must contain a matrix Q1

(usually the orthogonal/unitary matrix returned by ?ggsvp).

lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.

ldb The leading dimension of b; at least max(1, p) for column major layout and
max(1, n) for row major layout.

ldu The leading dimension of the array u .

ldu≥ max(1, m) if jobu = 'U'; ldu≥ 1 otherwise.

ldv The leading dimension of the array v .

ldv≥ max(1, p) if jobv = 'V'; ldv≥ 1 otherwise.

ldq The leading dimension of the array q .

ldq≥ max(1, n) if jobq = 'Q'; ldq≥ 1 otherwise.

tola, tolb tola and tolb are the convergence criteria for the Jacobi-Kogbetliantz
iteration procedure. Generally, they are the same as used in ?ggsvp:

tola = max(m, n)|A|MACHEPS,

tolb = max(p, n)*|B|*MACHEPS.

Output Parameters

a On exit, a(n-k+1:n, 1:min(k+l, m)) contains the triangular matrix R or part

of R.

994
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
b On exit, if necessary, b(m-k+1: l, n+m-k-l+1: n)) contains a part of R.

alpha, beta Arrays, size at least max(1, n). Contain the generalized singular value pairs
of A and B:
alpha(1:k) = 1,
beta(1:k) = 0,
and if m-k-l≥ 0,

alpha(k+1:k+l) = diag(C),
beta(k+1:k+l) = diag(S),
or if m-k-l < 0,

alpha(k+1:m)= diag(C), alpha(m+1:k+l)=0

beta(k+1:m) = diag(S),
beta(m+1:k+l) = 1.
Furthermore, if k+l < n,

alpha(k+l+1:n)= 0 and
beta(k+l+1:n) = 0.

u If jobu = 'I', u contains the orthogonal/unitary matrix U.

If jobu = 'U', u contains the product U1*U.

If jobu = 'N', u is not referenced.

v If jobv = 'I', v contains the orthogonal/unitary matrix U.

If jobv = 'V', v contains the product V1*V.

If jobv = 'N', v is not referenced.

q If jobq = 'I', q contains the orthogonal/unitary matrix U.

If jobq = 'Q', q contains the product Q1*Q.

If jobq = 'N', q is not referenced.

ncycle The number of cycles required for convergence.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = 1, the procedure does not converge after MAXIT cycles.

Cosine-Sine Decomposition: LAPACK Computational Routines

This topic describes LAPACK computational routines for computing the cosine-sine decomposition (CS
decomposition) of a partitioned unitary/orthogonal matrix. The algorithm computes a complete 2-by-2 CS
decomposition, which requires simultaneous diagonalization of all the four blocks of a unitary/orthogonal
matrix partitioned into a 2-by-2 block structure.
The computation has the following phases:

995
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

1. The matrix is reduced to a bidiagonal block form.

2. The blocks are simultaneously diagonalized using techniques from the bidiagonal SVD algorithms.
Table "Computational Routines for Cosine-Sine Decomposition (CSD)" lists LAPACK routines that perform CS
decomposition of matrices.
Computational Routines for Cosine-Sine Decomposition (CSD)
Operation Real matrices Complex matrices

Compute the CS decomposition of an bbcsd/bbcsd bbcsd/bbcsd

orthogonal/unitary matrix in bidiagonal-block
form

Simultaneously bidiagonalize the blocks of a orbdb unbdb

partitioned orthogonal matrix

Simultaneously bidiagonalize the blocks of a orbdb unbdb

partitioned unitary matrix

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobu1 If equals Y, then u1 is updated. Otherwise, u1 is not updated.

jobu2 If equals Y, then u2 is updated. Otherwise, u2 is not updated.

jobv1t If equals Y, then v1t is updated. Otherwise, v1t is not updated.

jobv2t If equals Y, then v2t is updated. Otherwise, v2t is not updated.

trans = 'T': x, u1, u2, v1t, v2t are stored in row-major order.

otherwise x, u1, u2, v1t, v2t are stored in column-major

order.

m The number of rows and columns of the orthogonal/unitary matrix X in

bidiagonal-block form.

p The number of rows in the top-left block of x. 0 ≤p≤m.

≤
q The number of columns in the top-left block of x. 0 q≤ min(p,m-p,m-q).

997
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

theta Array, size q.

On entry, the angles theta[0], ..., theta[q - 1] that, along with
phi[0], ..., phi[q - 2], define the matrix in bidiagonal-block form as
returned by orbdb/unbdb.

phi Array, size q-1.

The angles phi[0], ..., phi[q - 2] that, along with theta[0], ...,
theta[q - 1], define the matrix in bidiagonal-block form as returned by
orbdb/unbdb.

u1 Array, size at least max(1, ldu1*p).

On entry, a p-by-p matrix.

ldu1 The leading dimension of the array u1, ldu1≤ max(1, p).

u2 Array, size max(1, ldu2*(m-p)).

On entry, an (m-p)-by-(m-p) matrix.

ldu2 The leading dimension of the array u2, ldu2≤ max(1, m-p).

v1t Array, size max(1, ldv1t*q).

On entry, a q-by-q matrix.

ldv1t The leading dimension of the array v1t, ldv1t≤ max(1, q).

v2t Array, size.

On entry, an (m-q)-by-(m-q) matrix.

ldv2t The leading dimension of the array v2t, ldv2t≤ max(1, m-q).

Output Parameters

theta On exit, the angles whose cosines and sines define the diagonal blocks in
the CS decomposition.

u1 On exit, u1 is postmultiplied by the left singular vector matrix common to

[ b11 ; 0 ] and [ b12 0 0 ; 0 -I 0 ].

u2 On exit, u2 is postmultiplied by the left singular vector matrix common to

[ b21 ; 0 ] and [ b22 0 0 ; 0 0 I ].

v1t Array, size q.

On exit, v1t is premultiplied by the transpose of the right singular vector
matrix common to [ b11 ; 0 ] and [ b21 ; 0 ].

v2t On exit, v2t is premultiplied by the transpose of the right singular vector
matrix common to [ b12 0 0 ; 0 -I 0 ] and [ b22 0 0 ; 0 0 I ].

b11d Array, size q.

When ?bbcsd converges, b11d contains the cosines of theta[0], ...,
theta[q - 1]. If ?bbcsd fails to converge, b11d contains the diagonal of
the partially reduced top left block.

b11e Array, size q-1.

998
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
When ?bbcsd converges, b11e contains zeros. If ?bbcsd fails to converge,
b11e contains the superdiagonal of the partially reduced top left block.

b12d Array, size q.

When ?bbcsd converges, b12d contains the negative sines of
theta[0], ..., theta[q - 1]. If ?bbcsd fails to converge, b12d contains
the diagonal of the partially reduced top right block.

b12e Array, size q-1.

When ?bbcsd converges, b12e contains zeros. If ?bbcsd fails to converge,
b11e contains the superdiagonal of the partially reduced top right block.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info > 0 and if ?bbcsd did not converge, info specifies the number of nonzero entries in phi, and b11d,
b11e, etc. contain the partially reduced matrix.

p1, p2, q1, and q2 are represented as products of elementary reflectors. .

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

trans = 'T': x, u1, u2, v1t, v2t are stored in row-major order.

otherwise x, u1, u2, v1t, v2t are stored in column-major

order.

signs = 'O': The lower-left block is made nonpositive (the

"other" convention).
otherwise The upper-right block is made nonpositive (the
"default" convention).

m The number of rows and columns of the matrix X.

p The number of rows in x11 and x12. 0 ≤p≤m.

1000
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
q The number of columns in x11 and x21. 0 ≤q≤ min(p,m-p,m-q).

x11 Array, size (size max(1, ldx11*q) for column major layout and max(1,
ldx11*p) for row major layout) .
On entry, the top-left block of the orthogonal/unitary matrix to be reduced.

ldx11 The leading dimension of the array X11. If trans = 'T', ldx11≥p for column
major layout and ldx11≥q for row major layout. Otherwise, ldx11≥q.

x12 Array, size (size max(1, ldx12*(m-q)) for column major layout and max(1,
ldx12*p) for row major layout).
On entry, the top-right block of the orthogonal/unitary matrix to be
reduced.

ldx12 The leading dimension of the array X12. If trans = 'N', ldx12≥p for column
major layout and ldx12≥m - q for row major layout. . Otherwise,
ldx12≥m-q.

x21 Array, size (size max(1, ldx21*q) for column major layout and max(1,
ldx21*(m-p)) for row major layout).
On entry, the bottom-left block of the orthogonal/unitary matrix to be
reduced.

ldx21 The leading dimension of the array X21. If trans = 'N', ldx21≥m-p for
column major layout and ldx12≥q for row major layout. . Otherwise,
ldx21≥q.

x22 Array, size ((size max(1, ldx22*(m-q)) for column major layout and max(1,
ldx22*(m - p)) for row major layout).
On entry, the bottom-right block of the orthogonal/unitary matrix to be
reduced.

ldx22 The leading dimension of the array X21. If trans = 'N', ldx22≥m-p for
column major layout and ldx22≥m - q for row major layout. . Otherwise,
ldx22≥m-q.

Output Parameters

x11 On exit, the form depends on trans:

If trans='N', the columns of the lower triangle of x11 specify

reflectors for p1, the rows of the upper triangle of
x11(1:q - 1, q:q - 1) specify reflectors for q1
otherwise the rows of the upper triangle of x11 specify reflectors
trans='T', for p1, the columns of the lower triangle of x11(1:q -
1, q:q - 1) specify reflectors for q1

x12 On exit, the form depends on trans:

If trans='N', the columns of the upper triangle of x12 specify the first
p reflectors for q2

otherwise the columns of the lower triangle of x12 specify the first
trans='T', p reflectors for q2

1001
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

x21 On exit, the form depends on trans:

If trans='N', the columns of the lower triangle of x21 specify the

reflectors for p2

otherwise the columns of the upper triangle of x21 specify the

trans='T', reflectors for p2

x22 On exit, the form depends on trans:

If trans='N', the rows of the upper triangle of x22(q+1:m-p,p+1:m-

q) specify the last m-p-q reflectors for q2

otherwise the columns of the lower triangle of x22(p+1:m-q,q

trans='T', +1:m-p) specify the last m-p-q reflectors for p2

theta Array, size q. The entries of bidiagonal blocks b11, b12, b21, and b22 can be
computed from the angles theta and phi. See the Description section for
details.

phi Array, size q-1. The entries of bidiagonal blocks b11, b12, b21, and b22 can
be computed from the angles theta and phi. See the Description section
for details.

taup1 Array, size p.

Scalar factors of the elementary reflectors that define p1.
taup2 Array, size m-p.
Scalar factors of the elementary reflectors that define p2.
tauq1 Array, size q.
Scalar factors of the elementary reflectors that define q1.
tauq2 Array, size m-q.
Scalar factors of the elementary reflectors that define q2.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

See Also
?orcsd/?uncsd
?orgqr
?ungqr
?orglq
?unglq
xerbla

LAPACK Least Squares and Eigenvalue Problem Driver Routines

Each of the LAPACK driver routines solves a complete problem. To arrive at the solution, driver routines
typically call a sequence of appropriate computational routines.
Driver routines are described in the following topics :
Linear Least Squares (LLS) Problems

1002
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Generalized LLS Problems
Symmetric Eigenproblems
Nonsymmetric Eigenproblems
Singular Value Decomposition
Cosine-Sine Decomposition
Generalized Symmetric Definite Eigenproblems
Generalized Nonsymmetric Eigenproblems

Linear Least Squares (LLS) Problems: LAPACK Driver Routines

This topic describes LAPACK driver routines used for solving linear least squares problems. Table "Driver
Routines for Solving LLS Problems" lists all such routines.
Driver Routines for Solving LLS Problems
Routine Name Operation performed

gels Uses QR or LQ factorization to solve a overdetermined or underdetermined linear

system with full rank matrix.

gelsy Computes the minimum-norm solution to a linear least squares problem using a
complete orthogonal factorization of A.

gelss Computes the minimum-norm solution to a linear least squares problem using the
singular value decomposition of A.

gelsd Computes the minimum-norm solution to a linear least squares problem using the
singular value decomposition of A and a divide and conquer method.

?gels
Uses QR or LQ factorization to solve a overdetermined
or underdetermined linear system with full rank
matrix.

Syntax
lapack_int LAPACKE_sgels (int matrix_layout, char trans, lapack_int m, lapack_int n,
lapack_int nrhs, float* a, lapack_int lda, float* b, lapack_int ldb);
lapack_int LAPACKE_dgels (int matrix_layout, char trans, lapack_int m, lapack_int n,
lapack_int nrhs, double* a, lapack_int lda, double* b, lapack_int ldb);
lapack_int LAPACKE_cgels (int matrix_layout, char trans, lapack_int m, lapack_int n,
lapack_int nrhs, lapack_complex_float* a, lapack_int lda, lapack_complex_float* b,
lapack_int ldb);
lapack_int LAPACKE_zgels (int matrix_layout, char trans, lapack_int m, lapack_int n,
lapack_int nrhs, lapack_complex_double* a, lapack_int lda, lapack_complex_double* b,
lapack_int ldb);

Include Files
• mkl.h

Description

1003
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The routine solves overdetermined or underdetermined real/ complex linear systems involving an m-by-n
matrix A, or its transpose/ conjugate-transpose, using a QR or LQ factorization of A. It is assumed that A has
full rank.
The following options are provided:
1. If trans = 'N' and m≥n: find the least squares solution of an overdetermined system, that is, solve the
least squares problem
minimize ||b - A*x||2
2. If trans = 'N' and m < n: find the minimum norm solution of an underdetermined system A*X = B.

3. If trans = 'T' or 'C' and m≥n: find the minimum norm solution of an undetermined system AH*X = B.

4. If trans = 'T' or 'C' and m < n: find the least squares solution of an overdetermined system, that is,
solve the least squares problem
minimize ||b - AH*x||2
Several right hand side vectors b and solution vectors x can be handled in a single call; they are formed by
the columns of the right hand side matrix B and the solution matrix X (when coefficient matrix is A, B is m-
by-nrhs and X is n-by-nrhs; if the coefficient matrix is AT or AH, B isn-by-nrhs and X is m-by-nrhs.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

trans Must be 'N', 'T', or 'C'.

If trans = 'N', the linear system involves matrix A;

If trans = 'T', the linear system involves the transposed matrix AT (for
real flavors only);
If trans = 'C', the linear system involves the conjugate-transposed
matrix AH (for complex flavors only).

m The number of rows of the matrix A (m≥ 0).

n The number of columns of the matrix A

(n≥ 0).

nrhs The number of right-hand sides; the number of columns in B (nrhs≥ 0).

a, b Arrays:
a(size max(1, lda*n) for column major layout and max(1, lda*m) for row
major layout) contains the m-by-n matrix A.
b(size max(1, ldb*nrhs) for column major layout and max(1, ldb*max(m,
n)) for row major layout) contains the matrix B of right hand side vectors.

lda The leading dimension of a; at least max(1, m) for column major layout and
at least max(1, n) for row major layout.

ldb The leading dimension of b; must be at least max(1, m, n) for column

major layout if trans='N' and at least max(1, n) if trans='T' and at
least max(1, nrhs) for row major layout regardless of the value of trans.

1004
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

a On exit, overwritten by the factorization data as follows:

if m≥n, array a contains the details of the QR factorization of the matrix A as
returned by ?geqrf;

if m < n, array a contains the details of the LQ factorization of the matrix A

as returned by ?gelqf.

b If info = 0, b overwritten by the solution vectors, stored columnwise:

if trans = 'N' and m≥n, rows 1 to n of b contain the least squares solution
vectors; the residual sum of squares for the solution in each column is
given by the sum of squares of modulus of elements n+1 to m in that
column;
if trans = 'N' and m < n, rows 1 to n of b contain the minimum norm
solution vectors;
if trans = 'T' or 'C' and m≥n, rows 1 to m of b contain the minimum
norm solution vectors;
if trans = 'T' or 'C' and m < n, rows 1 to m of b contain the least
squares solution vectors; the residual sum of squares for the solution in
each column is given by the sum of squares of modulus of elements m+1 to
n in that column.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, the i-th diagonal element of the triangular factor of A is zero, so that A does not have full rank;
the least squares solution could not be computed.

?gelsy
Computes the minimum-norm solution to a linear least
squares problem using a complete orthogonal
factorization of A.

Syntax
lapack_int LAPACKE_sgelsy( int matrix_layout, lapack_int m, lapack_int n, lapack_int
nrhs, float* a, lapack_int lda, float* b, lapack_int ldb, lapack_int* jpvt, float rcond,
lapack_int* rank );
lapack_int LAPACKE_dgelsy( int matrix_layout, lapack_int m, lapack_int n, lapack_int
nrhs, double* a, lapack_int lda, double* b, lapack_int ldb, lapack_int* jpvt, double
rcond, lapack_int* rank );
lapack_int LAPACKE_cgelsy( int matrix_layout, lapack_int m, lapack_int n, lapack_int
nrhs, lapack_complex_float* a, lapack_int lda, lapack_complex_float* b, lapack_int ldb,
lapack_int* jpvt, float rcond, lapack_int* rank );
lapack_int LAPACKE_zgelsy( int matrix_layout, lapack_int m, lapack_int n, lapack_int
nrhs, lapack_complex_double* a, lapack_int lda, lapack_complex_double* b, lapack_int
ldb, lapack_int* jpvt, double rcond, lapack_int* rank );

1005
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description

The ?gelsy routine computes the minimum-norm solution to a real/complex linear least squares problem:

minimize ||b - A*x||2

using a complete orthogonal factorization of A. A is an m-by-n matrix which may be rank-deficient. Several
right hand side vectors b and solution vectors x can be handled in a single call; they are stored as the
columns of the m-by-nrhs right hand side matrix B and the n-by-nrhs solution matrix X.
The routine first computes a QR factorization with column pivoting:

with R11 defined as the largest leading submatrix whose estimated condition number is less than 1/rcond.
The order of R11, rank, is the effective rank of A. Then, R22 is considered to be negligible, and R12 is
annihilated by orthogonal/unitary transformations from the right, arriving at the complete orthogonal
factorization:

The minimum-norm solution is then

for real flavors and

for complex flavors,

1006
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
where Q1 consists of the first rank columns of Q.
The ?gelsy routine is identical to the original deprecated ?gelsx routine except for the following
differences:

• The call to the subroutine ?geqpf has been substituted by the call to the subroutine ?geqp3, which is a
BLAS-3 version of the QR factorization with column pivoting.
• The matrix B (the right hand side) is updated with BLAS-3.
• The permutation of the matrix B (the right hand side) is faster and more simple.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

m The number of rows of the matrix A (m≥ 0).

n The number of columns of the matrix A

(n≥ 0).

nrhs The number of right-hand sides; the number of columns in B (nrhs≥ 0).

lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.

ldb The leading dimension of b; must be at least max(1, m, n) for column

major layout and at least max(1, nrhs) for row major layout.

jpvt Array, size at least max(1, n).

On entry, if jpvt[i - 1]≠ 0, the i-th column of A is permuted to the front of
AP, otherwise the i-th column of A is a free column.

rcond rcond is used to determine the effective rank of A, which is defined as the
order of the largest leading triangular submatrix R11 in the QR factorization
with pivoting of A, whose estimated condition number < 1/rcond.

Output Parameters

a On exit, overwritten by the details of the complete orthogonal factorization

of A.

b Overwritten by the n-by-nrhs solution matrix X.

jpvt On exit, if jpvt[i - 1]= k, then the i-th column of AP was the k-th column of
A.

rank The effective rank of A, that is, the order of the submatrix R11. This is the
same as the order of the submatrix T11 in the complete orthogonal
factorization of A.

1007
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

?gelss
Computes the minimum-norm solution to a linear least
squares problem using the singular value
decomposition of A.

Syntax
lapack_int LAPACKE_sgelss( int matrix_layout, lapack_int m, lapack_int n, lapack_int
nrhs, float* a, lapack_int lda, float* b, lapack_int ldb, float* s, float rcond,
lapack_int* rank );
lapack_int LAPACKE_dgelss( int matrix_layout, lapack_int m, lapack_int n, lapack_int
nrhs, double* a, lapack_int lda, double* b, lapack_int ldb, double* s, double rcond,
lapack_int* rank );
lapack_int LAPACKE_cgelss( int matrix_layout, lapack_int m, lapack_int n, lapack_int
nrhs, lapack_complex_float* a, lapack_int lda, lapack_complex_float* b, lapack_int ldb,
float* s, float rcond, lapack_int* rank );
lapack_int LAPACKE_zgelss( int matrix_layout, lapack_int m, lapack_int n, lapack_int
nrhs, lapack_complex_double* a, lapack_int lda, lapack_complex_double* b, lapack_int
ldb, double* s, double rcond, lapack_int* rank );

Include Files
• mkl.h

Description

The routine computes the minimum norm solution to a real linear least squares problem:
minimize ||b - A*x||2
using the singular value decomposition (SVD) of A. A is an m-by-n matrix which may be rank-deficient.
Several right hand side vectors b and solution vectors x can be handled in a single call; they are stored as
the columns of the m-by-nrhs right hand side matrix B and the n-by-nrhs solution matrix X. The effective
rank of A is determined by treating as zero those singular values which are less than rcond times the largest
singular value.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

m The number of rows of the matrix A (m≥ 0).

n The number of columns of the matrix A

(n≥ 0).

nrhs The number of right-hand sides; the number of columns in B

1008
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
(nrhs≥ 0).

lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.

ldb The leading dimension of b; must be at least max(1, m, n) for column

major layout and at least max(1, nrhs) for row major layout.

rcond rcond is used to determine the effective rank of A. Singular values s(i)
≤rcond *s(1) are treated as zero.
If rcond <0, machine precision is used instead.

Output Parameters

a On exit, the first min(m, n) rows of a are overwritten with the matrix of
right singular vectors of A, stored row-wise.

b Overwritten by the n-by-nrhs solution matrix X.

If m≥n and rank = n, the residual sum-of-squares for the solution in the i-
th column is given by the sum of squares of modulus of elements n+1:m in
that column.

s Array, size at least max(1, min(m, n)). The singular values of A in

decreasing order. The condition number of A in the 2-norm is
k2(A) = s(1)/ s(min(m, n)) .

rank The effective rank of A, that is, the number of singular values which are
greater than rcond *s(1).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, then the algorithm for computing the SVD failed to converge; i indicates the number of off-
diagonal elements of an intermediate bidiagonal form which did not converge to zero.

?gelsd
Computes the minimum-norm solution to a linear least
squares problem using the singular value
decomposition of A and a divide and conquer method.

Syntax
lapack_int LAPACKE_sgelsd( int matrix_layout, lapack_int m, lapack_int n, lapack_int
nrhs, float* a, lapack_int lda, float* b, lapack_int ldb, float* s, float rcond,
lapack_int* rank );

1009
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lapack_int LAPACKE_dgelsd( int matrix_layout, lapack_int m, lapack_int n, lapack_int

nrhs, double* a, lapack_int lda, double* b, lapack_int ldb, double* s, double rcond,
lapack_int* rank );
lapack_int LAPACKE_cgelsd( int matrix_layout, lapack_int m, lapack_int n, lapack_int
nrhs, lapack_complex_float* a, lapack_int lda, lapack_complex_float* b, lapack_int ldb,
float* s, float rcond, lapack_int* rank );
lapack_int LAPACKE_zgelsd( int matrix_layout, lapack_int m, lapack_int n, lapack_int
nrhs, lapack_complex_double* a, lapack_int lda, lapack_complex_double* b, lapack_int
ldb, double* s, double rcond, lapack_int* rank );

Include Files
• mkl.h

Description

The routine computes the minimum-norm solution to a real linear least squares problem:
minimize ||b - A*x||2
using the singular value decomposition (SVD) of A. A is an m-by-n matrix which may be rank-deficient.
Several right hand side vectors b and solution vectors x can be handled in a single call; they are stored as
the columns of the m-by-nrhs right hand side matrix B and the n-by-nrhs solution matrix X.
The problem is solved in three steps:

1. Reduce the coefficient matrix A to bidiagonal form with Householder transformations, reducing the
original problem into a "bidiagonal least squares problem" (BLS).
2. Solve the BLS using a divide and conquer approach.
3. Apply back all the Householder transformations to solve the original least squares problem.

The effective rank of A is determined by treating as zero those singular values which are less than rcond
times the largest singular value.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

m The number of rows of the matrix A (m≥ 0).

n The number of columns of the matrix A

(n≥ 0).

nrhs The number of right-hand sides; the number of columns in B (nrhs≥ 0).

lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.

1010
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ldb The leading dimension of b; must be at least max(1, m, n) for column
major layout and at least max(1, nrhs) for row major layout.

rcond rcond is used to determine the effective rank of A. Singular values s(i)
≤rcond *s(1) are treated as zero. If rcond≤ 0, machine precision is used
instead.

Output Parameters

a On exit, A has been overwritten.

b Overwritten by the n-by-nrhs solution matrix X.

If m≥n and rank = n, the residual sum-of-squares for the solution in the i-
th column is given by the sum of squares of modulus of elements n+1:m in
that column.

s Array, size at least max(1, min(m, n)). The singular values of A in

decreasing order. The condition number of A in the 2-norm is
k2(A) = s(1)/ s(min(m, n)).

rank The effective rank of A, that is, the number of singular values which are
greater than rcond *s(1).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, then the algorithm for computing the SVD failed to converge; i indicates the number of off-
diagonal elements of an intermediate bidiagonal form that did not converge to zero.

Generalized Linear Least Squares (LLS) Problems: LAPACK Driver Routines

This topic describes LAPACK driver routines used for solving generalized linear least squares problems. Table
"Driver Routines for Solving Generalized LLS Problems" lists all such routines.
Driver Routines for Solving Generalized LLS Problems
Routine Name Operation performed

gglse Solves the linear equality-constrained least squares problem using a generalized RQ
factorization.

ggglm Solves a general Gauss-Markov linear model problem using a generalized QR

factorization.

?gglse
Solves the linear equality-constrained least squares
problem using a generalized RQ factorization.

Syntax
lapack_int LAPACKE_sgglse (int matrix_layout, lapack_int m, lapack_int n, lapack_int p,
float* a, lapack_int lda, float* b, lapack_int ldb, float* c, float* d, float* x);
lapack_int LAPACKE_dgglse (int matrix_layout, lapack_int m, lapack_int n, lapack_int p,
double* a, lapack_int lda, double* b, lapack_int ldb, double* c, double* d, double* x);

1011
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lapack_int LAPACKE_cgglse (int matrix_layout, lapack_int m, lapack_int n, lapack_int p,

lapack_complex_float* a, lapack_int lda, lapack_complex_float* b, lapack_int ldb,
lapack_complex_float* c, lapack_complex_float* d, lapack_complex_float* x);
lapack_int LAPACKE_zgglse (int matrix_layout, lapack_int m, lapack_int n, lapack_int p,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* b, lapack_int ldb,
lapack_complex_double* c, lapack_complex_double* d, lapack_complex_double* x);

Include Files
• mkl.h

Description

The routine solves the linear equality-constrained least squares (LSE) problem:
minimize ||c - A*x||2 subject to B*x = d
where A is an m-by-n matrix, B is a p-by-n matrix, c is a given m-vector, andd is a given p-vector. It is
assumed that p≤n≤m+p, and

These conditions ensure that the LSE problem has a unique solution, which is obtained using a generalized
RQ factorization of the matrices (B, A) given by

B=(0 R)Q, A=ZT*Q

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

m The number of rows of the matrix A (m≥ 0).

n The number of columns of the matrices A and B (n≥ 0).

p The number of rows of the matrix B

(0 ≤p≤n≤m+p).

a, b, c, d Arrays:
a(size max(1, lda*n) for column major layout and max(1, lda*m) for row
major layout) contains the m-by-n matrix A.
b(size max(1, ldb*n) for column major layout and max(1, ldb*p) for row
major layout) contains the p-by-nmatrix B.
c size at least max(1, m), contains the right hand side vector for the least
squares part of the LSE problem.
d, size at least max(1, p), contains the right hand side vector for the
constrained equation.

1012
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.

ldb The leading dimension of b; at least max(1, p)for column major layout and
max(1, n) for row major layout.

Output Parameters

a The elements on and above the diagonal contain the min(m, n)-by-n upper
trapezoidal matrix T as returned by ?ggrqf.

x The solution of the LSE problem.

b On exit, the upper right triangle contains the p-by-p upper triangular matrix
R as returned by ?ggrqf.

d On exit, d is destroyed.

c On exit, the residual sum-of-squares for the solution is given by the sum of
squares of elements n-p+1 to m of vector c.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = 1, the upper triangular factor R associated with B in the generalized RQ factorization of the pair
(B, A) is singular, so that rank(B) < p; the least squares solution could not be computed.
If info = 2, the (n-p)-by-(n-p) part of the upper trapezoidal factor T associated with A in the generalized
RQ factorization of the pair (B, A) is singular, so that

; the least squares solution could not be computed.

?ggglm
Solves a general Gauss-Markov linear model problem
using a generalized QR factorization.

1013
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
lapack_int LAPACKE_sggglm (int matrix_layout, lapack_int n, lapack_int m, lapack_int p,
float* a, lapack_int lda, float* b, lapack_int ldb, float* d, float* x, float* y);
lapack_int LAPACKE_dggglm (int matrix_layout, lapack_int n, lapack_int m, lapack_int p,
double* a, lapack_int lda, double* b, lapack_int ldb, double* d, double* x, double* y);
lapack_int LAPACKE_cggglm (int matrix_layout, lapack_int n, lapack_int m, lapack_int p,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* b, lapack_int ldb,
lapack_complex_float* d, lapack_complex_float* x, lapack_complex_float* y);
lapack_int LAPACKE_zggglm (int matrix_layout, lapack_int n, lapack_int m, lapack_int p,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* b, lapack_int ldb,
lapack_complex_double* d, lapack_complex_double* x, lapack_complex_double* y);

Include Files
• mkl.h

Description

The routine solves a general Gauss-Markov linear model (GLM) problem:

minimizex ||y||2 subject to d = A*x + B*y
where A is an n-by-m matrix, B is an n-by-p matrix, and d is a given n-vector. It is assumed that m≤n≤m+p,
and rank(A) = m and rank(AB) = n.

Under these assumptions, the constrained equation is always consistent, and there is a unique solution x and
a minimal 2-norm solution y, which is obtained using a generalized QR factorization of the matrices (A, B )
given by

In particular, if matrix B is square nonsingular, then the problem GLM is equivalent to the following weighted
linear least squares problem
minimizex ||B-1(d-A*x)||2.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

n The number of rows of the matrices A and B (n≥ 0).

m The number of columns in A (m≥ 0).

p The number of columns in B (p≥n - m).

a, b, d Arrays:
a(size max(1, lda*m) for column major layout and max(1, lda*n) for row
major layout) contains the n-by-m matrix A.

1014
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
b(size max(1, ldb*p) for column major layout and max(1, ldb*n) for row
major layout) contains the n-by-p matrix B.
d, size at least max(1, n), contains the left hand side of the GLM equation.

lda The leading dimension of a; at least max(1, n)for column major layout and
max(1, m) for row major layout.

ldb The leading dimension of b; at least max(1, n)for column major layout and
max(1, p) for row major layout.

Output Parameters

x, y Arrays x, y. size at least max(1, m) for x and at least max(1, p) for y.

On exit, x and y are the solutions of the GLM problem.

a On exit, the upper triangular part of the array a contains the m-by-m upper
triangular matrix R.

b On exit, if n ≤ p, the upper right triangle contains the n-by-n upper

triangular matrix T as returned by ?ggrqf; if n > p, the elements on and
above the (n-p)-th subdiagonal contain the n-by-p upper trapezoidal
matrix T.

d On exit, d is destroyed

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = 1, the upper triangular factor R associated with A in the generalized QR factorization of the pair
(A, B) is singular, so that rank(A) < m; the least squares solution could not be computed.
If info = 2, the bottom (n-m)-by-(n-m) part of the upper trapezoidal factor T associated with B in the
generalized QR factorization of the pair (A, B) is singular, so that rank(AB) < n; the least squares solution
could not be computed.

Symmetric Eigenvalue Problems: LAPACK Driver Routines

This topic describes LAPACK driver routines used for solving symmetric eigenvalue problems. See also
computational routines that can be called to solve these problems. Table "Driver Routines for Solving
Symmetric Eigenproblems" lists all such driver routines.
Driver Routines for Solving Symmetric Eigenproblems
Routine Name Operation performed

syev/heev Computes all eigenvalues and, optionally, eigenvectors of a real symmetric /

Hermitian matrix.

syevd/heevd Computes all eigenvalues and (optionally) all eigenvectors of a real symmetric /
Hermitian matrix using divide and conquer algorithm.

syevx/heevx Computes selected eigenvalues and, optionally, eigenvectors of a symmetric /

Hermitian matrix.

1015
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Routine Name Operation performed

syevr/heevr Computes selected eigenvalues and, optionally, eigenvectors of a real symmetric /

Hermitian matrix using the Relatively Robust Representations.

spev/hpev Computes all eigenvalues and, optionally, eigenvectors of a real symmetric /

Hermitian matrix in packed storage.

spevd/hpevd Uses divide and conquer algorithm to compute all eigenvalues and (optionally) all
eigenvectors of a real symmetric / Hermitian matrix held in packed storage.

spevx/hpevx Computes selected eigenvalues and, optionally, eigenvectors of a real symmetric /

Hermitian matrix in packed storage.

sbev /hbev Computes all eigenvalues and, optionally, eigenvectors of a real symmetric /
Hermitian band matrix.

sbevd/hbevd Computes all eigenvalues and (optionally) all eigenvectors of a real symmetric /
Hermitian band matrix using divide and conquer algorithm.

sbevx/hbevx Computes selected eigenvalues and, optionally, eigenvectors of a real symmetric /

Hermitian band matrix.

stev Computes all eigenvalues and, optionally, eigenvectors of a real symmetric

tridiagonal matrix.

stevd Computes all eigenvalues and (optionally) all eigenvectors of a real symmetric
tridiagonal matrix using divide and conquer algorithm.

stevx Computes selected eigenvalues and eigenvectors of a real symmetric tridiagonal

matrix.

stevr Computes selected eigenvalues and, optionally, eigenvectors of a real symmetric

tridiagonal matrix using the Relatively Robust Representations.

?syev
Computes all eigenvalues and, optionally,
eigenvectors of a real symmetric matrix.

Syntax
lapack_int LAPACKE_ssyev (int matrix_layout, char jobz, char uplo, lapack_int n, float*
a, lapack_int lda, float* w);
lapack_int LAPACKE_dsyev (int matrix_layout, char jobz, char uplo, lapack_int n,
double* a, lapack_int lda, double* w);

Include Files
• mkl.h

Description

The routine computes all eigenvalues and, optionally, eigenvectors of a real symmetric matrix A.
Note that for most cases of real symmetric eigenvalue problems the default choice should be syevr function
as its underlying algorithm is faster and uses less workspace.

1016
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

If jobz = 'N', then only eigenvalues are computed.

If jobz = 'V', then eigenvalues and eigenvectors are computed.

uplo Must be 'U' or 'L'.

If uplo = 'U', a stores the upper triangular part of A.

If uplo = 'L', a stores the lower triangular part of A.

n The order of the matrix A (n≥ 0).

a a (size max(1, lda*n)) is an array containing either upper or lower

triangular part of the symmetric matrix A, as specified by uplo.

lda The leading dimension of the array a.

Must be at least max(1, n).

Output Parameters

a On exit, if jobz = 'V', then if info = 0, array a contains the orthonormal

eigenvectors of the matrix A.
If jobz = 'N', then on exit the lower triangle

(if uplo = 'L') or the upper triangle (if uplo = 'U') of A, including the
diagonal, is overwritten.

w Array, size at least max(1, n).

If info = 0, contains the eigenvalues of the matrix A in ascending order.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, then the algorithm failed to converge; i indicates the number of elements of an intermediate
tridiagonal form which did not converge to zero.

?heev
Computes all eigenvalues and, optionally,
eigenvectors of a Hermitian matrix.

Syntax
lapack_int LAPACKE_cheev( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_complex_float* a, lapack_int lda, float* w );
lapack_int LAPACKE_zheev( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_complex_double* a, lapack_int lda, double* w );

1017
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description

The routine computes all eigenvalues and, optionally, eigenvectors of a complex Hermitian matrix A.
Note that for most cases of complex Hermitian eigenvalue problems the default choice should be heevr
function as its underlying algorithm is faster and uses less workspace.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

If jobz = 'N', then only eigenvalues are computed.

If jobz = 'V', then eigenvalues and eigenvectors are computed.

uplo Must be 'U' or 'L'.

If uplo = 'U', a stores the upper triangular part of A.

If uplo = 'L', a stores the lower triangular part of A.

n The order of the matrix A (n≥ 0).

a a (size max(1, lda*n)) is an array containing either upper or lower

triangular part of the Hermitian matrix A, as specified by uplo.

lda The leading dimension of the array a. Must be at least max(1, n).

Output Parameters

a On exit, if jobz = 'V', then if info = 0, array a contains the orthonormal

eigenvectors of the matrix A.
If jobz = 'N', then on exit the lower triangle

(if uplo = 'L') or the upper triangle (if uplo = 'U') of A, including the
diagonal, is overwritten.

w Array, size at least max(1, n).

If info = 0, contains the eigenvalues of the matrix A in ascending order.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, then the algorithm failed to converge; i indicates the number of elements of an intermediate
tridiagonal form which did not converge to zero.

1018
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
?syevd
Computes all eigenvalues and, optionally, all
eigenvectors of a real symmetric matrix using divide
and conquer algorithm.

Syntax
lapack_int LAPACKE_ssyevd (int matrix_layout, char jobz, char uplo, lapack_int n,
float* a, lapack_int lda, float* w);
lapack_int LAPACKE_dsyevd (int matrix_layout, char jobz, char uplo, lapack_int n,
double* a, lapack_int lda, double* w);

Include Files
• mkl.h

Description

The routine computes all the eigenvalues, and optionally all the eigenvectors, of a real symmetric matrix A.
In other words, it can compute the spectral factorization of A as: A = Z*λ*ZT.

Here Λ is a diagonal matrix whose diagonal elements are the eigenvalues λi, and Z is the orthogonal matrix
whose columns are the eigenvectors zi. Thus,
A*zi = λi*zi for i = 1, 2, ..., n.
If the eigenvectors are requested, then this routine uses a divide and conquer algorithm to compute
eigenvalues and eigenvectors. However, if only eigenvalues are required, then it uses the Pal-Walker-Kahan
variant of the QL or QR algorithm.
Note that for most cases of real symmetric eigenvalue problems the default choice should be syevr function
as its underlying algorithm is faster and uses less workspace. ?syevd requires more workspace but is faster
in some cases, especially for large matrices.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

If jobz = 'N', then only eigenvalues are computed.

If jobz = 'V', then eigenvalues and eigenvectors are computed.

uplo Must be 'U' or 'L'.

If uplo = 'U', a stores the upper triangular part of A.

If uplo = 'L', a stores the lower triangular part of A.

n The order of the matrix A (n≥ 0).

a Array, size (lda, *).

a (size max(1, lda*n)) is an array containing either upper or lower
triangular part of the symmetric matrix A, as specified by uplo.

lda The leading dimension of the array a.

Must be at least max(1, n).

1019
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

w Array, size at least max(1, n).

If info = 0, contains the eigenvalues of the matrix A in ascending order.
See also info.

a If jobz = 'V', then on exit this array is overwritten by the orthogonal

matrix Z which contains the eigenvectors of A.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = i, and jobz = 'N', then the algorithm failed to converge; i indicates the number of off-diagonal
elements of an intermediate tridiagonal form which did not converge to zero.
If info = i, and jobz = 'V', then the algorithm failed to compute an eigenvalue while working on the
submatrix lying in rows and columns info/(n+1) through mod(info,n+1).

If info = -i, the i-th parameter had an illegal value.

?heevd
Computes all eigenvalues and, optionally, all
eigenvectors of a complex Hermitian matrix using
divide and conquer algorithm.

Syntax
lapack_int LAPACKE_cheevd( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_complex_float* a, lapack_int lda, float* w );
lapack_int LAPACKE_zheevd( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_complex_double* a, lapack_int lda, double* w );

Include Files
• mkl.h

Description

The routine computes all the eigenvalues, and optionally all the eigenvectors, of a complex Hermitian matrix
A. In other words, it can compute the spectral factorization of A as: A = Z*Λ*ZH.

Here Λ is a real diagonal matrix whose diagonal elements are the eigenvalues λi, and Z is the (complex)
unitary matrix whose columns are the eigenvectors zi. Thus,
A*zi = λi*zi for i = 1, 2, ..., n.
If the eigenvectors are requested, then this routine uses a divide and conquer algorithm to compute
eigenvalues and eigenvectors. However, if only eigenvalues are required, then it uses the Pal-Walker-Kahan
variant of the QL or QR algorithm.

1020
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Note that for most cases of complex Hermetian eigenvalue problems the default choice should be heevr
function as its underlying algorithm is faster and uses less workspace. ?heevd requires more workspace but
is faster in some cases, especially for large matrices.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

If jobz = 'N', then only eigenvalues are computed.

If jobz = 'V', then eigenvalues and eigenvectors are computed.

uplo Must be 'U' or 'L'.

If uplo = 'U', a stores the upper triangular part of A.

If uplo = 'L', a stores the lower triangular part of A.

n The order of the matrix A (n≥ 0).

a a (size max(1, lda*n)) is an array containing either upper or lower

triangular part of the Hermitian matrix A, as specified by uplo.

lda The leading dimension of the array a. Must be at least max(1, n).

Output Parameters

w Array, size at least max(1, n).

If info = 0, contains the eigenvalues of the matrix A in ascending order.
See also info.

a If jobz = 'V', then on exit this array is overwritten by the unitary matrix
Z which contains the eigenvectors of A.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = i, and jobz = 'N', then the algorithm failed to converge; i off-diagonal elements of an
intermediate tridiagonal form did not converge to zero;
if info = i, and jobz = 'V', then the algorithm failed to compute an eigenvalue while working on the
submatrix lying in rows and columns info/(n+1) through mod(info, n+1).

If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed eigenvalues and eigenvectors are exact for a matrix A + E such that ||E||2 = O(ε)*||A||2,
where ε is the machine precision.
The real analogue of this routine is syevd. See also hpevd for matrices held in packed storage, and hbevd for
banded matrices.

1021
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

?syevx
Computes selected eigenvalues and, optionally,
eigenvectors of a symmetric matrix.

Syntax
lapack_int LAPACKE_ssyevx (int matrix_layout, char jobz, char range, char uplo,
lapack_int n, float* a, lapack_int lda, float vl, float vu, lapack_int il, lapack_int
iu, float abstol, lapack_int* m, float* w, float* z, lapack_int ldz, lapack_int* ifail);
lapack_int LAPACKE_dsyevx (int matrix_layout, char jobz, char range, char uplo,
lapack_int n, double* a, lapack_int lda, double vl, double vu, lapack_int il, lapack_int
iu, double abstol, lapack_int* m, double* w, double* z, lapack_int ldz, lapack_int*
ifail);

Include Files
• mkl.h

Description

The routine computes selected eigenvalues and, optionally, eigenvectors of a real symmetric matrix A.
Eigenvalues and eigenvectors can be selected by specifying either a range of values or a range of indices for
the desired eigenvalues.
Note that for most cases of real symmetric eigenvalue problems the default choice should be syevr function
as its underlying algorithm is faster and uses less workspace. ?syevx is faster for a few selected
eigenvalues.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

If jobz = 'N', then only eigenvalues are computed.

If jobz = 'V', then eigenvalues and eigenvectors are computed.

range Must be 'A', 'V', or 'I'.

If range = 'A', all eigenvalues will be found.

If range = 'V', all eigenvalues in the half-open interval (vl, vu] will be
found.
If range = 'I', the eigenvalues with indices il through iu will be found.

uplo Must be 'U' or 'L'.

If uplo = 'U', a stores the upper triangular part of A.

If uplo = 'L', a stores the lower triangular part of A.

n The order of the matrix A (n≥ 0).

a a (size max(1, lda*n)) is an array containing either upper or lower

triangular part of the symmetric matrix A, as specified by uplo.

lda The leading dimension of the array a. Must be at least max(1, n) .

1022
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues; vl≤vu. Not referenced if range = 'A'or 'I'.

il, iu If range = 'I', the indices of the smallest and largest eigenvalues to be
returned.
Constraints: 1 ≤il≤iu≤n, if n > 0;

il = 1 and iu = 0, if n = 0.
Not referenced if range = 'A'or 'V'.

abstol The absolute error tolerance for the eigenvalues. See Application Notes for
more information.

ldz The leading dimension of the output array z; ldz≥ 1.

If jobz = 'V', then ldz≥ max(1, n) for column major layout and lda≥
max(1, m) for row major layout .

Output Parameters

a On exit, the lower triangle (if uplo = 'L') or the upper triangle (if uplo =
'U') of A, including the diagonal, is overwritten.

m The total number of eigenvalues found;

0 ≤m≤n.
If range = 'A', m = n, and if range = 'I', m = iu-il+1.

w Array, size at least max(1, n). The first m elements contain the selected
eigenvalues of the matrix A in ascending order.

z Array z(size max(1, ldz*m) for column major layout and max(1, ldz*n) for
row major layout) contains eigenvectors.
If jobz = 'V', then if info = 0, the first m columns of z contain the
orthonormal eigenvectors of the matrix A corresponding to the selected
eigenvalues, with the i-th column of z holding the eigenvector associated
with w(i).
If an eigenvector fails to converge, then that column of z contains the latest
approximation to the eigenvector, and the index of the eigenvector is
returned in ifail.
If jobz = 'N', then z is not referenced.

Note: you must ensure that at least max(1,m) columns are supplied in the
array z; if range = 'V', the exact value of m is not known in advance and
an upper bound must be used.

ifail Array, size at least max(1, n).

If jobz = 'V', then if info = 0, the first m elements of ifail are zero; if
info > 0, then ifail contains the indices of the eigenvectors that failed to
converge.
If jobz = 'V', then ifail is not referenced.

1023
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, then i eigenvectors failed to converge; their indices are stored in the array ifail.

Application Notes
An approximate eigenvalue is accepted as converged when it is determined to lie in an interval [a,b] of width
less than or equal to abstol+ε*max(|a|,|b|), where ε is the machine precision.

If abstol is less than or equal to zero, then ε*||T|} is used as tolerance, where ||T|| is the 1-norm of the
tridiagonal matrix obtained by reducing A to tridiagonal form. Eigenvalues are computed most accurately
when abstol is set to twice the underflow threshold 2*?lamch('S'), not zero.

If this routine returns with info > 0, indicating that some eigenvectors did not converge, try setting abstol
to 2*?lamch('S').

?heevx
Computes selected eigenvalues and, optionally,
eigenvectors of a Hermitian matrix.

Syntax
lapack_int LAPACKE_cheevx( int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_complex_float* a, lapack_int lda, float vl, float vu, lapack_int
il, lapack_int iu, float abstol, lapack_int* m, float* w, lapack_complex_float* z,
lapack_int ldz, lapack_int* ifail );
lapack_int LAPACKE_zheevx( int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_complex_double* a, lapack_int lda, double vl, double vu,
lapack_int il, lapack_int iu, double abstol, lapack_int* m, double* w,
lapack_complex_double* z, lapack_int ldz, lapack_int* ifail );

Include Files
• mkl.h

Description

The routine computes selected eigenvalues and, optionally, eigenvectors of a complex Hermitian matrix A.
Eigenvalues and eigenvectors can be selected by specifying either a range of values or a range of indices for
the desired eigenvalues.
Note that for most cases of complex Hermetian eigenvalue problems the default choice should be heevr
function as its underlying algorithm is faster and uses less workspace. ?heevx is faster for a few selected
eigenvalues.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

1024
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If jobz = 'N', then only eigenvalues are computed.

If jobz = 'V', then eigenvalues and eigenvectors are computed.

range Must be 'A', 'V', or 'I'.

If range = 'A', all eigenvalues will be found.

If range = 'V', all eigenvalues in the half-open interval (vl, vu] will be
found.
If range = 'I', the eigenvalues with indices il through iu will be found.

uplo Must be 'U' or 'L'.

If uplo = 'U', a stores the upper triangular part of A.

If uplo = 'L', a stores the lower triangular part of A.

n The order of the matrix A (n ≥ 0).

a a (size max(1, lda*n)) is an array containing either upper or lower

triangular part of the Hermitian matrix A, as specified by uplo.

lda The leading dimension of the array a. Must be at least max(1, n).

vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues; vl≤vu. Not referenced if range = 'A'or 'I'.

il, iu If range = 'I', the indices of the smallest and largest eigenvalues to be
returned. Constraints:
1 ≤il≤iu≤n, if n > 0;il = 1 and iu = 0, if n = 0. Not referenced if range =
'A'or 'V'.

abstol
ldz The leading dimension of the output array z; ldz≥ 1.

If jobz = 'V', then ldz≥max(1, n) for column major layout and lda≥
max(1, m) for row major layout.

Output Parameters

a On exit, the lower triangle (if uplo = 'L') or the upper triangle (if uplo =
'U') of A, including the diagonal, is overwritten.

m The total number of eigenvalues found; 0 ≤m≤n.

If range = 'A', m = n, and if range = 'I', m = iu-il+1.

w Array, size max(1, n). The first m elements contain the selected eigenvalues
of the matrix A in ascending order.

1025
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If an eigenvector fails to converge, then that column of z contains the latest

approximation to the eigenvector, and the index of the eigenvector is
returned in ifail.
If jobz = 'N', then z is not referenced.

ifail Array, size at least max(1, n).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, then i eigenvectors failed to converge; their indices are stored in the array ifail.

If abstol is less than or equal to zero, then ε*||T|| will be used in its place, where ||T|| is the 1-norm of
the tridiagonal matrix obtained by reducing A to tridiagonal form. Eigenvalues will be computed most
accurately when abstol is set to twice the underflow threshold 2*?lamch('S'), not zero.

If this routine returns with info > 0, indicating that some eigenvectors did not converge, try setting abstol
to 2*?lamch('S').

?syevr
Computes selected eigenvalues and, optionally,
eigenvectors of a real symmetric matrix using the
Relatively Robust Representations.

Syntax
lapack_int LAPACKE_ssyevr (int matrix_layout, char jobz, char range, char uplo,
lapack_int n, float* a, lapack_int lda, float vl, float vu, lapack_int il, lapack_int
iu, float abstol, lapack_int* m, float* w, float* z, lapack_int ldz, lapack_int*
isuppz);
lapack_int LAPACKE_dsyevr (int matrix_layout, char jobz, char range, char uplo,
lapack_int n, double* a, lapack_int lda, double vl, double vu, lapack_int il, lapack_int
iu, double abstol, lapack_int* m, double* w, double* z, lapack_int ldz, lapack_int*
isuppz);

Include Files
• mkl.h

Description

1026
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The routine computes selected eigenvalues and, optionally, eigenvectors of a real symmetric matrix A.
Eigenvalues and eigenvectors can be selected by specifying either a range of values or a range of indices for
the desired eigenvalues.
The routine first reduces the matrix A to tridiagonal form T. Then, whenever possible, ?syevr calls stemr to
compute the eigenspectrum using Relatively Robust Representations. stemr computes eigenvalues by the
dqds algorithm, while orthogonal eigenvectors are computed from various "good" L*D*LT representations
(also known as Relatively Robust Representations). Gram-Schmidt orthogonalization is avoided as far as
possible. More specifically, the various steps of the algorithm are as follows. For the each unreduced block of
T:

a. Compute T - σ*I = L*D*LT, so that L and D define all the wanted eigenvalues to high relative
accuracy. This means that small relative changes in the entries of D and L cause only small relative
changes in the eigenvalues and eigenvectors. The standard (unfactored) representation of the
tridiagonal matrix T does not have this property in general.
b. Compute the eigenvalues to suitable accuracy. If the eigenvectors are desired, the algorithm attains full
accuracy of the computed eigenvalues only right before the corresponding vectors have to be
computed, see Steps c) and d).
c. For each cluster of close eigenvalues, select a new shift close to the cluster, find a new factorization,
and refine the shifted eigenvalues to suitable accuracy.
d. For each eigenvalue with a large enough relative separation, compute the corresponding eigenvector by
forming a rank revealing twisted factorization. Go back to Step c) for any clusters that remain.

The desired accuracy of the output can be specified by the input parameter abstol.
The routine ?syevr calls stemr when the full spectrum is requested on machines that conform to the
IEEE-754 floating point standard. ?syevr calls stebz and stein on non-IEEE machines and when partial
spectrum requests are made.
Normal execution of ?dsyevr may create NaNs and infinities and may abort due to a floating point exception
in environments that do not handle NaNs and infinities in the IEEE standard default manner.
Note that ?syevr is preferable for most cases of real symmetric eigenvalue problems as its underlying
algorithm is fast and uses less workspace.

NOTE
This routine supports the Progress Routine feature. See Progress Function for details.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

If jobz = 'N', then only eigenvalues are computed.

If jobz = 'V', then eigenvalues and eigenvectors are computed.

range Must be 'A' or 'V' or 'I'.

If range = 'A', the routine computes all eigenvalues.

If range = 'V', the routine computes eigenvalues w[i] in the half-open

interval:
vl < w[i]≤vu.
If range = 'I', the routine computes eigenvalues with indices il to iu.

1027
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

For range = 'V'or 'I' and iu-il < n-1, sstebz/dstebz and sstein/
dstein are called.

uplo Must be 'U' or 'L'.

If uplo = 'U', a stores the upper triangular part of A.

If uplo = 'L', a stores the lower triangular part of A.

n The order of the matrix A (n≥ 0).

a a (size max(1, lda*n)) is an array containing either upper or lower

triangular part of the symmetric matrix A, as specified by uplo.

lda The leading dimension of the array a. Must be at least max(1, n).

vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
Constraint: vl< vu.

If range = 'A' or 'I', vl and vu are not referenced.

il, iu If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint:
1 ≤il≤iu≤n, if n > 0;
il=1 and iu=0, if n = 0.
If range = 'A' or 'V', il and iu are not referenced.

abstol If jobz = 'V', the eigenvalues and eigenvectors output have residual
norms bounded by abstol, and the dot products between different
eigenvectors are bounded by abstol.
If abstol < n *eps*||T||, then n *eps*||T|| is used instead, where
eps is the machine precision, and ||T|| is the 1-norm of the matrix T. The
eigenvalues are computed to an accuracy of eps*||T|| irrespective of
abstol.
If high relative accuracy is important, set abstol to ?lamch('S').

ldz The leading dimension of the output array z.

Constraints:
ldz≥ 1 if jobz = 'N' and
ldz≥ max(1, n) for column major layout and ldz≥ max(1, m) for row major
layout if jobz = 'V'.

Output Parameters

a On exit, the lower triangle (if uplo = 'L') or the upper triangle (if uplo =
'U') of A, including the diagonal, is overwritten.

m The total number of eigenvalues found, 0 ≤m≤n.

1028
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If range = 'A', m = n, if range = 'I', m = iu-il+1, and if range =
'V' the exact value of m is not known in advance.

w, z Arrays:
w, size at least max(1, n), contains the selected eigenvalues in ascending
order, stored in w[0] to w[m - 1];

z(size max(1, ldz*m) for column major layout and max(1, ldz*n) for row
major layout) .
If jobz = 'V', then if info = 0, the first m columns of z contain the
orthonormal eigenvectors of the matrix A corresponding to the selected
eigenvalues, with the i-th column of z holding the eigenvector associated
with w[i - 1].

If jobz = 'N', then z is not referenced.

isuppz Array, size at least 2 *max(1, m).

The support of the eigenvectors in z, i.e., the indices indicating the nonzero
elements in z. The i-th eigenvector is nonzero only in elements isuppz[2i
- 2] through isuppz[2i - 1]. Referenced only if eigenvectors are needed
(jobz = 'V') and all eigenvalues are needed, that is, range = 'A' or
range = 'I' and il = 1 and iu = n.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, an internal error has occurred.

Application Notes

?heevr
Computes selected eigenvalues and, optionally,
eigenvectors of a Hermitian matrix using the
Relatively Robust Representations.

Syntax
lapack_int LAPACKE_cheevr( int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_complex_float* a, lapack_int lda, float vl, float vu, lapack_int
il, lapack_int iu, float abstol, lapack_int* m, float* w, lapack_complex_float* z,
lapack_int ldz, lapack_int* isuppz );
lapack_int LAPACKE_zheevr( int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_complex_double* a, lapack_int lda, double vl, double vu,
lapack_int il, lapack_int iu, double abstol, lapack_int* m, double* w,
lapack_complex_double* z, lapack_int ldz, lapack_int* isuppz );

Include Files
• mkl.h

1029
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Description

The routine computes selected eigenvalues and, optionally, eigenvectors of a complex Hermitian matrix A.
Eigenvalues and eigenvectors can be selected by specifying either a range of values or a range of indices for
the desired eigenvalues.
The routine first reduces the matrix A to tridiagonal form T with a call to hetrd. Then, whenever
possible, ?heevr calls stegr to compute the eigenspectrum using Relatively Robust Representations. ?stegr
computes eigenvalues by the dqds algorithm, while orthogonal eigenvectors are computed from various
"good" L*D*LT representations (also known as Relatively Robust Representations). Gram-Schmidt
orthogonalization is avoided as far as possible. More specifically, the various steps of the algorithm are as
follows. For each unreduced block (submatrix) of T:

The desired accuracy of the output can be specified by the input parameter abstol.
The routine ?heevr calls stemr when the full spectrum is requested on machines which conform to the
IEEE-754 floating point standard, or stebz and stein on non-IEEE machines and when partial spectrum
requests are made.
Note that the routine ?heevr is preferable for most cases of complex Hermitian eigenvalue problems as its
underlying algorithm is fast and uses less workspace.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

If job = 'N', then only eigenvalues are computed.

If job = 'V', then eigenvalues and eigenvectors are computed.

range Must be 'A' or 'V' or 'I'.

If range = 'A', the routine computes all eigenvalues.

If range = 'V', the routine computes eigenvalues lambda(i) in the half-

open interval: vl< lambda(i)≤vu.

If range = 'I', the routine computes eigenvalues with indices il to iu.

For range = 'V'or 'I', sstebz/dstebz and cstein/zstein are called.

uplo Must be 'U' or 'L'.

If uplo = 'U', a stores the upper triangular part of A.

1030
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If uplo = 'L', a stores the lower triangular part of A.

n The order of the matrix A (n≥ 0).

a a (size max(1, lda*n)) is an array containing either upper or lower

triangular part of the Hermitian matrix A, as specified by uplo.

lda The leading dimension of the array a.

Must be at least max(1, n).

vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
Constraint: vl< vu.

If range = 'A' or 'I', vl and vu are not referenced.

il, iu If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint: 1 ≤il≤iu≤n, if n > 0; il=1 and iu=0 if n = 0.

If range = 'A' or 'V', il and iu are not referenced.

abstol The absolute error tolerance to which each eigenvalue/eigenvector is

required.
If jobz = 'V', the eigenvalues and eigenvectors output have residual
norms bounded by abstol, and the dot products between different
eigenvectors are bounded by abstol.
If abstol < n *eps*||T||, then n *eps*||T|| is used instead, where
eps is the machine precision, and ||T|| is the 1-norm of the matrix T. The
eigenvalues are computed to an accuracy of eps*||T|| irrespective of
abstol.
If high relative accuracy is important, set abstol to ?lamch('S').

ldz The leading dimension of the output array z. Constraints:

ldz≥ 1 if jobz = 'N';
ldz≥ max(1, n) for column major layout and ldz≥ max(1, m) for row major
layout if jobz = 'V'.

Output Parameters

a On exit, the lower triangle (if uplo = 'L') or the upper triangle (if uplo =
'U') of A, including the diagonal, is overwritten.

m The total number of eigenvalues found,

0 ≤m≤n.
If range = 'A', m = n, if range = 'I', m = iu-il+1, and if range =
'V' the exact value of m is not known in advance.

w Array, size at least max(1, n), contains the selected eigenvalues in

ascending order, stored in w[0] to w[m - 1].

1031
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

z Array z(size max(1, ldz*m) for column major layout and max(1, ldz*n) for
row major layout) .
If jobz = 'V', then if info = 0, the first m columns of z contain the
orthonormal eigenvectors of the matrix A corresponding to the selected
eigenvalues, with the i-th column of z holding the eigenvector associated
with w[i - 1].

If jobz = 'N', then z is not referenced.

isuppz Array, size at least 2 *max(1, m).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, an internal error has occurred.

Application Notes
Normal execution of ?stemr may create NaNs and infinities and hence may abort due to a floating point
exception in environments which do not handle NaNs and infinities in the IEEE standard default manner.

For more details, see ?stemr and these references:

• Inderjit S. Dhillon and Beresford N. Parlett: "Multiple representations to compute orthogonal eigenvectors
of symmetric tridiagonal matrices," Linear Algebra and its Applications, 387(1), pp. 1-28, August 2004.
• Inderjit Dhillon and Beresford Parlett: "Orthogonal Eigenvectors and Relative Gaps," SIAM Journal on
Matrix Analysis and Applications, Vol. 25, 2004. Also LAPACK Working Note 154.
• Inderjit Dhillon: "A new O(n^2) algorithm for the symmetric tridiagonal eigenvalue/eigenvector problem",
Computer Science Division Technical Report No. UCB/CSD-97-971, UC Berkeley, May 1997.

?spev
Computes all eigenvalues and, optionally,
eigenvectors of a real symmetric matrix in packed
storage.

Syntax
lapack_int LAPACKE_sspev (int matrix_layout, char jobz, char uplo, lapack_int n, float*
ap, float* w, float* z, lapack_int ldz);
lapack_int LAPACKE_dspev (int matrix_layout, char jobz, char uplo, lapack_int n,
double* ap, double* w, double* z, lapack_int ldz);

Include Files
• mkl.h

1032
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description

The routine computes all the eigenvalues and, optionally, eigenvectors of a real symmetric matrix A in
packed storage.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

If job = 'N', then only eigenvalues are computed.

If job = 'V', then eigenvalues and eigenvectors are computed.

uplo Must be 'U' or 'L'.

If uplo = 'U', ap stores the packed upper triangular part of A.

If uplo = 'L', ap stores the packed lower triangular part of A.

n The order of the matrix A (n≥ 0).

ap Array ap contains the packed upper or lower triangle of symmetric matrix A,

as specified by uplo.
The size of ap must be at least max(1, n*(n+1)/2).

ldz The leading dimension of the output array z. Constraints:

if jobz = 'N', then ldz≥ 1;

if jobz = 'V', then ldz≥ max(1, n).

Output Parameters

w, z Arrays:
w, size at least max(1, n).
If info = 0, w contains the eigenvalues of the matrix A in ascending order.

z (size max(1, ldz*n)).

If jobz = 'V', then if info = 0, z contains the orthonormal eigenvectors
of the matrix A, with the i-th column of z holding the eigenvector associated
with w[i - 1].

If jobz = 'N', then z is not referenced.

ap On exit, this array is overwritten by the values generated during the

reduction to tridiagonal form. The elements of the diagonal and the off-
diagonal of the tridiagonal matrix overwrite the corresponding elements of
A.

Return Values
This function returns a value info.

If info=0, the execution is successful.

1033
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If info = -i, the i-th parameter had an illegal value.

If info = i, then the algorithm failed to converge; i indicates the number of elements of an intermediate
tridiagonal form which did not converge to zero.

?hpev
Computes all eigenvalues and, optionally,
eigenvectors of a Hermitian matrix in packed storage.

Syntax
lapack_int LAPACKE_chpev( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_complex_float* ap, float* w, lapack_complex_float* z, lapack_int ldz );
lapack_int LAPACKE_zhpev( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_complex_double* ap, double* w, lapack_complex_double* z, lapack_int ldz );

Include Files
• mkl.h

Description

The routine computes all the eigenvalues and, optionally, eigenvectors of a complex Hermitian matrix A in
packed storage.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

If job = 'N', then only eigenvalues are computed.

If job = 'V', then eigenvalues and eigenvectors are computed.

uplo Must be 'U' or 'L'.

If uplo = 'U', ap stores the packed upper triangular part of A.

If uplo = 'L', ap stores the packed lower triangular part of A.

n The order of the matrix A (n≥ 0).

ap Array ap contains the packed upper or lower triangle of Hermitian matrix A,

as specified by uplo.
The size of ap must be at least max(1, n*(n+1)/2).

ldz The leading dimension of the output array z.

Constraints:
if jobz = 'N', then ldz≥ 1;

if jobz = 'V', then ldz≥ max(1, n) .

Output Parameters

w Array, size at least max(1, n).

1034
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = 0, w contains the eigenvalues of the matrix A in ascending order.

z Array z (size at least max(1, ldz*n)).

If jobz = 'V', then if info = 0, z contains the orthonormal eigenvectors

of the matrix A, with the i-th column of z holding the eigenvector associated
with w[i - 1].

If jobz = 'N', then z is not referenced.

ap On exit, this array is overwritten by the values generated during the

reduction to tridiagonal form. The elements of the diagonal and the off-
diagonal of the tridiagonal matrix overwrite the corresponding elements of
A.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, then the algorithm failed to converge; i indicates the number of elements of an intermediate
tridiagonal form which did not converge to zero.

?spevd
Uses divide and conquer algorithm to compute all
eigenvalues and (optionally) all eigenvectors of a real
symmetric matrix held in packed storage.

Syntax
lapack_int LAPACKE_sspevd (int matrix_layout, char jobz, char uplo, lapack_int n,
float* ap, float* w, float* z, lapack_int ldz);
lapack_int LAPACKE_dspevd (int matrix_layout, char jobz, char uplo, lapack_int n,
double* ap, double* w, double* z, lapack_int ldz);

Include Files
• mkl.h

Description

The routine computes all the eigenvalues, and optionally all the eigenvectors, of a real symmetric matrix A
(held in packed storage). In other words, it can compute the spectral factorization of A as:
A = Z*Λ*ZT.
Here Λ is a diagonal matrix whose diagonal elements are the eigenvalues λi, and Z is the orthogonal matrix
whose columns are the eigenvectors zi. Thus,
A*zi = λi*zi for i = 1, 2, ..., n.
If the eigenvectors are requested, then this routine uses a divide and conquer algorithm to compute
eigenvalues and eigenvectors. However, if only eigenvalues are required, then it uses the Pal-Walker-Kahan
variant of the QL or QR algorithm.

1035
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

If jobz = 'N', then only eigenvalues are computed.

If jobz = 'V', then eigenvalues and eigenvectors are computed.

uplo Must be 'U' or 'L'.

If uplo = 'U', ap stores the packed upper triangular part of A.

If uplo = 'L', ap stores the packed lower triangular part of A.

n The order of the matrix A (n≥ 0).

ap ap contains the packed upper or lower triangle of symmetric matrix A, as

specified by uplo.
The dimension of ap must be max(1, n*(n+1)/2)

ldz The leading dimension of the output array z.

Constraints:
if jobz = 'N', then ldz≥ 1;

if jobz = 'V', then ldz≥ max(1, n).

Output Parameters

w, z Arrays:
w, size at least max(1, n).
If info = 0, contains the eigenvalues of the matrix A in ascending order.
See also info.
z (size max(1, ldz*n)).
If jobz = 'V', then this array is overwritten by the orthogonal matrix Z
which contains the eigenvectors of A. If jobz = 'N', then z is not
referenced.

ap On exit, this array is overwritten by the values generated during the

reduction to tridiagonal form. The elements of the diagonal and the off-
diagonal of the tridiagonal matrix overwrite the corresponding elements of
A.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = i, then the algorithm failed to converge; i indicates the number of elements of an intermediate
tridiagonal form which did not converge to zero.
If info = -i, the i-th parameter had an illegal value.

1036
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Application Notes
The computed eigenvalues and eigenvectors are exact for a matrix A+E such that ||E||2 = O(ε)*||A||2,
where ε is the machine precision.
The complex analogue of this routine is hpevd.
See also syevd for matrices held in full storage, and sbevd for banded matrices.

?hpevd
Uses divide and conquer algorithm to compute all
eigenvalues and, optionally, all eigenvectors of a
complex Hermitian matrix held in packed storage.

Syntax
lapack_int LAPACKE_chpevd( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_complex_float* ap, float* w, lapack_complex_float* z, lapack_int ldz );
lapack_int LAPACKE_zhpevd( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_complex_double* ap, double* w, lapack_complex_double* z, lapack_int ldz );

Include Files
• mkl.h

Description

The routine computes all the eigenvalues, and optionally all the eigenvectors, of a complex Hermitian matrix
A (held in packed storage). In other words, it can compute the spectral factorization of A as: A = Z*Λ*ZH.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

If jobz = 'N', then only eigenvalues are computed.

If jobz = 'V', then eigenvalues and eigenvectors are computed.

uplo Must be 'U' or 'L'.

If uplo = 'U', ap stores the packed upper triangular part of A.

If uplo = 'L', ap stores the packed lower triangular part of A.

n The order of the matrix A (n≥ 0).

ap ap contains the packed upper or lower triangle of Hermitian matrix A, as

specified by uplo.

1037
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The dimension of ap must be at least max(1, n*(n+1)/2).

ldz The leading dimension of the output array z.

Constraints:
if jobz = 'N', then ldz≥ 1;

if jobz = 'V', then ldz≥ max(1, n).

Output Parameters

w Array, size at least max(1, n).

If info = 0, contains the eigenvalues of the matrix A in ascending order.
See also info.

z Array, size 1 if jobz = 'N' and max(1, ldz*n) if jobz = 'V'.

If jobz = 'V', then this array is overwritten by the unitary matrix Z which
contains the eigenvectors of A.
If jobz = 'N', then z is not referenced.

ap On exit, this array is overwritten by the values generated during the

reduction to tridiagonal form. The elements of the diagonal and the off-
diagonal of the tridiagonal matrix overwrite the corresponding elements of
A.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, then the algorithm failed to converge; i indicates the number of elements of an intermediate
tridiagonal form which did not converge to zero.

Application Notes
The computed eigenvalues and eigenvectors are exact for a matrix A + E such that ||E||2 = O(ε)*||A||2,
where ε is the machine precision.
The real analogue of this routine is spevd.
See also heevd for matrices held in full storage, and hbevd for banded matrices.

?spevx
Computes selected eigenvalues and, optionally,
eigenvectors of a real symmetric matrix in packed
storage.

Syntax
lapack_int LAPACKE_sspevx (int matrix_layout, char jobz, char range, char uplo,
lapack_int n, float* ap, float vl, float vu, lapack_int il, lapack_int iu, float abstol,
lapack_int* m, float* w, float* z, lapack_int ldz, lapack_int* ifail);
lapack_int LAPACKE_dspevx (int matrix_layout, char jobz, char range, char uplo,
lapack_int n, double* ap, double vl, double vu, lapack_int il, lapack_int iu, double
abstol, lapack_int* m, double* w, double* z, lapack_int ldz, lapack_int* ifail);

1038
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h

Description

The routine computes selected eigenvalues and, optionally, eigenvectors of a real symmetric matrix A in
packed storage. Eigenvalues and eigenvectors can be selected by specifying either a range of values or a
range of indices for the desired eigenvalues.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

If job = 'N', then only eigenvalues are computed.

If job = 'V', then eigenvalues and eigenvectors are computed.

range Must be 'A' or 'V' or 'I'.

If range = 'A', the routine computes all eigenvalues.

If range = 'V', the routine computes eigenvalues w[i] in the half-open

interval: vl< w[i]≤vu.

If range = 'I', the routine computes eigenvalues with indices il to iu.

uplo Must be 'U' or 'L'.

If uplo = 'U', ap stores the packed upper triangular part of A.

If uplo = 'L', ap stores the packed lower triangular part of A.

n The order of the matrix A (n≥ 0).

ap Array ap contains the packed upper or lower triangle of the symmetric

matrix A, as specified by uplo.
The size of ap must be at least max(1, n*(n+1)/2).

vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
Constraint: vl< vu.

If range = 'A' or 'I', vl and vu are not referenced.

il, iu If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint: 1 ≤il≤iu≤n, if n > 0; il=1 and iu=0

if n = 0.

If range = 'A' or 'V', il and iu are not referenced.

abstol The absolute error tolerance to which each eigenvalue is required. See
Application notes for details on error tolerance.

1039
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ldz The leading dimension of the output array z.

Constraints:
if jobz = 'N', then ldz≥ 1;

if jobz = 'V', then ldz≥ max(1, n) for column major layout and ldz≥
max(1, m) for row major layout.

Output Parameters

ap On exit, this array is overwritten by the values generated during the

reduction to tridiagonal form. The elements of the diagonal and the off-
diagonal of the tridiagonal matrix overwrite the corresponding elements of
A.

m The total number of eigenvalues found,

0 ≤m≤n. If range = 'A', m = n, if range = 'I', m = iu-il+1, and if
range = 'V' the exact value of m is not known in advance..

w, z Arrays:
w, size at least max(1, n).
If info = 0, contains the selected eigenvalues of the matrix A in ascending
order.
z(size max(1, ldz*m) for column major layout and max(1, ldz*n) for row
major layout).
If jobz = 'V', then if info = 0, the first m columns of z contain the
orthonormal eigenvectors of the matrix A corresponding to the selected
eigenvalues, with the i-th column of z holding the eigenvector associated
with w[i - 1].

If an eigenvector fails to converge, then that column of z contains the latest

approximation to the eigenvector, and the index of the eigenvector is
returned in ifail.
If jobz = 'N', then z is not referenced.

ifail Array, size at least max(1, n).

If jobz = 'V', then if info = 0, the first m elements of ifail are zero; if
info > 0, the ifail contains the indices the eigenvectors that failed to
converge.
If jobz = 'N', then ifail is not referenced.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, then i eigenvectors failed to converge; their indices are stored in the array ifail.

1040
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Application Notes
An approximate eigenvalue is accepted as converged when it is determined to lie in an interval [a,b] of width
less than or equal to abstol+ε*max(|a|,|b|), where ε is the machine precision.

If abstol is less than or equal to zero, then ε*||T||1 will be used in its place, where T is the tridiagonal
matrix obtained by reducing A to tridiagonal form. Eigenvalues will be computed most accurately when abstol
is set to twice the underflow threshold 2*?lamch('S'), not zero.

If this routine returns with info > 0, indicating that some eigenvectors did not converge, try setting abstol
to 2*?lamch('S').

?hpevx
Computes selected eigenvalues and, optionally,
eigenvectors of a Hermitian matrix in packed storage.

Syntax
lapack_int LAPACKE_chpevx( int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_complex_float* ap, float vl, float vu, lapack_int il, lapack_int
iu, float abstol, lapack_int* m, float* w, lapack_complex_float* z, lapack_int ldz,
lapack_int* ifail );
lapack_int LAPACKE_zhpevx( int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_complex_double* ap, double vl, double vu, lapack_int il,
lapack_int iu, double abstol, lapack_int* m, double* w, lapack_complex_double* z,
lapack_int ldz, lapack_int* ifail );

Include Files
• mkl.h

Description

The routine computes selected eigenvalues and, optionally, eigenvectors of a complex Hermitian matrix A in
packed storage. Eigenvalues and eigenvectors can be selected by specifying either a range of values or a
range of indices for the desired eigenvalues.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

If job = 'N', then only eigenvalues are computed.

If job = 'V', then eigenvalues and eigenvectors are computed.

range Must be 'A' or 'V' or 'I'.

If range = 'A', the routine computes all eigenvalues.

If range = 'V', the routine computes eigenvalues w[i] in the half-open

interval: vl< w[i]≤vu.

If range = 'I', the routine computes eigenvalues with indices il to iu.

uplo Must be 'U' or 'L'.

1041
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If uplo = 'U', ap stores the packed upper triangular part of A.

If uplo = 'L', ap stores the packed lower triangular part of A.

n The order of the matrix A (n≥ 0).

ap Array ap contains the packed upper or lower triangle of the Hermitian

matrix A, as specified by uplo.
The size of ap must be at least max(1, n*(n+1)/2).

vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
Constraint: vl< vu.

If range = 'A' or 'I', vl and vu are not referenced.

il, iu If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint: 1 ≤il≤iu≤n, if n > 0; il=1 and iu=0 if n = 0.

If range = 'A' or 'V', il and iu are not referenced.

abstol The absolute error tolerance to which each eigenvalue is required. See
Application notes for details on error tolerance.

ldz The leading dimension of the output array z.

Constraints:
if jobz = 'N', then ldz≥ 1;

if jobz = 'V', then ldz≥ max(1, n) for column major layout and ldz≥
max(1, m) for row major layout.

Output Parameters

ap On exit, this array is overwritten by the values generated during the

reduction to tridiagonal form. The elements of the diagonal and the off-
diagonal of the tridiagonal matrix overwrite the corresponding elements of
A.

m The total number of eigenvalues found, 0 ≤m≤n.

0 ≤m≤n. If range = 'A', m = n, if range = 'I', m = iu-il+1, and if

range = 'V' the exact value of m is not known in advance..

w Array, size at least max(1, n).

If info = 0, contains the selected eigenvalues of the matrix A in ascending
order.

z Array z(size max(1, ldz*m) for column major layout and max(1, ldz*n) for
row major layout).
If jobz = 'V', then if info = 0, the first m columns of z contain the
orthonormal eigenvectors of the matrix A corresponding to the selected
eigenvalues, with the i-th column of z holding the eigenvector associated
with w(i).

1042
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If an eigenvector fails to converge, then that column of z contains the latest
approximation to the eigenvector, and the index of the eigenvector is
returned in ifail.
If jobz = 'N', then z is not referenced.

ifail Array, size at least max(1, n).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, then i eigenvectors failed to converge; their indices are stored in the array ifail.

If this routine returns with info > 0, indicating that some eigenvectors did not converge, try setting abstol
to 2*?lamch('S').

?sbev
Computes all eigenvalues and, optionally,
eigenvectors of a real symmetric band matrix.

Syntax
lapack_int LAPACKE_ssbev (int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int kd, float* ab, lapack_int ldab, float* w, float* z, lapack_int ldz);
lapack_int LAPACKE_dsbev (int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int kd, double* ab, lapack_int ldab, double* w, double* z, lapack_int ldz);

Include Files
• mkl.h

Description

The routine computes all eigenvalues and, optionally, eigenvectors of a real symmetric band matrix A.

1043
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

If jobz = 'N', then only eigenvalues are computed.

If jobz = 'V', then eigenvalues and eigenvectors are computed.

uplo Must be 'U' or 'L'.

If uplo = 'U', ab stores the upper triangular part of A.

If uplo = 'L', ab stores the lower triangular part of A.

n The order of the matrix A (n≥ 0).

kd The number of super- or sub-diagonals in A

(kd≥ 0).

ab ab (size at least max(1, ldab*n) for column major layout and at least
max(1, ldab*(kd + 1)) for row major layout) is an array containing either
upper or lower triangular part of the symmetric matrix A (as specified by
uplo) in band storage format.

ldab The leading dimension of ab; must be at least kd +1 for column major
layout and n for row major layout.

ldz The leading dimension of the output array z.

Constraints:
if jobz = 'N', then ldz≥ 1;

if jobz = 'V', then ldz≥ max(1, n) .

Output Parameters

w, z Arrays:
w, size at least max(1, n).
If info = 0, contains the eigenvalues of the matrix A in ascending order.

z(size max(1, ldz*n).

If jobz = 'V', then if info = 0, z contains the orthonormal eigenvectors
of the matrix A, with the i-th column of z holding the eigenvector associated
with w[i - 1].

If jobz = 'N', then z is not referenced.

ab On exit, this array is overwritten by the values generated during the

reduction to tridiagonal form (see the description of ?sbtrd).

Return Values
This function returns a value info.

If info=0, the execution is successful.

1044
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = -i, the i-th parameter had an illegal value.

?hbev
Computes all eigenvalues and, optionally,
eigenvectors of a Hermitian band matrix.

Syntax
lapack_int LAPACKE_chbev( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int kd, lapack_complex_float* ab, lapack_int ldab, float* w,
lapack_complex_float* z, lapack_int ldz );
lapack_int LAPACKE_zhbev( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int kd, lapack_complex_double* ab, lapack_int ldab, double* w,
lapack_complex_double* z, lapack_int ldz );

Include Files
• mkl.h

Description

The routine computes all eigenvalues and, optionally, eigenvectors of a complex Hermitian band matrix A.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

If jobz = 'N', then only eigenvalues are computed.

If jobz = 'V', then eigenvalues and eigenvectors are computed.

uplo Must be 'U' or 'L'.

If uplo = 'U', ab stores the upper triangular part of A.

If uplo = 'L', ab stores the lower triangular part of A.

n The order of the matrix A (n≥ 0).

kd The number of super- or sub-diagonals in A

(kd≥ 0).

ab ab (size at least max(1, ldab*n) for column major layout and at least
max(1, ldab*(kd + 1)) for row major layout) is an array containing either
upper or lower triangular part of the Hermitian matrix A (as specified by
uplo) in band storage format.

ldab The leading dimension of ab; must be at least kd +1 for column major
layout and n for row major layout.

ldz The leading dimension of the output array z.

Constraints:
if jobz = 'N', then ldz≥ 1;

1045
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

if jobz = 'V', then ldz≥ max(1, n) .

Output Parameters

w Array, size at least max(1, n).

If info = 0, contains the eigenvalues in ascending order.

z Array z(size max(1, ldz*n).

If jobz = 'V', then if info = 0, z contains the orthonormal eigenvectors

of the matrix A, with the i-th column of z holding the eigenvector associated
with w[i - 1].

If jobz = 'N', then z is not referenced.

ab On exit, this array is overwritten by the values generated during the

reduction to tridiagonal form(see the description of hbtrd).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, then the algorithm failed to converge;

i indicates the number of elements of an intermediate tridiagonal form which did not converge to zero.

?sbevd
Computes all eigenvalues and, optionally, all
eigenvectors of a real symmetric band matrix using
divide and conquer algorithm.

Syntax
lapack_int LAPACKE_ssbevd (int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int kd, float* ab, lapack_int ldab, float* w, float* z, lapack_int ldz);
lapack_int LAPACKE_dsbevd (int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int kd, double* ab, lapack_int ldab, double* w, double* z, lapack_int ldz);

Include Files
• mkl.h

Description

The routine computes all the eigenvalues, and optionally all the eigenvectors, of a real symmetric band
matrix A. In other words, it can compute the spectral factorization of A as:
A = Z*Λ*ZT
Here Λ is a diagonal matrix whose diagonal elements are the eigenvalues λi, and Z is the orthogonal matrix
whose columns are the eigenvectors zi. Thus,
A*zi = λi*zi for i = 1, 2, ..., n.

1046
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If the eigenvectors are requested, then this routine uses a divide and conquer algorithm to compute
eigenvalues and eigenvectors. However, if only eigenvalues are required, then it uses the Pal-Walker-Kahan
variant of the QL or QR algorithm.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

If jobz = 'N', then only eigenvalues are computed.

If jobz = 'V', then eigenvalues and eigenvectors are computed.

uplo Must be 'U' or 'L'.

If uplo = 'U', ab stores the upper triangular part of A.

If uplo = 'L', ab stores the lower triangular part of A.

n The order of the matrix A (n≥ 0).

kd The number of super- or sub-diagonals in A

(kd≥ 0).

ab ab (size at least max(1, ldab*n) for column major layout and at least
max(1, ldab*(kd + 1)) for row major layout) is an array containing either
upper or lower triangular part of the symmetric matrix A (as specified by
uplo) in band storage format.

ldab The leading dimension of ab; must be at least kd+1 for column major
layout and n for row major layout.

ldz The leading dimension of the output array z.

Constraints:
if jobz = 'N', then ldz≥ 1;

if jobz = 'V', then ldz≥ max(1, n) .

Output Parameters

w, z Arrays:
w, size at least max(1, n).
If info = 0, contains the eigenvalues of the matrix A in ascending order.
See also info.
z(size max(1, ldz*n if job = 'V' and at least 1 if job = 'N').
If job = 'V', then this array is overwritten by the orthogonal matrix Z
which contains the eigenvectors of A. The i-th column of Z contains the
eigenvector which corresponds to the eigenvalue w[i - 1].

If job = 'N', then z is not referenced.

ab On exit, this array is overwritten by the values generated during the

reduction to tridiagonal form.

1047
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Return Values
This function returns a value info.

If info=0, the execution is successful.

Application Notes
The computed eigenvalues and eigenvectors are exact for a matrix A+E such that ||E||2=O(ε)*||A||2,
where ε is the machine precision.
The complex analogue of this routine is hbevd.
See also syevd for matrices held in full storage, and spevd for matrices held in packed storage.

?hbevd
Computes all eigenvalues and, optionally, all
eigenvectors of a complex Hermitian band matrix
using divide and conquer algorithm.

Syntax
lapack_int LAPACKE_chbevd( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int kd, lapack_complex_float* ab, lapack_int ldab, float* w,
lapack_complex_float* z, lapack_int ldz );
lapack_int LAPACKE_zhbevd( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int kd, lapack_complex_double* ab, lapack_int ldab, double* w,
lapack_complex_double* z, lapack_int ldz );

Include Files
• mkl.h

Description

The routine computes all the eigenvalues, and optionally all the eigenvectors, of a complex Hermitian band
matrix A. In other words, it can compute the spectral factorization of A as: A = Z*Λ*ZH.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

If jobz = 'N', then only eigenvalues are computed.

1048
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If jobz = 'V', then eigenvalues and eigenvectors are computed.

uplo Must be 'U' or 'L'.

If uplo = 'U', ab stores the upper triangular part of A.

If uplo = 'L', ab stores the lower triangular part of A.

n The order of the matrix A (n≥ 0).

kd The number of super- or sub-diagonals in A

(kd≥ 0).

ab ab (size at least max(1, ldab*n) for column major layout and at least
max(1, ldab*(kd + 1)) for row major layout) is an array containing either
upper or lower triangular part of the Hermitian matrix A (as specified by
uplo) in band storage format.

ldab The leading dimension of ab; must be at least kd+1 for column major
layout and n for row major layout.

ldz The leading dimension of the output array z.

Constraints:
if jobz = 'N', then ldz≥ 1;

if jobz = 'V', then ldz≥ max(1, n) .

Output Parameters

w Array, size at least max(1, n).

If info = 0, contains the eigenvalues of the matrix A in ascending order.
See also info.

z Array, size max(1, ldz*n if job = 'V' and at least 1 if job = 'N'.

If jobz = 'V', then this array is overwritten by the unitary matrix Z which
contains the eigenvectors of A. The i-th column of Z contains the
eigenvector which corresponds to the eigenvalue w[i - 1].

If jobz = 'N', then z is not referenced.

ab On exit, this array is overwritten by the values generated during the

reduction to tridiagonal form.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

Application Notes
The computed eigenvalues and eigenvectors are exact for a matrix A + E such that ||E||2 = O(ε)||A||2,
where ε is the machine precision.
The real analogue of this routine is sbevd.
See also heevd for matrices held in full storage, and hpevd for matrices held in packed storage.

1049
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

?sbevx
Computes selected eigenvalues and, optionally,
eigenvectors of a real symmetric band matrix.

Syntax
lapack_int LAPACKE_ssbevx (int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_int kd, float* ab, lapack_int ldab, float* q, lapack_int ldq, float
vl, float vu, lapack_int il, lapack_int iu, float abstol, lapack_int* m, float* w,
float* z, lapack_int ldz, lapack_int* ifail);
lapack_int LAPACKE_dsbevx (int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_int kd, double* ab, lapack_int ldab, double* q, lapack_int ldq,
double vl, double vu, lapack_int il, lapack_int iu, double abstol, lapack_int* m,
double* w, double* z, lapack_int ldz, lapack_int* ifail);

Include Files
• mkl.h

Description

The routine computes selected eigenvalues and, optionally, eigenvectors of a real symmetric band matrix A.
Eigenvalues and eigenvectors can be selected by specifying either a range of values or a range of indices for
the desired eigenvalues.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

If jobz = 'N', then only eigenvalues are computed.

If jobz = 'V', then eigenvalues and eigenvectors are computed.

range Must be 'A' or 'V' or 'I'.

If range = 'A', the routine computes all eigenvalues.

If range = 'V', the routine computes eigenvalues w[i] in the half-open

interval: vl<w[i]≤vu.

If range = 'I', the routine computes eigenvalues with indices in range il

to iu.

uplo Must be 'U' or 'L'.

If uplo = 'U', ab stores the upper triangular part of A.

If uplo = 'L', ab stores the lower triangular part of A.

n The order of the matrix A (n≥ 0).

kd The number of super- or sub-diagonals in A

(kd≥ 0).

ab Arrays:

1050
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Array ab (size at least max(1, ldab*n) for column major layout and at least
max(1, ldab*(kd + 1)) for row major layout) contains either upper or
lower triangular part of the symmetric matrix A (as specified by uplo) in
band storage format.

ldab The leading dimension of ab; must be at least kd +1 for column major
layout and n for row major layout.

vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
Constraint: vl< vu.

If range = 'A' or 'I', vl and vu are not referenced.

il, iu If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint: 1 ≤il≤iu≤n, if n > 0; il=1 and iu=0

if n = 0.

If range = 'A' or 'V', il and iu are not referenced.

abstol The absolute error tolerance to which each eigenvalue is required. See
Application notes for details on error tolerance.

ldq, ldz The leading dimensions of the output arrays q and z, respectively.
Constraints:
ldq≥ 1, ldz≥ 1;
If jobz = 'V', then ldq≥ max(1, n) and ldz≥ max(1, n) for column
major layout and ldz≥ max(1, m) for row major layout .

Output Parameters

q Array, size max(1, ldz*n).

If jobz = 'V', the n-by-n orthogonal matrix is used in the reduction to

tridiagonal form.
If jobz = 'N', the array q is not referenced.

m The total number of eigenvalues found, 0 ≤m≤n.

If range = 'A', m = n, if range = 'I', m = iu-il+1, and if range =

'V', the exact value of m is not known in advance.

w, z Arrays:
w, size at least max(1, n). The first m elements of w contain the selected
eigenvalues of the matrix A in ascending order.
z(size at least max(1, ldz*m) for column major layout and max(1, ldz*n)
for row major layout).
If jobz = 'V', then if info = 0, the first m columns of z contain the
orthonormal eigenvectors of the matrix A corresponding to the selected
eigenvalues, with the i-th column of z holding the eigenvector associated
with w[i - 1].

1051
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If an eigenvector fails to converge, then that column of z contains the latest

approximation to the eigenvector, and the index of the eigenvector is
returned in ifail.
If jobz = 'N', then z is not referenced.

ab On exit, this array is overwritten by the values generated during the

reduction to tridiagonal form.

ifail Array, size at least max(1, n).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, then i eigenvectors failed to converge; their indices are stored in the array ifail.

If abstol is less than or equal to zero, then ε*||T||1 is used as tolerance, where T is the tridiagonal matrix
obtained by reducing A to tridiagonal form. Eigenvalues will be computed most accurately when abstol is set
to twice the underflow threshold 2*?lamch('S'), not zero.

If this routine returns with info > 0, indicating that some eigenvectors did not converge, try setting abstol
to 2*?lamch('S').

?hbevx
Computes selected eigenvalues and, optionally,
eigenvectors of a Hermitian band matrix.

Syntax
lapack_int LAPACKE_chbevx( int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_int kd, lapack_complex_float* ab, lapack_int ldab,
lapack_complex_float* q, lapack_int ldq, float vl, float vu, lapack_int il, lapack_int
iu, float abstol, lapack_int* m, float* w, lapack_complex_float* z, lapack_int ldz,
lapack_int* ifail );
lapack_int LAPACKE_zhbevx( int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_int kd, lapack_complex_double* ab, lapack_int ldab,
lapack_complex_double* q, lapack_int ldq, double vl, double vu, lapack_int il,
lapack_int iu, double abstol, lapack_int* m, double* w, lapack_complex_double* z,
lapack_int ldz, lapack_int* ifail );

Include Files
• mkl.h

1052
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description

The routine computes selected eigenvalues and, optionally, eigenvectors of a complex Hermitian band matrix
A. Eigenvalues and eigenvectors can be selected by specifying either a range of values or a range of indices
for the desired eigenvalues.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

If job = 'N', then only eigenvalues are computed.

If job = 'V', then eigenvalues and eigenvectors are computed.

range Must be 'A' or 'V' or 'I'.

If range = 'A', the routine computes all eigenvalues.

If range = 'V', the routine computes eigenvalues w[i] in the half-open

interval: vl< w[i]≤vu.

If range = 'I', the routine computes eigenvalues with indices il to iu.

uplo Must be 'U' or 'L'.

If uplo = 'U', ab stores the upper triangular part of A.

If uplo = 'L', ab stores the lower triangular part of A.

n The order of the matrix A (n≥ 0).

kd The number of super- or sub-diagonals in A

(kd≥ 0).

ab ab (size at least max(1, ldab*n) for column major layout and at least
max(1, ldab*(kd + 1)) for row major layout) is an array containing either
upper or lower triangular part of the Hermitian matrix A (as specified by
uplo) in band storage format.

ldab The leading dimension of ab; must be at least kd +1 for column major
layout and n for row major layout.

vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
Constraint: vl< vu.

If range = 'A' or 'I', vl and vu are not referenced.

il, iu If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint: 1 ≤il≤iu≤n, if n > 0; il=1 and iu=0 if n = 0.

If range = 'A' or 'V', il and iu are not referenced.

1053
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

abstol The absolute error tolerance to which each eigenvalue is required. See
Application notes for details on error tolerance.

ldq, ldz The leading dimensions of the output arrays q and z, respectively.
Constraints:
ldq≥ 1, ldz≥ 1;
If jobz = 'V', then ldq≥ max(1, n) and ldz≥ max(1, n) for column major
layout and ldz≥ max(1, m) for row major layout.

Output Parameters

q Array, size max(1, ldz*n).

If jobz = 'V', the n-by-n unitary matrix is used in the reduction to

tridiagonal form.
If jobz = 'N', the array q is not referenced.

m The total number of eigenvalues found,

0 ≤m≤n.
If range = 'A', m = n, if range = 'I', m = iu-il+1, and if range =
'V', the exact value of m is not known in advance..

w Array, size at least max(1, n). The first m elements contain the selected
eigenvalues of the matrix A in ascending order.

z Array z(size at least max(1, ldz*m) for column major layout and max(1,
ldz*n) for row major layout).
If jobz = 'V', then if info = 0, the first m columns of z contain the
orthonormal eigenvectors of the matrix A corresponding to the selected
eigenvalues, with the i-th column of z holding the eigenvector associated
with w[i - 1].

If an eigenvector fails to converge, then that column of z contains the latest

approximation to the eigenvector, and the index of the eigenvector is
returned in ifail.
If jobz = 'N', then z is not referenced.

ab On exit, this array is overwritten by the values generated during the

reduction to tridiagonal form.
If uplo = 'U', the first superdiagonal and the diagonal of the tridiagonal
matrix T are returned in rows kd and kd+1 of ab, and if uplo = 'L', the
diagonal and first subdiagonal of T are returned in the first two rows of ab.

ifail Array, size at least max(1, n).

If jobz = 'V', then if info = 0, the first m elements of ifail are zero; if
info > 0, the ifail contains the indices of the eigenvectors that failed to
converge.
If jobz = 'N', then ifail is not referenced.

1054
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, then i eigenvectors failed to converge; their indices are stored in the array ifail.

Application Notes
An approximate eigenvalue is accepted as converged when it is determined to lie in an interval [a,b] of width
less than or equal to abstol + ε * max( |a|,|b| ), where ε is the machine precision.

If this routine returns with info > 0, indicating that some eigenvectors did not converge, try setting abstol
to 2*?lamch('S').

?stev
Computes all eigenvalues and, optionally,
eigenvectors of a real symmetric tridiagonal matrix.

Syntax
lapack_int LAPACKE_sstev (int matrix_layout, char jobz, lapack_int n, float* d, float*
e, float* z, lapack_int ldz);
lapack_int LAPACKE_dstev (int matrix_layout, char jobz, lapack_int n, double* d,
double* e, double* z, lapack_int ldz);

Include Files
• mkl.h

Description

The routine computes all eigenvalues and, optionally, eigenvectors of a real symmetric tridiagonal matrix A.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

If jobz = 'N', then only eigenvalues are computed.

If jobz = 'V', then eigenvalues and eigenvectors are computed.

n The order of the matrix A (n≥ 0).

d, e Arrays:
Array d contains the n diagonal elements of the tridiagonal matrix A.

The size of d must be at least max(1, n).

Array e contains the n-1 subdiagonal elements of the tridiagonal matrix A.

1055
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The size of e must be at least max(1, n). The n-th element of this array is
used as workspace.

ldz The leading dimension of the output array z; ldz≥ 1. If jobz = 'V' then
ldz≥ max(1, n).

Output Parameters

d On exit, if info = 0, contains the eigenvalues of the matrix A in ascending

order.

z Array, size (size max(1, ldz*n)).

If jobz = 'V', then if info = 0, z contains the orthonormal eigenvectors

of the matrix A, with the i-th column of z holding the eigenvector associated
with the eigenvalue returned in d[i - 1].

If job = 'N', then z is not referenced.

e On exit, this array is overwritten with intermediate results.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, then the algorithm failed to converge;

i elements of e did not converge to zero.

?stevd
Computes all eigenvalues and, optionally, all
eigenvectors of a real symmetric tridiagonal matrix
using divide and conquer algorithm.

Syntax
lapack_int LAPACKE_sstevd (int matrix_layout, char jobz, lapack_int n, float* d, float*
e, float* z, lapack_int ldz);
lapack_int LAPACKE_dstevd (int matrix_layout, char jobz, lapack_int n, double* d,
double* e, double* z, lapack_int ldz);

Include Files
• mkl.h

Description

The routine computes all the eigenvalues, and optionally all the eigenvectors, of a real symmetric tridiagonal
matrix T. In other words, the routine can compute the spectral factorization of T as: T = Z*Λ*ZT.

Here Λ is a diagonal matrix whose diagonal elements are the eigenvalues λi, and Z is the orthogonal matrix
whose columns are the eigenvectors zi. Thus,
T*zi = λi*zi for i = 1, 2, ..., n.

1056
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If the eigenvectors are requested, then this routine uses a divide and conquer algorithm to compute
eigenvalues and eigenvectors. However, if only eigenvalues are required, then it uses the Pal-Walker-Kahan
variant of the QL or QR algorithm.
There is no complex analogue of this routine.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

If jobz = 'N', then only eigenvalues are computed.

If jobz = 'V', then eigenvalues and eigenvectors are computed.

n The order of the matrix T (n ≥ 0).

d, e Arrays:
d contains the n diagonal elements of the tridiagonal matrix T.
The dimension of d must be at least max(1, n).
e contains the n-1 off-diagonal elements of T.
The dimension of e must be at least max(1, n). The n-th element of this
array is used as workspace.

ldz The leading dimension of the output array z. Constraints:

ldz≥ 1 if job = 'N';
ldz≥ max(1, n) if job = 'V'.

Output Parameters

d On exit, if info = 0, contains the eigenvalues of the matrix T in ascending

order.
See also info.

z Array, size max(1, ldz*n) if jobz = 'V' and 1 if jobz = 'N' .

If jobz = 'V', then this array is overwritten by the orthogonal matrix Z

which contains the eigenvectors of T.
If jobz = 'N', then z is not referenced.

e On exit, this array is overwritten with intermediate results.

Return Values
This function returns a value info.

If info=0, the execution is successful.

1057
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Application Notes
The computed eigenvalues and eigenvectors are exact for a matrix T+E such that ||E||2 = O(ε)*||T||2,
where ε is the machine precision.
If λi is an exact eigenvalue, and μi is the corresponding computed value, then
|μi - λi| ≤ c(n)*ε*||T||2
where c(n) is a modestly increasing function of n.

If zi is the corresponding exact eigenvector, and wi is the corresponding computed vector, then the angle
θ(zi, wi) between them is bounded as follows:
θ(zi, wi) ≤ c(n)*ε*||T||2 / min i≠j|λi - λj|.
Thus the accuracy of a computed eigenvector depends on the gap between its eigenvalue and all the other
eigenvalues.

?stevx
Computes selected eigenvalues and eigenvectors of a
real symmetric tridiagonal matrix.

Syntax
lapack_int LAPACKE_sstevx (int matrix_layout, char jobz, char range, lapack_int n,
float* d, float* e, float vl, float vu, lapack_int il, lapack_int iu, float abstol,
lapack_int* m, float* w, float* z, lapack_int ldz, lapack_int* ifail);
lapack_int LAPACKE_dstevx (int matrix_layout, char jobz, char range, lapack_int n,
double* d, double* e, double vl, double vu, lapack_int il, lapack_int iu, double abstol,
lapack_int* m, double* w, double* z, lapack_int ldz, lapack_int* ifail);

Include Files
• mkl.h

Description

The routine computes selected eigenvalues and, optionally, eigenvectors of a real symmetric tridiagonal
matrix A. Eigenvalues and eigenvectors can be selected by specifying either a range of values or a range of
indices for the desired eigenvalues.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

If job = 'N', then only eigenvalues are computed.

If job = 'V', then eigenvalues and eigenvectors are computed.

range Must be 'A' or 'V' or 'I'.

If range = 'A', the routine computes all eigenvalues.

If range = 'V', the routine computes eigenvalues w[i] in the half-open

interval: vl<w[i]≤vu.

1058
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If range = 'I', the routine computes eigenvalues with indices il to iu.

n The order of the matrix A (n≥ 0).

d, e Arrays:
d contains the n diagonal elements of the tridiagonal matrix A.
The dimension of d must be at least max(1, n).
e contains the n-1 subdiagonal elements of A.
The dimension of e must be at least max(1, n-1). The n-th element of this
array is used as workspace.

vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
Constraint: vl< vu.

If range = 'A' or 'I', vl and vu are not referenced.

il, iu If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint: 1 ≤il≤iu≤n, if n > 0; il=1 and iu=0 if n = 0.

If range = 'A' or 'V', il and iu are not referenced.

abstol
ldz The leading dimensions of the output array z; ldz≥ 1. If jobz = 'V', then
ldz≥ max(1, n) for column major layout and ldz≥ max(1, m) for row major
layout.

Output Parameters

m The total number of eigenvalues found,

0 ≤m≤n.
If range = 'A', m = n, if range = 'I', m = iu-il+1, and if range =
'V' the exact value of m is unknown.

w, z Arrays:
w, size at least max(1, n).
The first m elements of w contain the selected eigenvalues of the matrix A
in ascending order.

z(size at least max(1, ldz*m) for column major layout and max(1, ldz*n)
for row major layout) .
If jobz = 'V', then if info = 0, the first m columns of z contain the
orthonormal eigenvectors of the matrix A corresponding to the selected
eigenvalues, with the i-th column of z holding the eigenvector associated
with w[i - 1].

If an eigenvector fails to converge, then that column of z contains the latest

approximation to the eigenvector, and the index of the eigenvector is
returned in ifail.

1059
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If jobz = 'N', then z is not referenced.

d, e On exit, these arrays may be multiplied by a constant factor chosen to

avoid overflow or underflow in computing the eigenvalues.

ifail Array, size at least max(1, n).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, then i eigenvectors failed to converge; their indices are stored in the array ifail.

If abstol is less than or equal to zero, then ε*|A|1 is used instead. Eigenvalues are computed most accurately
when abstol is set to twice the underflow threshold 2*?lamch('S'), not zero.

If this routine returns with info > 0, indicating that some eigenvectors did not converge, set abstol to
2*?lamch('S').

?stevr
Computes selected eigenvalues and, optionally,
eigenvectors of a real symmetric tridiagonal matrix
using the Relatively Robust Representations.

Syntax
lapack_int LAPACKE_sstevr (int matrix_layout, char jobz, char range, lapack_int n,
float* d, float* e, float vl, float vu, lapack_int il, lapack_int iu, float abstol,
lapack_int* m, float* w, float* z, lapack_int ldz, lapack_int* isuppz);
lapack_int LAPACKE_dstevr (int matrix_layout, char jobz, char range, lapack_int n,
double* d, double* e, double vl, double vu, lapack_int il, lapack_int iu, double abstol,
lapack_int* m, double* w, double* z, lapack_int ldz, lapack_int* isuppz);

Include Files
• mkl.h

Description

The routine computes selected eigenvalues and, optionally, eigenvectors of a real symmetric tridiagonal
matrix T. Eigenvalues and eigenvectors can be selected by specifying either a range of values or a range of
indices for the desired eigenvalues.

1060
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Whenever possible, the routine calls stemr to compute the eigenspectrum using Relatively Robust
Representations. stegr computes eigenvalues by the dqds algorithm, while orthogonal eigenvectors are
computed from various "good" L*D*LT representations (also known as Relatively Robust Representations).
Gram-Schmidt orthogonalization is avoided as far as possible. More specifically, the various steps of the
algorithm are as follows. For the i-th unreduced block of T:

a. Compute T - σi = LiDiLiT, such that LiDiLiT is a relatively robust representation.

b. Compute the eigenvalues, λj, of Li*Di*LiT to high relative accuracy by the dqds algorithm.
c. If there is a cluster of close eigenvalues, "choose" σi close to the cluster, and go to Step (a).
d. Given the approximate eigenvalue λj of Li*Di*LiT, compute the corresponding eigenvector by forming a
rank-revealing twisted factorization.

The desired accuracy of the output can be specified by the input parameter abstol.
The routine ?stevr calls stemr when the full spectrum is requested on machines which conform to the
IEEE-754 floating point standard. ?stevr calls stebz and stein on non-IEEE machines and when partial
spectrum requests are made.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

If jobz = 'N', then only eigenvalues are computed.

If jobz = 'V', then eigenvalues and eigenvectors are computed.

range Must be 'A' or 'V' or 'I'.

If range = 'A', the routine computes all eigenvalues.

If range = 'V', the routine computes eigenvalues w[i]in the half-open

interval:
vl<w[i]≤vu.
If range = 'I', the routine computes eigenvalues with indices il to iu.

For range = 'V'or 'I' and iu-il < n-1, sstebz/dstebz and sstein/
dstein are called.

n The order of the matrix T (n≥ 0).

d, e Arrays:
d contains the n diagonal elements of the tridiagonal matrix T.
The dimension of d must be at least max(1, n).
econtains the n-1 subdiagonal elements of A.
The dimension of e must be at least max(1, n-1). The n-th element of this
array is used as workspace.

vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
Constraint: vl< vu.

If range = 'A' or 'I', vl and vu are not referenced.

il, iu

1061
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint: 1 ≤il≤iu≤n, if n > 0; il=1 and iu=0 if n = 0.

If range = 'A' or 'V', il and iu are not referenced.

abstol The absolute error tolerance to which each eigenvalue/eigenvector is

required.
If jobz = 'V', the eigenvalues and eigenvectors output have residual
norms bounded by abstol, and the dot products between different
eigenvectors are bounded by abstol. If abstol < n *eps*||T||, then n
*eps*||T|| will be used in its place, where eps is the machine precision,
and ||T|| is the 1-norm of the matrix T. The eigenvalues are computed to
an accuracy of eps*||T|| irrespective of abstol.

If high relative accuracy is important, set abstol to ?lamch('S').

ldz The leading dimension of the output array z.

Constraints:
ldz≥ 1 if jobz = 'N';
ldz≥ max(1, n) for column major layout and ldz≥ max(1, m) for row major
layout if jobz = 'V'.

Output Parameters

m The total number of eigenvalues found,

0 ≤m≤n. If range = 'A', m = n, if range = 'I', m = iu-il+1, and if
range = 'V' the exact value of m is unknown..

w, z Arrays:
w, size at least max(1, n).
The first m elements of w contain the selected eigenvalues of the matrix T
in ascending order.
z(size at least max(1, ldz*m) for column major layout and max(1, ldz*n)
for row major layout).
If jobz = 'V', then if info = 0, the first m columns of z contain the
orthonormal eigenvectors of the matrix T corresponding to the selected
eigenvalues, with the i-th column of z holding the eigenvector associated
with w[i - 1].

If jobz = 'N', then z is not referenced.

d, e On exit, these arrays may be multiplied by a constant factor chosen to

avoid overflow or underflow in computing the eigenvalues.

isuppz Array, size at least 2 *max(1, m).

The support of the eigenvectors in z, i.e., the indices indicating the nonzero
elements in z. The i-th eigenvector is nonzero only in elements isuppz[2i
- 2] through isuppz[2i - 1].
Implemented only for range = 'A' or 'I' and iu-il = n-1.

1062
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, an internal error has occurred.

Application Notes
Normal execution of the routine ?stegr may create NaNs and infinities and hence may abort due to a floating
point exception in environments which do not handle NaNs and infinities in the IEEE standard default manner.

Nonsymmetric Eigenvalue Problems: LAPACK Driver Routines

This topic describes LAPACK driver routines used for solving nonsymmetric eigenproblems. See also
computational routines that can be called to solve these problems.
Table "Driver Routines for Solving Nonsymmetric Eigenproblems" lists all such driver routines.
Driver Routines for Solving Nonsymmetric Eigenproblems
Routine Name Operation performed

gees Computes the eigenvalues and Schur factorization of a general matrix, and orders
the factorization so that selected eigenvalues are at the top left of the Schur form.

geesx Computes the eigenvalues and Schur factorization of a general matrix, orders the
factorization and computes reciprocal condition numbers.

geev Computes the eigenvalues and left and right eigenvectors of a general matrix.

geevx Computes the eigenvalues and left and right eigenvectors of a general matrix, with
preliminary matrix balancing, and computes reciprocal condition numbers for the
eigenvalues and right eigenvectors.

?gees
Computes the eigenvalues and Schur factorization of a
general matrix, and orders the factorization so that
selected eigenvalues are at the top left of the Schur
form.

Syntax
lapack_int LAPACKE_sgees( int matrix_layout, char jobvs, char sort, LAPACK_S_SELECT2
select, lapack_int n, float* a, lapack_int lda, lapack_int* sdim, float* wr, float* wi,
float* vs, lapack_int ldvs );
lapack_int LAPACKE_dgees( int matrix_layout, char jobvs, char sort, LAPACK_D_SELECT2
select, lapack_int n, double* a, lapack_int lda, lapack_int* sdim, double* wr, double*
wi, double* vs, lapack_int ldvs );
lapack_int LAPACKE_cgees( int matrix_layout, char jobvs, char sort, LAPACK_C_SELECT1
select, lapack_int n, lapack_complex_float* a, lapack_int lda, lapack_int* sdim,
lapack_complex_float* w, lapack_complex_float* vs, lapack_int ldvs );
lapack_int LAPACKE_zgees( int matrix_layout, char jobvs, char sort, LAPACK_Z_SELECT1
select, lapack_int n, lapack_complex_double* a, lapack_int lda, lapack_int* sdim,
lapack_complex_double* w, lapack_complex_double* vs, lapack_int ldvs );

1063
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description

The routine computes for an n-by-n real/complex nonsymmetric matrix A, the eigenvalues, the real Schur
form T, and, optionally, the matrix of Schur vectors Z. This gives the Schur factorization A = Z*T*ZH.

Optionally, it also orders the eigenvalues on the diagonal of the real-Schur/Schur form so that selected
eigenvalues are at the top left. The leading columns of Z then form an orthonormal basis for the invariant
subspace corresponding to the selected eigenvalues.
A real matrix is in real-Schur form if it is upper quasi-triangular with 1-by-1 and 2-by-2 blocks. 2-by-2 blocks
will be standardized in the form

where b*c < 0. The eigenvalues of such a block are

A complex matrix is in Schur form if it is upper triangular.

1064
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobvs Must be 'N' or 'V'.

If jobvs = 'N', then Schur vectors are not computed.

If jobvs = 'V', then Schur vectors are computed.

sort Must be 'N' or 'S'. Specifies whether or not to order the eigenvalues on
the diagonal of the Schur form.
If sort = 'N', then eigenvalues are not ordered.

If sort = 'S', eigenvalues are ordered (see select).

select If sort = 'S', select is used to select eigenvalues to sort to the top left of
the Schur form.
If sort = 'N', select is not referenced.

For real flavors:

An eigenvalue wr[j]+sqrt(-1)*wi[j] is selected if select(wr[j], wi[j]) is
true; that is, if either one of a complex conjugate pair of eigenvalues is
selected, then both complex eigenvalues are selected.
For complex flavors:
An eigenvalue w[j] is selected if select(w[j]) is true.
Note that a selected complex eigenvalue may no longer satisfy select(wr[j],
wi[j])= 1 after ordering, since ordering may change the value of complex
eigenvalues (especially if the eigenvalue is ill-conditioned); in this case info
may be set to n+2 (see info below).

n The order of the matrix A (n≥ 0).

a Arrays:
a (size at least max(1, lda*n)) is an array containing the n-by-n matrix A.

lda The leading dimension of the array a. Must be at least max(1, n).

ldvs The leading dimension of the output array vs. Constraints:

ldvs≥ 1;
ldvs≥ max(1, n) if jobvs = 'V'.

Output Parameters

a On exit, this array is overwritten by the real-Schur/Schur form T.

sdim If sort = 'N', sdim= 0.

If sort = 'S', sdim is equal to the number of eigenvalues (after sorting)

for which select is true.
Note that for real flavors complex conjugate pairs for which select is true for
either eigenvalue count as 2.

1065
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

wr, wi Arrays, size at least max (1, n) each. Contain the real and imaginary parts,
respectively, of the computed eigenvalues, in the same order that they
appear on the diagonal of the output real-Schur form T. Complex conjugate
pairs of eigenvalues appear consecutively with the eigenvalue having
positive imaginary part first.

w Array, size at least max(1, n). Contains the computed eigenvalues. The
eigenvalues are stored in the same order as they appear on the diagonal of
the output Schur form T.

vs Array vs (size at least max(1, ldvs*n)) .

If jobvs = 'V', vs contains the orthogonal/unitary matrix Z of Schur

vectors.
If jobvs = 'N', vs is not referenced.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, and

i≤n:
the QR algorithm failed to compute all the eigenvalues; elements 1:ilo-1 and i+1:n of wr and wi (for real
flavors) or w (for complex flavors) contain those eigenvalues which have converged; if jobvs = 'V', vs
contains the matrix which reduces A to its partially converged Schur form;
i = n+1:
the eigenvalues could not be reordered because some eigenvalues were too close to separate (the problem is
very ill-conditioned);
i = n+2:
after reordering, round-off changed values of some complex eigenvalues so that leading eigenvalues in the
Schur form no longer satisfy select = 1. This could also be caused by underflow due to scaling.

?geesx
Computes the eigenvalues and Schur factorization of a
general matrix, orders the factorization and computes
reciprocal condition numbers.

Syntax
lapack_int LAPACKE_sgeesx( int matrix_layout, char jobvs, char sort, LAPACK_S_SELECT2
select, char sense, lapack_int n, float* a, lapack_int lda, lapack_int* sdim, float* wr,
float* wi, float* vs, lapack_int ldvs, float* rconde, float* rcondv );
lapack_int LAPACKE_dgeesx( int matrix_layout, char jobvs, char sort, LAPACK_D_SELECT2
select, char sense, lapack_int n, double* a, lapack_int lda, lapack_int* sdim, double*
wr, double* wi, double* vs, lapack_int ldvs, double* rconde, double* rcondv );
lapack_int LAPACKE_cgeesx( int matrix_layout, char jobvs, char sort, LAPACK_C_SELECT1
select, char sense, lapack_int n, lapack_complex_float* a, lapack_int lda, lapack_int*
sdim, lapack_complex_float* w, lapack_complex_float* vs, lapack_int ldvs, float*
rconde, float* rcondv );

1066
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_zgeesx( int matrix_layout, char jobvs, char sort, LAPACK_Z_SELECT1
select, char sense, lapack_int n, lapack_complex_double* a, lapack_int lda, lapack_int*
sdim, lapack_complex_double* w, lapack_complex_double* vs, lapack_int ldvs, double*
rconde, double* rcondv );

Include Files
• mkl.h

Description

The routine computes for an n-by-n real/complex nonsymmetric matrix A, the eigenvalues, the real-Schur/
Schur form T, and, optionally, the matrix of Schur vectors Z. This gives the Schur factorization A = Z*T*ZH.

Optionally, it also orders the eigenvalues on the diagonal of the real-Schur/Schur form so that selected
eigenvalues are at the top left; computes a reciprocal condition number for the average of the selected
eigenvalues (rconde); and computes a reciprocal condition number for the right invariant subspace
corresponding to the selected eigenvalues (rcondv). The leading columns of Z form an orthonormal basis for
this invariant subspace.
For further explanation of the reciprocal condition numbers rconde and rcondv, see [LUG], Section 4.10
(where these quantities are called s and sep respectively).
A real matrix is in real-Schur form if it is upper quasi-triangular with 1-by-1 and 2-by-2 blocks. 2-by-2 blocks
will be standardized in the form

1067
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

where b*c < 0. The eigenvalues of such a block are

A complex matrix is in Schur form if it is upper triangular.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobvs Must be 'N' or 'V'.

If jobvs = 'N', then Schur vectors are not computed.

If jobvs = 'V', then Schur vectors are computed.

sort Must be 'N' or 'S'. Specifies whether or not to order the eigenvalues on
the diagonal of the Schur form.
If sort = 'N', then eigenvalues are not ordered.

If sort = 'S', eigenvalues are ordered (see select).

select If sort = 'S', select is used to select eigenvalues to sort to the top left of
the Schur form.
If sort = 'N', select is not referenced.

1068
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
For real flavors:
An eigenvalue wr[j]+sqrt(-1)*wi[j] is selected if select(wr[j], wi[j]) is
true; that is, if either one of a complex conjugate pair of eigenvalues is
selected, then both complex eigenvalues are selected.
For complex flavors:
An eigenvalue w[j] is selected if select(w[j]) is true.
Note that a selected complex eigenvalue may no longer satisfy select(wr[j],
wi[j])= 1 after ordering, since ordering may change the value of complex
eigenvalues (especially if the eigenvalue is ill-conditioned); in this case info
may be set to n+2 (see info below).

sense Must be 'N', 'E', 'V', or 'B'. Determines which reciprocal condition
number are computed.
If sense = 'N', none are computed;

If sense = 'E', computed for average of selected eigenvalues only;

If sense = 'V', computed for selected right invariant subspace only;

If sense = 'B', computed for both.

If sense is 'E', 'V', or 'B', then sort must equal 'S'.

n The order of the matrix A (n≥ 0).

a Arrays:
a (size at least max(1, lda*n)) is an array containing the n-by-n matrix A.

lda The leading dimension of the array a. Must be at least max(1, n).

ldvs The leading dimension of the output array vs. Constraints:

ldvs≥ 1;
ldvs≥ max(1, n)if jobvs = 'V'.

Output Parameters

a On exit, this array is overwritten by the real-Schur/Schur form T.

sdim If sort = 'N', sdim= 0.

If sort = 'S', sdim is equal to the number of eigenvalues (after sorting)

for which select is true.
Note that for real flavors complex conjugate pairs for which select is true for
either eigenvalue count as 2.

w Array, size at least max(1, n). Contains the computed eigenvalues. The
eigenvalues are stored in the same order as they appear on the diagonal of
the output Schur form T.

1069
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

vs Array vs (size at least max(1, ldvs*n))

If jobvs = 'V', vs contains the orthogonal/unitary matrix Z of Schur

vectors.
If jobvs = 'N', vs is not referenced.

rconde, rcondv If sense = 'E' or 'B', rconde contains the reciprocal condition number for
the average of the selected eigenvalues.
If sense = 'N' or 'V', rconde is not referenced.

If sense = 'V' or 'B', rcondv contains the reciprocal condition number for
the selected right invariant subspace.
If sense = 'N' or 'E', rcondv is not referenced.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, and

i≤n:
the QR algorithm failed to compute all the eigenvalues; elements 1:ilo-1 and i+1:n of wr and wi (for real
flavors) or w (for complex flavors) contain those eigenvalues which have converged; if jobvs = 'V', vs
contains the transformation which reduces A to its partially converged Schur form;
i = n+1:
the eigenvalues could not be reordered because some eigenvalues were too close to separate (the problem is
very ill-conditioned);
i = n+2:
after reordering, roundoff changed values of some complex eigenvalues so that leading eigenvalues in the
Schur form no longer satisfy select = 1. This could also be caused by underflow due to scaling.

?geev
Computes the eigenvalues and left and right
eigenvectors of a general matrix.

Syntax
lapack_int LAPACKE_sgeev( int matrix_layout, char jobvl, char jobvr, lapack_int n,
float* a, lapack_int lda, float* wr, float* wi, float* vl, lapack_int ldvl, float* vr,
lapack_int ldvr );
lapack_int LAPACKE_dgeev( int matrix_layout, char jobvl, char jobvr, lapack_int n,
double* a, lapack_int lda, double* wr, double* wi, double* vl, lapack_int ldvl, double*
vr, lapack_int ldvr );
lapack_int LAPACKE_cgeev( int matrix_layout, char jobvl, char jobvr, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* w, lapack_complex_float*
vl, lapack_int ldvl, lapack_complex_float* vr, lapack_int ldvr );

1070
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_zgeev( int matrix_layout, char jobvl, char jobvr, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* w,
lapack_complex_double* vl, lapack_int ldvl, lapack_complex_double* vr, lapack_int
ldvr );

Include Files
• mkl.h

Description

The routine computes for an n-by-n real/complex nonsymmetric matrix A, the eigenvalues and, optionally,
the left and/or right eigenvectors. The right eigenvector v of A satisfies
A*v = λ*v
where λ is its eigenvalue.

The left eigenvector u of A satisfies

uH*A = λ*uH
where uH denotes the conjugate transpose of u. The computed eigenvectors are normalized to have
Euclidean norm equal to 1 and largest component real.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobvl Must be 'N' or 'V'.

If jobvl = 'N', then left eigenvectors of A are not computed.

If jobvl = 'V', then left eigenvectors of A are computed.

jobvr Must be 'N' or 'V'.

If jobvr = 'N', then right eigenvectors of A are not computed.

If jobvr = 'V', then right eigenvectors of A are computed.

n The order of the matrix A (n≥ 0).

a a (size at least max(1, lda*n)) is an array containing the n-by-n matrix A.

lda The leading dimension of the array a. Must be at least max(1, n).

ldvl, ldvr The leading dimensions of the output arrays vl and vr, respectively.

Constraints:
ldvl≥ 1; ldvr≥ 1.
If jobvl = 'V', ldvl≥ max(1, n);

If jobvr = 'V', ldvr≥ max(1, n).

Output Parameters

a On exit, this array is overwritten.

1071
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

wr, wi Arrays, size at least max (1, n) each.

Contain the real and imaginary parts, respectively, of the computed

eigenvalues. Complex conjugate pairs of eigenvalues appear consecutively
with the eigenvalue having positive imaginary part first.

w Array, size at least max(1, n).

Contains the computed eigenvalues.

vl, vr Arrays:
vl (size at least max(1, ldvl*n)) .
If jobvl = 'N', vl is not referenced.

For real flavors:

If the j-th eigenvalue is real,the i-th component of the j-th eigenvector uj is
stored in vl[(i - 1) + (j - 1)*ldvl] for column major layout and in
vl[(i - 1)*ldvl + (j - 1)] for row major layout..
If the j-th and (j+1)-st eigenvalues form a complex conjugate pair, then for
i = sqrt(-1), the k-th component of the j-th eigenvector uj is vl[(k - 1)
+ (j - 1)*ldvl] + i*vl[(k - 1) + j*ldvl] for column major layout and as
vl[(k - 1)*ldvl + (j - 1)] + i*vl[(k-1)*ldvl + j] for row major layout.
Similarly, the k-th component of vector (j+1) uj + 1 is vl[(k - 1) + (j -
1)*ldvl] - i*vl[(k - 1) + j*ldvl] for column major layout and as vl[(k -
1)*ldvl + (j - 1)] -i*vl[(k - 1)*ldvl + j] for row major layout. .

For complex flavors:

The i-th component of the j-th eigenvector uj is stored in vl[(i - 1) +
(j - 1)*ldvl] for column major layout and in vl[(i - 1)*ldvl+(j -
1)] for row major layout.
vr (size at least max(1, ldvr*n)).
If jobvr = 'N', vr is not referenced.

For real flavors:

If the j-th eigenvalue is real, then the i-th component of j-th eigenvector vj
is stored in vr[(i - 1) + (j - 1)*ldvr] for column major layout and in
vr[(i - 1)*ldvr + (j - 1)] for row major layout..
If the j-th and (j+1)-st eigenvalues form a complex conjugate pair, then for
i = sqrt(-1), the k-th component of the j-th eigenvector vj is vr[(k - 1)
+ (j - 1)*ldvr] +i*vr[(k - 1) + j*ldvr] for column major layout and as
vr[(k - 1)*ldvr + (j - 1)] + i*vr[(k - 1)*ldvr + j] for row major layout.
Similarly, the k-th component of vector j + 1) vj + 1 is vr[(k - 1) + (j -
1)*ldvr] - i*vr[(k - 1) + j*ldvr] for column major layout and as vr[(k -
1)*ldvr + (j - 1)] - i*vr[(k - 1)*ldvr + j] for row major layout.

For complex flavors:

The i-th component of the j-th eigenvector vj is stored in vr[(i - 1) + (j
- 1)*ldvr] for column major layout and in vr[(i - 1)*ldvr + (j -
1)] for row major layout.

1072
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, the QR algorithm failed to compute all the eigenvalues, and no eigenvectors have been
computed; elements i+1:n of wr and wi (for real flavors) or w (for complex flavors) contain those
eigenvalues which have converged.

?geevx
Computes the eigenvalues and left and right
eigenvectors of a general matrix, with preliminary
matrix balancing, and computes reciprocal condition
numbers for the eigenvalues and right eigenvectors.

Syntax
lapack_int LAPACKE_sgeevx( int matrix_layout, char balanc, char jobvl, char jobvr, char
sense, lapack_int n, float* a, lapack_int lda, float* wr, float* wi, float* vl,
lapack_int ldvl, float* vr, lapack_int ldvr, lapack_int* ilo, lapack_int* ihi, float*
scale, float* abnrm, float* rconde, float* rcondv );
lapack_int LAPACKE_dgeevx( int matrix_layout, char balanc, char jobvl, char jobvr, char
sense, lapack_int n, double* a, lapack_int lda, double* wr, double* wi, double* vl,
lapack_int ldvl, double* vr, lapack_int ldvr, lapack_int* ilo, lapack_int* ihi, double*
scale, double* abnrm, double* rconde, double* rcondv );
lapack_int LAPACKE_cgeevx( int matrix_layout, char balanc, char jobvl, char jobvr, char
sense, lapack_int n, lapack_complex_float* a, lapack_int lda, lapack_complex_float* w,
lapack_complex_float* vl, lapack_int ldvl, lapack_complex_float* vr, lapack_int ldvr,
lapack_int* ilo, lapack_int* ihi, float* scale, float* abnrm, float* rconde, float*
rcondv );
lapack_int LAPACKE_zgeevx( int matrix_layout, char balanc, char jobvl, char jobvr, char
sense, lapack_int n, lapack_complex_double* a, lapack_int lda, lapack_complex_double*
w, lapack_complex_double* vl, lapack_int ldvl, lapack_complex_double* vr, lapack_int
ldvr, lapack_int* ilo, lapack_int* ihi, double* scale, double* abnrm, double* rconde,
double* rcondv );

Include Files
• mkl.h

Description

The routine computes for an n-by-n real/complex nonsymmetric matrix A, the eigenvalues and, optionally,
the left and/or right eigenvectors.
Optionally also, it computes a balancing transformation to improve the conditioning of the eigenvalues and
eigenvectors (ilo, ihi, scale, and abnrm), reciprocal condition numbers for the eigenvalues (rconde), and
reciprocal condition numbers for the right eigenvectors (rcondv).
The right eigenvector v of A satisfies

A·v = λ·v
where λ is its eigenvalue.

1073
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The left eigenvector u of A satisfies

uHA = λuH
where uH denotes the conjugate transpose of u. The computed eigenvectors are normalized to have Euclidean
norm equal to 1 and largest component real.
Balancing a matrix means permuting the rows and columns to make it more nearly upper triangular, and
applying a diagonal similarity transformation D*A*inv(D), where D is a diagonal matrix, to make its rows and
columns closer in norm and the condition numbers of its eigenvalues and eigenvectors smaller. The computed
reciprocal condition numbers correspond to the balanced matrix. Permuting rows and columns will not
change the condition numbers in exact arithmetic) but diagonal scaling will. For further explanation of
balancing, see [LUG], Section 4.10.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

balanc Must be 'N', 'P', 'S', or 'B'. Indicates how the input matrix should be
diagonally scaled and/or permuted to improve the conditioning of its
eigenvalues.
If balanc = 'N', do not diagonally scale or permute;

If balanc = 'P', perform permutations to make the matrix more nearly

upper triangular. Do not diagonally scale;
If balanc = 'S', diagonally scale the matrix, i.e. replace A by
D*A*inv(D), where D is a diagonal matrix chosen to make the rows and
columns of A more equal in norm. Do not permute;
If balanc = 'B', both diagonally scale and permute A.

Computed reciprocal condition numbers will be for the matrix after

balancing and/or permuting. Permuting does not change condition numbers
(in exact arithmetic), but balancing does.

jobvl Must be 'N' or 'V'.

If jobvl = 'N', left eigenvectors of A are not computed;

If jobvl = 'V', left eigenvectors of A are computed.

If sense = 'E' or 'B', then jobvl must be 'V'.

jobvr Must be 'N' or 'V'.

If jobvr = 'N', right eigenvectors of A are not computed;

If jobvr = 'V', right eigenvectors of A are computed.

If sense = 'E' or 'B', then jobvr must be 'V'.

sense Must be 'N', 'E', 'V', or 'B'. Determines which reciprocal condition
number are computed.
If sense = 'N', none are computed;

If sense = 'E', computed for eigenvalues only;

If sense = 'V', computed for right eigenvectors only;

If sense = 'B', computed for eigenvalues and right eigenvectors.

1074
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If sense is 'E' or 'B', both left and right eigenvectors must also be
computed (jobvl = 'V' and jobvr = 'V').

n The order of the matrix A (n≥ 0).

a Arrays:
a (size at least max(1, lda*n)) is an array containing the n-by-n matrix A.

lda The leading dimension of the array a. Must be at least max(1, n).

ldvl, ldvr The leading dimensions of the output arrays vl and vr, respectively.
Constraints:
ldvl≥ 1; ldvr≥ 1.
If jobvl = 'V', ldvl≥ max(1, n);

If jobvr = 'V', ldvr≥ max(1, n).

Output Parameters

a On exit, this array is overwritten.

If jobvl = 'V' or jobvr = 'V', it contains the real-Schur/Schur form of
the balanced version of the input matrix A.

wr, wi Arrays, size at least max (1, n) each. Contain the real and imaginary parts,
respectively, of the computed eigenvalues. Complex conjugate pairs of
eigenvalues appear consecutively with the eigenvalue having positive
imaginary part first.

w Array, size at least max(1, n). Contains the computed eigenvalues.

vl, vr Arrays:
vl (size at least max(1, ldvl*n)) .
If jobvl = 'N', vl is not referenced.

For real flavors:

For complex flavors:

The i-th component of the j-th eigenvector uj is stored in vl[(i - 1) +
(j - 1)*ldvl] for column major layout and in vl[(i - 1)*ldvl+(j -
1)] for row major layout.
vr (size at least max(1, ldvr*n)).

1075
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If jobvr = 'N', vr is not referenced.

For real flavors:

For complex flavors:

The i-th component of the j-th eigenvector vj is stored in vr[(i - 1) + (j
- 1)*ldvr] for column major layout and in vr[(i - 1)*ldvr + (j -
1)] for row major layout.

ilo, ihi ilo and ihi are integer values determined when A was balanced.
The balanced A(i,j) = 0 if i > j and j = 1,..., ilo-1 or i = ihi
+1,..., n.
If balanc = 'N' or 'S', ilo = 1 and ihi = n.

scale Array, size at least max(1, n). Details of the permutations and scaling
factors applied when balancing A.
If P[j - 1] is the index of the row and column interchanged with row and
column j, and D[j - 1] is the scaling factor applied to row and column j,
then
scale[j - 1] = P[j - 1], for j = 1,...,ilo-1
= D[j - 1], for j = ilo,...,ihi
= P[j - 1] for j = ihi+1,..., n.
The order in which the interchanges are made is n to ihi+1, then 1 to ilo-1.

abnrm The one-norm of the balanced matrix (the maximum of the sum of absolute
values of elements of any column).

rconde, rcondv Arrays, size at least max(1, n) each.

rconde[j - 1] is the reciprocal condition number of the j-th eigenvalue.
rcondv[j - 1] is the reciprocal condition number of the j-th right eigenvector.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

1076
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = i, the QR algorithm failed to compute all the eigenvalues, and no eigenvectors or condition
numbers have been computed; elements 1:ilo-1 and i+1:n of wr and wi (for real flavors) or w (for complex
flavors) contain eigenvalues which have converged.

Singular Value Decomposition: LAPACK Driver Routines

Table "Driver Routines for Singular Value Decomposition" lists the LAPACK driver routines that perform
singular value decomposition .
Driver Routines for Singular Value Decomposition
Routine Name Operation performed

?gesvd Computes the singular value decomposition of a general rectangular matrix.

?gesdd Computes the singular value decomposition of a general rectangular matrix using a
divide and conquer method.

?gejsv Computes the singular value decomposition of a real matrix using a preconditioned
Jacobi SVD method.

?gesvj Computes the singular value decomposition of a real matrix using Jacobi plane
rotations.

?ggsvd Computes the generalized singular value decomposition of a pair of general

rectangular matrices.

?gesvdx Computes the SVD and left and right singular vectors for a matrix.

?bdsvdx Computes the SVD of a bidiagonal matrix.

? Computes the truncated SVD of a group of general m-by-n matrices that are stored
gesvda_batch_stri at a constant stride from each other in a contiguous block of memory.
ded

Singular Value Decomposition - LAPACK Computational Routines

?gesvd
Computes the singular value decomposition of a
general rectangular matrix.

Syntax
lapack_int LAPACKE_sgesvd( int matrix_layout, char jobu, char jobvt, lapack_int m,
lapack_int n, float* a, lapack_int lda, float* s, float* u, lapack_int ldu, float* vt,
lapack_int ldvt, float* superb );
lapack_int LAPACKE_dgesvd( int matrix_layout, char jobu, char jobvt, lapack_int m,
lapack_int n, double* a, lapack_int lda, double* s, double* u, lapack_int ldu, double*
vt, lapack_int ldvt, double* superb );
lapack_int LAPACKE_cgesvd( int matrix_layout, char jobu, char jobvt, lapack_int m,
lapack_int n, lapack_complex_float* a, lapack_int lda, float* s, lapack_complex_float*
u, lapack_int ldu, lapack_complex_float* vt, lapack_int ldvt, float* superb );
lapack_int LAPACKE_zgesvd( int matrix_layout, char jobu, char jobvt, lapack_int m,
lapack_int n, lapack_complex_double* a, lapack_int lda, double* s,
lapack_complex_double* u, lapack_int ldu, lapack_complex_double* vt, lapack_int ldvt,
double* superb );

1077
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description

The routine computes the singular value decomposition (SVD) of a real/complex m-by-n matrix A, optionally
computing the left and/or right singular vectors. The SVD is written as
A = U*Σ*VT for real routines
A = U*Σ*VH for complex routines
where Σ is an m-by-n matrix which is zero except for its min(m,n) diagonal elements, U is an m-by-m
orthogonal/unitary matrix, and V is an n-by-n orthogonal/unitary matrix. The diagonal elements of Σ are the
singular values of A; they are real and non-negative, and are returned in descending order. The first min(m,
n) columns of U and V are the left and right singular vectors of A.
The routine returns VT (for real flavors) or VH (for complex flavors), not V.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobu Must be 'A', 'S', 'O', or 'N'. Specifies options for computing all or part
of the matrix U.
If jobu = 'A', all m columns of U are returned in the array u;

if jobu = 'S', the first min(m, n) columns of U (the left singular vectors)
are returned in the array u;
if jobu = 'O', the first min(m, n) columns of U (the left singular vectors)
are overwritten on the array a;
if jobu = 'N', no columns of U (no left singular vectors) are computed.

jobvt Must be 'A', 'S', 'O', or 'N'. Specifies options for computing all or part
of the matrix VT/VH.
If jobvt = 'A', all n rows of VT/VH are returned in the array vt;

if jobvt = 'S', the first min(m,n) rows of VT/VH (the right singular
vectors) are returned in the array vt;
if jobvt = 'O', the first min(m,n) rows of VT/VH) (the right singular
vectors) are overwritten on the array a;
if jobvt = 'N', no rows of VT/VH (no right singular vectors) are computed.

jobvt and jobu cannot both be 'O'.

m The number of rows of the matrix A (m≥ 0).

n The number of columns in A (n≥ 0).

a Arrays:
a(size at least max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout) is an array containing the m-by-n matrix A.

1078
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lda The leading dimension of the array a.
Must be at least max(1, m) for column major layout and at least max(1, n)
for row major layout .

ldu, ldvt The leading dimensions of the output arrays u and vt, respectively.
Constraints:
ldu≥ 1; ldvt≥ 1.
If jobu = 'A', ldu≥m;

If jobu = 'S', ldu≥m for column major layout and ldu≥ min(m, n) for row
major layout;
If jobvt = 'A', ldvt≥n;

If jobvt = 'S', ldvt≥ min(m, n) for column major layout and ldvt≥n for
row major layout .

Output Parameters

a On exit,
If jobu = 'O', a is overwritten with the first min(m,n) columns of U (the
left singular vectors stored columnwise);
If jobvt = 'O', a is overwritten with the first min(m, n) rows of VT/VH (the
right singular vectors stored rowwise);
If jobu≠'O' and jobvt≠'O', the contents of a are destroyed.

s Array, size at least max(1, min(m,n)). Contains the singular values of A

sorted so that s[i] ≥ s[i + 1].

u, vt Arrays:
Array u minimum size:

Column major Row major layout

layout

jobu = 'A' max(1, ldum) max(1, ldum)

jobu = 'S' max(1, ldumin(m, max(1, ldum)

n))

If jobu = 'A', u contains the m-by-m orthogonal/unitary matrix U.

If jobu = 'S', u contains the first min(m, n) columns of U (the left

singular vectors stored column-wise).
If jobu = 'N' or 'O', u is not referenced.

Array v minimum size:

Column major Row major layout

layout

jobvt = 'A' max(1, ldvtn) max(1, ldvtn)

1079
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Column major Row major layout

layout

jobvt = 'S' max(1, ldvtmin(m, max(1, ldvtn)

n))

If jobvt = 'A', vt contains the n-by-n orthogonal/unitary matrix VT/VH.

If jobvt = 'S', vt contains the first min(m, n) rows of VT/VH (the right
singular vectors stored row-wise).
If jobvt = 'N'or 'O', vt is not referenced.

superb If ?bdsqr does not converge (indicated by the return value info > 0), on
exit superb(0:min(m,n)-2) contains the unconverged superdiagonal
elements of an upper bidiagonal matrix B whose diagonal is in s (not
necessarily sorted). B satisfies A = u*B*VT (real flavors) or A = u*B*VH
(complex flavors), so it has the same singular values as A, and singular
vectors related by u and vt.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, then if ?bdsqr did not converge, i specifies how many superdiagonals of the intermediate
bidiagonal form B did not converge to zero (see the description of the superb parameter for details).

?gesdd
Computes the singular value decomposition of a
general rectangular matrix using a divide and conquer
method.

Syntax
lapack_int LAPACKE_sgesdd( int matrix_layout, char jobz, lapack_int m, lapack_int n,
float* a, lapack_int lda, float* s, float* u, lapack_int ldu, float* vt, lapack_int
ldvt );
lapack_int LAPACKE_dgesdd( int matrix_layout, char jobz, lapack_int m, lapack_int n,
double* a, lapack_int lda, double* s, double* u, lapack_int ldu, double* vt, lapack_int
ldvt );
lapack_int LAPACKE_cgesdd( int matrix_layout, char jobz, lapack_int m, lapack_int n,
lapack_complex_float* a, lapack_int lda, float* s, lapack_complex_float* u, lapack_int
ldu, lapack_complex_float* vt, lapack_int ldvt );
lapack_int LAPACKE_zgesdd( int matrix_layout, char jobz, lapack_int m, lapack_int n,
lapack_complex_double* a, lapack_int lda, double* s, lapack_complex_double* u,
lapack_int ldu, lapack_complex_double* vt, lapack_int ldvt );

Include Files
• mkl.h

1080
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description

The routine computes the singular value decomposition (SVD) of a real/complex m-by-n matrix A, optionally
computing the left and/or right singular vectors.
If singular vectors are desired, it uses a divide-and-conquer algorithm. The SVD is written
A = U*Σ*VT for real routines,
A = U*Σ*VH for complex routines,
where Σ is an m-by-n matrix which is zero except for its min(m,n) diagonal elements, U is an m-by-m
orthogonal/unitary matrix, and V is an n-by-n orthogonal/unitary matrix. The diagonal elements of Σ are the
singular values of A; they are real and non-negative, and are returned in descending order. The first min(m,
n) columns of U and V are the left and right singular vectors of A.
Note that the routine returns vt = VT (for real flavors) or vt =VH (for complex flavors), not V.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'A', 'S', 'O', or 'N'.

Specifies options for computing all or part of the matrices U and V.

If jobz = 'A', all m columns of U and all n rows of VT or VH are returned
in the arrays u and vt;
if jobz = 'S', the first min(m, n) columns of U and the first min(m, n)
rows of VT or VH are returned in the arrays u and vt;
if jobz = 'O', then

if m≥ n, the first n columns of U are overwritten in the array a and all rows
of VT or VH are returned in the array vt;
if m<n, all columns of U are returned in the array u and the first m rows of
VT or VH are overwritten in the array a;
if jobz = 'N', no columns of U or rows of VT or VH are computed.

m The number of rows of the matrix A (m≥ 0).

n The number of columns in A (n≥ 0).

a a(size max(1, lda*n) for column major layout and max(1, lda*m) for row
major layout) is an array containing the m-by-n matrix A.

lda The leading dimension of the array a. Must be at least max(1, m) for
column major layout and at least max(1, n) for row major layout.

ldu, ldvt The leading dimensions of the output arrays u and vt, respectively.
The minimum size of ldu is

jobz m≥n m<n

'N' 1 1

'A' m m

1081
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

jobz m≥n m<n

'S' m for column major m

layout; n for row
major layout

'O' 1 m

The minimum size of ldvt is

jobz m≥n m<n

'N' 1 1

'A' n n

'S' n m for column major

layout; n for row
major layout

'O' n 1

Output Parameters

a On exit:
If jobz = 'O', then if m≥ n, a is overwritten with the first n columns of U
(the left singular vectors, stored columnwise). If m < n, a is overwritten
with the first m rows of VT (the right singular vectors, stored rowwise);
If jobz≠'O', the contents of a are destroyed.

s Array, size at least max(1, min(m,n)). Contains the singular values of A

sorted so that s(i) ≥ s(i+1).

u, vt Arrays:
Array u is of size:

jobz m≥n m<n

'N' 1 1

'A' max(1, ldum) max(1, ldum)

'S' max(1, ldun) for max(1, ldum)

column major layout;
max(1, ldu*m) for
row major layout

'O' 1 max(1, ldu*m)

If jobz = 'A'or jobz = 'O' and m < n, u contains the m-by-m

orthogonal/unitary matrix U.
If jobz = 'S', u contains the first min(m, n) columns of U (the left
singular vectors, stored columnwise).
If jobz = 'O' and m≥n, or jobz = 'N', u is not referenced.

1082
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Array vt is of size:

jobz m≥n m<n

'N' 1 1

'A' max(1, ldvtn) max(1, ldvtn)

'S' max(1, ldvtn) max(1, ldvtn ) for

column major layout;
max(1, ldvt*m ) for
row major layout;

'O' max(1, ldvt*n) 1

If jobz = 'A'or jobz = 'O' and m≥n, vt contains the n-by-n orthogonal/
unitary matrix VT.
If jobz = 'S', vt contains the first min(m, n) rows of VT (the right singular
vectors, stored rowwise).
If jobz = 'O' and m < n, or jobz = 'N', vt is not referenced.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = -4, A had a NAN entry.

If info = i, then ?bdsdc did not converge, updating process failed.

?gejsv
Computes the singular value decomposition using a
preconditioned Jacobi SVD method.

Syntax
lapack_int LAPACKE_sgejsv (int matrix_layout, char joba, char jobu, char jobv, char
jobr, char jobt, char jobp, lapack_int m, lapack_int n, float * a, lapack_int lda, float
* sva, float * u, lapack_int ldu, float * v, lapack_int ldv, float * stat, lapack_int *
istat);
lapack_int LAPACKE_dgejsv (int matrix_layout, char joba, char jobu, char jobv, char
jobr, char jobt, char jobp, lapack_int m, lapack_int n, double * a, lapack_int lda,
double * sva, double * u, lapack_int ldu, double * v, lapack_int ldv, double * stat,
lapack_int * istat);
lapack_int LAPACKE_cgejsv (int matrix_layout, char joba, char jobu, char jobv, char
jobr, char jobt, char jobp, lapack_int m, lapack_int n, lapack_complex_float * a,
lapack_int lda, float * sva, lapack_complex_float * u, lapack_int ldu,
lapack_complex_float * v, lapack_int ldv, float * stat, lapack_int * istat);
lapack_int LAPACKE_zgejsv (int matrix_layout, char joba, char jobu, char jobv, char
jobr, char jobt, char jobp, lapack_int m, lapack_int n, lapack_complex_double * a,
lapack_int lda, double * sva, lapack_complex_double * u, lapack_int ldu,
lapack_complex_double * v, lapack_int ldv, double * stat, lapack_int * istat);

1083
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description
The routine computes the singular value decomposition (SVD) of a real/complex m-by-n matrix A, where m≥n.

The SVD is written as

A = U*Σ*VT, for real routines
A = U*Σ*VH, for complex routines
where Σ is an m-by-n matrix which is zero except for its n diagonal elements, U is an m-by-n (or m-by-m)
orthonormal matrix, and V is an n-by-n orthogonal matrix. The diagonal elements of Σ are the singular values
of A; the columns of U and V are the left and right singular vectors of A, respectively. The matrices U and V
are computed and stored in the arrays u and v, respectively. The diagonal of Σ is computed and stored in the
array sva.

The ?gejsv routine can sometimes compute tiny singular values and their singular vectors much more
accurately than other SVD routines.
The routine implements a preconditioned Jacobi SVD algorithm. It uses ?geqp3, ?geqrf, and ?gelqf as
preprocessors and preconditioners. Optionally, an additional row pivoting can be used as a preprocessor,
which in some cases results in much higher accuracy. An example is matrix A with the structure A = D1 * C
* D2, where D1, D2 are arbitrarily ill-conditioned diagonal matrices and C is a well-conditioned matrix. In that
case, complete pivoting in the first QR factorizations provides accuracy dependent on the condition number
of C, and independent of D1, D2. Such higher accuracy is not completely understood theoretically, but it
works well in practice.
If A can be written as A = B*D, with well-conditioned B and some diagonal D, then the high accuracy is
guaranteed, both theoretically and in software, independent of D. For more details see [Drmac08-1],
[Drmac08-2].
The computational range for the singular values can be the full range ( UNDERFLOW,OVERFLOW ), provided
that the machine arithmetic and the BLAS and LAPACK routines called by ?gejsv are implemented to work in
that range. If that is not the case, the restriction for safe computation with the singular values in the range
of normalized IEEE numbers is that the spectral condition number kappa(A)=sigma_max(A)/sigma_min(A)
does not overflow. This code (?gejsv) is best used in this restricted range, meaning that singular values of
magnitude below ||A||_2 / slamch('O') (for single precision) or ||A||_2 / dlamch('O') (for double
precision) are returned as zeros. See jobr for details on this.

This implementation is slower than the one described in [Drmac08-1], [Drmac08-2] due to replacement of
some non-LAPACK components, and because the choice of some tuning parameters in the iterative part
(?gesvj) is left to the implementer on a particular machine.

The rank revealing QR factorization (in this code: ?geqp3) should be implemented as in [Drmac08-3].

If m is much larger than n, it is obvious that the inital QRF with column pivoting can be preprocessed by the
QRF without pivoting. That well known trick is not used in ?gejsv because in some cases heavy row
weighting can be treated with complete pivoting. The overhead in cases m much larger than n is then only
due to pivoting, but the benefits in accuracy have prevailed. You can incorporate this extra QRF step easily
and also improve data movement (matrix transpose, matrix copy, matrix transposed copy) - this
implementation of ?gejsv uses only the simplest, naive data movement.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.

1084
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Product and Performance Information

Notice revision #20201201

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

joba Must be 'C', 'E', 'F', 'G', 'A', or 'R'.

Specifies the level of accuracy:

If joba = 'C', high relative accuracy is achieved if A = B*D with well-
conditioned B and arbitrary diagonal matrix D. The accuracy cannot be
spoiled by column scaling. The accuracy of the computed output depends
on the condition of B, and the procedure aims at the best theoretical
accuracy. The relative error max_{i=1:N}|d sigma_i| / sigma_i is
bounded by f(M,N)*epsilon* cond(B), independent of D. The input
matrix is preprocessed with the QRF with column pivoting. This initial
preprocessing and preconditioning by a rank revealing QR factorization is
common for all values of joba. Additional actions are specified as follows:

If joba = 'E', computation as with 'C' with an additional estimate of the

condition number of B. It provides a realistic error bound.
If joba = 'F', accuracy higher than in the 'C' option is achieved, if A =
D1*C*D2 with ill-conditioned diagonal scalings D1, D2, and a well-
conditioned matrix C. This option is advisable, if the structure of the input
matrix is not known and relative accuracy is desirable. The input matrix A is
preprocessed with QR factorization with full (row and column) pivoting.
If joba = 'G', computation as with 'F' with an additional estimate of the
condition number of B, where A = B*D. If A has heavily weighted rows,
using this condition number gives too pessimistic error bound.
If joba = 'A', small singular values are the noise and the matrix is treated
as numerically rank defficient. The error in the computed singular values is
bounded by f(m,n)*epsilon*||A||. The computed SVD A = U*S*V**t
(for real flavors) or A = U*S*V**H (for complex flavors) restores A up to
f(m,n)*epsilon*||A||. This enables the procedure to set all singular
values below n*epsilon*||A|| to zero.

If joba = 'R', the procedure is similar to the 'A' option. Rank revealing
property of the initial QR factorization is used to reveal (using triangular
factor) a gap sigma_{r+1} < epsilon * sigma_r, in which case the
numerical rank is declared to be r. The SVD is computed with absolute error
bounds, but more accurately than with 'A'.

jobu Must be 'U', 'F', 'W', or 'N'.

Specifies whether to compute the columns of the matrix U:

If jobu = 'U', n columns of U are returned in the array u

If jobu = 'F', a full set of m left singular vectors is returned in the array u.

If jobu = 'W', u may be used as workspace of length m*n. See the

description of u.

1085
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If jobu = 'N', u is not computed.

jobv Must be 'V', 'J', 'W', or 'N'.

Specifies whether to compute the matrix V:

If jobv = 'V', n columns of V are returned in the array v; Jacobi rotations
are not explicitly accumulated.
If jobv = 'J', n columns of V are returned in the array v but they are
computed as the product of Jacobi rotations. This option is allowed only if
jobu≠'N'
If jobv = 'W', v may be used as workspace of length n*n. See the
description of v.

If jobv = 'N', v is not computed.

jobr Must be 'N' or 'R'.

Specifies the range for the singular values. If small positive singular values
are outside the specified range, they may be set to zero. If A is scaled so
that the largest singular value of the scaled matrix is around sqrt(big),
big = ?lamch('O'), the function can remove columns of A whose norm in
the scaled matrix is less than sqrt(?lamch('S')) (for jobr = 'R'), or
less than small = ?lamch('S')/?lamch('E').

If jobr = 'N', the function does not remove small columns of the scaled
matrix. This option assumes that BLAS and QR factorizations and triangular
solvers are implemented to work in that range. If the condition of A if
greater that big, use ?gesvj.

If jobr = 'R', restricted range for singular values of the scaled matrix A is
[sqrt(?lamch('S'), sqrt(big)], roughly as described above. This
option is recommended.
For computing the singular values in the full range [?lamch('S'),big],
use ?gesvj.

jobt Must be 'T' or 'N'.

If the matrix is square, the procedure may determine to use a transposed A

if AT (for real flavors) or AH (for complex flavors) seems to be better with
respect to convergence. If the matrix is not square, jobt is ignored.

The decision is based on two values of entropy over the adjoint orbit of AT *
A (for real flavors) or AH * A (for complex flavors). See the descriptions of
stat[5] and stat[6].
If jobt = 'T', the function performs transposition if the entropy test
indicates possibly faster convergence of the Jacobi process, if A is taken as
input. If A is replaced with AT or AH, the row pivoting is included
automatically.
If jobt = 'N', the functions attempts no speculations. This option can be
used to compute only the singular values, or the full SVD (u, sigma, and v).
For only one set of singular vectors (u or v), the caller should provide both
u and v, as one of the arrays is used as workspace if the matrix A is
transposed. The implementer can easily remove this constraint and make
the code more complicated. See the descriptions of u and v.

1086
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

Caution
The jobt = 'T' option is experimental and its effect might not
be the same in subsequent releases. Consider using the jobt =
'N' instead.

jobp Must be 'P' or 'N'.

Enables structured perturbations of denormalized numbers. This option

should be active if the denormals are poorly implemented, causing slow
computation, especially in cases of fast convergence. For details, see
[Drmac08-1], [Drmac08-2] . For simplicity, such perturbations are included
only when the full SVD or only the singular values are requested. You can
add the perturbation for the cases of computing one set of singular vectors.
If jobp = 'P', the function introduces perturbation.

If jobp = 'N', the function introduces no perturbation.

m The number of rows of the input matrix A; m≥ 0.

n The number of columns in the input matrix A; m≥n≥ 0.

a, u, v Array a(size lda*n for column major layout and lda*m for row major
layout) is an array containing the m-by-n matrix A.

u is a workspace array, its size for column major layout is ldu*n for
jobu='U' or 'W' and ldu*m for jobu='F'; for row major layout its size is at
least ldu*m. When jobt = 'T' and m = n, u must be provided even though
jobu = 'N'.
v is a workspace array, its size is ldv*n. When jobt = 'T' and m = n, v
must be provided even though jobv = 'N'.

lda The leading dimension of the array a. Must be at least max(1, m) for
column major layout and at least max(1, n) for row major layout .

sva sva is a workspace array, its size is n.

ldu The leading dimension of the array u; ldu≥ 1.

jobu = 'U' or 'F' or 'W', ldu≥m for column major layout; for row major
layout if jobu = 'U' or jobu = 'W'ldu≥n and if jobu = 'F'ldu≥m.

ldv The leading dimension of the array v; ldv≥ 1.

jobv = 'V' or 'J' or 'W', ldv≥n.

cwork cwork is a workspace array of size max(2, lwork).

rwork rwork is an array of size at least max(7, lrwork) for real flavors and at
least max(7, lwork) for complex flavors.

Output Parameters

sva On exit:

1087
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

For stat[0]/stat[1] = 1: the singular values of A. During the

computation sva contains Euclidean column norms of the iterated matrices
in the array a.

For stat[0]≠stat[1]: the singular values of A are (stat[0]/stat[1]) *

sva[0:n - 1]. This factored form is used if sigma_max(A) overflows or if
small singular values have been saved from underflow by scaling the input
matrix A.
jobr = 'R', some of the singular values may be returned as exact zeros
obtained by 'setting to zero' because they are below the numerical rank
threshold or are denormalized numbers.

u On exit:
If jobu = 'U', contains the m-by-n matrix of the left singular vectors.

If jobu = 'F', contains the m-by-m matrix of the left singular vectors,
including an orthonormal basis of the orthogonal complement of the range
of A.
If jobu = 'W' and jobv = 'V', jobt = 'T', and m = n, then u is used
as workspace if the procedure replaces A with AT (for real flavors) or AH (for
complex flavors). In that case, v is computed in u as left singular vectors of
AT or AH and copied back to the v array. This 'W' option is just a reminder
to the caller that in this case u is reserved as workspace of length n*n.

If jobu = 'N', u is not referenced.

v On exit:
If jobv = 'V' or 'J', contains the n-by-n matrix of the right singular
vectors.
If jobv = 'W' and jobu = 'U', jobt = 'T', and m = n, then v is used
as workspace if the procedure replaces A with AT (for real flavors) or AH (for
complex flavors). In that case, u is computed in v as right singular vectors
of AT or AH and copied back to the u array. This 'W' option is just a
reminder to the caller that in this case v is reserved as workspace of length
n*n.
If jobv = 'N', v is not referenced.

stat On exit,
stat[0] = scale = stat[1]/stat[0] is the scaling factor such that
scale*sva(1:n) are the computed singular values of A. See the
description of sva.

stat[1] = see the description of stat[0].

stat[2] = sconda is an estimate for the condition number of column
equilibrated A. If joba = 'E' or 'G', sconda is an estimate of sqrt(||(RT
* R)-1||_1). It is computed using ?pocon. It holds n-1/4 * sconda≤ ||
R-1||_2 ≤n-1/4 * sconda, where R is the triangular factor from the QRF of
A. However, if R is truncated and the numerical rank is determined to be
strictly smaller than n, sconda is returned as -1, indicating that the smallest
singular values might be lost.

1088
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If full SVD is needed, the following two condition numbers are useful for the
analysis of the algorithm. They are provied for a user who is familiar with
the details of the method.
stat[3] = an estimate of the scaled condition number of the triangular
factor in the first QR factorization.
stat[4] = an estimate of the scaled condition number of the triangular
factor in the second QR factorization.
The following two parameters are computed if jobt = 'T'. They are
provided for a user who is familiar with the details of the method.
stat[5] = the entropy of AT*A :: this is the Shannon entropy of
diag(AT*A) / Trace(AT*A) taken as point in the probability simplex.
stat[6] = the entropy of A*A**t.

istat On exit,
istat[0] = the numerical rank determined after the initial QR factorization
with pivoting. See the descriptions of joba and jobr.

istat[1] = the number of the computed nonzero singular value.

istat[2] = if nonzero, a warning message. If istat[2]=1, some of the
column norms of A were denormalized floats. The requested high accuracy
is not warranted by the data.
For complex flavors, istat[3] = 1 or -1. If istat[3] = 1, then the
procedure used AH to do the job as specified by the job parameters.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info > 0, the function did not converge in the maximal number of sweeps. The computed values may be
inaccurate.

See Also
?geqp3
?geqrf
?gelqf
?gesvj
?lamch
?pocon
?ormlq

?gesvj
Computes the singular value decomposition of a real
matrix using Jacobi plane rotations.

1089
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
lapack_int LAPACKE_sgesvj (int matrix_layout, char joba, char jobu, char jobv,
lapack_int m, lapack_int n, float * a, lapack_int lda, float * sva, lapack_int mv, float
* v, lapack_int ldv, float * stat);
lapack_int LAPACKE_dgesvj (int matrix_layout, char joba, char jobu, char jobv,
lapack_int m, lapack_int n, double * a, lapack_int lda, double * sva, lapack_int mv,
double * v, lapack_int ldv, double * stat);
lapack_int LAPACKE_cgesvj (int matrix_layout, char joba, char jobu, char jobv,
lapack_int m, lapack_int n, lapack_complex_float * a, lapack_int lda, float * sva,
lapack_int mv, lapack_complex_float * v, lapack_int ldv, float * stat);
lapack_int LAPACKE_zgesvj (int matrix_layout, char joba, char jobu, char jobv,
lapack_int m, lapack_int n, lapack_complex_double * a, lapack_int lda, double * sva,
lapack_int mv, lapack_complex_double * v, lapack_int ldv, double * stat);

Include Files
• mkl.h

Description
The routine computes the singular value decomposition (SVD) of a real or complex m-by-n matrix A, where
m≥n.
The SVD of A is written as
A = U*Σ*VT for real flavors, or
A = U*Σ*VH for complex flavors,
where Σ is an m-by-n diagonal matrix, U is an m-by-n orthonormal matrix, and V is an n-by-n orthogonal/
unitary matrix. The diagonal elements of Σ are the singular values of A; the columns of U and V are the left
and right singular vectors of A, respectively. The matrices U and V are computed and stored in the arrays u
and v, respectively. The diagonal of Σ is computed and stored in the array sva.

The ?gesvj routine can sometimes compute tiny singular values and their singular vectors much more
accurately than other SVD routines.
The n-by-n orthogonal matrix V is obtained as a product of Jacobi plane rotations. The rotations are
implemented as fast scaled rotations of Anda and Park [AndaPark94]. In the case of underflow of the Jacobi
angle, a modified Jacobi transformation of Drmac ([Drmac08-4]) is used. Pivot strategy uses column
interchanges of de Rijk ([deRijk98]). The relative accuracy of the computed singular values and the accuracy
of the computed singular vectors (in angle metric) is as guaranteed by the theory of Demmel and Veselic
[Demmel92]. The condition number that determines the accuracy in the full rank case is essentially

where κ(.) is the spectral condition number. The best performance of this Jacobi SVD procedure is achieved if
used in an accelerated version of Drmac and Veselic [Drmac08-1], [Drmac08-2].

1090
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The computational range for the nonzero singular values is the machine number interval
( UNDERFLOW,OVERFLOW ). In extreme cases, even denormalized singular values can be computed with the
corresponding gradual loss of accurate digit.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

joba Must be 'L', 'U' or 'G'.

Specifies the structure of A:

If joba = 'L', the input matrix A is lower triangular.

If joba = 'U', the input matrix A is upper triangular.

If joba = 'G', the input matrix A is a general m-by-n, m≥n.

jobu Must be 'U', 'C' or 'N'.

Specifies whether to compute the left singular vectors (columns of U):

If jobu = 'U', the left singular vectors corresponding to the nonzero
singular values are computed and returned in the leading columns of A. See
more details in the description of a. The default numerical orthogonality
threshold is set to approximately TOL=CTOL*EPS, CTOL=sqrt(m), EPS
= ?lamch('E')
If jobu = 'C', analogous to jobu = 'U', except that you can control the
level of numerical orthogonality of the computed left singular vectors. TOL
can be set to TOL=CTOL*EPS, where CTOL is given on input in the array
stat. No CTOL smaller than ONE is allowed. CTOL greater than 1 / EPS is
meaningless. The option 'C' can be used if m*EPS is satisfactory
orthogonality of the computed left singular vectors, so CTOL=m could save a
few sweeps of Jacobi rotations. See the descriptions of a and stat[0].

If jobu = 'N', u is not computed. However, see the description of a.

jobv Must be 'V', 'A' or 'N'.

Specifies whether to compute the right singular vectors, that is, the matrix
V:
If jobv = 'V', the matrix V is computed and returned in the array v.

If jobv = 'A', the Jacobi rotations are applied to the mv-byn array v. In
other words, the right singular vector matrix V is not computed explicitly,
instead it is applied to an mv-byn matrix initially stored in the first mv rows
of V.
If jobv = 'N', the matrix V is not computed and the array v is not
referenced.

m The number of rows of the input matrix A.

1/slamch('E')> m≥ 0 for sgesvj.
1/dlamch('E')> m≥ 0 for dgesvj.

n The number of columns in the input matrix A; m≥n≥ 0.

1091
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

a, v Array a(size at least lda*n for column major layout andlda*m for row
major layout) is an array containing the m-by-n matrix A.

Array v(size at least max(1, ldv*n)) contains, if jobv = 'A' the mv-by-n
matrix to be post-multiplied by Jacobi rotations.

lda The leading dimension of the array a. Must be at least max(1, m) for
column major layout and at least max(1, n) for row major layout .

mv Ifjobv = 'A', the product of Jacobi rotations in ?gesvj is applied to the

first mv rows of v. See the description of jobv. 0 ≤mv≤ldv.

ldv The leading dimension of the array v; ldv≥ 1.

jobv = 'V', ldv≥ max(1, n).

jobv = 'A', ldv≥ max(1, mv) for column major layout and ldv≥ max(1,
n) for row major layout.

stat Array size 6. If jobu = 'C', stat[0] = CTOL, where CTOL defines the
threshold for convergence. The process stops if all columns of A are
mutually orthogonal up to CTOL*EPS, where EPS = ?lamch('E'). It is
required that CTOL≥ 1 - that is, it is not allowed to force the routine to
obtain orthogonality below ε.

Output Parameters

a On exit:
If jobu = 'U' or jobu = 'C':

• if info = 0, the leading columns of A contain left singular vectors

corresponding to the computed singular values of a that are above the
underflow threshold ?lamch('S'), that is, non-zero singular values. The
number of the computed non-zero singular values is returned in
stat[1]. Also see the descriptions of sva and stat. The computed
columns of u are mutually numerically orthogonal up to approximately
TOL=sqrt(m)*EPS (default); or TOL=CTOL*EPSjobu = 'C', see the
description of jobu.
• if info > 0, the procedure ?gesvj did not converge in the given
number of iterations (sweeps). In that case, the computed columns of u
may not be orthogonal up to TOL. The output u (stored in a), sigma
(given by the computed singular values in sva(1:n)) and v is still a
decomposition of the input matrix A in the sense that the residual ||A-
scale*U*sigma*VT||2 / ||A||2 for real flavors or ||A-
scale*U*sigma*VH||2 / ||A||2 for complex flavors (where scale =
stat[0]) is small.
If jobu = 'N':

• if info = 0, note that the left singular vectors are 'for free' in the one-
sided Jacobi SVD algorithm. However, if only the singular values are
needed, the level of numerical orthogonality of u is not an issue and
iterations are stopped when the columns of the iterated matrix are
numerically orthogonal up to approximately m*EPS. Thus, on exit, a
contains the columns of u scaled with the corresponding singular values.

1092
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• if info > 0, the procedure ?gesvj did not converge in the given
number of iterations (sweeps).

sva Array size n.

If info = 0, depending on the value scale =stat[0], where scale is the

scaling factor:

• if scale = 1, sva[0:n - 1] contains the computed singular values of

a.
• if scale≠ 1, the singular values of a are scale*sva(1:n), and this
factored representation is due to the fact that some of the singular
values of a might underflow or overflow.

If info > 0, the procedure ?gesvj did not converge in the given number
of iterations (sweeps) and scale*sva(1:n) may not be accurate.

v On exit:
If jobv = 'V', contains the n-by-n matrix of the right singular vectors.

If jobv = 'A', then v contains the product of the computed right singular
vector matrix and the initial matrix in the array v.

If jobv = 'N', v is not referenced.

stat On exit,
stat[0] = scale is the scaling factor such that scale*sva(1:n) are the
computed singular values of A. See the description of sva.

stat[1] is the number of the computed nonzero singular values.

stat[2] is the number of the computed singular values that are larger than
the underflow threshold.
stat[3] is the number of sweeps of Jacobi rotations needed for numerical
convergence.
stat[4] = max_{i≠j} |COS(A(:,i),A(:,j))| in the last sweep. This is
useful information in cases when ?gesvj did not converge, as it can be
used to estimate whether the output is still useful and for post festum
analysis.
stat[5] is the largest absolute value over all sines of the Jacobi rotation
angles in the last sweep. It can be useful in a post festum analysis.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info > 0, the function did not converge in the maximal number (30) of sweeps. The output may still be
useful. See the description of stat.

1093
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

?ggsvd
Computes the generalized singular value
decomposition of a pair of general rectangular
matrices (deprecated).

Syntax
lapack_int LAPACKE_sggsvd( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int n, lapack_int p, lapack_int* k, lapack_int* l, float* a,
lapack_int lda, float* b, lapack_int ldb, float* alpha, float* beta, float* u,
lapack_int ldu, float* v, lapack_int ldv, float* q, lapack_int ldq, lapack_int* iwork );
lapack_int LAPACKE_dggsvd( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int n, lapack_int p, lapack_int* k, lapack_int* l, double* a,
lapack_int lda, double* b, lapack_int ldb, double* alpha, double* beta, double* u,
lapack_int ldu, double* v, lapack_int ldv, double* q, lapack_int ldq, lapack_int*
iwork );
lapack_int LAPACKE_cggsvd( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int n, lapack_int p, lapack_int* k, lapack_int* l,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* b, lapack_int ldb,
float* alpha, float* beta, lapack_complex_float* u, lapack_int ldu,
lapack_complex_float* v, lapack_int ldv, lapack_complex_float* q, lapack_int ldq,
lapack_int* iwork );
lapack_int LAPACKE_zggsvd( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int n, lapack_int p, lapack_int* k, lapack_int* l,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* b, lapack_int ldb,
double* alpha, double* beta, lapack_complex_double* u, lapack_int ldu,
lapack_complex_double* v, lapack_int ldv, lapack_complex_double* q, lapack_int ldq,
lapack_int* iwork );

Include Files
• mkl.h

Description
This routine is deprecated; use ggsvd3.

The routine computes the generalized singular value decomposition (GSVD) of an m-by-n real/complex
matrix A and p-by-n real/complex matrix B:
U'*A*Q = D1*(0 R), V'*B*Q = D2*(0 R),
where U, V and Q are orthogonal/unitary matrices and U', V' mean transpose/conjugate transpose of U and V
respectively.
Let k+l = the effective numerical rank of the matrix (A', B')', then R is a (k+l)-by-(k+l) nonsingular upper
triangular matrix, D1 and D2 are m-by-(k+l) and p-by-(k+l) "diagonal" matrices and of the following
structures, respectively:
If m-k-l≥0,

1094
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

where
C = diag(alpha[k],..., alpha[k + l - 1])
S = diag(beta[k],...,beta[k + l - 1])
C2 + S2 = I
Nonzero element ri j (1 ≤i≤j≤k + l) of R is stored in a[(i - 1) + (n - k - l + j - 1)*lda] for column
major layout and in a[(i - 1)*lda + (n - k - l + j - 1)] for row major layout.

If m-k-l < 0,

1095
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

where
C = diag(alpha[k],..., alpha(m)),
S = diag(beta[k],...,beta[m - 1]),
C2 + S2 = I
On exit, the location of nonzero element ri j (1 ≤i≤j≤k + l) of R depends on the value of i. For i≤m this element
is stored in a[(i - 1) + (n - k - l + j - 1)*lda] for column major layout and in a[(i - 1)*lda +
(n - k - l + j - 1)] for row major layout. For m < i≤k + l it is stored in b[(i - k - 1) + (n - k -
l + j - 1)*ldb] for column major layout and in b[(i - k - 1)*ldb + (n - k - l + j - 1)] for row
major layout.
The routine computes C, S, R, and optionally the orthogonal/unitary transformation matrices U, V and Q.
In particular, if B is an n-by-n nonsingular matrix, then the GSVD of A and B implicitly gives the SVD of
A*B-1:
A*B-1 = U*(D1*D2-1)*V'.
If (A', B')' has orthonormal columns, then the GSVD of A and B is also equal to the CS decomposition of A
and B. Furthermore, the GSVD can be used to derive the solution of the eigenvalue problem:
A'**A*x = λ*B'*B*x.

1096
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobu Must be 'U' or 'N'.

If jobu = 'U', orthogonal/unitary matrix U is computed.

If jobu = 'N', U is not computed.

jobv Must be 'V' or 'N'.

If jobv = 'V', orthogonal/unitary matrix V is computed.

If jobv = 'N', V is not computed.

jobq Must be 'Q' or 'N'.

If jobq = 'Q', orthogonal/unitary matrix Q is computed.

If jobq = 'N', Q is not computed.

m The number of rows of the matrix A (m≥ 0).

n The number of columns of the matrices A and B (n≥ 0).

p The number of rows of the matrix B (p≥ 0).

lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.

ldb The leading dimension of b; at least max(1, p)for column major layout and
max(1, n) for row major layout.

ldu The leading dimension of the array u .

ldu≥ max(1, m) if jobu = 'U'; ldu≥ 1 otherwise.

ldv The leading dimension of the array v .

ldv≥ max(1, p) if jobv = 'V'; ldv≥ 1 otherwise.

ldq The leading dimension of the array q .

ldq≥ max(1, n) if jobq = 'Q'; ldq≥ 1 otherwise.

Output Parameters

k, l On exit, k and l specify the dimension of the subblocks. The sum k+l is
equal to the effective numerical rank of (A', B')'.

a On exit, a contains the triangular matrix R or part of R.

b On exit, b contains part of the triangular matrix R if m-k-l < 0.

1097
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

alpha, beta Arrays, size at least max(1, n) each.

Contain the generalized singular value pairs of A and B:
alpha(1:k) = 1,
beta(1:k) = 0,
and if m-k-l≥ 0,

alpha(k+1:k+l) = C,
beta(k+1:k+l) = S,
or if m-k-l < 0,

alpha(k+1:m)= C, alpha(m+1:k+l)=0
beta(k+1:m) = S, beta(m+1:k+l) = 1
and
alpha(k+l+1:n) = 0
beta(k+l+1:n) = 0.

u, v, q Arrays:
u, size at least max(1, ldu*m).

If jobu = 'U', u contains the m-by-m orthogonal/unitary matrix U.

If jobu = 'N', u is not referenced.

v, size at least max(1, ldv*p).

If jobv = 'V', v contains the p-by-p orthogonal/unitary matrix V.

If jobv = 'N', v is not referenced.

q, size at least max(1, ldq*n).

If jobq = 'Q', q contains the n-by-n orthogonal/unitary matrix Q.

If jobq = 'N', q is not referenced.

iwork On exit, iwork stores the sorting information.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = 1, the Jacobi-type procedure failed to converge. For further details, see subroutine tgsja.

?gesvdx
Computes the SVD and left and right singular vectors
for a matrix.

Syntax
lapack_int LAPACKE_sgesvdx (int matrix_layout, char jobu, char jobvt, char range,
lapack_int m, lapack_int n, float * a, lapack_int lda, float vl, float vu, lapack_int
il, lapack_int iu, lapack_int * ns, float * s, float * u, lapack_int ldu, float * vt,
lapack_int ldvt, lapack_int * superb);

1098
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_dgesvdx (int matrix_layout, char jobu, char jobvt, char range,
lapack_int m, lapack_int n, double * a, lapack_int lda, double vl, double vu, lapack_int
il, lapack_int iu, lapack_int *ns, double * s, double * u, lapack_int ldu, double * vt,
lapack_int ldvt, lapack_int * superb);
lapack_int LAPACKE_cgesvdx (int matrix_layout, char jobu, char jobvt, char range,
lapack_int m, lapack_int n, lapack_complex_float * a, lapack_int lda, float vl, float
vu, lapack_int il, lapack_int iu, lapack_int * ns, float * s, lapack_complex_float * u,
lapack_int ldu, lapack_complex_float * vt, lapack_int ldvt, lapack_int * superb);
lapack_int LAPACKE_zgesvdx (int matrix_layout, char jobu, char jobvt, char range,
lapack_int m, lapack_int n, lapack_complex_double * a, lapack_int lda, double vl,
double vu, lapack_int il, lapack_int iu, lapack_int * ns, double * s,
lapack_complex_double * u, lapack_int ldu, lapack_complex_double * vt, lapack_int ldvt,
lapack_int * superb);

Include Files
• mkl.h

Description
?gesvdx computes the singular value decomposition (SVD) of a real or complex m-by-n matrix A, optionally
computing the left and right singular vectors. The SVD is written
A = U * Σ * transpose(V)
where Σ is an m-by-n matrix which is zero except for its min(m,n) diagonal elements, U is an m-by-m matrix,
and V is an n-by-n matrix. The matrices U and V are orthogonal for real A, and unitary for complex A. The
diagonal elements of Σ are the singular values of A; they are real and non-negative, and are returned in
descending order. The first min(m,n) columns of U and V are the left and right singular vectors of A.

?gesvdx uses an eigenvalue problem for obtaining the SVD, which allows for the computation of a subset of
singular values and vectors. See ?bdsvdx for details.

Note that the routine returns VT, not V.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobu Specifies options for computing all or part of the matrix U:

= 'V': the first min(m,n) columns of U (the left singular vectors) or as
specified by range are returned in the array u;

= 'N': no columns of U (no left singular vectors) are computed.

jobvt Specifies options for computing all or part of the matrix VT:
= 'V': the first min(m,n) rows of VT (the right singular vectors) or as
specified by range are returned in the array vt;

= 'N': no rows of VT (no right singular vectors) are computed.

range = 'A': find all singular values.

= 'V': all singular values in the half-open interval (vl,vu] are found.

= 'I': the il-th through iu-th singular values are found.

1099
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

m The number of rows of the input matrix A. m≥ 0.

n The number of columns of the input matrix A. n≥ 0.

a Array, size lda*n

On entry, the m-by-n matrix A.

lda The leading dimension of the array a.

lda≥ max(1,m).

vl vl≥0.

vu If range='V', the lower and upper bounds of the interval to be searched for
singular values. vu > vl. Not referenced if range = 'A' or 'I'.

il
iu If range='I', the indices (in ascending order) of the smallest and largest
singular values to be returned. 1 ≤il≤iu≤ min(m,n), if min(m,n) > 0. Not
referenced if range = 'A' or 'V'.

ldu The leading dimension of the array u. ldu≥ 1; if jobu = 'V', ldu≥m.

ldvt The leading dimension of the array vt. ldvt≥ 1; if jobvt = 'V', ldvt≥ns
(see above).

Output Parameters

a On exit, the contents of a are destroyed.

ns The total number of singular values found,

0 ≤ns≤ min(m, n).

If range = 'A', ns = min(m, n); if range = 'I', ns = iu - il + 1.

s Array, size (min(m,n))

The singular values of A, sorted so that s[i]≥s[i + 1].

u Array, size ldu*ucol

If jobu = 'V', u contains columns of U (the left singular vectors,

stored columnwise) as specified by range; if jobu = 'N', u is not
referenced.

NOTE
Make sure that ucol≥ns; if range = 'V', the exact value of ns
is not known in advance and an upper bound must be used.

vt Array, size ldvt*n

If jobvt = 'V', vt contains the rows of VT (the right singular vectors,

stored rowwise) as specified by range; if jobvt = 'N', vt is not
referenced.

1100
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

NOTE
Make sure that ldvt≥ns; if range = 'V', the exact value of
ns is not known in advance and an upper bound must be
used.

superb Array, size (12*min(m, n)).

If info = 0, the first ns elements of superb are zero. If info > 0,

then superb contains the indices of the eigenvectors that failed to
converge in ?bdsvdx/?stevx.

Return Values
This function returns a value info.

= 0: successful exit.
< 0: if info = -i, the i-th argument had an illegal value.
> 0: if info = i, then i eigenvectors failed to converge in ?bdsvdx/?stevx. if info = n*2 + 1, an internal error
occurred in ?bdsvdx.

?bdsvdx
Computes the SVD of a bidiagonal matrix.

Syntax
lapack_int LAPACKE_sbdsvdx (int matrix_layout, char uplo, char jobz, char range,
lapack_int n, float * d, float * e, float vl, float vu, lapack_int il, lapack_int iu,
lapack_int * ns, float * s, float * z, lapack_int ldz, lapack_int * superb);
lapack_int LAPACKE_dbdsvdx (int matrix_layout, char uplo, char jobz, char range,
lapack_int n, double * d, double * e, double vl, double vu, lapack_int il, lapack_int
iu, lapack_int * ns, double * s, double * z, lapack_int ldz, lapack_int * superb);

Include Files
• mkl.h

Description
?bdsvdx computes the singular value decomposition (SVD) of a real n-by-n (upper or lower) bidiagonal
matrix B, B = U * S * VT, where S is a diagonal matrix with non-negative diagonal elements (the singular
values of B), and U and VT are orthogonal matrices of left and right singular vectors, respectively.
Given an upper bidiagonal B with diagonal d = [d1d2 ... dn] and superdiagonal e = [e1e2 ... en - 1], ?bdsvdx
computes the singular value decompositon of B through the eigenvalues and eigenvectors of the n*2-by-n*2
tridiagonal matrix
0 d1
d1 0 e1
TGK = e1 0 d2
d2 ⋱ ⋱
⋱ ⋱

1101
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If (s,u,v) is a singular triplet of B with ||u|| = ||v|| = 1, then (±s,q), ||q|| = 1, are eigenpairs of TGK, with
u′ ± v′ v1 u1 v2 u2 ⋯ vn un
q =P* = , and P = en + 1 e1 en + 2 e2 ⋯ .
2 2

Given a TGK matrix, one can either

1. compute -s, -v and change signs so that the singular values (and corresponding vectors) are already in
descending order (as in ?gesvd/?gesdd) or
2. compute s, v and reorder the values (and corresponding vectors).

?bdsvdx implements (1) by calling ?stevx (bisection plus inverse iteration, to be replaced with a version of
the Multiple Relative Robust Representation algorithm. (See P. Willems and B. Lang, A framework for the
MR^3 algorithm: theory and implementation, SIAM J. Sci. Comput., 35:740-766, 2013.)

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo = 'U': B is upper bidiagonal;

= 'L': B is lower bidiagonal.

jobz = 'N': Compute singular values only;

= 'V': Compute singular values and singular vectors.

range = 'A': Find all singular values.

= 'V': all singular values in the half-open interval [vl,vu) are found.

= 'I': the il-th through iu-th singular values are found.

n The order of the bidiagonal matrix.

n >= 0.

d Array, size n.

The n diagonal elements of the bidiagonal matrix B.

e Array, size (max(1,n - 1))

The (n - 1) superdiagonal elements of the bidiagonal matrix B in elements 1

to n - 1.

vl vl≥ 0.

vu If range='V', the lower and upper bounds of the interval to be searched for
singular values. vu > vl.

Not referenced if range = 'A' or 'I'.

il, iu If range='I', the indices (in ascending order) of the smallest and largest
singular values to be returned.
1 ≤il≤iu≤ min(m,n), if min(m,n) > 0.

Not referenced if range = 'A' or 'V'.

ldz The leading dimension of the array z.

ldz≥ 1, and if jobz = 'V', ldz≥ max(2,n*2).

1102
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

ns The total number of singular values found. 0 ≤ns≤n.

If range = 'A', ns = n, and if range = 'I', ns = iu - il + 1.

s Array, size (n)

The first ns elements contain the selected singular values in ascending

order.

z Array, size 2nk

If jobz = 'V', then if info = 0 the first ns columns of z contain the

singular vectors of the matrix B corresponding to the selected singular
values, with U in rows 1 to n and V in rows n+1 to n*2, i.e.

U
z=
V
If jobz = 'N', then z is not referenced.

NOTE
Make sure that at least k = ns+1 columns are supplied in
the array z; if range = 'V', the exact value of ns is not
known in advance and an upper bound must be used.

superb Array, size (12*n).

If jobz = 'V', then if info = 0, the first ns elements of iwork are

zero. If info > 0, then iwork contains the indices of the eigenvectors
that failed to converge in ?stevx.

Return Values
This function returns a value info.

= 0: successful exit.
< 0: if info = -i, the i-th argument had an illegal value.

> 0:
if info = i, then i eigenvectors failed to converge in ?stevx. The indices of the eigenvectors (as returned
by ?stevx) are stored in the array iwork.

if info = n*2 + 1, an internal error occurred.

?gesvda_batch_strided
Computes the truncated SVD of a group of general m-
by-n matrices that are stored at a constant stride from
each other in a contiguous block of memory.

Syntax
void sgesvda_batch_strided(
const MKL_INT* iparm, MKL_INT* irank,
const MKL_INT* m, const MKL_INT* n,
float* a, const MKL_INT* lda, const MKL_INT* stride_a,
float* s, const MKL_INT* stride_s,

1103
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

float* u, const MKL_INT* ldu, const MKL_INT* stride_u,

float* vt, const MKL_INT* ldvt, const MKL_INT* stride_vt,
const float* tolerance, float *residual,
float* work, const MKL_INT* lwork,
const MKL_INT* batch_size, MKL_INT* info
)
void dgesvda_batch_strided(
const MKL_INT* iparm, MKL_INT* irank,
const MKL_INT* m, const MKL_INT* n,
double* a, const MKL_INT* lda, const MKL_INT* stride_a,
double* s, const MKL_INT* stride_s,
double* u, const MKL_INT* ldu, const MKL_INT* stride_u,
double* vt, const MKL_INT* ldvt, const MKL_INT* stride_vt,
const double* tolerance, double *residual,
double* work, const MKL_INT* lwork,
const MKL_INT* batch_size, MKL_INT* info
)
void cgesvda_batch_strided(
const MKL_INT* iparm, MKL_INT* irank,
const MKL_INT* m, const MKL_INT* n,
MKL_Complex8* a, const MKL_INT* lda, const MKL_INT* stride_a,
float* s, const MKL_INT* stride_s,
MKL_Complex8* u, const MKL_INT* ldu, const MKL_INT* stride_u,
MKL_Complex8* vt, const MKL_INT* ldvt, const MKL_INT* stride_vt,
const float* tolerance, float *residual,
MKL_Complex8* work, const MKL_INT* lwork,
const MKL_INT* batch_size, MKL_INT* info
)
void zgesvda_batch_strided(
const MKL_INT* iparm, MKL_INT* irank,
const MKL_INT* m, const MKL_INT* n,
MKL_Complex16* a, const MKL_INT* lda, const MKL_INT* stride_a,
double* s, const MKL_INT* stride_s,
MKL_Complex16* u, const MKL_INT* ldu, const MKL_INT* stride_u,
MKL_Complex16* vt, const MKL_INT* ldvt, const MKL_INT* stride_vt,
const double* tolerance, double *residual,
MKL_Complex16* work, const MKL_INT* lwork,
const MKL_INT* batch_size, MKL_INT* info
)

Include Files
mkl.h

Description
The ?gesvda_batch_strided routines compute the truncated SVD for a group of general m-by-n matrices.

All matrices have the same parameters (matrix size, leading dimension) and are stored at constant
stride_a from each other in a contiguous block of memory. The operation is defined as

for i = 0 … batch_size-1
Ai is a matrix at offset i * stride_a from A
Ai := Ui * Si*ViT
Ai := U i * Si *
end for

1104
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
where Ui and Vi are orthogonal matrices, and Si is a diagonal matrix with singular values on the diagonal.
Singular values are nonnegative and listed in decreasing order. A truncated SVD of a given mxn matrix
produces matrices with the specified number of columns, where the number of columns is defined by the
user or determined at runtime with the help of the user-defined tolerance threshold.
An approximation of each matrix can be also obtained as a product of two low-rank matrices (low-rank
product):

Ai=Pi×Qi
where Pi=Ui×Si , Qi=ViT if m≥n, and Pi=Ui , Qi=Si × ViT otherwise.

The routines provide three possible ways to compute truncated SVD:

• Compute truncated SVD with the help of the input array rank where rank(i) specifies the number of
singular values and vectors to be computed in parameters Ui ,Vi and Si for each matrix Ai.
• Compute truncated SVD using a tolerance threshold. While computing SVD, singular values that are less
than the user-defined tolerance are treated as zero, and they are not computed but set to zero.
• Compute truncated SVD using the effective rank. The effective rank of A is determined by treating as zero
those singular values that are less than the user-defined tolerance threshold times the largest singular
value.

The routines can be also used for computing singular values only.

Input Parameters

iparm Array of dimension 16 specifying options to compute truncated SVD. Also

specifies the type of returned SVD decomposition form. The individual
components of the iparm parameter appear below. Default values are
denoted with an asterisk (*).

iparm[0] Specifies a criterion for treating singular values

as zeros.

-1 Use default iparm values

(iparm(0-2)=0, iparm[3]=1) . All
other iparm settings are ignored.
= 0* Computes the truncated SVD with the
help of the input array irank.
= 1 Computes the truncated SVD using the
parameter tolerance.
= 2 Computes the truncated SVD using the
effective rank. The effective rank of A
is determined by treating as zero those
singular values that are less than the
user-defined tolerance multiplied by
the largest singular value.

iparm[1] Specifies the option for computing singular

vectors.

0* Both singular values and singular vectors

are computed.
1 Only singular values are computed.

iparm[2] Specifies the type of the returned SVD

decomposition.

1105
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

0*
Computes the truncated SVD as a product
of three matrices:

Ai=Ui×Si×ViT

1
Computes the truncated SVD as a low-
rank product:

Ai=Pi×Qi

iparm[3] Specifies the option for computing the residual

vector.

0 The residual vector is not computed.

1* Computes the residual vector.

NOTE
iparm[4]–iparm[15] are reserved for future use.

irank Array with size at least batch_size. If iparm[0]=0 or iparm[0]=-1,

element irank[i] specifies the number of singular values and/or singular
vectors to be computed in Ui , ViT, and Si for each matrix Ai.

m The number of rows in the matrices Ai (m ≥ 0).

n The number of columns in the matrices Ai (n ≥ 0).

a Array of size at least stride_a * batch_size holding input matrices Ai.

lda Specifies the leading dimension of the Ai matrices: lda ≥ max(1, m).

strde_a Stride between two consecutive Ai matrices: stride_a ≥max(1, lda *

n).

stride_s The stride between two consecutive Si matrices: stride_s ≥ max(1,

min(m,n)).

ldu Specifies the leading dimension of the Ui matrices: ldu ≥ max(1, m).

stride_u The stride between two consecutive Ui matrices: stride_u ≥ max(1, ldu
* m).

ldvt Specifies the leading dimension of the ViT matrices: ldvt ≥ max(1, n).

stride_vt The stride between two consecutive ViT matrices: stride_vt ≥ max(1,
ldvt * n).

tolerance Specifies the tolerance threshold for computing truncated SVD in the cases
of iparm[0]=1 and iparm[0]=2. Not used otherwise.

batch_size The number of problems in a batch. Must be at least 0.

work Workspace array with dimension max(1, lwork).

1106
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lwork The dimension of the array work.
If lwork = -1, a workspace query is assumed: the routine only calculates
the optimal size of the work array and returns this value as the first entry of
the work array, and no error message related to lwork is issued by xerbla. If
lwork is less than the required minimum size but is positive, the routine
internally allocates the needed memory.

Output Parameters

irank On exit, if iparm[0]=1 or iparm[0]=2, element irank[0] is the

number of computed singular values and/or singular vectors for matrix
Ai.

a Unchanged on exit if the residual vector is not required. Otherwise,

contains the residual matrix

Ai:=Ai- Ui×Si×ViT
if iparm[2]=0, and

A_i:=Ai-Pi×Qi
otherwise.

s Array of size at least min(m,n)*batch_size to store a batch of

singular values Si.

u Array of size at least stride_u*batch_size to store a batch of Ui if

iparm[2]=0, or to store a batch of Pi if iparm[2]=1.

vt Array of size at least stride_vt*batch_size to store a batch of ViT if

iparm[2]=0, or to store a batch of Qi if iparm[2]=1.

residual Array of dimension batch_size. If iparm[3]=1, residual[i] is the

Frobenius norm of the matrix ||Ai - Ui×Si×ViT|| if iparm[2]=0,
and ||Ai - Pi×Qi|| if iparm[2]=1.

info Array of size at least batch_size, which reports the status for each
matrix.
If info[i] = 0, the execution is successful for Ai.

If info[0] = -j, the j-th parameter had an illegal value.

If info[0]= 1, an internal memory allocation failed.

If info[i] = 2, an input parameter contains an invalid value.

If info[i] = 3, an error in algorithm while computing singular values

of Ai occurred.
If info[0] = 4, the routine encountered an empty structure or
matrix array.

Cosine-Sine Decomposition: LAPACK Driver Routines

This topic describes LAPACK driver routines for computing the cosine-sine decomposition (CS
decomposition). You can also call the corresponding computational routines to perform the same task.
The computation has the following phases:

1107
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

1. The matrix is reduced to a bidiagonal block form.

2. The blocks are simultaneously diagonalized using techniques from the bidiagonal SVD algorithms.
Table "Driver Routines for Cosine-Sine Decomposition (CSD)" lists LAPACK routines that perform CS
decomposition of matrices.

Computational Routines for Cosine-Sine Decomposition (CSD)

Operation Real matrices Complex matrices

Compute the CS decomposition of a block- orcsd uncsd

partitioned orthogonal matrix

Compute the CS decomposition of a block- orcsd uncsd

partitioned unitary matrix

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobu1 If equals Y, then u1 is computed. Otherwise, u1 is not computed.

jobu2 If equals Y, then u2 is computed. Otherwise, u2 is not computed.

jobv1t If equals Y, then v1t is computed. Otherwise, v1t is not computed.

jobv2t If equals Y, then v2t is computed. Otherwise, v2t is not computed.

trans = 'T': x, u1, u2, v1t, v2t are stored in row-major order.

otherwise x, u1, u2, v1t, v2t are stored in column-major

order.

signs = 'O': The lower-left block is made nonpositive (the

"other" convention).
otherwise The upper-right block is made nonpositive (the
"default" convention).

m The number of rows and columns of the matrix X.

p The number of rows in x11 and x12. 0 ≤p≤m.

q The number of columns in x11 and x21. 0 ≤q≤m.

x11, x12, x21, x22 Arrays of size x11 (ldx11,q), x12 (ldx12,m - q), x21 (ldx21,q), and x22
(ldx22,m - q).

1109
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Contain the parts of the orthogonal/unitary matrix whose CSD is desired.

ldx11, ldx12, ldx21, ldx22 The leading dimensions of the parts of array X. ldx11≥ max(1, p), ldx12≥
max(1, p), ldx21≥ max(1, m - p), ldx22≥ max(1, m - p).

ldu1 The leading dimension of the array u1. If jobu1 = 'Y', ldu1≥ max(1,p).

ldu2 The leading dimension of the array u2. If jobu2 = 'Y', ldu2≥ max(1,m-p).

ldv1t The leading dimension of the array v1t. If jobv1t = 'Y', ldv1t≥
max(1,q).

ldv2t The leading dimension of the array v2t. If jobv2t = 'Y', ldv2t≥ max(1,m-
q).

Output Parameters

theta Array, size r, in which r = min(p,m-p,q,m-q).

C = diag( cos(theta[0]), ..., cos(theta[r - 1]) ), and

S = diag( sin(theta[0]), ..., sin(theta[r - 1]) ).

u1 Array, size at least max(1, ldu1*p).

If jobu1 = 'Y', u1 contains the p-by-p orthogonal/unitary matrix u1.

u2 Array, size at least max(1, ldu2*(m - p)).

If jobu2 = 'Y', u2 contains the (m-p)-by-(m-p) orthogonal/unitary matrix

u2.

v1t Array, size at least max(1, ldv1t*q) .

If jobv1t = 'Y', v1t contains the q-by-q orthogonal matrix v1T or unitary
matrix v1H.

v2t Array, size at least max(1, ldv2t*(m - q)).

If jobv2t = 'Y', v2t contains the (m-q)-by-(m-q) orthogonal matrix v2T or

unitary matrix v2H.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

> 0: ?orcsd/?uncsd did not converge.

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobu1 If equal to 'Y', then u1 is computed. Otherwise, u1 is not computed.

jobu2 If equal to 'Y', then u2 is computed. Otherwise, u2 is not computed.

jobv1t If equal to 'Y', then v1t is computed. Otherwise, v1t is not computed.

m The number of rows and columns of the matrix X.

p The number of rows in x11. 0 ≤p≤m.

q The number of columns in x11 . 0 ≤q≤m.

x11 Array, size (ldx11*q).

On entry, the part of the orthogonal matrix whose CSD is desired.

ldx11 The leading dimension of the array x11. ldx11≥ max(1,p).

x21 Array, size (ldx21*q).

On entry, the part of the orthogonal matrix whose CSD is desired.

ldx21 The leading dimension of the array X. ldx21≥ max(1,m - p).

ldu1 The leading dimension of the array u1. If jobu1 = 'Y', ldu1≥ max(1,p).

ldu2 The leading dimension of the array u2. If jobu2 = 'Y', ldu2≥ max(1,m-p).

ldv1t The leading dimension of the array v1t. If jobv1t = 'Y', ldv1t≥
max(1,q).

Output Parameters

theta Array, size r, in which r = min(p,m-p,q,m-q).

C = diag( cos(theta(1)), ..., cos(theta(r)) ), and

S = diag( sin(theta(1)), ..., sin(theta(r)) ).

u1 Array, size (ldu1*p) .

If jobu1 = 'Y', u1 contains the p-by-p orthogonal/unitary matrix u1.

u2 Array, size (ldu2*(m - p)) .

If jobu2 = 'Y', u2 contains the (m-p)-by-(m-p) orthogonal/unitary matrix
u2.

v1t Array, size (ldv1t*q) .

If jobv1t = 'Y', v1t contains the q-by-q orthogonal matrix v1T or unitary
matrix v1H.

Return Values
This function returns a value info.

= 0: successful exit

1112
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
< 0: if info = -i, the i-th argument has an illegal value

> 0: ?orcsd2by1/?uncsd2by1 did not converge.

Generalized Symmetric Definite Eigenvalue Problems: LAPACK Driver Routines

This topic describes LAPACK driver routines used for solving generalized symmetric definite eigenproblems.
See also computational routines that can be called to solve these problems. Table "Driver Routines for
Solving Generalized Symmetric Definite Eigenproblems" lists all such driver routines.
Driver Routines for Solving Generalized Symmetric Definite Eigenproblems
Routine Name Operation performed

sygv/hegv Computes all eigenvalues and, optionally, eigenvectors of a real / complex

generalized symmetric /Hermitian positive-definite eigenproblem.

sygvd/hegvd Computes all eigenvalues and, optionally, eigenvectors of a real / complex

generalized symmetric /Hermitian positive-definite eigenproblem. If eigenvectors
are desired, it uses a divide and conquer method.

sygvx/hegvx Computes selected eigenvalues and, optionally, eigenvectors of a real / complex

generalized symmetric /Hermitian positive-definite eigenproblem.

spgv/hpgv Computes all eigenvalues and, optionally, eigenvectors of a real / complex

generalized symmetric /Hermitian positive-definite eigenproblem with matrices in
packed storage.

spgvd/hpgvd Computes all eigenvalues and, optionally, eigenvectors of a real / complex

generalized symmetric /Hermitian positive-definite eigenproblem with matrices in
packed storage. If eigenvectors are desired, it uses a divide and conquer method.

spgvx/hpgvx Computes selected eigenvalues and, optionally, eigenvectors of a real / complex

generalized symmetric /Hermitian positive-definite eigenproblem with matrices in
packed storage.

sbgv/hbgv Computes all eigenvalues and, optionally, eigenvectors of a real / complex

generalized symmetric /Hermitian positive-definite eigenproblem with banded
matrices.

sbgvd/hbgvd Computes all eigenvalues and, optionally, eigenvectors of a real / complex

generalized symmetric /Hermitian positive-definite eigenproblem with banded
matrices. If eigenvectors are desired, it uses a divide and conquer method.

sbgvx/hbgvx Computes selected eigenvalues and, optionally, eigenvectors of a real / complex

generalized symmetric /Hermitian positive-definite eigenproblem with banded
matrices.

?sygv
Computes all eigenvalues and, optionally,
eigenvectors of a real generalized symmetric definite
eigenproblem.

Syntax
lapack_int LAPACKE_ssygv (int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, float* a, lapack_int lda, float* b, lapack_int ldb, float* w);

1113
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lapack_int LAPACKE_dsygv (int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, double* a, lapack_int lda, double* b, lapack_int ldb, double* w);

Include Files
• mkl.h

Description

The routine computes all the eigenvalues, and optionally, the eigenvectors of a real generalized symmetric-
definite eigenproblem, of the form
A*x = λ*B*x, A*B*x = λ*x, or B*A*x = λ*x.
Here A and B are assumed to be symmetric and B is also positive definite.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

itype Must be 1 or 2 or 3.
Specifies the problem type to be solved:
if itype = 1, the problem type is A*x = lambda*B*x;

if itype = 2, the problem type is ABx = lambda*x;

if itype = 3, the problem type is BAx = lambda*x.

jobz Must be 'N' or 'V'.

If jobz = 'N', then compute eigenvalues only.

If jobz = 'V', then compute eigenvalues and eigenvectors.

uplo Must be 'U' or 'L'.

If uplo = 'U', arrays a and b store the upper triangles of A and B;

If uplo = 'L', arrays a and b store the lower triangles of A and B.

n The order of the matrices A and B (n≥ 0).

a, b Arrays:
a (size at least max(1, lda*n)) contains the upper or lower triangle of the
symmetric matrix A, as specified by uplo.
b (size at least max(1, ldb*n)) contains the upper or lower triangle of the
symmetric positive definite matrix B, as specified by uplo.

lda The leading dimension of a; at least max(1, n).

ldb The leading dimension of b; at least max(1, n).

Output Parameters

a On exit, if jobz = 'V', then if info = 0, a contains the matrix Z of

eigenvectors. The eigenvectors are normalized as follows:
if itype = 1 or 2, ZT*B*Z = I;

1114
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
if itype = 3, ZT*inv(B)*Z = I;

If jobz = 'N', then on exit the upper triangle (if uplo = 'U') or the
lower triangle (if uplo = 'L') of A, including the diagonal, is destroyed.

b On exit, if info≤n, the part of b containing the matrix is overwritten by the

triangular factor U or L from the Cholesky factorization B = UT*U or B =
L*LT.

w Array, size at least max(1, n).

If info = 0, contains the eigenvalues in ascending order.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info > 0, spotrf/dpotrf or ssyev/dsyev returned an error code:

If info = i≤n, ssyev/dsyev failed to converge, and i off-diagonal elements of an intermediate tridiagonal
did not converge to zero;
If info = n + i, for 1 ≤i≤n, then the leading minor of order i of B is not positive-definite. The factorization
of B could not be completed and no eigenvalues or eigenvectors were computed.

?hegv
Computes all eigenvalues and, optionally,
eigenvectors of a complex generalized Hermitian
positive-definite eigenproblem.

Syntax
lapack_int LAPACKE_chegv( int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, lapack_complex_float* a, lapack_int lda, lapack_complex_float* b,
lapack_int ldb, float* w );
lapack_int LAPACKE_zhegv( int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, lapack_complex_double* a, lapack_int lda, lapack_complex_double* b,
lapack_int ldb, double* w );

Include Files
• mkl.h

Description

The routine computes all the eigenvalues, and optionally, the eigenvectors of a complex generalized
Hermitian positive-definite eigenproblem, of the form
A*x = λ*B*x, A*B*x = λ*x, or B*A*x = λ*x.
Here A and B are assumed to be Hermitian and B is also positive definite.

1115
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

itype Must be 1 or 2 or 3. Specifies the problem type to be solved:

if itype = 1, the problem type is A*x = lambda*B*x;

if itype = 2, the problem type is ABx = lambda*x;

if itype = 3, the problem type is BAx = lambda*x.

jobz Must be 'N' or 'V'.

If jobz = 'N', then compute eigenvalues only.

If jobz = 'V', then compute eigenvalues and eigenvectors.

uplo Must be 'U' or 'L'.

If uplo = 'U', arrays a and b store the upper triangles of A and B;

If uplo = 'L', arrays a and b store the lower triangles of A and B.

n The order of the matrices A and B (n≥ 0).

a, b Arrays:
a (size at least max(1, lda*n)) contains the upper or lower triangle of the
Hermitian matrix A, as specified by uplo.
b (size at least max(1, ldb*n)) contains the upper or lower triangle of the
Hermitian positive definite matrix B, as specified by uplo.

lda The leading dimension of a; at least max(1, n).

ldb The leading dimension of b; at least max(1, n).

Output Parameters

a On exit, if jobz = 'V', then if info = 0, a contains the matrix Z of

eigenvectors. The eigenvectors are normalized as follows:
if itype = 1 or 2, ZH*B*Z = I;

if itype = 3, ZH*inv(B)*Z = I;

If jobz = 'N', then on exit the upper triangle (if uplo = 'U') or the
lower triangle (if uplo = 'L') of A, including the diagonal, is destroyed.

b On exit, if info≤n, the part of b containing the matrix is overwritten by the

triangular factor U or L from the Cholesky factorization B = UH*U or B =
L*LH.

w Array, size at least max(1, n).

If info = 0, contains the eigenvalues in ascending order.

Return Values
This function returns a value info.

If info=0, the execution is successful.

1116
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = -i, the i-th parameter had an illegal value.

If info > 0, cpotrf/zpotrf or cheev/zheev return an error code:

If info = i≤n, cheev/zheev fails to converge, and i off-diagonal elements of an intermediate tridiagonal do
not converge to zero;
If info = n + i, for 1 ≤i≤n, then the leading minor of order i of B is not positive-definite. The factorization
of B can not be completed and no eigenvalues or eigenvectors are computed.

?sygvd
Computes all eigenvalues and, optionally,
eigenvectors of a real generalized symmetric definite
eigenproblem using a divide and conquer method.

Syntax
lapack_int LAPACKE_ssygvd (int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, float* a, lapack_int lda, float* b, lapack_int ldb, float* w);
lapack_int LAPACKE_dsygvd (int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, double* a, lapack_int lda, double* b, lapack_int ldb, double* w);

Include Files
• mkl.h

Description

The routine computes all the eigenvalues, and optionally, the eigenvectors of a real generalized symmetric-
definite eigenproblem, of the form
A*x = λ*B*x, A*B*x = λ*x, or B*A*x = λ*x .
Here A and B are assumed to be symmetric and B is also positive definite.
It uses a divide and conquer algorithm.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

itype Must be 1 or 2 or 3. Specifies the problem type to be solved:

if itype = 1, the problem type is A*x = lambda*B*x;

if itype = 2, the problem type is ABx = lambda*x;

if itype = 3, the problem type is BAx = lambda*x.

jobz Must be 'N' or 'V'.

If jobz = 'N', then compute eigenvalues only.

If jobz = 'V', then compute eigenvalues and eigenvectors.

uplo Must be 'U' or 'L'.

If uplo = 'U', arrays a and b store the upper triangles of A and B;

If uplo = 'L', arrays a and b store the lower triangles of A and B.

1117
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

n The order of the matrices A and B (n≥ 0).

a, b Arrays:
a (size at least lda*n) contains the upper or lower triangle of the
symmetric matrix A, as specified by uplo.
b (size at least ldb*n) contains the upper or lower triangle of the
symmetric positive definite matrix B, as specified by uplo.

lda The leading dimension of a; at least max(1, n).

ldb The leading dimension of b; at least max(1, n).

Output Parameters

a On exit, if jobz = 'V', then if info = 0, a contains the matrix Z of

eigenvectors. The eigenvectors are normalized as follows:
if itype = 1 or 2, ZT*B*Z = I;

if itype = 3, ZT*inv(B)*Z = I;

If jobz = 'N', then on exit the upper triangle (if uplo = 'U') or the
lower triangle (if uplo = 'L') of A, including the diagonal, is destroyed.

b On exit, if info≤n, the part of b containing the matrix is overwritten by the

triangular factor U or L from the Cholesky factorization B = UT*U or B =
L*LT.

w Array, size at least max(1, n).

If info = 0, contains the eigenvalues in ascending order.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info > 0, an error code is returned as specified below.

• For info≤n:

• If info = i and jobz = 'N', then the algorithm failed to converge; i off-diagonal elements of an
intermediate tridiagonal form did not converge to zero.
• If jobz = 'V', then the algorithm failed to compute an eigenvalue while working on the submatrix
lying in rows and columns info/(n+1) through mod(info,n+1).
• For info > n:

• If info = n + i, for 1 ≤i≤n, then the leading minor of order i of B is not positive-definite. The
factorization of B could not be completed and no eigenvalues or eigenvectors were computed.

?hegvd
Computes all the eigenvalues, and optionally, the
eigenvectors of a complex generalized Hermitian
positive-definite eigenproblem using a divide and
conquer method.

1118
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_chegvd( int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, lapack_complex_float* a, lapack_int lda, lapack_complex_float* b,
lapack_int ldb, float* w );
lapack_int LAPACKE_zhegvd( int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, lapack_complex_double* a, lapack_int lda, lapack_complex_double* b,
lapack_int ldb, double* w );

Include Files
• mkl.h

Description

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

itype Must be 1 or 2 or 3. Specifies the problem type to be solved:

if itype = 1, the problem type is A*x = lambda*B*x;

if itype = 2, the problem type is ABx = lambda*x;

if itype = 3, the problem type is BAx = lambda*x.

jobz Must be 'N' or 'V'.

If jobz = 'N', then compute eigenvalues only.

If jobz = 'V', then compute eigenvalues and eigenvectors.

uplo Must be 'U' or 'L'.

If uplo = 'U', arrays a and b store the upper triangles of A and B;

If uplo = 'L', arrays a and b store the lower triangles of A and B.

n The order of the matrices A and B (n≥ 0).

lda The leading dimension of a; at least max(1, n).

ldb The leading dimension of b; at least max(1, n).

1119
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

a On exit, if jobz = 'V', then if info = 0, a contains the matrix Z of

eigenvectors. The eigenvectors are normalized as follows:
if itype = 1 or 2, ZH* B*Z = I;

if itype = 3, ZH*inv(B)*Z = I;

If jobz = 'N', then on exit the upper triangle (if uplo = 'U') or the
lower triangle (if uplo = 'L') of A, including the diagonal, is destroyed.

b On exit, if info≤n, the part of b containing the matrix is overwritten by the

triangular factor U or L from the Cholesky factorization B = UH*U or B =
L*LH.

w Array, size at least max(1, n).

If info = 0, contains the eigenvalues in ascending order.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = n + i, for 1 ≤i≤n, then the leading minor of order i of B is not positive-definite. The factorization
of B could not be completed and no eigenvalues or eigenvectors were computed.

?sygvx
Computes selected eigenvalues and, optionally,
eigenvectors of a real generalized symmetric definite
eigenproblem.

Syntax
lapack_int LAPACKE_ssygvx (int matrix_layout, lapack_int itype, char jobz, char range,
char uplo, lapack_int n, float* a, lapack_int lda, float* b, lapack_int ldb, float vl,
float vu, lapack_int il, lapack_int iu, float abstol, lapack_int* m, float* w, float* z,
lapack_int ldz, lapack_int* ifail);
lapack_int LAPACKE_dsygvx (int matrix_layout, lapack_int itype, char jobz, char range,
char uplo, lapack_int n, double* a, lapack_int lda, double* b, lapack_int ldb, double
vl, double vu, lapack_int il, lapack_int iu, double abstol, lapack_int* m, double* w,
double* z, lapack_int ldz, lapack_int* ifail);

Include Files
• mkl.h

Description

1120
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The routine computes selected eigenvalues, and optionally, the eigenvectors of a real generalized symmetric-
definite eigenproblem, of the form
A*x = λ*B*x, A*B*x = λ*x, or B*A*x = λ*x.
Here A and B are assumed to be symmetric and B is also positive definite. Eigenvalues and eigenvectors can
be selected by specifying either a range of values or a range of indices for the desired eigenvalues.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

itype Must be 1 or 2 or 3. Specifies the problem type to be solved:

if itype = 1, the problem type is A*x = λ*B*x;

if itype = 2, the problem type is ABx = λ*x;

if itype = 3, the problem type is BAx = λ*x.

jobz Must be 'N' or 'V'.

If jobz = 'N', then compute eigenvalues only.

If jobz = 'V', then compute eigenvalues and eigenvectors.

range Must be 'A' or 'V' or 'I'.

If range = 'A', the routine computes all eigenvalues.

If range = 'V', the routine computes eigenvalues w[i] in the half-open

interval:
vl<w[i]≤vu.
If range = 'I', the routine computes eigenvalues with indices il to iu.

uplo Must be 'U' or 'L'.

If uplo = 'U', arrays a and b store the upper triangles of A and B;

If uplo = 'L', arrays a and b store the lower triangles of A and B.

n The order of the matrices A and B (n≥ 0).

lda The leading dimension of a; at least max(1, n).

ldb The leading dimension of b; at least max(1, n).

vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
Constraint: vl< vu.

If range = 'A' or 'I', vl and vu are not referenced.

il, iu

1121
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint: 1 ≤il≤iu≤n, if n > 0; il=1 and iu=0

if n = 0.

If range = 'A' or 'V', il and iu are not referenced.

abstol
ldz The leading dimension of the output array z. Constraints:
ldz≥ 1; if jobz = 'V', ldz≥ max(1, n) for column major layout and ldz≥
max(1, m) for row major layout .

Output Parameters

a On exit, the upper triangle (if uplo = 'U') or the lower triangle (if uplo =
'L') of A, including the diagonal, is overwritten.

b On exit, if info≤n, the part of b containing the matrix is overwritten by the

triangular factor U or L from the Cholesky factorization B = UT*U or B =
L*LT.

m The total number of eigenvalues found,

0 ≤m≤n. If range = 'A', m = n, and if range = 'I',
m = iu-il+1.

w, z Arrays:
w, size at least max(1, n).
The first m elements of w contain the selected eigenvalues in ascending
order.
z(size at least max(1, ldz*m) for column major layout and max(1, ldz*n)
for row major layout) .
If jobz = 'V', then if info = 0, the first m columns of z contain the
orthonormal eigenvectors of the matrix A corresponding to the selected
eigenvalues, with the i-th column of z holding the eigenvector associated
with w[i - 1]. The eigenvectors are normalized as follows:
if itype = 1 or 2, ZT*B*Z = I;

if itype = 3, ZT*inv(B)*Z = I;

If jobz = 'N', then z is not referenced.

If an eigenvector fails to converge, then that column of z contains the latest

approximation to the eigenvector, and the index of the eigenvector is
returned in ifail.
Note: you must ensure that at least max(1,m) columns are supplied in the
array z; if range = 'V', the exact value of m is not known in advance and
an upper bound must be used.

ifail Array, size at least max(1, n).

1122
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If jobz = 'V', then if info = 0, the first m elements of ifail are zero; if
info > 0, the ifail contains the indices of the eigenvectors that failed to
converge.
If jobz = 'N', then ifail is not referenced.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info > 0, spotrf/dpotrf and ssyevx/dsyevx returned an error code:

If info = i≤n, ssyevx/dsyevx failed to converge, and i eigenvectors failed to converge. Their indices are
stored in the array ifail;
If info = n + i, for 1 ≤i≤n, then the leading minor of order i of B is not positive-definite. The factorization
of B could not be completed and no eigenvalues or eigenvectors were computed.

Application Notes
An approximate eigenvalue is accepted as converged when it is determined to lie in an interval [a,b] of
width less than or equal to abstol+ε*max(|a|,|b|), where ε is the machine precision.

If abstol is less than or equal to zero, then ε*||T||1 is used as tolerance, where T is the tridiagonal matrix
obtained by reducing C to tridiagonal form, where C is the symmetric matrix of the standard symmetric
problem to which the generalized problem is transformed. Eigenvalues will be computed most accurately
when abstol is set to twice the underflow threshold 2*?lamch('S'), not zero.

If this routine returns with info > 0, indicating that some eigenvectors did not converge, set abstol to
2*?lamch('S').

?hegvx
Computes selected eigenvalues and, optionally,
eigenvectors of a complex generalized Hermitian
positive-definite eigenproblem.

Syntax
lapack_int LAPACKE_chegvx( int matrix_layout, lapack_int itype, char jobz, char range,
char uplo, lapack_int n, lapack_complex_float* a, lapack_int lda, lapack_complex_float*
b, lapack_int ldb, float vl, float vu, lapack_int il, lapack_int iu, float abstol,
lapack_int* m, float* w, lapack_complex_float* z, lapack_int ldz, lapack_int* ifail );
lapack_int LAPACKE_zhegvx( int matrix_layout, lapack_int itype, char jobz, char range,
char uplo, lapack_int n, lapack_complex_double* a, lapack_int lda,
lapack_complex_double* b, lapack_int ldb, double vl, double vu, lapack_int il,
lapack_int iu, double abstol, lapack_int* m, double* w, lapack_complex_double* z,
lapack_int ldz, lapack_int* ifail );

Include Files
• mkl.h

Description

1123
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The routine computes selected eigenvalues, and optionally, the eigenvectors of a complex generalized
Hermitian positive-definite eigenproblem, of the form
A*x = λ*B*x, A*B*x = λ*x, or B*A*x = λ*x.
Here A and B are assumed to be Hermitian and B is also positive definite. Eigenvalues and eigenvectors can
be selected by specifying either a range of values or a range of indices for the desired eigenvalues.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

itype Must be 1 or 2 or 3. Specifies the problem type to be solved:

if itype = 1, the problem type is A*x = λ*B*x;

if itype = 2, the problem type is ABx = λ*x;

if itype = 3, the problem type is BAx = λ*x.

jobz Must be 'N' or 'V'.

If jobz = 'N', then compute eigenvalues only.

If jobz = 'V', then compute eigenvalues and eigenvectors.

range Must be 'A' or 'V' or 'I'.

If range = 'A', the routine computes all eigenvalues.

If range = 'V', the routine computes eigenvalues w[i] in the half-open

interval:
vl<w[i]≤vu.
If range = 'I', the routine computes eigenvalues with indices il to iu.

uplo Must be 'U' or 'L'.

If uplo = 'U', arrays a and b store the upper triangles of A and B;

If uplo = 'L', arrays a and b store the lower triangles of A and B.

n The order of the matrices A and B (n≥ 0).

lda The leading dimension of a; at least max(1, n).

ldb The leading dimension of b; at least max(1, n).

vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
Constraint: vl< vu.

If range = 'A' or 'I', vl and vu are not referenced.

il, iu

1124
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint: 1 ≤il≤iu≤n, if n > 0; il=1 and iu=0

if n = 0.

If range = 'A' or 'V', il and iu are not referenced.

abstol The absolute error tolerance for the eigenvalues. See Application Notes for
more information.

ldz The leading dimension of the output array z. Constraints:

ldz≥ 1; if jobz = 'V', ldz≥ max(1, n) for column major layout and ldz≥
max(1, m) for row major layout.

Output Parameters

a On exit, the upper triangle (if uplo = 'U') or the lower triangle (if uplo =
'L') of A, including the diagonal, is overwritten.

b On exit, if info≤n, the part of b containing the matrix is overwritten by the

triangular factor U or L from the Cholesky factorization B = UH*U or B =
L*LH.

m The total number of eigenvalues found,

0 ≤m≤n. If range = 'A', m = n, and if range = 'I',
m = iu-il+1.

w Array, size at least max(1, n).

The first m elements of w contain the selected eigenvalues in ascending
order.

if itype = 3, ZH*inv(B)*Z = I;

If jobz = 'N', then z is not referenced.

If an eigenvector fails to converge, then that column of z contains the latest

ifail Array, size at least max(1, n).

1125
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info > 0, cpotrf/zpotrf and cheevx/zheevx returned an error code:

If info = i≤n, cheevx/zheevx failed to converge, and i eigenvectors failed to converge. Their indices are
stored in the array ifail;
If info = n + i, for 1 ≤i≤n, then the leading minor of order i of B is not positive-definite. The factorization
of B could not be completed and no eigenvalues or eigenvectors were computed.

If abstol is less than or equal to zero, then ε*||T||1 will be used in its place, where T is the tridiagonal
matrix obtained by reducing C to tridiagonal form, where C is the symmetric matrix of the standard
symmetric problem to which the generalized problem is transformed. Eigenvalues will be computed most
accurately when abstol is set to twice the underflow threshold 2*?lamch('S'), not zero.

If this routine returns with info > 0, indicating that some eigenvectors did not converge, try setting abstol
to 2*?lamch('S').

?spgv
Computes all eigenvalues and, optionally,
eigenvectors of a real generalized symmetric definite
eigenproblem with matrices in packed storage.

Syntax
lapack_int LAPACKE_sspgv (int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, float* ap, float* bp, float* w, float* z, lapack_int ldz);
lapack_int LAPACKE_dspgv (int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, double* ap, double* bp, double* w, double* z, lapack_int ldz);

Include Files
• mkl.h

Description

1126
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

itype Must be 1 or 2 or 3. Specifies the problem type to be solved:

if itype = 1, the problem type is A*x = lambda*B*x;

if itype = 2, the problem type is ABx = lambda*x;

if itype = 3, the problem type is BAx = lambda*x.

jobz Must be 'N' or 'V'.

If jobz = 'N', then compute eigenvalues only.

If jobz = 'V', then compute eigenvalues and eigenvectors.

uplo Must be 'U' or 'L'.

If uplo = 'U', arrays ap and bp store the upper triangles of A and B;

If uplo = 'L', arrays ap and bp store the lower triangles of A and B.

n The order of the matrices A and B (n≥ 0).

ap, bp Arrays:
ap contains the packed upper or lower triangle of the symmetric matrix A,
as specified by uplo.
The dimension of ap must be at least max(1, n*(n+1)/2).
bp contains the packed upper or lower triangle of the symmetric matrix B,
as specified by uplo.
The dimension of bp must be at least max(1, n*(n+1)/2).

ldz The leading dimension of the output array z; ldz≥ 1. If jobz = 'V', ldz≥
max(1, n).

Output Parameters

ap On exit, the contents of ap are overwritten.

bp On exit, contains the triangular factor U or L from the Cholesky factorization

B = UT*U or B = L*LT, in the same storage format as B.

w, z Arrays:
w, size at least max(1, n).
If info = 0, contains the eigenvalues in ascending order.

z (size max(1, ldz*n)) .

If jobz = 'V', then if info = 0, z contains the matrix Z of eigenvectors.
The eigenvectors are normalized as follows:
if itype = 1 or 2, ZT*B*Z = I;

if itype = 3, ZT*inv(B)*Z = I;

If jobz = 'N', then z is not referenced.

1127
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info > 0, spptrf/dpptrf and sspev/dspev returned an error code:

If info = i≤n, sspev/dspev failed to converge, and i off-diagonal elements of an intermediate tridiagonal
did not converge to zero;
If info = n + i, for 1 ≤i≤n, then the leading minor of order i of B is not positive-definite. The factorization
of B could not be completed and no eigenvalues or eigenvectors were computed.

?hpgv
Computes all eigenvalues and, optionally,
eigenvectors of a complex generalized Hermitian
positive-definite eigenproblem with matrices in packed
storage.

Syntax
lapack_int LAPACKE_chpgv( int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, lapack_complex_float* ap, lapack_complex_float* bp, float* w,
lapack_complex_float* z, lapack_int ldz );
lapack_int LAPACKE_zhpgv( int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, lapack_complex_double* ap, lapack_complex_double* bp, double* w,
lapack_complex_double* z, lapack_int ldz );

Include Files
• mkl.h

Description

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

itype Must be 1 or 2 or 3. Specifies the problem type to be solved:

if itype = 1, the problem type is A*x = lambda*B*x;

if itype = 2, the problem type is ABx = lambda*x;

if itype = 3, the problem type is BAx = lambda*x.

jobz Must be 'N' or 'V'.

If jobz = 'N', then compute eigenvalues only.

1128
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If jobz = 'V', then compute eigenvalues and eigenvectors.

uplo Must be 'U' or 'L'.

If uplo = 'U', arrays ap and bp store the upper triangles of A and B;

If uplo = 'L', arrays ap and bp store the lower triangles of A and B.

n The order of the matrices A and B (n≥ 0).

ap, bp Arrays:
ap contains the packed upper or lower triangle of the Hermitian matrix A, as
specified by uplo.
The dimension of ap must be at least max(1, n*(n+1)/2).
bp contains the packed upper or lower triangle of the Hermitian matrix B,
as specified by uplo.
The dimension of bp must be at least max(1, n*(n+1)/2).

ldz The leading dimension of the output array z; ldz≥ 1. If jobz = 'V', ldz≥
max(1, n).

Output Parameters

ap On exit, the contents of ap are overwritten.

bp On exit, contains the triangular factor U or L from the Cholesky factorization

B = UH*U or B = L*LH, in the same storage format as B.

w Array, size at least max(1, n).

If info = 0, contains the eigenvalues in ascending order.

z Array z (size max(1, ldz*n)).

If jobz = 'V', then if info = 0, z contains the matrix Z of eigenvectors.

The eigenvectors are normalized as follows:
if itype = 1 or 2, ZH*B*Z = I;

if itype = 3, ZH*inv(B)*Z = I;

If jobz = 'N', then z is not referenced.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info > 0, cpptrf/zpptrf and chpev/zhpev returned an error code:

If info = i≤n, chpev/zhpev failed to converge, and i off-diagonal elements of an intermediate tridiagonal
did not converge to zero;
If info = n + i, for 1 ≤i≤n, then the leading minor of order i of B is not positive-definite. The factorization
of B could not be completed and no eigenvalues or eigenvectors were computed.

1129
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

?spgvd
Computes all eigenvalues and, optionally,
eigenvectors of a real generalized symmetric definite
eigenproblem with matrices in packed storage using a
divide and conquer method.

Syntax
lapack_int LAPACKE_sspgvd (int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, float* ap, float* bp, float* w, float* z, lapack_int ldz);
lapack_int LAPACKE_dspgvd (int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, double* ap, double* bp, double* w, double* z, lapack_int ldz);

Include Files
• mkl.h

Description

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

itype Must be 1 or 2 or 3. Specifies the problem type to be solved:

if itype = 1, the problem type is A*x = lambda*B*x;

if itype = 2, the problem type is ABx = lambda*x;

if itype = 3, the problem type is BAx = lambda*x.

jobz Must be 'N' or 'V'.

If jobz = 'N', then compute eigenvalues only.

If jobz = 'V', then compute eigenvalues and eigenvectors.

uplo Must be 'U' or 'L'.

If uplo = 'U', arrays ap and bp store the upper triangles of A and B;

If uplo = 'L', arrays ap and bp store the lower triangles of A and B.

n The order of the matrices A and B (n≥ 0).

ap, bp Arrays:
ap contains the packed upper or lower triangle of the symmetric matrix A,
as specified by uplo.
The dimension of ap must be at least max(1, n*(n+1)/2).

1130
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
bp contains the packed upper or lower triangle of the symmetric matrix B,
as specified by uplo.
The dimension of bp must be at least max(1, n*(n+1)/2).

ldz The leading dimension of the output array z; ldz≥ 1. If jobz = 'V', ldz≥
max(1, n).

Output Parameters

ap On exit, the contents of ap are overwritten.

bp On exit, contains the triangular factor U or L from the Cholesky factorization

B = UT*U or B = L*LT, in the same storage format as B.

w, z Arrays:
w, size at least max(1, n).
If info = 0, contains the eigenvalues in ascending order.

z (size at least max(1, ldz*n)).

If jobz = 'V', then if info = 0, z contains the matrix Z of eigenvectors.
The eigenvectors are normalized as follows:
if itype = 1 or 2, ZT*B*Z = I;

if itype = 3, ZT*inv(B)*Z = I;

If jobz = 'N', then z is not referenced.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info > 0, spptrf/dpptrf and sspevd/dspevd returned an error code:

If info = i≤n, sspevd/dspevd failed to converge, and i off-diagonal elements of an intermediate tridiagonal
did not converge to zero;
If info = n + i, for 1 ≤i≤n, then the leading minor of order i of B is not positive-definite. The factorization
of B could not be completed and no eigenvalues or eigenvectors were computed.

?hpgvd
Computes all eigenvalues and, optionally,
eigenvectors of a complex generalized Hermitian
positive-definite eigenproblem with matrices in packed
storage using a divide and conquer method.

Syntax
lapack_int LAPACKE_chpgvd( int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, lapack_complex_float* ap, lapack_complex_float* bp, float* w,
lapack_complex_float* z, lapack_int ldz );
lapack_int LAPACKE_zhpgvd( int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, lapack_complex_double* ap, lapack_complex_double* bp, double* w,
lapack_complex_double* z, lapack_int ldz );

1131
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

itype Must be 1 or 2 or 3. Specifies the problem type to be solved:

if itype = 1, the problem type is A*x = lambda*B*x;

if itype = 2, the problem type is ABx = lambda*x;

if itype = 3, the problem type is BAx = lambda*x.

jobz Must be 'N' or 'V'.

If jobz = 'N', then compute eigenvalues only.

If jobz = 'V', then compute eigenvalues and eigenvectors.

uplo Must be 'U' or 'L'.

If uplo = 'U', arrays ap and bp store the upper triangles of A and B;

If uplo = 'L', arrays ap and bp store the lower triangles of A and B.

n The order of the matrices A and B (n≥ 0).

ldz The leading dimension of the output array z; ldz≥ 1. If jobz = 'V', ldz≥
max(1, n).

Output Parameters

ap On exit, the contents of ap are overwritten.

bp On exit, contains the triangular factor U or L from the Cholesky factorization

B = UH*U or B = L*LH, in the same storage format as B.

1132
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
w Array, size at least max(1, n).
If info = 0, contains the eigenvalues in ascending order.

z Array z (size at least max(1, ldz*n)).

If jobz = 'V', then if info = 0, z contains the matrix Z of eigenvectors.

The eigenvectors are normalized as follows:
if itype = 1 or 2, ZH*B*Z = I;

if itype = 3, ZH*inv(B)*Z = I;

If jobz = 'N', then z is not referenced.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info > 0, cpptrf/zpptrf and chpevd/zhpevd returned an error code:

If info = i≤n, chpevd/zhpevd failed to converge, and i off-diagonal elements of an intermediate tridiagonal
did not converge to zero;
If info = n + i, for 1 ≤i≤n, then the leading minor of order i of B is not positive-definite. The factorization
of B could not be completed and no eigenvalues or eigenvectors were computed.

?spgvx
Computes selected eigenvalues and, optionally,
eigenvectors of a real generalized symmetric definite
eigenproblem with matrices in packed storage.

Syntax
lapack_int LAPACKE_sspgvx (int matrix_layout, lapack_int itype, char jobz, char range,
char uplo, lapack_int n, float* ap, float* bp, float vl, float vu, lapack_int il,
lapack_int iu, float abstol, lapack_int* m, float* w, float* z, lapack_int ldz,
lapack_int* ifail);
lapack_int LAPACKE_dspgvx (int matrix_layout, lapack_int itype, char jobz, char range,
char uplo, lapack_int n, double* ap, double* bp, double vl, double vu, lapack_int il,
lapack_int iu, double abstol, lapack_int* m, double* w, double* z, lapack_int ldz,
lapack_int* ifail);

Include Files
• mkl.h

Description

The routine computes selected eigenvalues, and optionally, the eigenvectors of a real generalized symmetric-
definite eigenproblem, of the form
A*x = λ*B*x, A*B*x = λ*x, or B*A*x = λ*x.
Here A and B are assumed to be symmetric, stored in packed format, and B is also positive definite.
Eigenvalues and eigenvectors can be selected by specifying either a range of values or a range of indices for
the desired eigenvalues.

1133
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

itype Must be 1 or 2 or 3. Specifies the problem type to be solved:

if itype = 1, the problem type is A*x = lambda*B*x;

if itype = 2, the problem type is ABx = lambda*x;

if itype = 3, the problem type is BAx = lambda*x.

jobz Must be 'N' or 'V'.

If jobz = 'N', then compute eigenvalues only.

If jobz = 'V', then compute eigenvalues and eigenvectors.

range Must be 'A' or 'V' or 'I'.

If range = 'A', the routine computes all eigenvalues.

If range = 'V', the routine computes eigenvalues w[i] in the half-open

interval:
vl<w[i]≤vu.
If range = 'I', the routine computes eigenvalues with indices il to iu.

uplo Must be 'U' or 'L'.

If uplo = 'U', arrays ap and bp store the upper triangles of A and B;

If uplo = 'L', arrays ap and bp store the lower triangles of A and B.

n The order of the matrices A and B (n≥ 0).

ap, bp Arrays:
ap contains the packed upper or lower triangle of the symmetric matrix A,
as specified by uplo.
The size of ap must be at least max(1, n*(n+1)/2).
bp contains the packed upper or lower triangle of the symmetric matrix B,
as specified by uplo.
The size of bp must be at least max(1, n*(n+1)/2).

vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
Constraint: vl< vu.

If range = 'A' or 'I', vl and vu are not referenced.

il, iu If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint: 1 ≤il≤iu≤n, if n > 0; il=1 and iu=0

if n = 0.

If range = 'A' or 'V', il and iu are not referenced.

1134
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
abstol The absolute error tolerance for the eigenvalues. See Application Notes for
more information.

ldz The leading dimension of the output array z. Constraints:

ldz≥ 1; if jobz = 'V', ldz≥ max(1, n) for column major layout and ldz≥
max(1, m) for row major layout .

Output Parameters

ap On exit, the contents of ap are overwritten.

bp On exit, contains the triangular factor U or L from the Cholesky factorization

B = UT*U or B = L*LT, in the same storage format as B.

m The total number of eigenvalues found,

0 ≤m≤n. If range = 'A', m = n, and if range = 'I',
m = iu-il+1.

w, z Arrays:
w, size at least max(1, n).
If info = 0, contains the eigenvalues in ascending order.

if itype = 3, ZT*inv(B)*Z = I;

If jobz = 'N', then z is not referenced.

If an eigenvector fails to converge, then that column of z contains the latest

ifail Array, size at least max(1, n).

Return Values
This function returns a value info.

If info=0, the execution is successful.

1135
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If info = -i, the i-th parameter had an illegal value.

If info > 0, spptrf/dpptrf and sspevx/dspevx returned an error code:

If info = i≤n, sspevx/dspevx failed to converge, and i eigenvectors failed to converge. Their indices are
stored in the array ifail;
If info = n + i, for 1 ≤i≤n, then the leading minor of order i of B is not positive-definite. The factorization
of B could not be completed and no eigenvalues or eigenvectors were computed.

If abstol is less than or equal to zero, then ε*||T||1 is used instead, where T is the tridiagonal matrix
obtained by reducing A to tridiagonal form. Eigenvalues are computed most accurately when abstol is set to
twice the underflow threshold 2*?lamch('S'), not zero.

If this routine returns with info > 0, indicating that some eigenvectors did not converge, set abstol to
2*?lamch('S').

?hpgvx
Computes selected eigenvalues and, optionally,
eigenvectors of a generalized Hermitian positive-
definite eigenproblem with matrices in packed
storage.

Syntax
lapack_int LAPACKE_chpgvx( int matrix_layout, lapack_int itype, char jobz, char range,
char uplo, lapack_int n, lapack_complex_float* ap, lapack_complex_float* bp, float vl,
float vu, lapack_int il, lapack_int iu, float abstol, lapack_int* m, float* w,
lapack_complex_float* z, lapack_int ldz, lapack_int* ifail );
lapack_int LAPACKE_zhpgvx( int matrix_layout, lapack_int itype, char jobz, char range,
char uplo, lapack_int n, lapack_complex_double* ap, lapack_complex_double* bp, double
vl, double vu, lapack_int il, lapack_int iu, double abstol, lapack_int* m, double* w,
lapack_complex_double* z, lapack_int ldz, lapack_int* ifail );

Include Files
• mkl.h

Description

The routine computes selected eigenvalues, and optionally, the eigenvectors of a complex generalized
Hermitian positive-definite eigenproblem, of the form
A*x = λ*B*x, A*B*x = λ*x, or B*A*x = λ*x.
Here A and B are assumed to be Hermitian, stored in packed format, and B is also positive definite.
Eigenvalues and eigenvectors can be selected by specifying either a range of values or a range of indices for
the desired eigenvalues.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

1136
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
itype Must be 1 or 2 or 3. Specifies the problem type to be solved:
if itype = 1, the problem type is A*x = lambda*B*x;

if itype = 2, the problem type is ABx = lambda*x;

if itype = 3, the problem type is BAx = lambda*x.

jobz Must be 'N' or 'V'.

If jobz = 'N', then compute eigenvalues only.

If jobz = 'V', then compute eigenvalues and eigenvectors.

range Must be 'A' or 'V' or 'I'.

If range = 'A', the routine computes all eigenvalues.

If range = 'V', the routine computes eigenvalues w[i] in the half-open

interval:
vl<w[i]≤vu.
If range = 'I', the routine computes eigenvalues with indices il to iu.

uplo Must be 'U' or 'L'.

If uplo = 'U', arrays ap and bp store the upper triangles of A and B;

If uplo = 'L', arrays ap and bp store the lower triangles of A and B.

n The order of the matrices A and B (n≥ 0).

vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
Constraint: vl< vu.

If range = 'A' or 'I', vl and vu are not referenced.

il, iu If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint: 1 ≤il≤iu≤n, if n > 0; il=1 and iu=0

if n = 0.

If range = 'A' or 'V', il and iu are not referenced.

abstol The absolute error tolerance for the eigenvalues.

See Application Notes for more information.

1137
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ldz The leading dimension of the output array z; ldz≥ 1. If jobz = 'V', ldz≥
max(1, n) for column major layout and ldz≥ max(1, m) for row major
layout.

Output Parameters

ap On exit, the contents of ap are overwritten.

bp On exit, contains the triangular factor U or L from the Cholesky factorization

B = UH*U or B = L*LH, in the same storage format as B.

m The total number of eigenvalues found,

0 ≤m≤n. If range = 'A', m = n, and if range = 'I',
m = iu-il+1.

w Array, size at least max(1, n).

If info = 0, contains the eigenvalues in ascending order.

if itype = 3, ZH*inv(B)*Z = I;

If jobz = 'N', then z is not referenced.

If an eigenvector fails to converge, then that column of z contains the latest

ifail Array, size at least max(1, n).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info > 0, cpptrf/zpptrf and chpevx/zhpevx returned an error code:

If info = i≤n, chpevx/zhpevx failed to converge, and i eigenvectors failed to converge. Their indices are
stored in the array ifail;

1138
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = n + i, for 1 ≤i≤n, then the leading minor of order i of B is not positive-definite. The factorization
of B could not be completed and no eigenvalues or eigenvectors were computed.

If this routine returns with info > 0, indicating that some eigenvectors did not converge, try setting abstol
to 2*?lamch('S').

?sbgv
Computes all eigenvalues and, optionally,
eigenvectors of a real generalized symmetric definite
eigenproblem with banded matrices.

Syntax
lapack_int LAPACKE_ssbgv (int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, float* ab, lapack_int ldab, float* bb, lapack_int ldbb,
float* w, float* z, lapack_int ldz);
lapack_int LAPACKE_dsbgv (int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, double* ab, lapack_int ldab, double* bb, lapack_int ldbb,
double* w, double* z, lapack_int ldz);

Include Files
• mkl.h

Description

The routine computes all the eigenvalues, and optionally, the eigenvectors of a real generalized symmetric-
definite banded eigenproblem, of the form A*x = λ*B*x. Here A and B are assumed to be symmetric and
banded, and B is also positive definite.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

If jobz = 'N', then compute eigenvalues only.

If jobz = 'V', then compute eigenvalues and eigenvectors.

uplo Must be 'U' or 'L'.

If uplo = 'U', arrays ab and bb store the upper triangles of A and B;

If uplo = 'L', arrays ab and bb store the lower triangles of A and B.

n The order of the matrices A and B (n≥ 0).

1139
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ka The number of super- or sub-diagonals in A

(ka≥ 0).

kb The number of super- or sub-diagonals in B (kb≥ 0).

ab, bb Arrays:
ab(size at least max(1, ldab*n) for column major layout and max(1,
ldab*(ka + 1)) for row major layout) is an array containing either upper or
lower triangular part of the symmetric matrix A (as specified by uplo) in
band storage format.
bb(size at least max(1, ldbb*n) for column major layout and max(1,
ldbb*(kb + 1)) for row major layout) is an array containing either upper or
lower triangular part of the symmetric matrix B (as specified by uplo) in
band storage format.

ldab The leading dimension of the array ab; must be at least ka+1 for column
major layout and at least max(1, n) for row major layout .

ldbb The leading dimension of the array bb; must be at least kb+1 for column
major layout and at least max(1, n) for row major layout.

ldz The leading dimension of the output array z; ldz≥ 1. If jobz = 'V', ldz≥
max(1, n).

Output Parameters

ab On exit, the contents of ab are overwritten.

bb On exit, contains the factor S from the split Cholesky factorization B =

ST*S, as returned by pbstf/pbstf.

w, z Arrays:
w, size at least max(1, n).
If info = 0, contains the eigenvalues in ascending order.

z (size at least max(1, ldz*n)) .

If jobz = 'V', then if info = 0, z contains the matrix Z of eigenvectors,
with the i-th column of z holding the eigenvector associated with w(i). The
eigenvectors are normalized so that ZT*B*Z = I.

If jobz = 'N', then z is not referenced.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info > 0, and

if i≤n, the algorithm failed to converge, and i off-diagonal elements of an intermediate tridiagonal did not
converge to zero;

1140
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
if info = n + i, for 1 ≤i≤n, then pbstf/pbstf returned info = i and B is not positive-definite. The
factorization of B could not be completed and no eigenvalues or eigenvectors were computed.

?hbgv
Computes all eigenvalues and, optionally,
eigenvectors of a complex generalized Hermitian
positive-definite eigenproblem with banded matrices.

Syntax
lapack_int LAPACKE_chbgv( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, lapack_complex_float* ab, lapack_int ldab,
lapack_complex_float* bb, lapack_int ldbb, float* w, lapack_complex_float* z,
lapack_int ldz );
lapack_int LAPACKE_zhbgv( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, lapack_complex_double* ab, lapack_int ldab,
lapack_complex_double* bb, lapack_int ldbb, double* w, lapack_complex_double* z,
lapack_int ldz );

Include Files
• mkl.h

Description

The routine computes all the eigenvalues, and optionally, the eigenvectors of a complex generalized
Hermitian positive-definite banded eigenproblem, of the form A*x = λ*B*x. Here A and B are Hermitian and
banded matrices, and matrix B is also positive definite.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

If jobz = 'N', then compute eigenvalues only.

If jobz = 'V', then compute eigenvalues and eigenvectors.

uplo Must be 'U' or 'L'.

If uplo = 'U', arrays ab and bb store the upper triangles of A and B;

If uplo = 'L', arrays ab and bb store the lower triangles of A and B.

n The order of the matrices A and B (n≥ 0).

ka The number of super- or sub-diagonals in A

(ka≥ 0).

kb The number of super- or sub-diagonals in B (kb≥ 0).

ab, bb Arrays:

1141
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ab(size at least max(1, ldab*n) for column major layout and max(1,
ldab*(ka + 1)) for row major layout) is an array containing either upper or
lower triangular part of the Hermitian matrix A (as specified by uplo) in
band storage format.
bb(size at least max(1, ldbb*n) for column major layout and max(1,
ldbb*(kb + 1)) for row major layout) is an array containing either upper or
lower triangular part of the Hermitian matrix B (as specified by uplo) in
band storage format.

ldab The leading dimension of the array ab; must be at least ka+1 for column
major layout and at least max(1, n for row major layout.

ldbb The leading dimension of the array bb; must be at least kb+1 for column
major layout and at least max(1, n for row major layout.

ldz The leading dimension of the output array z; ldz≥ 1. If jobz = 'V', ldz≥
max(1, n).

Output Parameters

ab On exit, the contents of ab are overwritten.

bb On exit, contains the factor S from the split Cholesky factorization B =

SH*S, as returned by pbstf/pbstf.

w Array, size at least max(1, n).

If info = 0, contains the eigenvalues in ascending order.

z Array z (size at least max(1, ldz*n)).

If jobz = 'V', then if info = 0, z contains the matrix Z of eigenvectors,

with the i-th column of z holding the eigenvector associated with w(i). The
eigenvectors are normalized so that ZH*B*Z = I.

If jobz = 'N', then z is not referenced.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info > 0, and

if i≤n, the algorithm failed to converge, and i off-diagonal elements of an intermediate tridiagonal did not
converge to zero;
if info = n + i, for 1 ≤i≤n, then pbstf/pbstf returned info = i and B is not positive-definite. The
factorization of B could not be completed and no eigenvalues or eigenvectors were computed.

?sbgvd
Computes all eigenvalues and, optionally,
eigenvectors of a real generalized symmetric definite
eigenproblem with banded matrices. If eigenvectors
are desired, it uses a divide and conquer method.

1142
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_ssbgvd (int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, float* ab, lapack_int ldab, float* bb, lapack_int ldbb,
float* w, float* z, lapack_int ldz);
lapack_int LAPACKE_dsbgvd (int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, double* ab, lapack_int ldab, double* bb, lapack_int ldbb,
double* w, double* z, lapack_int ldz);

Include Files
• mkl.h

Description

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

If jobz = 'N', then compute eigenvalues only.

If jobz = 'V', then compute eigenvalues and eigenvectors.

uplo Must be 'U' or 'L'.

If uplo = 'U', arrays ab and bb store the upper triangles of A and B;

If uplo = 'L', arrays ab and bb store the lower triangles of A and B.

n The order of the matrices A and B (n≥ 0).

ka The number of super- or sub-diagonals in A

(ka≥ 0).

kb The number of super- or sub-diagonals in B (kb≥ 0).

ldab The leading dimension of the array ab; must be at least ka+1 for column
major layout and at least max(1, n) for row major layout.

1143
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ldbb The leading dimension of the array bb; must be at least kb+1 for column
major layout and at least max(1, n) for row major layout.

ldz The leading dimension of the output array z; ldz≥ 1. If jobz = 'V', ldz≥
max(1, n).

Output Parameters

ab On exit, the contents of ab are overwritten.

bb On exit, contains the factor S from the split Cholesky factorization B =

ST*S, as returned by pbstf/pbstf.

w, z Arrays:
w, size at least max(1, n).
If info = 0, contains the eigenvalues in ascending order.

z (size at least max(1, ldz*n)).

If jobz = 'V', then if info = 0, z contains the matrix Z of eigenvectors,
with the i-th column of z holding the eigenvector associated with w[i -
1]. The eigenvectors are normalized so that ZT*B*Z = I.
If jobz = 'N', then z is not referenced.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info > 0, and

?hbgvd
Computes all eigenvalues and, optionally,
eigenvectors of a complex generalized Hermitian
positive-definite eigenproblem with banded matrices.
If eigenvectors are desired, it uses a divide and
conquer method.

Syntax
lapack_int LAPACKE_chbgvd( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, lapack_complex_float* ab, lapack_int ldab,
lapack_complex_float* bb, lapack_int ldbb, float* w, lapack_complex_float* z,
lapack_int ldz );
lapack_int LAPACKE_zhbgvd( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, lapack_complex_double* ab, lapack_int ldab,
lapack_complex_double* bb, lapack_int ldbb, double* w, lapack_complex_double* z,
lapack_int ldz );

1144
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h

Description

The routine computes all the eigenvalues, and optionally, the eigenvectors of a complex generalized
Hermitian positive-definite banded eigenproblem, of the form A*x = λ*B*x. Here A and B are assumed to be
Hermitian and banded, and B is also positive definite.
If eigenvectors are desired, it uses a divide and conquer algorithm.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

If jobz = 'N', then compute eigenvalues only.

If jobz = 'V', then compute eigenvalues and eigenvectors.

uplo Must be 'U' or 'L'.

If uplo = 'U', arrays ab and bb store the upper triangles of A and B;

If uplo = 'L', arrays ab and bb store the lower triangles of A and B.

n The order of the matrices A and B (n≥ 0).

ka The number of super- or sub-diagonals in A

(ka≥0).

kb The number of super- or sub-diagonals in B (kb≥ 0).

ab, bb Arrays:
ab(size at least max(1, ldab*n) for column major layout and max(1,
ldab*(ka + 1)) for row major layout) is an array containing either upper or
lower triangular part of the Hermitian matrix A (as specified by uplo) in
band storage format.
bb(size at least max(1, ldbb*n) for column major layout and max(1,
ldbb*(kb + 1)) for row major layout) is an array containing either upper or
lower triangular part of the Hermitian matrix B (as specified by uplo) in
band storage format.

ldab The leading dimension of the array ab; must be at least ka+1.

ldbb The leading dimension of the array bb; must be at least kb+1.

ldz The leading dimension of the output array z; ldz≥ 1. If jobz = 'V', ldz≥
max(1, n).

Output Parameters

ab On exit, the contents of ab are overwritten.

1145
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

bb On exit, contains the factor S from the split Cholesky factorization B =

SH*S, as returned by pbstf/pbstf.

w Array, size at least max(1, n) .

If info = 0, contains the eigenvalues in ascending order.

z Array z (size at least max(1, ldz*n)).

If jobz = 'V', then if info = 0, z contains the matrix Z of eigenvectors,

with the i-th column of z holding the eigenvector associated with w(i). The
eigenvectors are normalized so that ZH*B*Z = I.

If jobz = 'N', then z is not referenced.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info > 0, and

?sbgvx
Computes selected eigenvalues and, optionally,
eigenvectors of a real generalized symmetric definite
eigenproblem with banded matrices.

Syntax
lapack_int LAPACKE_ssbgvx (int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_int ka, lapack_int kb, float* ab, lapack_int ldab, float* bb,
lapack_int ldbb, float* q, lapack_int ldq, float vl, float vu, lapack_int il, lapack_int
iu, float abstol, lapack_int* m, float* w, float* z, lapack_int ldz, lapack_int* ifail);
lapack_int LAPACKE_dsbgvx (int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_int ka, lapack_int kb, double* ab, lapack_int ldab, double* bb,
lapack_int ldbb, double* q, lapack_int ldq, double vl, double vu, lapack_int il,
lapack_int iu, double abstol, lapack_int* m, double* w, double* z, lapack_int ldz,
lapack_int* ifail);

Include Files
• mkl.h

Description

The routine computes selected eigenvalues, and optionally, the eigenvectors of a real generalized symmetric-
definite banded eigenproblem, of the form A*x = λ*B*x. Here A and B are assumed to be symmetric and
banded, and B is also positive definite. Eigenvalues and eigenvectors can be selected by specifying either all
eigenvalues, a range of values or a range of indices for the desired eigenvalues.

1146
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobz Must be 'N' or 'V'.

If jobz = 'N', then compute eigenvalues only.

If jobz = 'V', then compute eigenvalues and eigenvectors.

range Must be 'A' or 'V' or 'I'.

If range = 'A', the routine computes all eigenvalues.

If range = 'V', the routine computes eigenvalues w[i] in the half-open

interval:
vl<w[i]≤vu.
If range = 'I', the routine computes eigenvalues in range il to iu.

uplo Must be 'U' or 'L'.

If uplo = 'U', arrays ab and bb store the upper triangles of A and B;

If uplo = 'L', arrays ab and bb store the lower triangles of A and B.

n The order of the matrices A and B (n≥ 0).

ka The number of super- or sub-diagonals in A

(ka≥ 0).

kb The number of super- or sub-diagonals in B (kb≥ 0).

ldab The leading dimension of the array ab; must be at least ka+1 for column
major layout and at least max(1, n) for row major layout.

ldbb The leading dimension of the array bb; must be at least kb+1 for column
major layout and at least max(1, n) for row major layout.

vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
Constraint: vl< vu.

If range = 'A' or 'I', vl and vu are not referenced.

il, iu If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.

1147
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Constraint: 1 ≤il≤iu≤n, if n > 0; il=1 and iu=0

if n = 0.

If range = 'A' or 'V', il and iu are not referenced.

abstol The absolute error tolerance for the eigenvalues. See Application Notes for
more information.

ldz The leading dimension of the output array z; ldz≥ 1. If jobz = 'V', ldz≥
max(1, n).

ldq The leading dimension of the output array q; ldq < 1.

If jobz = 'V', ldq < max(1, n).

Output Parameters

ab On exit, the contents of ab are overwritten.

bb On exit, contains the factor S from the split Cholesky factorization B =

ST*S, as returned by pbstf/pbstf.

m The total number of eigenvalues found,

0 ≤m≤n. If range = 'A', m = n, and if range = 'I',
m = iu-il+1.

w, z, q Arrays:
w, size at least max(1, n) .
If info = 0, contains the eigenvalues in ascending order.

z(size max(1, ldz*m) for column major layout and max(1, ldz*n) for row
major layout) .
If jobz = 'V', then if info = 0, z contains the matrix Z of eigenvectors,
with the i-th column of z holding the eigenvector associated with w(i). The
eigenvectors are normalized so that ZT*B*Z = I.

If jobz = 'N', then z is not referenced.

q (size max(1, ldq*n)) .

If jobz = 'V', then q contains the n-by-n matrix used in the reduction of
A*x = lambda*B*x to standard form, that is, C*x= lambda*x and
consequently C to tridiagonal form.
If jobz = 'N', then q is not referenced.

ifail Array, size m.

Return Values
This function returns a value info.

1148
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info > 0, and

If this routine returns with info > 0, indicating that some eigenvectors did not converge, try setting abstol
to 2*?lamch('S').

?hbgvx
Computes selected eigenvalues and, optionally,
eigenvectors of a complex generalized Hermitian
positive-definite eigenproblem with banded matrices.

Syntax
lapack_int LAPACKE_chbgvx( int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_int ka, lapack_int kb, lapack_complex_float* ab, lapack_int ldab,
lapack_complex_float* bb, lapack_int ldbb, lapack_complex_float* q, lapack_int ldq,
float vl, float vu, lapack_int il, lapack_int iu, float abstol, lapack_int* m, float* w,
lapack_complex_float* z, lapack_int ldz, lapack_int* ifail );
lapack_int LAPACKE_zhbgvx( int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_int ka, lapack_int kb, lapack_complex_double* ab, lapack_int ldab,
lapack_complex_double* bb, lapack_int ldbb, lapack_complex_double* q, lapack_int ldq,
double vl, double vu, lapack_int il, lapack_int iu, double abstol, lapack_int* m,
double* w, lapack_complex_double* z, lapack_int ldz, lapack_int* ifail );

Include Files
• mkl.h

Description

The routine computes selected eigenvalues, and optionally, the eigenvectors of a complex generalized
Hermitian positive-definite banded eigenproblem, of the form A*x = λ*B*x. Here A and B are assumed to be
Hermitian and banded, and B is also positive definite. Eigenvalues and eigenvectors can be selected by
specifying either all eigenvalues, a range of values or a range of indices for the desired eigenvalues.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

1149
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

jobz Must be 'N' or 'V'.

If jobz = 'N', then compute eigenvalues only.

If jobz = 'V', then compute eigenvalues and eigenvectors.

range Must be 'A' or 'V' or 'I'.

If range = 'A', the routine computes all eigenvalues.

If range = 'V', the routine computes eigenvalues w[i] in the half-open

interval:
vl< w[i]≤vu.
If range = 'I', the routine computes eigenvalues with indices il to iu.

uplo Must be 'U' or 'L'.

If uplo = 'U', arrays ab and bb store the upper triangles of A and B;

If uplo = 'L', arrays ab and bb store the lower triangles of A and B.

n The order of the matrices A and B (n≥ 0).

ka The number of super- or sub-diagonals in A

(ka≥ 0).

kb The number of super- or sub-diagonals in B (kb≥ 0).

ab, bb Arrays:
ab(size at least max(1, ldab*n) for column major layout and max(1,
ldab*(ka + 1)) for row major layout) is an array containing either upper or
lower triangular part of the Hermitian matrix A (as specified by uplo) in
band storage format.
bb(size at least max(1, ldbb*n) for column major layout and max(1,
ldbb*(kb + 1)) for row major layout) is an array containing either upper or
lower triangular part of the Hermitian matrix B (as specified by uplo) in
band storage format.

ldab The leading dimension of the array ab; must be at least ka+1 for column
major layout and at least max(1, n) for row major layout.

ldbb The leading dimension of the array bb; must be at least kb+1 for column
major layout and at least max(1, n) for row major layout.

vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
Constraint: vl< vu.

If range = 'A' or 'I', vl and vu are not referenced.

il, iu If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint: 1 ≤il≤iu≤n, if n > 0; il=1 and iu=0

if n = 0.

If range = 'A' or 'V', il and iu are not referenced.

1150
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
abstol The absolute error tolerance for the eigenvalues. See Application Notes for
more information.

ldz The leading dimension of the output array z; ldz≥ 1. If jobz = 'V', ldz≥
max(1, n) for column major layout and at least max(1, m) for row major
layout.

ldq The leading dimension of the output array q; ldq≥ 1. If jobz = 'V', ldq≥
max(1, n).

Output Parameters

ab On exit, the contents of ab are overwritten.

bb On exit, contains the factor S from the split Cholesky factorization B =

SH*S, as returned by pbstf/pbstf.

m The total number of eigenvalues found,

0 ≤m≤n. If range = 'A', m = n, and if range = 'I',
m = iu-il+1.

w Array w, size at least max(1, n).

If info = 0, contains the eigenvalues in ascending order.

z, q Arrays:
z(size max(1, ldz*m) for column major layout and max(1, ldz*n) for row
major layout).
If jobz = 'V', then if info = 0, z contains the matrix Z of eigenvectors,
with the i-th column of z holding the eigenvector associated with w[i -
1]. The eigenvectors are normalized so that ZH*B*Z = I.
If jobz = 'N', then z is not referenced.

q (size max(1, ldq*n)).

If jobz = 'V', then q contains the n-by-n matrix used in the reduction of
Ax = λBx to standard form, that is, Cx = λx and consequently C to
tridiagonal form.
If jobz = 'N', then q is not referenced.

ifail Array, size at least max(1, n).

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

1151
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If info > 0, and

If this routine returns with info > 0, indicating that some eigenvectors did not converge, try setting abstol
to 2*?lamch('S').

Generalized Nonsymmetric Eigenvalue Problems: LAPACK Driver Routines

This topic describes LAPACK driver routines used for solving generalized nonsymmetric eigenproblems. See
also computational routines that can be called to solve these problems. Table "Driver Routines for Solving
Generalized Nonsymmetric Eigenproblems" lists all such driver routines.
Driver Routines for Solving Generalized Nonsymmetric Eigenproblems
Routine Name Operation performed

gges Computes the generalized eigenvalues, Schur form, and the left and/or right Schur
vectors for a pair of nonsymmetric matrices.

ggesx Computes the generalized eigenvalues, Schur form, and, optionally, the left and/or
right matrices of Schur vectors.

gges3 Computes generalized Schur factorization for a pair of matrices.

ggev Computes the generalized eigenvalues, and the left and/or right generalized
eigenvectors for a pair of nonsymmetric matrices.

ggevx Computes the generalized eigenvalues, and, optionally, the left and/or right
generalized eigenvectors.

ggev3 Computes generalized Schur factorization for a pair of matrices.

?gges
Computes the generalized eigenvalues, Schur form,
and the left and/or right Schur vectors for a pair of
nonsymmetric matrices.

Syntax
lapack_int LAPACKE_sgges( int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_S_SELECT3 select, lapack_int n, float* a, lapack_int lda, float* b, lapack_int
ldb, lapack_int* sdim, float* alphar, float* alphai, float* beta, float* vsl, lapack_int
ldvsl, float* vsr, lapack_int ldvsr );
lapack_int LAPACKE_dgges( int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_D_SELECT3 select, lapack_int n, double* a, lapack_int lda, double* b, lapack_int
ldb, lapack_int* sdim, double* alphar, double* alphai, double* beta, double* vsl,
lapack_int ldvsl, double* vsr, lapack_int ldvsr );

1152
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_cgges( int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_C_SELECT2 select, lapack_int n, lapack_complex_float* a, lapack_int lda,
lapack_complex_float* b, lapack_int ldb, lapack_int* sdim, lapack_complex_float* alpha,
lapack_complex_float* beta, lapack_complex_float* vsl, lapack_int ldvsl,
lapack_complex_float* vsr, lapack_int ldvsr );
lapack_int LAPACKE_zgges( int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_Z_SELECT2 select, lapack_int n, lapack_complex_double* a, lapack_int lda,
lapack_complex_double* b, lapack_int ldb, lapack_int* sdim, lapack_complex_double*
alpha, lapack_complex_double* beta, lapack_complex_double* vsl, lapack_int ldvsl,
lapack_complex_double* vsr, lapack_int ldvsr );

Include Files
• mkl.h

Description

The ?gges routine computes the generalized eigenvalues, the generalized real/complex Schur form (S,T),
optionally, the left and/or right matrices of Schur vectors (vsl and vsr) for a pair of n-by-n real/complex
nonsymmetric matrices (A,B). This gives the generalized Schur factorization
(A,B) = ( vsl*S *vsrH, vsl*T*vsrH )
Optionally, it also orders the eigenvalues so that a selected cluster of eigenvalues appears in the leading
diagonal blocks of the upper quasi-triangular matrix S and the upper triangular matrix T. The leading
columns of vsl and vsr then form an orthonormal/unitary basis for the corresponding left and right
eigenspaces (deflating subspaces).
If only the generalized eigenvalues are needed, use the driver ggev instead, which is faster.
A generalized eigenvalue for a pair of matrices (A,B) is a scalar w or a ratio alpha / beta = w, such that A -
w*B is singular. It is usually represented as the pair (alpha, beta), as there is a reasonable interpretation
for beta=0 or for both being zero. A pair of matrices (S,T) is in the generalized real Schur form if T is upper
triangular with non-negative diagonal and S is block upper triangular with 1-by-1 and 2-by-2 blocks. 1-by-1
blocks correspond to real generalized eigenvalues, while 2-by-2 blocks of S are "standardized" by making the
corresponding elements of T have the form:

1153
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

and the pair of corresponding 2-by-2 blocks in S and T will have a complex conjugate pair of generalized
eigenvalues. A pair of matrices (S,T) is in generalized complex Schur form if S and T are upper triangular
and, in addition, the diagonal of T are non-negative real numbers.
The ?gges routine replaces the deprecated ?gegs routine.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobvsl Must be 'N' or 'V'.

If jobvsl = 'N', then the left Schur vectors are not computed.

If jobvsl = 'V', then the left Schur vectors are computed.

jobvsr Must be 'N' or 'V'.

If jobvsr = 'N', then the right Schur vectors are not computed.

If jobvsr = 'V', then the right Schur vectors are computed.

sort Must be 'N' or 'S'. Specifies whether or not to order the eigenvalues on
the diagonal of the generalized Schur form.

1154
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If sort = 'N', then eigenvalues are not ordered.

If sort = 'S', eigenvalues are ordered (see select).

select The select parameter is a pointer to a function returning a value of

lapack_logical type. For different flavors the function has different
arguments:
LAPACKE_sgges: lapack_logical (*LAPACK_S_SELECT3) ( const
float*, const float*, const float* );
LAPACKE_dgges: lapack_logical (*LAPACK_D_SELECT3) ( const
double*, const double*, const double* );
LAPACKE_cgges: lapack_logical (*LAPACK_C_SELECT2) ( const
lapack_complex_float*, const lapack_complex_float* );
LAPACKE_zgges: lapack_logical (*LAPACK_Z_SELECT2) ( const
lapack_complex_double*, const lapack_complex_double* );
If sort = 'S', select is used to select eigenvalues to sort to the top left
of the Schur form.
If sort = 'N', select is not referenced.

For real flavors:

An eigenvalue (alphar[j] + alphai[j])/beta[j] is selected if select(alphar[j],
alphai[j], beta[j]) is true; that is, if either one of a complex conjugate pair
of eigenvalues is selected, then both complex eigenvalues are selected.
Note that in the ill-conditioned case, a selected complex eigenvalue may no
longer satisfy select(alphar[j], alphai[j], beta[j]) = 1 after
ordering. In this case info is set to n+2 .
For complex flavors:
An eigenvalue alpha[j] / beta[j] is selected if select(alpha[j], beta[j])
is true.
Note that a selected complex eigenvalue may no longer satisfy
select(alpha[j], beta[j]) = 1 after ordering, since ordering may
change the value of complex eigenvalues (especially if the eigenvalue is ill-
conditioned); in this case info is set to n+2 (see info below).

n The order of the matrices A, B, vsl, and vsr (n≥ 0).

a, b Arrays:
a (size at least max(1, lda*n)) is an array containing the n-by-n matrix A
(first of the pair of matrices).
b (size at least max(1, ldb*n)) is an array containing the n-by-n matrix B
(second of the pair of matrices).

lda The leading dimension of the array a. Must be at least max(1, n).

ldb The leading dimension of the array b. Must be at least max(1, n).

ldvsl, ldvsr The leading dimensions of the output matrices vsl and vsr, respectively.
Constraints:
ldvsl≥ 1. If jobvsl = 'V', ldvsl≥ max(1, n).

1155
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ldvsr≥ 1. If jobvsr = 'V', ldvsr≥ max(1, n).

Output Parameters

a On exit, this array has been overwritten by its generalized Schur form S.

b On exit, this array has been overwritten by its generalized Schur form T.

sdim If sort = 'N', sdim= 0.

If sort = 'S', sdim is equal to the number of eigenvalues (after sorting)

for which select is true.

Note that for real flavors complex conjugate pairs for which select is true
for either eigenvalue count as 2.

alphar, alphai Arrays, size at least max(1, n) each. Contain values that form generalized
eigenvalues in real flavors.
See beta.

alpha Array, size at least max(1, n). Contain values that form generalized
eigenvalues in complex flavors. See beta.

beta Array, size at least max(1, n).

For real flavors:
On exit, (alphar[j] + alphai[j]*i)/beta[j], j=0,..., n - 1, will be the
generalized eigenvalues.
alphar[j] + alphai[j]*i and beta[j], j=0,..., n - 1 are the diagonals of the
complex Schur form (S,T) that would result if the 2-by-2 diagonal blocks of
the real generalized Schur form of (A,B) were further reduced to triangular
form using complex unitary transformations. If alphai[j] is zero, then the j-
th eigenvalue is real; if positive, then the j-th and (j+1)-st eigenvalues are
a complex conjugate pair, with alphai[j+1] negative.
For complex flavors:
On exit, alpha[j]/beta[j], j=0,..., n - 1, will be the generalized eigenvalues.
alpha[j] and beta[j], j=0,..., n - 1 are the diagonals of the complex Schur
form (S,T) output by cgges/zgges. The beta[j] will be non-negative real.

If jobvsl = 'N', vsl is not referenced.

vsr (size at least max(1, ldvsr*n)).

If jobvsr = 'V', this array will contain the right Schur vectors.

If jobvsr = 'N', vsr is not referenced.

Return Values
This function returns a value info.

1156
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, and

i≤n:
the QZ iteration failed. (A, B) is not in Schur form, but alphar[j], alphai[j] (for real flavors), or alpha[j] (for
complex flavors), and beta[j], j = info,..., n - 1 should be correct.

i > n: errors that usually indicate LAPACK problems:

i = n+1: other than QZ iteration failed in hgeqz;
i = n+2: after reordering, roundoff changed values of some complex eigenvalues so that leading
eigenvalues in the generalized Schur form no longer satisfy select = 1. This could also be caused due to
scaling;
i = n+3: reordering failed in tgsen.

Application Notes
The quotients alphar[j]/beta[j] and alphai[j]/beta[j] may easily over- or underflow, and beta[j] may even be
zero. Thus, you should avoid simply computing the ratio. However, alphar and alphai will be always less than
and usually comparable with norm(A) in magnitude, and beta always less than and usually comparable with
norm(B).

?ggesx
Computes the generalized eigenvalues, Schur form,
and, optionally, the left and/or right matrices of Schur
vectors.

Syntax
lapack_int LAPACKE_sggesx( int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_S_SELECT3 select, char sense, lapack_int n, float* a, lapack_int lda, float* b,
lapack_int ldb, lapack_int* sdim, float* alphar, float* alphai, float* beta, float* vsl,
lapack_int ldvsl, float* vsr, lapack_int ldvsr, float* rconde, float* rcondv );
lapack_int LAPACKE_dggesx( int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_D_SELECT3 select, char sense, lapack_int n, double* a, lapack_int lda, double* b,
lapack_int ldb, lapack_int* sdim, double* alphar, double* alphai, double* beta, double*
vsl, lapack_int ldvsl, double* vsr, lapack_int ldvsr, double* rconde, double* rcondv );
lapack_int LAPACKE_cggesx( int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_C_SELECT2 select, char sense, lapack_int n, lapack_complex_float* a, lapack_int
lda, lapack_complex_float* b, lapack_int ldb, lapack_int* sdim, lapack_complex_float*
alpha, lapack_complex_float* beta, lapack_complex_float* vsl, lapack_int ldvsl,
lapack_complex_float* vsr, lapack_int ldvsr, float* rconde, float* rcondv );
lapack_int LAPACKE_zggesx( int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_Z_SELECT2 select, char sense, lapack_int n, lapack_complex_double* a, lapack_int
lda, lapack_complex_double* b, lapack_int ldb, lapack_int* sdim, lapack_complex_double*
alpha, lapack_complex_double* beta, lapack_complex_double* vsl, lapack_int ldvsl,
lapack_complex_double* vsr, lapack_int ldvsr, double* rconde, double* rcondv );

Include Files
• mkl.h

1157
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Description

The routine computes for a pair of n-by-n real/complex nonsymmetric matrices (A,B), the generalized
eigenvalues, the generalized real/complex Schur form (S,T), optionally, the left and/or right matrices of
Schur vectors (vsl and vsr). This gives the generalized Schur factorization
(A,B) = ( vsl*S *vsrH, vsl*T*vsrH )
Optionally, it also orders the eigenvalues so that a selected cluster of eigenvalues appears in the leading
diagonal blocks of the upper quasi-triangular matrix S and the upper triangular matrix T; computes a
reciprocal condition number for the average of the selected eigenvalues (rconde); and computes a reciprocal
condition number for the right and left deflating subspaces corresponding to the selected eigenvalues
(rcondv). The leading columns of vsl and vsr then form an orthonormal/unitary basis for the corresponding
left and right eigenspaces (deflating subspaces).
A generalized eigenvalue for a pair of matrices (A,B) is a scalar w or a ratio alpha / beta = w, such that A
- w*B is singular. It is usually represented as the pair (alpha, beta), as there is a reasonable interpretation
for beta=0 or for both being zero. A pair of matrices (S,T) is in generalized real Schur form if T is upper
triangular with non-negative diagonal and S is block upper triangular with 1-by-1 and 2-by-2 blocks. 1-by-1
blocks correspond to real generalized eigenvalues, while 2-by-2 blocks of S will be "standardized" by making
the corresponding elements of T have the form:

1158
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
and the pair of corresponding 2-by-2 blocks in S and T will have a complex conjugate pair of generalized
eigenvalues. A pair of matrices (S,T) is in generalized complex Schur form if S and T are upper triangular
and, in addition, the diagonal of T are non-negative real numbers.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobvsl Must be 'N' or 'V'.

If jobvsl = 'N', then the left Schur vectors are not computed.

If jobvsl = 'V', then the left Schur vectors are computed.

jobvsr Must be 'N' or 'V'.

If jobvsr = 'N', then the right Schur vectors are not computed.

If jobvsr = 'V', then the right Schur vectors are computed.

sort Must be 'N' or 'S'. Specifies whether or not to order the eigenvalues on
the diagonal of the generalized Schur form.
If sort = 'N', then eigenvalues are not ordered.

If sort = 'S', eigenvalues are ordered (see select).

select The select parameter is a pointer to a function returning a value of

lapack_logical type. For different flavors the function has different
arguments:
LAPACKE_sggesx: lapack_logical (*LAPACK_S_SELECT3) ( const
float*, const float*, const float* );
LAPACKE_dggesx: lapack_logical (*LAPACK_D_SELECT3) ( const
double*, const double*, const double* );
LAPACKE_cggesx: lapack_logical (*LAPACK_C_SELECT2) ( const
lapack_complex_float*, const lapack_complex_float* );
LAPACKE_zggesx: lapack_logical (*LAPACK_Z_SELECT2) ( const
lapack_complex_double*, const lapack_complex_double* );
If sort = 'S', select is used to select eigenvalues to sort to the top left
of the Schur form.
If sort = 'N', select is not referenced.

For real flavors:

1159
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Note that a selected complex eigenvalue may no longer satisfy

select(alpha[j], beta[j]) = 1 after ordering, since ordering may
change the value of complex eigenvalues (especially if the eigenvalue is ill-
conditioned); in this case info is set to n+2 (see info below).

sense Must be 'N', 'E', 'V', or 'B'. Determines which reciprocal condition
number are computed.
If sense = 'N', none are computed;

If sense = 'E', computed for average of selected eigenvalues only;

If sense = 'V', computed for selected deflating subspaces only;

If sense = 'B', computed for both.

If sense is 'E', 'V', or 'B', then sort must equal 'S'.

n The order of the matrices A, B, vsl, and vsr (n≥ 0).

lda The leading dimension of the array a.

Must be at least max(1, n).

ldb The leading dimension of the array b.

Must be at least max(1, n).

ldvsl, ldvsr The leading dimensions of the output matrices vsl and vsr, respectively.
Constraints:
ldvsl≥ 1. If jobvsl = 'V', ldvsl≥ max(1, n).
ldvsr≥ 1. If jobvsr = 'V', ldvsr≥ max(1, n).

Output Parameters

a On exit, this array has been overwritten by its generalized Schur form S.

b On exit, this array has been overwritten by its generalized Schur form T.

sdim If sort = 'N', sdim= 0.

If sort = 'S', sdim is equal to the number of eigenvalues (after sorting)

for which select is true.

Note that for real flavors complex conjugate pairs for which select is true
for either eigenvalue count as 2.

alphar, alphai Arrays, size at least max(1, n) each. Contain values that form generalized
eigenvalues in real flavors.
See beta.

1160
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
alpha Array, size at least max(1, n). Contain values that form generalized
eigenvalues in complex flavors. See beta.

beta Array, size at least max(1, n).

For real flavors:
On exit, (alphar[j] + alphai[j]*i)/beta[j], j=0,..., n - 1 will be the
generalized eigenvalues.
alphar[j] + alphai[j]*i and beta[j], j=0,..., n - 1 are the diagonals of the
complex Schur form (S,T) that would result if the 2-by-2 diagonal blocks of
the real generalized Schur form of (A,B) were further reduced to triangular
form using complex unitary transformations. If alphai[j] is zero, then the j-
th eigenvalue is real; if positive, then the j-th and (j+1)-st eigenvalues are
a complex conjugate pair, with alphai[j+1] negative.
For complex flavors:
On exit, alpha[j]/beta[j], j=0,..., n - 1 will be the generalized eigenvalues.
alpha[j] and beta[j], j=0,..., n - 1 are the diagonals of the complex Schur
form (S,T) output by cggesx/zggesx. The beta[j] will be non-negative real.

If jobvsl = 'N', vsl is not referenced.

vsr (size at least max(1, ldvsr*n)).

If jobvsr = 'V', this array will contain the right Schur vectors.

If jobvsr = 'N', vsr is not referenced.

rconde, rcondv Arrays, size 2 each

If sense = 'E' or 'B', rconde(1) and rconde(2) contain the reciprocal
condition numbers for the average of the selected eigenvalues.
Not referenced if sense = 'N' or 'V'.

If sense = 'V' or 'B', rcondv[0] and rcondv[1] contain the reciprocal

condition numbers for the selected deflating subspaces.
Not referenced if sense = 'N' or 'E'.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, and

i≤n:
the QZ iteration failed. (A, B) is not in Schur form, but alphar[j], alphai[j] (for real flavors), or alpha[j] (for
complex flavors), and beta[j], j = info,..., n - 1 should be correct.

i > n: errors that usually indicate LAPACK problems:

1161
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

i = n+1: other than QZ iteration failed in hgeqz;

i = n+2: after reordering, roundoff changed values of some complex eigenvalues so that leading
eigenvalues in the generalized Schur form no longer satisfy select = 1. This could also be caused due to
scaling;
i = n+3: reordering failed in tgsen.

?gges3
Computes generalized Schur factorization for a pair of
matrices.

Syntax
lapack_int LAPACKE_sgges3 (int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_S_SELECT3 selctg, lapack_int n, float * a, lapack_int lda, float * b, lapack_int
ldb, lapack_int * sdim, float * alphar, float * alphai, float * beta, float * vsl,
lapack_int ldvsl, float * vsr, lapack_int ldvsr);
lapack_int LAPACKE_dgges3 (int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_D_SELECT3 selctg, lapack_int n, double * a, lapack_int lda, double * b,
lapack_int ldb, lapack_int * sdim, double * alphar, double * alphai, double * beta,
double * vsl, lapack_int ldvsl, double * vsr, lapack_int ldvsr);
lapack_int LAPACKE_cgges3 (int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_C_SELECT2 selctg, lapack_int n, lapack_complex_float * a, lapack_int lda,
lapack_complex_float * b, lapack_int ldb, lapack_int * sdim, lapack_complex_float *
alpha, lapack_complex_float * beta, lapack_complex_float * vsl, lapack_int ldvsl,
lapack_complex_float * vsr, lapack_int ldvsr);
lapack_int LAPACKE_zgges3 (int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_Z_SELECT2 selctg, lapack_int n, lapack_complex_double * a, lapack_int lda,
lapack_complex_double * b, lapack_int ldb, lapack_int * sdim, lapack_complex_double *
alpha, lapack_complex_double * beta, lapack_complex_double * vsl, lapack_int ldvsl,
lapack_complex_double * vsr, lapack_int ldvsr);

Include Files
• mkl.h

Description
For a pair of n-by-n real or complex nonsymmetric matrices (A,B), ?gges3 computes the generalized
eigenvalues, the generalized real or complex Schur form (S,T), and optionally the left or right matrices of
Schur vectors (VSL and VSR). This gives the generalized Schur factorization
(A,B) = ( (VSL)*S*(VSR)T, (VSL)*T*(VSR)T ) for real (A,B)
or
(A,B) = ( (VSL)*S*(VSR)H, (VSL)*T*(VSR)H ) for complex (A,B)
where (VSR)H is the conjugate-transpose of VSR.

1162
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Optionally, it also orders the eigenvalues so that a selected cluster of eigenvalues appears in the leading
diagonal blocks of the upper quasi-triangular matrix S and the upper triangular matrix T. The leading
columns of VSL and VSR then form an orthonormal basis for the corresponding left and right eigenspaces
(deflating subspaces).

NOTE
If only the generalized eigenvalues are needed, use the driver ?ggev instead, which is faster.

A generalized eigenvalue for a pair of matrices (A,B) is a scalar w or a ratio alpha/beta = w, such that A -
w*B is singular. It is usually represented as the pair (alpha,beta), as there is a reasonable interpretation for
beta=0 or both being zero.
For real flavors:
A pair of matrices (S,T) is in generalized real Schur form if T is upper triangular with non-negative diagonal
and S is block upper triangular with 1-by-1 and 2-by-2 blocks. 1-by-1 blocks correspond to real generalized
eigenvalues, while 2-by-2 blocks of S will be "standardized" by making the corresponding elements of T have
the form:
a 0
0 b
and the pair of corresponding 2-by-2 blocks in S and T have a complex conjugate pair of generalized
eigenvalues.
For complex flavors:
A pair of matrices (S,T) is in generalized complex Schur form if S and T are upper triangular and, in addition,
the diagonal elements of T are non-negative real numbers.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobvsl = 'N': do not compute the left Schur vectors;

jobvsr = 'N': do not compute the right Schur vectors;

= 'V': compute the right Schur vectors.

sort Specifies whether or not to order the eigenvalues on the diagonal of the
generalized Schur form.
= 'N': Eigenvalues are not ordered;
= 'S': Eigenvalues are ordered (see selctg).

selctg selctg is a function of three arguments for real flavors or two arguments
for complex flavors. selctg must be declared EXTERNAL in the calling
subroutine. If sort = 'N', selctg is not referenced. If sort = 'S', selctg
is used to select eigenvalues to sort to the top left of the Schur form.
For real flavors:
An eigenvalue (alphar[j - 1] + alphai[j - 1])/beta[j - 1] is
selected if selctg(alphar[j - 1],alphai[j - 1],beta[j - 1]) is true.
In other words, if either one of a complex conjugate pair of eigenvalues is
selected, then both complex eigenvalues are selected.

1163
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Note that in the ill-conditioned case, a selected complex eigenvalue may no

longer satisfy selctg(alphar[j - 1],alphai[j - 1], beta[j - 1]) ≠ 0
after ordering. info is to be set to n+2 in this case.

For complex flavors:

An eigenvalue alpha[j - 1]/beta[j - 1] is selected if selctg(alpha[j
- 1],beta[j - 1]) is true.
Note that a selected complex eigenvalue may no longer satisfy
selctg(alpha[j - 1],beta[j - 1])≠ 0 after ordering, since ordering
may change the value of complex eigenvalues (especially if the eigenvalue
is ill-conditioned), in this case ?gges3 returns n + 2.

n The order of the matrices A, B, VSL, and VSR. n≥ 0.

a Array, size (lda*n). On entry, the first of the pair of matrices.

lda The leading dimension of a. lda≥ max(1,n).

b Array, size (ldb*n). On entry, the second of the pair of matrices.

ldb The leading dimension of b. ldb≥ max(1,n).

ldvsl The leading dimension of the matrix VSL. ldvsl≥ 1, and if jobvsl = 'V',
ldvsl≥ n.

ldvsr The leading dimension of the matrix VSR. ldvsr≥ 1, and if jobvsr = 'V',
ldvsr≥ n.

Output Parameters

a On exit, a is overwritten by its generalized Schur form S.

b On exit, b is overwritten by its generalized Schur form T.

sdim If sort = 'N', sdim = 0. If sort = 'S', sdim = number of eigenvalues

(after sorting) for which selctg is true.

alpha Array, size (n).

alphar Array, size (n).

alphai Array, size (n).

beta Array, size (n).

For real flavors:

On exit, (alphar[j - 1] + alphai[j - 1]*i)/beta[j - 1],
j=1,...,n, are the generalized eigenvalues. alphar[j - 1] +
alphai[j - 1]*i, and beta[j - 1],j=1,...,n are the diagonals of
the complex Schur form (S,T) that would result if the 2-by-2 diagonal
blocks of the real Schur form of (a,b) were further reduced to
triangular form using 2-by-2 complex unitary transformations. If
alphai[j - 1] is zero, then the j-th eigenvalue is real; if positive,
then the j-th and (j+1)-st eigenvalues are a complex conjugate pair,
with alphai[j] negative.

1164
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Note: the quotients alphar[j - 1]/beta[j - 1] and alphai[j -
1]/beta[j - 1] can easily over- or underflow, and beta[j - 1]
might even be zero. Thus, you should avoid computing the ratio
alpha/beta by simply dividing alpha by beta. However, alphar and
alphai is always less than and usually comparable with norm(a) in
magnitude, and beta is always less than and usually comparable with
norm(b).

For complex flavors:

On exit, alpha[j - 1][j - 1]/beta[j - 1], j=1,...,n, are the
generalized eigenvalues. alpha[j - 1], j=1,...,n and beta[j - 1],
j=1,...,n are the diagonals of the complex Schur form (a,b) output
by ?gges3. The beta[j - 1] is non-negative real.

Note: the quotient alpha[j - 1]/beta[j - 1] can easily over- or

underflow, and beta[j - 1] might even be zero. Thus, you should
avoid computing the ratio alpha/beta by simply dividing alpha by
beta. However, alpha is always less than and usually comparable
with norm(a) in magnitude, and beta is always less than and usually
comparable with norm(b).

vsl Array, size (ldvsl*n).

If jobvsl = 'V', vsl contains the left Schur vectors. Not referenced if
jobvsl = 'N'.

vsr Array, size (ldvsr*n).

If jobvsr = 'V', vsr contains the right Schur vectors. Not referenced
if jobvsr = 'N'.

Return Values
This function returns a value info.

= 0: successful exit < 0: if info = -i, the i-th argument had an illegal value.

=1,...,n:

for real flavors:

The QZ iteration failed. (a,b) are not in Schur form, but alphar[j], alphai[j] and beta[j] should be
correct for j=info,...,n - 1.

The QZ iteration failed. (a,b) are not in Schur form, but alpha[j] and beta[j] should be correct for
j=info,...,n - 1.

for complex flavors:

> n:

=n+1: other than QZ iteration failed in ?hgeqz.

=n+2: after reordering, roundoff changed values of some complex eigenvalues so that leading eigenvalues in
the Generalized Schur form no longer satisfy selctg≠ 0 This could also be caused due to scaling.

=n+3: reordering failed in ?tgsen.

1165
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

?ggev
Computes the generalized eigenvalues, and the left
and/or right generalized eigenvectors for a pair of
nonsymmetric matrices.

Syntax
lapack_int LAPACKE_sggev( int matrix_layout, char jobvl, char jobvr, lapack_int n,
float* a, lapack_int lda, float* b, lapack_int ldb, float* alphar, float* alphai, float*
beta, float* vl, lapack_int ldvl, float* vr, lapack_int ldvr );
lapack_int LAPACKE_dggev( int matrix_layout, char jobvl, char jobvr, lapack_int n,
double* a, lapack_int lda, double* b, lapack_int ldb, double* alphar, double* alphai,
double* beta, double* vl, lapack_int ldvl, double* vr, lapack_int ldvr );
lapack_int LAPACKE_cggev( int matrix_layout, char jobvl, char jobvr, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* b, lapack_int ldb,
lapack_complex_float* alpha, lapack_complex_float* beta, lapack_complex_float* vl,
lapack_int ldvl, lapack_complex_float* vr, lapack_int ldvr );
lapack_int LAPACKE_zggev( int matrix_layout, char jobvl, char jobvr, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* b, lapack_int ldb,
lapack_complex_double* alpha, lapack_complex_double* beta, lapack_complex_double* vl,
lapack_int ldvl, lapack_complex_double* vr, lapack_int ldvr );

Include Files
• mkl.h

Description

The ?ggev routine computes the generalized eigenvalues, and optionally, the left and/or right generalized
eigenvectors for a pair of n-by-n real/complex nonsymmetric matrices (A,B).
A generalized eigenvalue for a pair of matrices (A,B) is a scalar λ or a ratio alpha / beta = λ, such that A -
λ*B is singular. It is usually represented as the pair (alpha, beta), as there is a reasonable interpretation for
beta =0 and even for both being zero.
The right generalized eigenvector v(j) corresponding to the generalized eigenvalue λ(j) of (A,B) satisfies

A*v(j) = λ(j)*B*v(j).
The left generalized eigenvector u(j) corresponding to the generalized eigenvalue λ(j) of (A,B) satisfies

u(j)H*A = λ(j)*u(j)H*B
where u(j)H denotes the conjugate transpose of u(j).

The ?ggev routine replaces the deprecated ?gegv routine.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobvl Must be 'N' or 'V'.

If jobvl = 'N', the left generalized eigenvectors are not computed;

If jobvl = 'V', the left generalized eigenvectors are computed.

1166
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
jobvr Must be 'N' or 'V'.

If jobvr = 'N', the right generalized eigenvectors are not computed;

If jobvr = 'V', the right generalized eigenvectors are computed.

n The order of the matrices A, B, vl, and vr (n≥ 0).

lda The leading dimension of the array a. Must be at least max(1, n).

ldb The leading dimension of the array b. Must be at least max(1, n).

ldvl, ldvr The leading dimensions of the output matrices vl and vr, respectively.
Constraints:
ldvl≥ 1. If jobvl = 'V', ldvl≥ max(1, n).
ldvr≥ 1. If jobvr = 'V', ldvr≥ max(1, n).

Output Parameters

a, b On exit, these arrays have been overwritten.

alphar, alphai Arrays, size at least max(1, n) each. Contain values that form generalized
eigenvalues in real flavors.
See beta.

alpha Array, size at least max(1, n). Contain values that form generalized
eigenvalues in complex flavors. See beta.

beta Array, size at least max(1, n).

For real flavors:
On exit, (alphar[j] + alphai[j]*i)/beta[j], j=0,..., n - 1, are the generalized
eigenvalues.
If alphai[j] is zero, then the j-th eigenvalue is real; if positive, then the j-th
and (j+1)-st eigenvalues are a complex conjugate pair, with alphai[j+1]
negative.
For complex flavors:
On exit, alpha[j]/beta[j], j=0,..., n - 1, are the generalized eigenvalues.
See also Application Notes below.

vl, vr Arrays:
vl (size at least max(1, ldvl*n)). Contains the matrix of left generalized
eigenvectors VL.

1167
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If jobvl = 'V', the left generalized eigenvectors uj are stored one after
another in the columns of VL, in the same order as their eigenvalues. Each
eigenvector is scaled so the largest component has abs(Re) + abs(Im) =
1.
If jobvl = 'N', vl is not referenced.

For real flavors:

If the j-th eigenvalue is real,then the k-th component of the j-th left
eigenvector uj is stored in vl[(k - 1) + (j - 1)*ldvl] for column
major layout and in vl[(k - 1)*ldvl + (j - 1)] for row major layout..

If the j-th and (j+1)-st eigenvalues form a complex conjugate pair, then for
i = sqrt(-1), the k-th components of the j-th left eigenvector ujare
vl[(k - 1) + (j - 1)*ldvl] + i*vl[(k - 1) + j*ldvl] for column
major layout and vl[(k - 1)*ldvl + (j - 1)] + i*vl[(k - 1)*ldvl
+ j] for row major layout. Similarly, the k-th components of left
eigenvector j+1 uj+1 are vl[(k - 1) + (j - 1)*ldvl] - i*vl[(k - 1)
+ j*ldvl] for column major layout and vl[(k - 1)*ldvl + (j - 1)] -
i*vl[(k - 1)*ldvl + j] for row major layout..

For complex flavors:

The k-th component of the j-th left eigenvector uj is stored in vl[(k - 1)
+ (j - 1)*ldvl] for column major layout and in vl[(k - 1)*ldvl + (j
- 1)] for row major layout.
vr (size at least max(1, ldvr*n)). Contains the matrix of right generalized
eigenvectors VR.
If jobvr = 'V', the right generalized eigenvectors vj are stored one after
another in the columns of VR, in the same order as their eigenvalues. Each
eigenvector is scaled so the largest component has abs(Re) + abs(Im) = 1.
If jobvr = 'N', vr is not referenced.

For real flavors:

If the j-th eigenvalue is real, then The k-th component of the j-th right
eigenvector vj is stored in vr[(k - 1) + (j - 1)*ldvr] for column
major layout and in vr[(k - 1)*ldvr + (j - 1)] for row major layout..

If the j-th and (j+1)-st eigenvalues form a complex conjugate pair, then the
k-th components of thej-th right eigenvector vj can be computed as vr[(k
- 1) + (j - 1)*ldvr] + i*vr[(k - 1) + j*ldvr] for column major
layout and vr[(k - 1)*ldvr + (j - 1)] + i*vr[(k - 1)*ldvr + j]
for row major layout. Similarly, the k-th components of the right
eigenvector j+1 v{j+1} can be computed as vr[(k - 1) + (j - 1)*ldvr]
- i*vr[(k - 1) + j*ldvr] for column major layout and vr[(k -
1)*ldvr + (j - 1)] - i*vr[(k - 1)*ldvr + j] for row major layout..
For complex flavors:
The k-th component of the j-th right eigenvector vj is stored in vr[(k - 1)
+ (j - 1)*ldvr] for column major layout and in vr[(k - 1)*ldvr + (j
- 1)] for row major layout.

1168
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = i, and

i≤n: the QZ iteration failed. No eigenvectors have been calculated, but alphar[j], alphai[j] (for real flavors),
or alpha[j] (for complex flavors), and beta[j], j=info,..., n - 1 should be correct.

i > n: errors that usually indicate LAPACK problems:

i = n+1: other than QZ iteration failed in hgeqz;
i = n+2: error return from tgevc.

Application Notes
The quotients alphar[j]/beta[j] and alphai[j]/beta[j] may easily over- or underflow, and beta[j] may even be
zero. Thus, you should avoid simply computing the ratio. However, alphar and alphai (for real flavors) or
alpha (for complex flavors) will be always less than and usually comparable with norm(A) in magnitude, and
beta always less than and usually comparable with norm(B).

?ggevx
Computes the generalized eigenvalues, and,
optionally, the left and/or right generalized
eigenvectors.

Syntax
lapack_int LAPACKE_sggevx( int matrix_layout, char balanc, char jobvl, char jobvr, char
sense, lapack_int n, float* a, lapack_int lda, float* b, lapack_int ldb, float* alphar,
float* alphai, float* beta, float* vl, lapack_int ldvl, float* vr, lapack_int ldvr,
lapack_int* ilo, lapack_int* ihi, float* lscale, float* rscale, float* abnrm, float*
bbnrm, float* rconde, float* rcondv );
lapack_int LAPACKE_dggevx( int matrix_layout, char balanc, char jobvl, char jobvr, char
sense, lapack_int n, double* a, lapack_int lda, double* b, lapack_int ldb, double*
alphar, double* alphai, double* beta, double* vl, lapack_int ldvl, double* vr,
lapack_int ldvr, lapack_int* ilo, lapack_int* ihi, double* lscale, double* rscale,
double* abnrm, double* bbnrm, double* rconde, double* rcondv );
lapack_int LAPACKE_cggevx( int matrix_layout, char balanc, char jobvl, char jobvr, char
sense, lapack_int n, lapack_complex_float* a, lapack_int lda, lapack_complex_float* b,
lapack_int ldb, lapack_complex_float* alpha, lapack_complex_float* beta,
lapack_complex_float* vl, lapack_int ldvl, lapack_complex_float* vr, lapack_int ldvr,
lapack_int* ilo, lapack_int* ihi, float* lscale, float* rscale, float* abnrm, float*
bbnrm, float* rconde, float* rcondv );
lapack_int LAPACKE_zggevx( int matrix_layout, char balanc, char jobvl, char jobvr, char
sense, lapack_int n, lapack_complex_double* a, lapack_int lda, lapack_complex_double*
b, lapack_int ldb, lapack_complex_double* alpha, lapack_complex_double* beta,
lapack_complex_double* vl, lapack_int ldvl, lapack_complex_double* vr, lapack_int ldvr,
lapack_int* ilo, lapack_int* ihi, double* lscale, double* rscale, double* abnrm,
double* bbnrm, double* rconde, double* rcondv );

1169
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description

The routine computes for a pair of n-by-n real/complex nonsymmetric matrices (A,B), the generalized
eigenvalues, and optionally, the left and/or right generalized eigenvectors.
Optionally also, it computes a balancing transformation to improve the conditioning of the eigenvalues and
eigenvectors (ilo, ihi, lscale, rscale, abnrm, and bbnrm), reciprocal condition numbers for the eigenvalues
(rconde), and reciprocal condition numbers for the right eigenvectors (rcondv).
A generalized eigenvalue for a pair of matrices (A,B) is a scalar λ or a ratio alpha / beta = λ, such that A -
λ*B is singular. It is usually represented as the pair (alpha, beta), as there is a reasonable interpretation for
beta=0 and even for both being zero. The right generalized eigenvector v(j) corresponding to the
generalized eigenvalue λ(j) of (A,B) satisfies

A*v(j) = λ(j)*B*v(j).
The left generalized eigenvector u(j) corresponding to the generalized eigenvalue λ(j) of (A,B) satisfies

u(j)H*A = λ(j)*u(j)H*B
where u(j)H denotes the conjugate transpose of u(j).

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

balanc Must be 'N', 'P', 'S', or 'B'. Specifies the balance option to be
performed.
If balanc = 'N', do not diagonally scale or permute;

If balanc = 'P', permute only;

If balanc = 'S', scale only;

If balanc = 'B', both permute and scale.

Computed reciprocal condition numbers will be for the matrices after

balancing and/or permuting. Permuting does not change condition numbers
(in exact arithmetic), but balancing does.

jobvl Must be 'N' or 'V'.

If jobvl = 'N', the left generalized eigenvectors are not computed;

If jobvl = 'V', the left generalized eigenvectors are computed.

jobvr Must be 'N' or 'V'.

If jobvr = 'N', the right generalized eigenvectors are not computed;

If jobvr = 'V', the right generalized eigenvectors are computed.

sense Must be 'N', 'E', 'V', or 'B'. Determines which reciprocal condition
number are computed.
If sense = 'N', none are computed;

1170
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If sense = 'E', computed for eigenvalues only;

If sense = 'V', computed for eigenvectors only;

If sense = 'B', computed for eigenvalues and eigenvectors.

n The order of the matrices A, B, vl, and vr (n≥ 0).

lda The leading dimension of the array a.

Must be at least max(1, n).

ldb The leading dimension of the array b.

Must be at least max(1, n).

ldvl, ldvr The leading dimensions of the output matrices vl and vr, respectively.
Constraints:
ldvl≥ 1. If jobvl = 'V', ldvl≥ max(1, n).
ldvr≥ 1. If jobvr = 'V', ldvr≥ max(1, n).

Output Parameters

a, b On exit, these arrays have been overwritten.

If jobvl = 'V' or jobvr = 'V' or both, then a contains the first part of
the real Schur form of the "balanced" versions of the input A and B, and b
contains its second part.

alphar, alphai Arrays, size at least max(1, n) each. Contain values that form generalized
eigenvalues in real flavors.
See beta.

alpha Array, size at least max(1, n). Contain values that form generalized
eigenvalues in complex flavors. See beta.

beta Array, size at least max(1, n).

For real flavors:
On exit, (alphar[j] + alphai[j]*i)/beta[j], j=0,..., n - 1, will be the
generalized eigenvalues.
If alphai[j] is zero, then the j-th eigenvalue is real; if positive, then the j-th
and (j+1)-st eigenvalues are a complex conjugate pair, with alphai[j+1]
negative.
For complex flavors:
On exit, alpha[j]/beta[j], j=0,..., n - 1, will be the generalized eigenvalues.
See also Application Notes below.
vl, vr Arrays:

1171
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

vl (size at least max(1, ldvl*n)).

If jobvl = 'V', the left generalized eigenvectors u(j) are stored one after
another in the columns of vl, in the same order as their eigenvalues. Each
eigenvector will be scaled so the largest component have abs(Re) +
abs(Im) = 1.
If jobvl = 'N', vl is not referenced.

For real flavors:

If the j-th eigenvalue is real, then k-th component of j-th left eigenvector uj
is stored in vl[(k - 1) + (j - 1)*ldvl] for column major layout and in
vl[(k - 1)*ldvl + (j - 1)] for row major layout..
If the j-th and (j+1)-st eigenvalues form a complex conjugate pair, then for
i = sqrt(-1), the k-th components of the j-th left eigenvector uj can be
computed as vl[(k - 1) + (j - 1)*ldvl] + i*vl[(k - 1) + j*ldvl]
for column major layout and vl[(k - 1)*ldvl + (j - 1)] + i*vl[(k -
1)*ldvl + j] for row major layout. Similarly, the k-th components of the
left eigenvector j+1 uj+1 can be computed as vl[(k - 1) + (j -
1)*ldvl] - i*vl[(k - 1) + j*ldvl] for column major layout and vl[(k
- 1)*ldvl + (j - 1)] - i*vl[(k - 1)*ldvl + j] for row major
layout..

For complex flavors:

If jobvr = 'V', the right generalized eigenvectors v(j) are stored one after
another in the columns of vr, in the same order as their eigenvalues. Each
eigenvector will be scaled so the largest component have abs(Re) +
abs(Im) = 1.
If jobvr = 'N', vr is not referenced.

For real flavors:

If the j-th eigenvalue is real, then the k-th component of the j-th right
eigenvector vj is stored in vr[(k - 1) + (j - 1)*ldvr] for column
major layout and in vr[(k - 1)*ldvr + (j - 1)] for row major layout..

If the j-th and (j+1)-st eigenvalues form a complex conjugate pair, then
The k-th components of the j-th right eigenvector vj can be computed as
vr[(k - 1) + (j - 1)*ldvr] + i*vr[(k - 1) + j*ldvr] for column
major layout and vr[(k - 1)*ldvr + (j - 1)] + i*vr[(k - 1)*ldvr
+ j] for row major layout. Respectively, the k-th components of right
eigenvector j+1 vj + 1 can be computed as vr[(k - 1) + (j - 1)*ldvr]
- i*vr[(k - 1) + j*ldvr] for column major layout and vr[(k -
1)*ldvr + (j - 1)] - i*vr[(k - 1)*ldvr + j] for row major layout..
For complex flavors:
The k-th component of the j-th right eigenvector vj is stored in vr[(k - 1)
+ (j - 1)*ldvr] for column major layout and in vr[(k - 1)*ldvr + (j
- 1)] for row major layout.

1172
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ilo, ihi ilo and ihi are integer values such that on exit Ai j = 0 and Bi j = 0 if i >
j and j = 1,..., ilo-1 or i = ihi+1,..., n.
If balanc = 'N' or 'S', ilo = 1 and ihi = n.

lscale, rscale Arrays, size at least max(1, n) each.

lscale contains details of the permutations and scaling factors applied to the
left side of A and B.
If PL(j) is the index of the row interchanged with row j, and DL(j) is the
scaling factor applied to row j, then
lscale[j - 1] = PL(j), for j = 1,..., ilo-1
= DL(j), for j = ilo,...,ihi
= PL(j) for j = ihi+1,..., n.
The order in which the interchanges are made is n to ihi+1, then 1 to ilo-1.
rscale contains details of the permutations and scaling factors applied to the
right side of A and B.
If PR(j) is the index of the column interchanged with column j, and DR(j)
is the scaling factor applied to column j, then
rscale[j - 1] = PR(j), for j = 1,..., ilo-1
= DR(j), for j = ilo,...,ihi
= PR(j) for j = ihi+1,..., n.
The order in which the interchanges are made is n to ihi+1, then 1 to ilo-1.

abnrm, bbnrm The one-norms of the balanced matrices A and B, respectively.

rconde, rcondv Arrays, size at least max(1, n) each.

If sense = 'E', or 'B', rconde contains the reciprocal condition numbers
of the eigenvalues, stored in consecutive elements of the array. For a
complex conjugate pair of eigenvalues two consecutive elements of rconde
are set to the same value. Thus rconde[j], rcondv[j], and the j-th columns
of vl and vr all correspond to the same eigenpair (but not in general the j-th
eigenpair, unless all eigenpairs are selected).
If sense = 'N', or 'V', rconde is not referenced.

If sense = 'V', or 'B', rcondv contains the estimated reciprocal condition

numbers of the eigenvectors, stored in consecutive elements of the array.
For a complex eigenvector two consecutive elements of rcondv are set to
the same value.
If the eigenvalues cannot be reordered to compute , rcondv[j] is set to 0;
this can only occur when the true value would be very small anyway.
If sense = 'N', or 'E', rcondv is not referenced.

Return Values
This function returns a value info.

If info=0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

1173
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If info = i, and

i≤n: the QZ iteration failed. No eigenvectors have been calculated, but alphar[j], alphai[j] (for real flavors),
or alpha[j] (for complex flavors), and beta[j], j=info,..., n - 1 should be correct.

i > n: errors that usually indicate LAPACK problems:

i = n+1: other than QZ iteration failed in hgeqz;
i = n+2: error return from tgevc.

Application Notes
The quotients alphar[j]/beta[j] and alphai[j]/beta[j] may easily over- or underflow, and beta[j] may even be
zero. Thus, you should avoid simply computing the ratio. However, alphar and alphai (for real flavors) or
alpha (for complex flavors) will be always less than and usually comparable with norm(A) in magnitude, and
beta always less than and usually comparable with norm(B).

?ggev3
Computes the generalized eigenvalues and the left
and right generalized eigenvectors for a pair of
matrices.

Syntax
lapack_int LAPACKE_sggev3 (int matrix_layout, char jobvl, char jobvr, lapack_int n,
float * a, lapack_int lda, float * b, lapack_int ldb, float * alphar, float * alphai,
float * beta, float * vl, lapack_int ldvl, float * vr, lapack_int ldvr);
lapack_int LAPACKE_dggev3 (int matrix_layout, char jobvl, char jobvr, lapack_int n,
double * a, lapack_int lda, double * b, lapack_int ldb, double * alphar, double *
alphai, double * beta, double * vl, lapack_int ldvl, double * vr, lapack_int ldvr);
lapack_int LAPACKE_cggev3 (int matrix_layout, char jobvl, char jobvr, lapack_int n,
lapack_complex_float * a, lapack_int lda, lapack_complex_float * b, lapack_int ldb,
lapack_complex_float * alpha, lapack_complex_float * beta, lapack_complex_float * vl,
lapack_int ldvl, lapack_complex_float * vr, lapack_int ldvr);
lapack_int LAPACKE_zggev3 (int matrix_layout, char jobvl, char jobvr, lapack_int n,
lapack_complex_double * a, lapack_int lda, lapack_complex_double * b, lapack_int ldb,
lapack_complex_double * alpha, lapack_complex_double * beta, lapack_complex_double *
vl, lapack_int ldvl, lapack_complex_double * vr, lapack_int ldvr);

Include Files
• mkl.h

Description
For a pair of n-by-n real or complex nonsymmetric matrices (A, B), ?ggev3 computes the generalized
eigenvalues, and optionally, the left and right generalized eigenvectors.
A generalized eigenvalue for a pair of matrices (A, B) is a scalar λ or a ratio alpha/beta = λ, such that A -
λ*B is singular. It is usually represented as the pair (alpha,beta), as there is a reasonable interpretation for
beta=0, and even for both being zero.
For real flavors:
The right eigenvector vj corresponding to the eigenvalue λj of (A, B) satisfies
A * vj = λj * B * vj.
The left eigenvector uj corresponding to the eigenvalue λj of (A, B) satisfies

1174
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ujH * A = λj * ujH * B
where ujH is the conjugate-transpose of uj.
For complex flavors:
The right generalized eigenvector vj corresponding to the generalized eigenvalue λj of (A, B) satisfies
A * vj = λj * B * vj.
The left generalized eigenvector uj corresponding to the generalized eigenvalues λj of (A, B) satisfies
ujH * A = λj * ujH * B
where ujH is the conjugate-transpose of uj.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

jobvl = 'N': do not compute the left generalized eigenvectors;

= 'V': compute the left generalized eigenvectors.

jobvr = 'N': do not compute the right generalized eigenvectors;

= 'V': compute the right generalized eigenvectors.

n The order of the matrices A, B, VL, and VR.

n≥ 0.

a Array, size (lda*n).

On entry, the matrix A in the pair (A, B).

lda The leading dimension of a.

lda≥ max(1,n).

b Array, size (ldb*n).

On entry, the matrix B in the pair (A, B).

ldb The leading dimension of b.

ldb≥ max(1,n).

ldvl The leading dimension of the matrix VL.

ldvl≥ 1, and if jobvl = 'V', ldvl≥n.

ldvr The leading dimension of the matrix VR.

ldvr≥ 1, and if jobvr = 'V', ldvr≥n.

Output Parameters

a On exit, a is overwritten.

b On exit, b is overwritten.

alphar Array, size (n).

alphai Array, size (n).

1175
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

alpha Array, size (n).

beta Array, size (n).

For real flavors:

On exit, (alphar[j] + alphai[j]*i)/beta[j], j=0,...,n - 1, are the
generalized eigenvalues. If alphai[j - 1] is zero, then the j-th
eigenvalue is real; if positive, then the j-th and (j+1)-st eigenvalues
are a complex conjugate pair, with alphai[j] negative.

Note: the quotients alphar[j - 1]/beta[j - 1] and alphai[j -

1]/beta[j - 1] can easily over- or underflow, and beta(j) might
even be zero. Thus, you should avoid computing the ratio alpha/beta
by simply dividing alpha by beta. However, alphar and alphai are
always less than and usually comparable with norm(A) in magnitude,
and beta is always less than and usually comparable with norm(B).

For complex flavors:

On exit, alpha[j]/beta[j], j=0,...,n - 1, are the generalized
eigenvalues.
Note: the quotients alpha[j - 1]/beta[j - 1] may easily over- or
underflow, and beta(j) can even be zero. Thus, you should avoid
computing the ratio alpha/beta by simply dividing alpha by beta.
However, alpha is always less than and usually comparable with
norm(A) in magnitude, and betais always less than and usually
comparable with norm(B).

vl Array, size (ldvl*n).

For real flavors:

If jobvl = 'V', the left eigenvectors uj are stored one after another in
the columns of vl, in the same order as their eigenvalues. If the j-th
eigenvalue is real, then uj = the j-th column of vl. If the j-th and (j
+1)-st eigenvalues form a complex conjugate pair, then the real part
of uj = the j-th column of vl and the imaginary part of vj = the (j +
1)-st column of vl.

Each eigenvector is scaled so the largest component has abs(real

part)+abs(imag. part)=1.
Not referenced if jobvl = 'N'.

For complex flavors:

If jobvl = 'V', the left generalized eigenvectors uj are stored one
after another in the columns of vl, in the same order as their
eigenvalues.
Each eigenvector is scaled so the largest component has abs(real
part) + abs(imag. part) = 1.
Not referenced if jobvl = 'N'.

vr Array, size (ldvr*n).

For real flavors:

1176
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If jobvr = 'V', the right eigenvectors vj are stored one after another
in the columns of vr, in the same order as their eigenvalues. If the j-
th eigenvalue is real, then vj = the j-th column of vr. If the j-th and (j
+ 1)-st eigenvalues form a complex conjugate pair, then the real part
of vj = the j-th column of vr and the imaginary part of vj = the (j +
1)-st column of vr.

Each eigenvector is scaled so the largest component has abs(real

part)+abs(imag. part)=1.
Not referenced if jobvr = 'N'.

For complex flavors:

If jobvr = 'V', the right generalized eigenvectors vj are stored one
after another in the columns of vr, in the same order as their
eigenvalues. Each eigenvector is scaled so the largest component has
abs(real part) + abs(imag. part) = 1.
Not referenced if jobvr = 'N'.

Return Values
This function returns a value info.

= 0: successful exit
< 0: if info = -i, the i-th argument had an illegal value.

=1,...,n:

for real flavors:

The QZ iteration failed. No eigenvectors have been calculated, but alphar[j], alphar[j] and beta[j] should
be correct for j=info,...,n - 1.

for complex flavors:

The QZ iteration failed. No eigenvectors have been calculated, but alpha[j] and beta[j] should be correct for
j=info,...,n - 1.

> n:

=n + 1: other than QZ iteration failed in ?hgeqz,

=n + 2: error return from ?tgevc.

LAPACK Auxiliary Routines

Routine naming conventions, mathematical notation, and matrix storage schemes used for LAPACK auxiliary
routines are the same as for the driver and computational routines described in previous chapters.

?lacgv
Conjugates a complex vector.

Syntax
lapack_int LAPACKE_clacgv (lapack_int n, lapack_complex_float* x, lapack_int incx);
lapack_int LAPACKE_zlacgv (lapack_int n, lapack_complex_double* x, lapack_int incx);

1177
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description

The routine conjugates a complex vector x of length n and increment incx (see "Vector Arguments in BLAS"
in Appendix B).

n The length of the vector x (n≥ 0).

x Array, dimension (1+(n-1)* |incx|).

Contains the vector of length n to be conjugated.

incx The spacing between successive elements of x.

Output Parameters

x On exit, overwritten with conjg(x).

?lacrm
Multiplies a complex matrix by a square real matrix.

Syntax
call clacrm( m, n, a, lda, b, ldb, c, ldc, rwork )
call zlacrm( m, n, a, lda, b, ldb, c, ldc, rwork )

Include Files
• mkl.h

Description

The routine performs a simple matrix-matrix multiplication of the form

C = A*B,
where A is m-by-n and complex, B is n-by-n and real, C is m-by-n and complex.

Input Parameters

m INTEGER. The number of rows of the matrix A and of the matrix C (m≥ 0).

n INTEGER. The number of columns and rows of the matrix B and the number
of columns of the matrix C
(n≥ 0).

a COMPLEX for clacrm

DOUBLE COMPLEX for zlacrm

1178
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Array, DIMENSION(lda, n). Contains the m-by-n matrix A.

lda INTEGER. The leading dimension of the array a, lda≥max(1, m).

b REAL for clacrm

DOUBLE PRECISION for zlacrm
Array, DIMENSION(ldb, n). Contains the n-by-n matrix B.

ldb INTEGER. The leading dimension of the array b, ldb≥max(1, n).

ldc INTEGER. The leading dimension of the output array c, ldc≥max(1, n).

rwork REAL for clacrm

DOUBLE PRECISION for zlacrm
Workspace array, DIMENSION(2*m*n).

Output Parameters

c COMPLEX for clacrm

DOUBLE COMPLEX for zlacrm
Array, DIMENSION (ldc, n). Contains the m-by-n matrix C.

?syconv
Converts a symmetric matrix given by a triangular
matrix factorization into two matrices and vice versa.

Syntax
lapack_int LAPACKE_ssyconv (int matrix_layout, char uplo, char way, lapack_int n, float
* a, lapack_int lda, const lapack_int * ipiv, float * e);
lapack_int LAPACKE_dsyconv (int matrix_layout, char uplo, char way, lapack_int n,
double* a, lapack_int lda, const lapack_int * ipiv, double * e);
lapack_int LAPACKE_csyconv (int matrix_layout, char uplo, char way, lapack_int n,
lapack_complex_float * a, lapack_int lda, const lapack_int * ipiv, lapack_complex_float
* e);
lapack_int LAPACKE_zsyconv (int matrix_layout, char uplo, char way, lapack_int n,
lapack_complex_double* a, lapack_int lda, const lapack_int * ipiv,
lapack_complex_double * e);

Include Files
• mkl.h

Description
The routine converts matrix A, which results from a triangular matrix factorization, into matrices L and D and
vice versa. The routine returns non-diagonalized elements of D and applies or reverses permutation done
with the triangular matrix factorization.

1179
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major ( LAPACK_COL_MAJOR ).

uplo Must be 'U' or 'L'.

Indicates whether the details of the factorization are stored as an upper or

lower triangular matrix:
If uplo = 'U': the upper triangular, A = U*D*UT.

If uplo = 'L': the lower triangular, A = LDLT.

way Must be 'C' or 'R'.

n The order of matrix A; n≥ 0.

a Array of size max(1,lda *n).

The block diagonal matrix D and the multipliers used to obtain the factor U
or L as computed by ?sytrf.

lda The leading dimension of a; lda≥ max(1, n).

ipiv Array, size at least max(1, n).

Details of the interchanges and the block structure of D, as returned

by ?sytrf.

Output Parameters

e Array of size max(1, n) containing the superdiagonal/subdiagonal of the

symmetric 1-by-1 or 2-by-2 block diagonal matrix D in L*D*LT.

Return Values

info If info = 0, the execution is successful.

If info < 0, the i-th parameter had an illegal value.

If info = -1011, memory allocation error occurred.

The routine performs the symmetric rank 1 operation defined as

a := alpha*x*xH + a,
where:
• alpha is a complex scalar.
• x is an n-element complex vector.
• a is an n-by-n complex symmetric matrix.
These routines have their real equivalents in BLAS (see ?syr in Chapter "BLAS and Sparse BLAS Routines").

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo Specifies whether the upper or lower triangular part of the array a is used:
If uplo = 'U' or 'u', then the upper triangular part of the array a is used.

If uplo = 'L' or 'l', then the lower triangular part of the array a is used.

n Specifies the order of the matrix a. The value of n must be at least zero.

alpha Specifies the scalar alpha.

x Array, size at least (1 + (n - 1)*abs(incx)). Before entry, the

incremented array x must contain the n-element vector x.

incx Specifies the increment for the elements of x. The value of incx must not be
zero.

a Array, size max(1, lda*n). Before entry with uplo = 'U' or 'u', the
leading n-by-n upper triangular part of the array a must contain the upper
triangular part of the symmetric matrix and the strictly lower triangular part
of a is not referenced.
Before entry with uplo = 'L' or 'l', the leading n-by-n lower triangular
part of the array a must contain the lower triangular part of the symmetric
matrix and the strictly upper triangular part of a is not referenced.

lda Specifies the leading dimension of a as declared in the calling

(sub)program. The value of lda must be at least max(1,n).

Output Parameters

a With uplo = 'U' or 'u', the upper triangular part of the array a is
overwritten by the upper triangular part of the updated matrix.
With uplo = 'L' or 'l', the lower triangular part of the array a is
overwritten by the lower triangular part of the updated matrix.

1181
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Return Values
This function returns a value info.
If info = 0, the execution is successful.

If info < 0, the i-th parameter had an illegal value.

If info = -1011, memory allocation error occurred.

i?max1
Finds the index of the vector element whose real part
has maximum absolute value.

Syntax
MKL_INT icmax1(const MKL_INT*n, const MKL_Complex8*cx, const MKL_INT*incx)
MKL_INT izmax1(const MKL_INT*n, const MKL_Complex16*cx, const MKL_INT*incx)

Include Files
• mkl.h

Description

Given a complex vector cx, the i?max1 functions return the index of the first vector element of maximum
absolute value. These functions are based on the BLAS functions icamax/izamax, but using the absolute
value of components. They are designed for use with clacon/zlacon.

Input Parameters

n Specifies the number of elements in the vector cx.

cx Array, size at least (1+(n-1)*abs(incx)).

Contains the input vector.

incx Specifies the spacing between successive elements of cx.

Return Values
Index of the vector element of maximum absolute value.

?sum1
Forms the 1-norm of the complex vector using the
true absolute value.

Syntax
float scsum1(const MKL_INT*n, const MKL_Complex8*cx, const MKL_INT*incx)
double dzsum1(const MKL_INT*n, const MKL_Complex16*cx, const MKL_INT*incx)

Include Files
• mkl.h

1182
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
Given a complex vector cx, scsum1/dzsum1 functions take the sum of the absolute values of vector elements
and return a single/double precision result, respectively. These functions are based on scasum/dzasum from
Level 1 BLAS, but use the true absolute value and were designed for use with clacon/zlacon.

Input Parameters

n Specifies the number of elements in the vector cx.

cx Array, size at least (1+(n-1)*abs(incx)).

Contains the input vector whose elements will be summed.

incx Specifies the spacing between successive elements of cx (incx > 0).

Return Values
Sum of absolute values.

?gelq2
Computes the LQ factorization of a general
rectangular matrix using an unblocked algorithm.

Syntax
lapack_int LAPACKE_sgelq2 (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int lda, float* tau);
lapack_int LAPACKE_dgelq2 (int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int lda, double * tau);
lapack_int LAPACKE_cgelq2 (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* tau);
lapack_int LAPACKE_zgelq2 (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* tau);

Include Files
• mkl.h

Description

The routine computes an LQ factorization of a real/complex m-by-n matrix A as A = L*Q.

The routine does not form the matrix Q explicitly. Instead, Q is represented as a product of min(m, n)
elementary reflectors :
Q = H(k) ... H(2) H(1) (or Q = H(k)H ... H(2)HH(1)H for complex flavors), where k = min(m, n)
Each H(i) has the form
H(i) = I - tau*v*vT for real flavors, or
H(i) = I - tau*v*vH for complex flavors,
where tau is a real/complex scalar stored in tau(i), and v is a real/complex vector with v1:i-1 = 0 and vi =
1.

1183
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

On exit, the j-th (i+1 ≤j≤n) component of vector v (for real functions) or its conjugate (for complex functions)
is stored in a[i - 1 + lda*(j - 1)] for column major layout or in a[j - 1 + lda*(i - 1)] for row
major layout.

m The number of rows in the matrix A (m≥ 0).

n The number of columns in A (n≥ 0).

a Array, size at least max(1, lda*n) for column major and max(1, lda*m)
for row major layout. Array a contains the m-by-n matrix A.

lda The leading dimension of a; at least max(1, m) for column major layout and
max(1,n) for row major layout.

Output Parameters

a Overwritten by the factorization data as follows:

on exit, the elements on and below the diagonal of the array a contain the
m-by-min(n,m) lower trapezoidal matrix L (L is lower triangular if n≥m); the
elements above the diagonal, with the array tau, represent the orthogonal/
unitary matrix Q as a product of min(n,m) elementary reflectors.

tau Array, size at least max(1, min(m, n)).

Contains scalar factors of the elementary reflectors.

Return Values
This function returns a value info.
If info = 0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = -1011, memory allocation error occurred.

?geqr2
Computes the QR factorization of a general
rectangular matrix using an unblocked algorithm.

Syntax
lapack_int LAPACKE_sgeqr2 (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int lda, float* tau);
lapack_int LAPACKE_dgeqr2 (int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int lda, double* tau);
lapack_int LAPACKE_cgeqr2 (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* tau);
lapack_int LAPACKE_zgeqr2 (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* tau);

1184
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h

Description

The routine computes a QR factorization of a real/complex m-by-n matrix A as A = Q*R.

The routine does not form the matrix Q explicitly. Instead, Q is represented as a product of min(m, n)
elementary reflectors :
Q = H(1)*H(2)* ... *H(k), where k = min(m, n)
Each H(i) has the form
H(i) = I - tau*v*vT for real flavors, or
H(i) = I - tau*v*vH for complex flavors
where tau is a real/complex scalar stored in tau[i], and v is a real/complex vector with v1:i-1 = 0 and vi =
1.
On exit, vi+1:m is stored in a(i+1:m, i).

m The number of rows in the matrix A (m≥ 0).

n The number of columns in A (n≥ 0).

a Array, size at least max(1, lda*n) for column major and max(1, lda*m)
for row major layout. Array a contains the m-by-n matrix A.

lda The leading dimension of a; at least max(1, m) for column major layout and
max(1,n) for row major layout.

Output Parameters

a Overwritten by the factorization data as follows:

on exit, the elements on and above the diagonal of the array a contain the
min(n,m)-by-n upper trapezoidal matrix R (R is upper triangular if m≥n); the
elements below the diagonal, with the array tau, represent the orthogonal/
unitary matrix Q as a product of elementary reflectors.

tau Array, size at least max(1, min(m, n)).

Contains scalar factors of the elementary reflectors.

Return Values
This function returns a value info.
If info = 0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = -1011, memory allocation error occurred.

1185
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

?geqrt2
Computes a QR factorization of a general real or
complex matrix using the compact WY representation
of Q.

Syntax
lapack_int LAPACKE_sgeqrt2 (int matrix_layout, lapack_int m, lapack_int n, float * a,
lapack_int lda, float * t, lapack_int ldt );
lapack_int LAPACKE_dgeqrt2 (int matrix_layout, lapack_int m, lapack_int n, double * a,
lapack_int lda, double * t, lapack_int ldt );
lapack_int LAPACKE_cgeqrt2 (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float * a, lapack_int lda, lapack_complex_float * t, lapack_int ldt );
lapack_int LAPACKE_zgeqrt2 (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double * a, lapack_int lda, lapack_complex_double * t, lapack_int ldt );

Include Files
• mkl.h

Description

The strictly lower triangular matrix V contains the elementary reflectors H(i) in the ith column below the
diagonal. For example, if m=5 and n=3, the matrix V is

1186
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

where vi represents the vector that defines H(i). The vectors are returned in the lower triangular part of array
a.

NOTE
The 1s along the diagonal of V are not stored in a.

The block reflector H is then given by

H = I - V*T*VT for real flavors, and
H = I - V*T*VH for complex flavors,
where VT is the transpose and VH is the conjugate transpose of V.

1187
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

m The number of rows in the matrix A (m ≥ n).

n The number of columns in A (n ≥ 0).

a Array, size at least max(1, lda*n) for column major and max(1, lda*m)
for row major layout. Array a contains the m-by-n matrix A.

lda The leading dimension of a; at least max(1, m) for column major layout and
max(1,n) for row major layout.

ldt The leading dimension of t; at least max(1, n).

Output Parameters

a Overwritten by the factorization data as follows:

The elements on and above the diagonal of the array contain the n-by-n
upper triangular matrix R. The elements below the diagonal are the
columns of V.

t Array, size at least max(1, ldt*n).

The n-by-n upper triangular factor of the block reflector. The elements on
and above the diagonal contain the block reflector T. The elements below
the diagonal are not used.

Return Values
This function returns a value info.
If info = 0, the execution is successful.

If info < 0 and info = -i, the ith argument had an illegal value.

If info = -1011, memory allocation error occurred.

?geqrt3
Recursively computes a QR factorization of a general
real or complex matrix using the compact WY
representation of Q.

Syntax
lapack_int LAPACKE_sgeqrt3 (int matrix_layout , lapack_int m , lapack_int n , float *
a , lapack_int lda , float * t , lapack_int ldt );
lapack_int LAPACKE_dgeqrt3 (int matrix_layout , lapack_int m , lapack_int n , double *
a , lapack_int lda , double * t , lapack_int ldt );
lapack_int LAPACKE_cgeqrt3 (int matrix_layout , lapack_int m , lapack_int n ,
lapack_complex_float * a , lapack_int lda , lapack_complex_float * t , lapack_int
ldt );
lapack_int LAPACKE_zgeqrt3 (int matrix_layout , lapack_int m , lapack_int n ,
lapack_complex_double * a , lapack_int lda , lapack_complex_double * t , lapack_int
ldt );

1188
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h

Description

The strictly lower triangular matrix V contains the elementary reflectors H(i) in the ith column below the
diagonal. For example, if m=5 and n=3, the matrix V is

where vi represents one of the vectors that define H(i). The vectors are returned in the lower part of
triangular array a.

1189
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

NOTE
The 1s along the diagonal of V are not stored in a.

The block reflector H is then given by

H = I - V*T*VT for real flavors, and
H = I - V*T*VH for complex flavors,
where VT is the transpose and VHis the conjugate transpose of V.

Input Parameters

m The number of rows in the matrix A (m ≥ n).

n The number of columns in A (n ≥ 0).

a Array, size at least max(1, lda*n) for column major and max(1, lda*m)
for row major layout. Array a contains the m-by-n matrix A.

lda The leading dimension of a; at least max(1, m) for column major layout and
max(1,n) for row major layout.

ldt The leading dimension of t; at least max(1, n).

Output Parameters

a The elements on and above the diagonal of the array contain the n-by-n
upper triangular matrix R. The elements below the diagonal are the
columns of V.

t Array, size ldt by n.

The n-by-n upper triangular factor of the block reflector. The elements on
and above the diagonal contain the block reflector T. The elements below
the diagonal are not used.

Return Values
This function returns a value info.
If info = 0, the execution is successful.

If info < 0 and info = -i, the ith argument had an illegal value.

If info = -1011, memory allocation error occurred.

?getf2
Computes the LU factorization of a general m-by-n
matrix using partial pivoting with row interchanges
(unblocked algorithm).

Syntax
lapack_int LAPACKE_sgetf2 (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int lda, lapack_int * ipiv);
lapack_int LAPACKE_dgetf2 (int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int lda, lapack_int * ipiv);

1190
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_cgetf2 (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_int * ipiv);
lapack_int LAPACKE_zgetf2 (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_int * ipiv);

Include Files
• mkl.h

Description

The routine computes the LU factorization of a general m-by-n matrix A using partial pivoting with row
interchanges. The factorization has the form
A = P*L*U
where p is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m >
n) and U is upper triangular (upper trapezoidal if m < n).

m The number of rows in the matrix A (m≥ 0).

n The number of columns in A (n≥ 0).

a Array, size at least max(1, lda*n) for column major and max(1, lda*m)
for row major layout. Array a contains the m-by-n matrix A.

lda The leading dimension of a; at least max(1, m) for column major layout and
max(1,n) for row major layout.

Output Parameters

a Overwritten by L and U. The unit diagonal elements of L are not stored.

ipiv Array, size at least max(1,min(m,n)).

The pivot indices: for 1 ≤ i ≤ n, row i was interchanged with row ipiv(i).

Return Values
This function returns a value info.
If info = -i, the i-th parameter had an illegal value.

If info = i >0, uii is 0. The factorization has been completed, but U is exactly singular. Division by 0 will
occur if you use the factor U for solving a system of linear equations.
If info = -1011, memory allocation error occurred.

?lacn2
Estimates the 1-norm of a square matrix, using
reverse communication for evaluating matrix-vector
products.

1191
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
C:
lapack_int LAPACKE_slacn2 (lapack_int n, float * v, float * x, lapack_int * isgn, float
* est, lapack_int * kase, lapack_int * isave);
lapack_int LAPACKE_clacn2 (lapack_int n, lapack_complex_float * v, lapack_complex_float
* x, float * est, lapack_int * kase, lapack_int * isave);
lapack_int LAPACKE_dlacn2 (lapack_int n, double * v, double * x, lapack_int * isgn,
double * est, lapack_int * kase, lapack_int * isave);
lapack_int LAPACKE_zlacn2 (lapack_int n, lapack_complex_double * v,
lapack_complex_double * x, double * est, lapack_int * kase, lapack_int * isave);

Include Files
• mkl.h

Description

The routine estimates the 1-norm of a square, real or complex matrix A. Reverse communication is used for
evaluating matrix-vector products.

Input Parameters

n The order of the matrix A (n≥ 1).

v, x Arrays, size (n) each.

v is a workspace array.
x is used as input after an intermediate return.

isgn Workspace array, size (n), used with real flavors only.

est On entry with kase set to 1 or 2, and isave(1) = 1, est must be

unchanged from the previous call to the routine.

kase On the initial call to the routine, kase must be set to 0.

isave Array, size (3).

Contains variables from the previous call to the routine.

Output Parameters

est An estimate (a lower bound) for norm(A).

kase On an intermediate return, kase is set to 1 or 2, indicating whether x is

overwritten by A*x or AT*x for real flavors and A*x or AH*x for complex
flavors.
On the final return, kase is set to 0.

v On the final return, v = A*w, where est = norm(v)/norm(w) (w is not

returned).

x On an intermediate return, x is overwritten by

A*x, if kase = 1,

1192
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
AT*x, if kase = 2 (for real flavors),
AH*x, if kase = 2 (for complex flavors),
and the routine must be re-called with all the other parameters unchanged.

isave This parameter is used to save variables between calls to the routine.

Return Values
This function returns a value info.
If info = 0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

?lacpy
Copies all or part of one two-dimensional array to
another.

Syntax
lapack_int LAPACKE_slacpy (int matrix_layout, char uplo, lapack_int m, lapack_int n,
const float* a, lapack_int lda, float* b, lapack_int ldb);
lapack_int LAPACKE_dlacpy (int matrix_layout, char uplo, lapack_int m, lapack_int n,
const double* a, lapack_int lda, double* b, lapack_int ldb);
lapack_int LAPACKE_clacpy (int matrix_layout, char uplo, lapack_int m, lapack_int n,
const lapack_complex_float* a, lapack_int lda, lapack_complex_float* b, lapack_int
ldb);
lapack_int LAPACKE_zlacpy (int matrix_layout, char uplo, lapack_int m, lapack_int n,
const lapack_complex_double* a, lapack_int lda, lapack_complex_double* b, lapack_int
ldb);

Include Files
• mkl.h

Description

The routine copies all or part of a two-dimensional matrix A to another matrix B.

uplo Specifies the part of the matrix A to be copied to B.

If uplo = 'U', the upper triangular part of A;

if uplo = 'L', the lower triangular part of A.

Otherwise, all of the matrix A is copied.

m The number of rows in the matrix A (m≥ 0).

n The number of columns in A (n≥ 0).

1193
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

a Array, size at least max(1, lda*n) for column major and max(1, lda*m)
for row major layout. A contains the m-by-n matrix A.

If uplo = 'U', only the upper triangle or trapezoid is accessed; if uplo =

'L', only the lower triangle or trapezoid is accessed.

lda The leading dimension of a; lda≥max(1,m) for column major layout

and max(1,n) for row major layout.
ldb The leading dimension of the output array b; ldb≥ max(1, m)for column
major layout and max(1,n) for row major layout.

Output Parameters

b Array, size at least max(1, ldb*n) for column major and max(1, ldb*m)
for row major layout. Array a contains the m-by-n matrix B.

On exit, B = A in the locations specified by uplo.

Return Values
This function returns a value info.
If info = 0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

?lakf2
Forms a matrix containing Kronecker products
between the given matrices.

Syntax
void slakf2 (lapack_int *m, lapack_int *n, float *a, lapack_int *lda, float *b, float
*d, float *e, float *z, lapack_int *ldz);
void dlakf2 (lapack_int *m, lapack_int *n, double *a, lapack_int *lda, double *b, double
*d, double *e, double *z, lapack_int *ldz);
void clakf2 (lapack_int *m, lapack_int *n, lapack_complex *a, lapack_int *lda,
lapack_complex *b, lapack_complex *d, lapack_complex *e, lapack_complex *z, lapack_int
*ldz);
void zlakf2 (lapack_int *m, lapack_int *n, lapack_complex_double *a, lapack_int *lda,
lapack_complex_double *b, lapack_complex_double *d, lapack_complex_double *e,
lapack_complex_double *z, lapack_int *ldz);

Include Files
• mkl.h

Description

The routine ?lakf2 forms the 2mn by 2mn matrix Z.

1194
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
where In is the identity matrix of size n and XT is the transpose of X. kron(X, Y) is the Kronecker product
between the matrices X and Y.

Input Parameters

m Size of matrix, m≥ 1

n Size of matrix, n≥ 1

a Array, size lda-by-n. The matrix A in the output matrix Z.

lda The leading dimension of a, b, d, and e. lda≥m+n.

b Array, size lda by n. Matrix used in forming the output matrix Z.

d Array, size lda by m. Matrix used in forming the output matrix Z.

e Array, size lda by n. Matrix used in forming the output matrix Z.

ldz The leading dimension of Z. ldz≥ 2* m*n.

Output Parameters

z Array, size ldz-by-2mn. The resultant Kronecker mn2 -by-mn2

matrix.

?lange
Returns the value of the 1-norm, Frobenius norm,
infinity-norm, or the largest absolute value of any
element of a general rectangular matrix.

Syntax
float LAPACKE_slange (int matrix_layout, char norm, lapack_int m, lapack_int n, const
float * a, lapack_int lda);
double LAPACKE_dlange (int matrix_layout, char norm, lapack_int m, lapack_int n, const
double * a, lapack_int lda);
float LAPACKE_clange (int matrix_layout, char norm, lapack_int m, lapack_int n, const
lapack_complex_float * a, lapack_int lda);
double LAPACKE_zlange (int matrix_layout, char norm, lapack_int m, lapack_int n, const
lapack_complex_double * a, lapack_int lda);

Include Files
• mkl.h

Description

The function ?lange returns the value of the 1-norm, or the Frobenius norm, or the infinity norm, or the
element of largest absolute value of a real/complex matrix A.

1195
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

norm Specifies the value to be returned by the routine:

= 'M' or 'm': val = max(abs(Aij)), largest absolute value of the matrix
A.
= '1' or 'O' or 'o': val = norm1(A), 1-norm of the matrix A
(maximum column sum),
= 'I' or 'i': val = normI(A), infinity norm of the matrix A (maximum
row sum),
= 'F', 'f', 'E' or 'e': val = normF(A), Frobenius norm of the matrix
A (square root of sum of squares).

m The number of rows of the matrix A.

m≥ 0. When m = 0, ?lange is set to zero.

n The number of columns of the matrix A.

n≥ 0. When n = 0, ?lange is set to zero.

a Array, size at least max(1, lda*n) for column major and max(1, lda*m)
for row major layout. Array a contains the m-by-n matrix A.

lda The leading dimension of the array a.

lda≥ max(n,1) for column major layout and max(1,n) for row major
layout.

?lansy
Returns the value of the 1-norm, or the Frobenius
norm, or the infinity norm, or the element of largest
absolute value of a real/complex symmetric matrix.

Syntax
float LAPACKE_slansy (int matrix_layout, char norm, char uplo, lapack_int n, const
float * a, lapack_int lda);
double LAPACKE_dlansy (int matrix_layout, char norm, char uplo, lapack_int n, const
double * a, lapack_int lda);
float LAPACKE_clansy (int matrix_layout, char norm, char uplo, lapack_int n, const
lapack_complex_float * a, lapack_int lda);
double LAPACKE_zlansy (int matrix_layout, char norm, char uplo, lapack_int n, const
lapack_complex_double * a, lapack_int lda);

Include Files
• mkl.h

Description

The function ?lansy returns the value of the 1-norm, or the Frobenius norm, or the infinity norm, or the
element of largest absolute value of a real/complex symmetric matrix A.

1196
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.

norm Specifies the value to be returned by the routine:

uplo Specifies whether the upper or lower triangular part of the symmetric
matrix A is to be referenced.
= 'U': Upper triangular part of A is referenced.

= 'L': Lower triangular part of A is referenced

n The order of the matrix A. n≥ 0. When n = 0, ?lansy is set to zero.

a Array, size at least max(1,lda*n). The symmetric matrix A.

If uplo = 'U', the leading n-by-n upper triangular part of a contains the
upper triangular part of the matrix A, and the strictly lower triangular part
of a is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of a contains the
lower triangular part of the matrix A, and the strictly upper triangular part
of a is not referenced.

lda The leading dimension of the array a.

lda≥ max(n,1).

?lanhe
Returns the value of the 1-norm, or the Frobenius
norm, or the infinity norm, or the element of largest
absolute value of a complex Hermitian matrix.

Syntax
float LAPACKE_clanhe (int matrix_layout, char norm, char uplo, lapack_int n, const
lapack_complex_float * a, lapack_int lda);
double LAPACKE_zlanhe (int matrix_layout, char norm, char uplo, lapack_int n, const
lapack_complex_double * a, lapack_int lda);

Include Files
• mkl.h

Description

1197
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The function ?lanhe returns the value of the 1-norm, or the Frobenius norm, or the infinity norm, or the
element of largest absolute value of a complex Hermitian matrix A.

norm Specifies the value to be returned by the routine:

= 'M' or 'm': val = max(abs(Aij)), largest absolute value of the matrix A.

= '1' or 'O' or 'o': val = norm1(A), 1-norm of the matrix A (maximum

column sum),
= 'I' or 'i': val = normI(A), infinity norm of the matrix A (maximum
row sum),
= 'F', 'f', 'E' or 'e': val = normF(A), Frobenius norm of the matrix A
(square root of sum of squares).

uplo Specifies whether the upper or lower triangular part of the Hermitian matrix
A is to be referenced.
= 'U': Upper triangular part of A is referenced.

= 'L': Lower triangular part of A is referenced

n The order of the matrix A. n≥ 0. When n = 0, ?lanhe is set to zero.

a Array, size at least max(1, lda*n). The Hermitian matrix A.

lda The leading dimension of the array a.

lda≥ max(n,1).

?lantr
Returns the value of the 1-norm, or the Frobenius
norm, or the infinity norm, or the element of largest
absolute value of a trapezoidal or triangular matrix.

Syntax
float LAPACKE_slantr (char * norm, char * uplo, char * diag, lapack_int * m, lapack_int
* n, const float * a, lapack_int * lda, float * work);
double LAPACKE_dlantr (char * norm, char * uplo, char * diag, lapack_int * m,
lapack_int * n, const double * a, lapack_int * lda, double * work);
float LAPACKE_clantr (char * norm, char * uplo, char * diag, lapack_int * m, lapack_int
* n, const lapack_complex_float * a, lapack_int * lda, float * work);
double LAPACKE_zlantr (char * norm, char * uplo, char * diag, lapack_int * m,
lapack_int * n, const lapack_complex_double * a, lapack_int * lda, double * work);

1198
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h

Description

The function ?lantr returns the value of the 1-norm, or the Frobenius norm, or the infinity norm, or the
element of largest absolute value of a trapezoidal or triangular matrix A.

norm Specifies the value to be returned by the routine:

uplo Specifies whether the matrix A is upper or lower trapezoidal.

= 'U': Upper trapezoidal

= 'L': Lower trapezoidal.

Note that A is triangular instead of trapezoidal if m = n.

diag Specifies whether or not the matrix A has unit diagonal.

= 'N': Non-unit diagonal

= 'U': Unit diagonal.

m The number of rows of the matrix A. m≥ 0, and if uplo = 'U', m ≤ n.

When m = 0, ?lantr is set to zero.

n The number of columns of the matrix A. n≥ 0, and if uplo = 'L', n ≤ m.

When n = 0, ?lantr is set to zero.

a Array, size at least max(1, lda*n) for column major and max(1, lda*m)
for row major layout.
The trapezoidal matrix A (A is triangular if m = n).

If uplo = 'U', the leading m-by-n upper trapezoidal part of the array a
contains the upper trapezoidal matrix, and the strictly lower triangular part
of A is not referenced.

1199
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If uplo = 'L', the leading m-by-n lower trapezoidal part of the array a
contains the lower trapezoidal matrix, and the strictly upper triangular part
of A is not referenced. Note that when diag = 'U', the diagonal elements
of A are not referenced and are assumed to be one.

lda The leading dimension of the array a.

lda≥ max(m,1)for column major layout and ≥max(1,n) for row major
layout.

LAPACKE_set_nancheck
Turns NaN checking off or on

LAPACKE_set_nancheck(int flag);

Description
The routine sets a value for the LAPACKE NaN checking flag, which indicates whether or not LAPACKE
routines check input matrices for NaNs.

Input Parameters

flag If flag= 0, NaN checking is turned OFF. Otherwise, it is turned ON.

LAPACKE_get_nancheck
Gets the current NaN checking flag, which indicates
whether NaN checking has been turned off or on.
int flag = LAPACKE_get_nancheck ();

Description
The function returns the current value for the LAPACKE NaN checking flag, which indicates whether or not
LAPACKE routines check input matrices for NaNs.

Return Value
An integer value is returned which indicates the current NaN checking status.
The returned flag value is either 0 (OFF) or 1 (ON), even though any integer value can be used as an input
parameter for LAPACKE_set_nancheck.

For example, the following code turns on NaN checking:

LAPACKE_set_nancheck(100);
int flag = LAPACKE_get_nancheck(); // flag==1, not 100.

?lapmr
Rearranges rows of a matrix as specified by a
permutation vector.

Syntax
lapack_int LAPACKE_slapmr (int matrix_layout, lapack_logical forwrd, lapack_int m,
lapack_int n, float* x, lapack_int ldx, lapack_int * k);
lapack_int LAPACKE_dlapmr (int matrix_layout, lapack_logical forwrd, lapack_int m,
lapack_int n, double* x, lapack_int ldx, lapack_int * k);

1200
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_clapmr (int matrix_layout, lapack_logical forwrd, lapack_int m,
lapack_int n, lapack_complex_float* x, lapack_int ldx, lapack_int * k);
lapack_int LAPACKE_zlapmr (int matrix_layout, lapack_logical forwrd, lapack_int m,
lapack_int n, lapack_complex_double* x, lapack_int ldx, lapack_int * k);

Include Files
• mkl.h

Description
The ?lapmr routine rearranges the rows of the m-by-n matrix X as specified by the permutation k[0],
k[1], ... , k[m-1] of the integers 1,...,m.
If forwrd is true, forward permutation:
X(k[i-1],:) is moved to X{i,:) for i= 1,2,...,m.
If forwrd is false, backward permutation:

X{i,:) is moved to X(k[i-1,:) for i = 1,2,...,m.

forwrd If forwrd is true, forward permutation.

If forwrd is false, backward permutation.

m The number of rows of the matrix X. m≥ 0.

n The number of columns of the matrix X. n≥ 0.

x Array, size at least max(1, ldx*n) for column major and max(1, ldx*m)
for row major layout. On entry, the m-by-n matrix X.

ldx The leading dimension of the array X, ldx≥ max(1,m)for column major
layout and ldx≥ max(1,n) for row major layout.

k Array, size (m). On entry, k contains the permutation vector and is used as
internal workspace.

Output Parameters

x On exit, x contains the permuted matrix X.

k On exit, k is reset to its original value.

Return Values
This function returns a value info.
If info = 0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = -1011, memory allocation error occurred.

1201
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

X(,k(j)) is moved to X(,j) for j=1,2,...,n.

If forwrd = 0, backward permutation:

X(,j) is moved to X(,k(j)) for j = 1,2,...,n.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

forwrd If forwrd≠ 0, forward permutation

If forwrd = 0, backward permutation

m The number of rows of the matrix X. m≥ 0.

n The number of columns of the matrix X. n≥ 0.

x Array, size ldx*n. On entry, the m-by-n matrix X.

ldx The leading dimension of the array x, ldx≥ max(1,m).

k Array, size (n). On entry, k contains the permutation vector and is used as
internal workspace.

Output Parameters

x On exit, x contains the permuted matrix X.

k On exit, k is reset to its original value.

1202
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
See Also
?lapmr

?lapy2
Returns sqrt(x2+y2).

Syntax
float LAPACKE_slapy2 (floatx, floaty);
double LAPACKE_dlapy2 (doublex, doubley);

Include Files
• mkl.h

Description

The function ?lapy2 returns sqrt(x2+y2), avoiding unnecessary overflow or harmful underflow.

x, y Specify the input values x and y.

Return Values
The function returns a value val.

If val=-1D0, the first argument was NaN.

If val=-2D0, the second argument was NaN.

?lapy3
Returns sqrt(x2+y2+z2).

Syntax
float LAPACKE_slapy3 (floatx, floaty, floatz);
double LAPACKE_dlapy3 (double x, doubley, doublez);

Include Files
• mkl.h

Description

The function ?lapy3 returns sqrt(x2+y2+z2), avoiding unnecessary overflow or harmful underflow.

x, y, z Specify the input values x, y and z.

1203
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Return Values
This function returns a value val.

If val = -1D0, the first argument was NaN.

If val = -2D0, the second argument was NaN.

If val = -3D0, the third argument was NaN.

?laran
Returns a random real number from a uniform
distribution.

Syntax
float slaran (lapack_int *iseed);
double dlaran (lapack_int *iseed);

Description

The ?laran routine returns a random real number from a uniform (0,1) distribution. This routine uses a
multiplicative congruential method with modulus 248 and multiplier 33952834046453. 48-bit integers are
stored in four integer array elements with 12 bits per element. Hence the routine is portable across machines
with integers of 32 bits or more.

Input Parameters

iseed Array, size 4. On entry, the seed of the random number generator. The
array elements must be between 0 and 4095, and iseed[3] must be odd.

Output Parameters

iseed On exit, the seed is updated.

Return Values
The function returns a random number.

?larfb
Applies a block reflector or its transpose/conjugate-
transpose to a general rectangular matrix.

Syntax
lapack_int LAPACKE_slarfb (int matrix_layout , char side , char trans , char direct ,
char storev , lapack_int m , lapack_int n , lapack_int k , const float * v , lapack_int
ldv , const float * t , lapack_int ldt , float * c , lapack_int ldc );
lapack_int LAPACKE_dlarfb (int matrix_layout , char side , char trans , char direct ,
char storev , lapack_int m , lapack_int n , lapack_int k , const double * v ,
lapack_int ldv , const double * t , lapack_int ldt , double * c , lapack_int
ldc );lapack_int LAPACKE_clarfb (int matrix_layout , char side , char trans , char
direct , char storev , lapack_int m , lapack_int n , lapack_int k , const
lapack_complex_float * v , lapack_int ldv , const lapack_complex_float * t , lapack_int
ldt , lapack_complex_float * c , lapack_int ldc );

1204
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_zlarfb (int matrix_layout , char side , char trans , char direct ,
char storev , lapack_int m , lapack_int n , lapack_int k , const lapack_complex_double
* v , lapack_int ldv , const lapack_complex_double * t , lapack_int ldt ,
lapack_complex_double * c , lapack_int ldc );

Include Files
• mkl.h

Description
The real flavors of the routine ?larfb apply a real block reflector H or its transpose HT to a real m-by-n
matrix C from either left or right.
The complex flavors of the routine ?larfb apply a complex block reflector H or its conjugate transpose HH to
a complex m-by-n matrix C from either left or right.

side If side = 'L': apply H or HT for real flavors and H or HH for complex
flavors from the left.
If side = 'R': apply H or HT for real flavors and H or HH for complex
flavors from the right.

trans If trans = 'N': apply H (No transpose).

If trans = 'C': apply HH (Conjugate transpose).

If trans = 'T': apply HT (Transpose).

direct Indicates how H is formed from a product of elementary reflectors

If direct = 'F': H = H(1)*H(2)*. . . *H(k) (forward)

If direct = 'B': H = H(k)* . . . H(2)*H(1) (backward)

storev Indicates how the vectors which define the elementary reflectors are
stored:
If storev = 'C': Column-wise

If storev = 'R': Row-wise

m The number of rows of the matrix C.

n The number of columns of the matrix C.

k The order of the matrix T (equal to the number of elementary reflectors

whose product defines the block reflector).

v The size limitations depend on values of parameters storev and side as

described in the following table:

storev = C storev = R

side = L side = R side = L side = R

1205
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Column max(1,ldv* max(1,ldv* max(1,ldv* max(1,ldv*

major k) k) m) n)

Row major max(1,ldv* max(1,ldv* max(1,ldv* max(1,ldv*

m) n) k) k)

The matrix v. See Application Notes below.

ldv The leading dimension of the array v.It should satisfy the following
conditions:

storev = C storev = R

side = L side = R side = L side = R

Column max(1,m) max(1,n) max(1,k) max(1,k)

major

Row major max(1,k) max(1,k) max(1,m) max(1,n)

t Array, size at least max(1,ldt * k).

Contains the triangular k-by-k matrix T in the representation of the block
reflector.

ldt The leading dimension of the array t.

ldt≥k.

c Array, size at least max(1, ldc * n) for column major layout and max(1, ldc
* m) for row major layout.
On entry, the m-by-n matrix C.

ldc The leading dimension of the array c.

ldc≥ max(1,m) for column major layout and ldc≥ max(1,n) for row
major layout.

Output Parameters

c On exit, c is overwritten by the product of the following:

• H*C, or HT*C, or C*H, or C*HT for real flavors
• H*C, or HH*C, or C*H, or C*HH for complex flavors

Return Values
This function returns a value info.
If info = 0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = -1011, memory allocation error occurred.

Application Notes
The shape of the matrix V and the storage of the vectors which define the H(i) is best illustrated by the
following example with n = 5 and k = 3. The elements equal to 1 are not stored; the corresponding array
elements are modified but restored on exit. The rest of the array is not used.

1206
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

?larfg
Generates an elementary reflector (Householder
matrix).

Syntax
lapack_int LAPACKE_slarfg (lapack_int n , float * alpha , float * x , lapack_int incx ,
float * tau );
lapack_int LAPACKE_dlarfg (lapack_int n , double * alpha , double * x , lapack_int
incx , double * tau );
lapack_int LAPACKE_clarfg (lapack_int n , lapack_complex_float * alpha ,
lapack_complex_float * x , lapack_int incx , lapack_complex_float * tau );
lapack_int LAPACKE_zlarfg (lapack_int n , lapack_complex_double * alpha ,
lapack_complex_double * x , lapack_int incx , lapack_complex_double * tau );

Include Files
• mkl.h

Description

The routine ?larfg generates a real/complex elementary reflector H of order n, such that

for real flavors and

for complex flavors,

1207
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

where alpha and beta are scalars (with beta real for all flavors), and x is an (n-1)-element real/complex
vector. H is represented in the form

for real flavors and

for complex flavors,

where tau is a real/complex scalar and v is a real/complex (n-1)-element vector, respectively. Note that for
clarfg/zlarfg, H is not Hermitian.
If the elements of x are all zero (and, for complex flavors, alpha is real), then tau = 0 and H is taken to be
the unit matrix.
Otherwise, 1 ≤ tau ≤ 2 (for real flavors), or

1 ≤ Re(tau) ≤ 2 and abs(tau-1) ≤ 1 (for complex flavors).

n The order of the elementary reflector.

alpha
x Array, size (1+(n-2)*abs(incx)).
On entry, the vector x.

incx The increment between elements of x. incx > 0.

Output Parameters

alpha On exit, it is overwritten with the value beta.

x On exit, it is overwritten with the vector v.

tau

Return Values
This function returns a value info.
If info = 0, the execution is successful.

If info = -2, alpha is NaN

If info = -3, array x contains NaN components.

?larft
Forms the triangular factor T of a block reflector H = I
- V*T*V**H.

1208
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_slarft (int matrix_layout , char direct , char storev , lapack_int
n , lapack_int k , const float * v , lapack_int ldv , const float * tau , float * t ,
lapack_int ldt );
lapack_int LAPACKE_dlarft (int matrix_layout , char direct , char storev , lapack_int
n , lapack_int k , const double * v , lapack_int ldv , const double * tau , double * t ,
lapack_int ldt );
lapack_int LAPACKE_clarft (int matrix_layout , char direct , char storev , lapack_int
n , lapack_int k , const lapack_complex_float * v , lapack_int ldv , const
lapack_complex_float * tau , lapack_complex_float * t , lapack_int ldt );
lapack_int LAPACKE_zlarft (int matrix_layout , char direct , char storev , lapack_int
n , lapack_int k , const lapack_complex_double * v , lapack_int ldv , const
lapack_complex_double * tau , lapack_complex_double * t , lapack_int ldt );

Include Files
• mkl.h

Description
The routine ?larft forms the triangular factor T of a real/complex block reflector H of order n, which is
defined as a product of k elementary reflectors.
If direct = 'F', H = H(1)*H(2)* . . .*H(k) and T is upper triangular;

If direct = 'B', H = H(k). . .H(2)*H(1) and T is lower triangular.

If storev = 'C', the vector which defines the elementary reflector H(i) is stored in the i-th column of the
array v, and H = I - V*T*VT (for real flavors) or H = I - V*T*VH (for complex flavors) .

If storev = 'R', the vector which defines the elementary reflector H(i) is stored in the i-th row of the array
v, and H = I - VT*T*V (for real flavors) or H = I - VH*T*V (for complex flavors).

direct Specifies the order in which the elementary reflectors are multiplied to form
the block reflector:
= 'F': H = H(1)*H(2)*. . . *H(k) (forward)

= 'B': H = H(k). . .H(2)*H(1) (backward)

storev Specifies how the vectors which define the elementary reflectors are stored
(see also Application Notes below):
= 'C': column-wise

= 'R': row-wise.

n The order of the block reflector H. n≥ 0.

k The order of the triangular factor T (equal to the number of elementary

reflectors). k≥ 1.

1209
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

v The size limitations depend on values of parameters storev and side as

described in the following table:

storev = C storev = R

Column major max(1,ldvk) max(1,ldvn)

Row major max(1,ldvn) max(1,ldvk)

The matrix v. See Application Notes below.

ldv The leading dimension of the array v.

If storev = 'C', ldv≥ max(1,n) for column major and ldv≥max(1,k) for
row major;
if storev = 'R', ldv≥k for column major and ldv≥max(1,n) for row
major.

tau Array, size (k). tau[i-1] must contain the scalar factor of the elementary
reflector H(i).

ldt The leading dimension of the output array t. ldt≥k.

Output Parameters

t Array, size ldt * k. The k-by-k triangular factor T of the block reflector. If
direct = 'F', T is upper triangular; if direct = 'B', T is lower
triangular. The rest of the array is not used.

v The matrix V.

1210
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

?larfx
Applies an elementary reflector to a general
rectangular matrix, with loop unrolling when the
reflector has order less than or equal to 10.

Syntax
lapack_int LAPACKE_slarfx (int matrix_layout , char side , lapack_int m , lapack_int
n , const float * v , float tau , float * c , lapack_int ldc , float * work );
lapack_int LAPACKE_dlarfx (int matrix_layout , char side , lapack_int m , lapack_int
n , const double * v , double tau , double * c , lapack_int ldc , double * work );
lapack_int LAPACKE_clarfx (int matrix_layout , char side , lapack_int m , lapack_int
n , const lapack_complex_float * v , lapack_complex_float tau , lapack_complex_float *
c , lapack_int ldc , lapack_complex_float * work );
lapack_int LAPACKE_zlarfx (int matrix_layout , char side , lapack_int m , lapack_int
n , const lapack_complex_double * v , lapack_complex_double tau , lapack_complex_double
* c , lapack_int ldc , lapack_complex_double * work );

Include Files
• mkl.h

Description

The routine ?larfx applies a real/complex elementary reflector H to a real/complex m-by-n matrix C, from
either the left or the right.
H is represented in the following forms:
• H = I - tau*v*vT, where tau is a real scalar and v is a real vector.
• H = I - tau*v*vH, where tau is a complex scalar and v is a complex vector.
If tau = 0, then H is taken to be the unit matrix.

side If side = 'L': form H*C

If side = 'R': form C*H.

1211
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

m The number of rows of the matrix C.

n The number of columns of the matrix C.

v Array, size
(m) if side = 'L' or

(n) if side = 'R'.

The vector v in the representation of H.

tau The value tau in the representation of H.

c Array, size at least max(1, ldc*n) for column major layout and max (1,
ldc*m) for row major layout. On entry, the m-by-n matrix C.

ldc The leading dimension of the array c. lda≥ (1,m).

work Workspace array, size

(n) if side = 'L' or

(m) if side = 'R'.

work is not referenced if H has order < 11.

Output Parameters

c On exit, C is overwritten by the matrix HC if side = 'L', or CH if side =

'R'.

?large
Pre- and post-multiplies a real general matrix with a
random orthogonal matrix.

Syntax
void slarge (lapack_int *n, float *a, lapack_int *lda, lapack_int *iseed, float * work,
lapack_int *info);
void dlarge (lapack_int *n, double *a, lapack_int *lda, lapack_int *iseed, double *
work, lapack_int *info);
void clarge (lapack_int *n, lapack_complex *a, lapack_int *lda, lapack_int *iseed,
lapack_complex * work, lapack_int *info);
void zlarge (lapack_int *n, lapack_complex_double *a, lapack_int *lda, lapack_int
*iseed, lapack_complex_double * work, lapack_int *info);

Include Files
• mkl.h

Description

The routine ?large pre- and post-multiplies a general n-by-n matrix A with a random orthogonal or unitary
matrix: A = U*D*UT .

1212
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

n The order of the matrix A. n≥0

a Array, size lda by n.

On entry, the original n-by-n matrix A.

lda The leading dimension of the array a. lda≥n.

iseed Array, size 4.

On entry, the seed of the random number generator. The array elements
must be between 0 and 4095, and iseed[3] must be odd.

work Workspace array, size 2*n.

Output Parameters

a On exit, A is overwritten by UAU' for some random orthogonal matrix U.

iseed On exit, the seed is updated.

info If info = 0, the execution is successful.

If info < 0, the i -th parameter had an illegal value.

?larnd
Returns a random real number from a uniform or
normal distribution.

Syntax
float slarnd (lapack_int *idist, lapack_int *iseed);
double dlarnd (lapack_int *idist, lapack_int *iseed);
The data types for complex variations depend on whether or not the application links with Gnu Fortran
(gfortran) libraries.
For non-gfortran (libmkl_intel_*) interface libraries:
void clarnd (lapack_complex_float *res, lapack_int *idist, lapack_int *iseed);
void zlarnd (lapack_complex_double *res, lapack_int *idist, lapack_int *iseed);
For gfortran (libmkl_gf_*) interface libraries:
lapack_complex_float clarnd (lapack_int *idist, lapack_int *iseed);
lapack_complex_double zlarnd (lapack_int *idist, lapack_int *iseed);
To understand the difference between the non-gfortran and gfortran interfaces and when to use each of
them, see Dynamic Libraries in the lib/intel64 Directory in the oneAPI Math Kernel Library Developer Guide.

Include Files
• mkl.h

Description

The routine ?larnd returns a random number from a uniform or normal distribution.

1213
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

idist Specifies the distribution of the random numbers. For slarnd and dlanrd:

= 1: uniform (0,1)
= 2: uniform (-1,1)
= 3: normal (0,1).
For clarnd and zlanrd:

= 1: real and imaginary parts each uniform (0,1)

= 2: real and imaginary parts each uniform (-1,1)
= 3: real and imaginary parts each normal (0,1)
= 4: uniformly distributed on the disc abs(z) ≤ 1
= 5: uniformly distributed on the circle abs(z) = 1

iseed Array, size 4.

On entry, the seed of the random number generator. The array elements
must be between 0 and 4095, and iseed[3] must be odd.

Output Parameters

iseed On exit, the seed is updated.

Return Values
The function returns a random number (for complex variations libmkl_gf_* interface layer/libraries return
the result as the parameter res).

?larnv
Returns a vector of random numbers from a uniform
or normal distribution.

Syntax
lapack_int LAPACKE_slarnv (lapack_int idist , lapack_int * iseed , lapack_int n , float
* x );
lapack_int LAPACKE_dlarnv (lapack_int idist , lapack_int * iseed , lapack_int n ,
double * x );
lapack_int LAPACKE_clarnv (lapack_int idist , lapack_int * iseed , lapack_int n ,
lapack_complex_float * x );
lapack_int LAPACKE_zlarnv (lapack_int idist , lapack_int * iseed , lapack_int n ,
lapack_complex_double * x );

Include Files
• mkl.h

Description

The routine ?larnv returns a vector of n random real/complex numbers from a uniform or normal
distribution.

1214
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
This routine calls the auxiliary routine ?laruv to generate random real numbers from a uniform (0,1)
distribution, in batches of up to 128 using vectorisable code. The Box-Muller method is used to transform
numbers from a uniform to a normal distribution.

idist Specifies the distribution of the random numbers: for slarnv and dlarnv:

= 1: uniform (0,1)
= 2: uniform (-1,1)
= 3: normal (0,1).
for clarnv and zlarnv:

= 1: real and imaginary parts each uniform (0,1)

= 2: real and imaginary parts each uniform (-1,1)
= 3: real and imaginary parts each normal (0,1)
= 4: uniformly distributed on the disc abs(z) < 1
= 5: uniformly distributed on the circle abs(z) = 1

iseed Array, size (4).

On entry, the seed of the random number generator; the array elements
must be between 0 and 4095, and iseed(4) must be odd.

n The number of random numbers to be generated.

Output Parameters

x Array, size (n). The generated random numbers.

iseed On exit, the seed is updated.

Return Values
This function returns a value info.
If info = 0, the execution is successful.

?laror
Pre- or post-multiplies an m-by-n matrix by a random
orthogonal/unitary matrix.

Syntax
void slaror (char *side, char *init, lapack_int *m, lapack_int *n, float *a, lapack_int
*lda, lapack_int *iseed, float *x, lapack_int *info);
void dlaror (char *side, char *init, lapack_int *m, lapack_int *n, double *a, lapack_int
*lda, lapack_int *iseed, double *x, lapack_int *info);
void claror (char *side, char *init, lapack_int *m, lapack_int *n, lapack_complex *a,
lapack_int *lda, lapack_int *iseed, lapack_complex *x, lapack_int *info);

1215
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

void zlaror (char *side, char *init, lapack_int *m, lapack_int *n,
lapack_complex_double *a, lapack_int *lda, lapack_int *iseed, lapack_complex_double *x,
lapack_int *info);

Include Files
• mkl.h

Description

The routine ?laror pre- or post-multiplies an m-by-n matrix A by a random orthogonal or unitary matrix U,
overwriting A. A may optionally be initialized to the identity matrix before multiplying by U. U is generated
using the method of G.W. Stewart (SIAM J. Numer. Anal. 17, 1980, 403-409).

Input Parameters

side Specifies whether A is multiplied by U on the left or right.

for slaror and dlaror:

If side = 'L', multiply A on the left (premultiply) by U.

If side = 'R', multiply A on the right (postmultiply) by UT.

If side = 'C' or 'T', multiply A on the left by U and the right by UT.

for claror and zlaror:

If side = 'L', multiply A on the left (premultiply) by U.

If side = 'R', multiply A on the right (postmultiply) by UC>.

If side = 'C', multiply A on the left by U and the right by UC>

Ifside = 'T', multiply A on the left by U and the right by UT.

init Specifies whether or not a should be initialized to the identity matrix.

If init = 'I', initialize a to (a section of) the identity matrix before

applying U.
If init = 'N', no initialization. Apply U to the input matrix A.

init = 'I' generates square or rectangular orthogonal matrices:

For m = n and side = 'L' or 'R', the rows and the columns are
orthogonal to each other.
For rectangular matrices where m < n:

• If side = 'R', ?laror produces a dense matrix in which rows are

orthogonal and columns are not.
• If side= 'L', ?laror produces a matrix in which rows are orthogonal,
first m columns are orthogonal, and remaining columns are zero.

For rectangular matrices where m > n:

• If side = 'L', ?laror produces a dense matrix in which columns are

orthogonal and rows are not.
• If side = 'R', ?laror produces a matrix in which columns are
orthogonal, first m rows are orthogonal, and remaining rows are zero.

m The number of rows of A.

1216
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
n The number of columns of A.

a Array, size lda by n.

lda The leading dimension of the array a.

lda≥ max(1, m).

iseed Array, size (4).

On entry, specifies the seed of the random number generator. The array
elements must be between 0 and 4095; if not they are reduced mod 4096.
Also, iseed[3] must be odd.

x Workspace array, size (3*max( m, n )) .

Value of side Length of workspace

'L' 2*m + n
'R' 2*n + m
'C' or 'T' 3*n

Output Parameters

a On exit, overwritten
by UA ( if side = 'L' ),

by AU ( if side = 'R' ),

by UAUT ( if side = 'C' or 'T').

iseed The values of iseed are changed on exit, and can be used in the next call
to continue the same random number sequence.

info Array, size (4).

For slaror and dlaror:

If info = 0, the execution is successful.

If info < 0, the i -th parameter had an illegal value.

If info = 1, the random numbers generated by ?laror are bad.

For claror and zlaror:

If info = 0, the execution is successful.

If info = -1, side is not 'L', 'R', 'C', or 'T'.

If info = -3, if m is negative.

If info = -4, if m is negative or if side is 'C' or 'T' and n is not equal to

m.
If info = -6, if lda is less than m .

?larot
Applies a Givens rotation to two adjacent rows or
columns.

1217
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
void slarot (lapack_logical *lrows, lapack_logical *ileft, lapack_logical *iright,
lapack_int *nl, float *c, float *s, float *a, lapack_int *lda, float *xleft, float
*xright);
void dlarot (lapack_logical *lrows, lapack_logical *ileft, lapack_logical *iright,
lapack_int *nl, double *c, double *s, double *a, lapack_int *lda, double *xleft, double
*xright);
void clarot (lapack_logical *lrows, lapack_logical *ileft, lapack_logical *iright,
lapack_int *nl, lapack_complex *c, lapack_complex *s, lapack_complex *a, lapack_int
*lda, lapack_complex *xleft, lapack_complex *xright);
void zlarot (lapack_logical *lrows, lapack_logical *ileft, lapack_logical *iright,
lapack_int *nl, lapack_complex_double *c, lapack_complex_double *s,
lapack_complex_double *a, lapack_int *lda, lapack_complex_double *xleft,
lapack_complex_double *xright);

Include Files
• mkl.h

Description

The routine ?larot applies a Givens rotation to two adjacent rows or columns, where one element of the
first or last column or row is stored in some format other than GE so that elements of the matrix may be
used or modified for which no array element is provided.
One example is a symmetric matrix in SB format (bandwidth = 4), for which uplo = 'L'. Two adjacent rows
will have the format:

row j : C > C > C > C > C > . . . .

row j + 1 : C > C > C > C > C > . . . .
'*' indicates elements for which storage is provided.
'.' indicates elements for which no storage is provided, but are not necessarily zero; their values are
determined by symmetry.
' ' indicates elements which are required to be zero, and have no storage provided.
Those columns which have two '*' entries can be handled by srot (for slarot and clarot), or by
drot( for dlarot and zlarot).
Those columns which have no '*' entries can be ignored, since as long as the Givens rotations are carefully
applied to preserve symmetry, their values are determined.
Those columns which have one '*' have to be handled separately, by using separate variables p and q :

row j : C > C > C > C > C > p. . . .

row j + 1 : q C > C > C > C > C > . . . .
If element p is set correctly, ?larot rotates the column and sets p to its new value. The next call to ?larot
rotates columns j and j +1, and restore symmetry. The element q is zero at the beginning, and non-zero
after the rotation. Later, rotations would presumably be chosen to zero q out.
Typical Calling Sequences: rotating the i -th and (i +1)-st rows.

1218
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

lrows If lrows = 1, ?larot rotates two rows.

If lrows = 0, ?larot rotates two columns.

lleft If lleft = 1, xleft is used instead of the corresponding element of a for

the first element in the second row (if lrows = 0) or column (if lrows=1).

If lleft = 0, the corresponding element of a is used.

lright If lleft = 1, xright is used instead of the corresponding element of a for

the first element in the second row (if lrows = 0) or column (if lrows=1).

If lright = 0, the corresponding element of a is used.

nl The length of the rows (if lrows=1) or columns (if lrows=1) to be rotated.

If xleft or xright are used, the columns or rows they are in should be
included in nl, e.g., if lleft = lright = 1, then nl must be at least 2.

The number of rows or columns to be rotated exclusive of those involving

xleft and/or xright may not be negative, i.e., nl minus how many of
lleft and lright are 1 must be at least zero; if not, xerbla is called.

c, s Specify the Givens rotation to be applied.

If lrows = 1, then the matrix

is applied from the left.

If lrows = 0, then the transpose thereof is applied from the right.

a The array containing the rows or columns to be rotated. The first element of
a should be the upper left element to be rotated.

lda The "effective" leading dimension of a.

If a contains a matrix stored in GE or SY format, then this is just the

leading dimension of A.
If a contains a matrix stored in band (GB or SB) format, then this should be
one less than the leading dimension used in the calling routine. Thus, if a
in ?larot is of size lda*n, then a[(j - 1)*lda] would be the j -th
element in the first of the two rows to be rotated, and a[(j - 1)*lda +
1] would be the j -th in the second, regardless of how the array may be
stored in the calling routine. a cannot be dimensioned, because for band
format the row number may exceed lda, which is not legal FORTRAN.

If lrows = 1, then lda must be at least 1, otherwise it must be at least nl

minus the number of 1 values in xleft and xright.

xleft If lrows = 1, xleft is used and modified instead of a[1] (if lrows = 1)
or a[lda + 1] (if lrows = 0).

1219
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

xright If lright = 1, xright is used and modified instead of a[(nl - 1)*lda]

(if lrows = 1) or a[nl - 1] (if lrows = 0).

Output Parameters

a On exit, modified array A.

?lartgp
Generates a plane rotation.

Syntax
lapack_int LAPACKE_slartgp (float f, floatg, float* cs, float* sn, float* r);
lapack_int LAPACKE_dlartgp (doublef, doubleg, double* cs, double* sn, double* r);

Include Files
• mkl.h

Description
The routine generates a plane rotation so that

where cs2 + sn2 = 1

This is a slower, more accurate version of the BLAS Level 1 routine ?rotg, except for the following
differences:

• f and g are unchanged on return.

• If g=0, then cs=(+/-)1 and sn=0.
• If f=0 and g≠ 0, then cs=0 and sn=(+/-)1.
The sign is chosen so that r≥ 0.

f, g The first and second component of the vector to be rotated.

Output Parameters

cs The cosine of the rotation.

sn The sine of the rotation.

1220
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
r The nonzero component of the rotated vector.

Return Values
If info = 0, the execution is successful.

If info =-1,f is NaN.

If info = -2, g is NaN.

x, y The (1,1) and (1,2) entries of an upper bidiagonal matrix, respectively.

sigma Shift

Output Parameters

cs The cosine of the rotation.

1221
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

sn The sine of the rotation.

Return Values
If info = 0, the execution is successful.

If info = - 1, x is NaN.
If info = - 2, y is NaN.
If info = - 3, sigma is NaN.

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

type This parameter specifies the storage type of the input matrix.
= 'G': A is a full matrix.

= 'L': A is a lower triangular matrix.

= 'U': A is an upper triangular matrix.

= 'H': A is an upper Hessenberg matrix.

1222
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
= 'B': A is a symmetric band matrix with lower bandwidth kl and upper
bandwidth ku and with the only the lower half stored
= 'Q': A is a symmetric band matrix with lower bandwidth kl and upper
bandwidth ku and with the only the upper half stored.
= 'Z': A is a band matrix with lower bandwidth kl and upper bandwidth ku.
See description of the ?gbtrf function for storage details.

kl The lower bandwidth of A. Referenced only if type = 'B', 'Q' or 'Z'.

ku The upper bandwidth of A. Referenced only if type = 'B', 'Q' or 'Z'.

cfrom, cto The matrix A is multiplied by cto/cfrom. A(i,j) is computed without over/
underflow if the final result cto*A(i,j)/cfrom can be represented without
over/underflow. cfrom must be nonzero.

m The number of rows of the matrix A. m≥ 0.

n The number of columns of the matrix A. n≥ 0.

a Array, size (lda*n). The matrix to be multiplied by cto/cfrom. See type for
the storage type.

lda The leading dimension of the array a.

lda≥ max(1,m).

Output Parameters

a The multiplied matrix A.

info If info = 0 - successful exit

If info = -i < 0, the i-th argument had an illegal value.

sqre Specifies the column dimension of the bidiagonal matrix.

If sqre = 0: the bidiagonal matrix has column dimension m = n.

If sqre = 1: the bidiagonal matrix has column dimension m = n+1.

d Array, DIMENSION (n). On entry, d contains the main diagonal of the

bidiagonal matrix.

e Array, DIMENSION (m-1). Contains the subdiagonal entries of the bidiagonal

matrix. On exit, e is destroyed.

ldu On entry, leading dimension of the output array u.

ldvt On entry, leading dimension of the output array vt.

smlsiz On entry, maximum size of the subproblems at the bottom of the

computation tree.

iwork Workspace array, dimension must be at least (8n).

work Workspace array, dimension must be at least (3m2+2m).

Output Parameters

d On exit d, If info = 0, contains singular values of the bidiagonal matrix.

u Array, DIMENSION at least (ldq, n). On exit, u contains the left singular
vectors.

vt Array, DIMENSION at least (ldvt, m). On exit, vtT contains the right singular
vectors.

info If info = 0: successful exit.

If info = -i < 0, the i-th argument had an illegal value.

If info = 1, a singular value did not converge.

?lasd1
Computes the SVD of an upper bidiagonal matrix B of
the specified size. Used by ?bdsdc.

1224
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void slasd1( lapack_int *nl, lapack_int *nr, lapack_int *sqre, float *d, float *alpha,
float *beta, float *u, lapack_int *ldu, float *vt, lapack_int *ldvt, lapack_int *idxq,
lapack_int *iwork, float *work, lapack_int *info );
void dlasd1( lapack_int *nl, lapack_int *nr, lapack_int *sqre, double *d, double *alpha,
double *beta, double *u, lapack_int *ldu, double *vt, lapack_int *ldvt, lapack_int
*idxq, lapack_int *iwork, double *work, lapack_int *info );

Include Files
• mkl.h

Description

The routine computes the SVD of an upper bidiagonal n-by-m matrix B, where n = nl + nr + 1 and m = n
+ sqre.
The routine ?lasd1 is called from ?lasd0.

A related subroutine ?lasd7 handles the case in which the singular values (and the singular vectors in
factored form) are desired.
?lasd1 computes the SVD as follows:

= U(out)*(D(out) 0)*VT(out)
whereZT = (Z1TaZ2Tb) = uT*VTT, and u is a vector of dimension m with alpha and beta in the nl+1 and nl
+2-th entries and zeros elsewhere; and the entry b is empty if sqre = 0.

The left singular vectors of the original matrix are stored in u, and the transpose of the right singular vectors
are stored in vt, and the singular values are in d. The algorithm consists of three stages:

1. The first stage consists of deflating the size of the problem when there are multiple singular values or
when there are zeros in the Z vector. For each such occurrence the dimension of the secular equation
problem is reduced by one. This stage is performed by the routine ?lasd2.
2. The second stage consists of calculating the updated singular values. This is done by finding the square
roots of the roots of the secular equation via the routine ?lasd4 (as called by ?lasd3). This routine
also calculates the singular vectors of the current problem.
3. The final stage consists of computing the updated singular vectors directly using the updated singular
values. The singular vectors for the current problem are multiplied with the singular vectors from the
overall problem.

Input Parameters

nl The row dimension of the upper block.

nl≥ 1.

1225
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

nr The row dimension of the lower block.

nr≥ 1.

sqre If sqre = 0: the lower block is an nr-by-nr square matrix.

If sqre = 1: the lower block is an nr-by-(nr+1) rectangular matrix. The

bidiagonal matrix has row dimension n = nl + nr + 1, and column
dimension m = n + sqre.

d Array, DIMENSION (nl+nr+1). n = nl+nr+1. On entry d(1:nl,1:nl)

contains the singular values of the upper block; and d(nl+2:n) contains
the singular values of the lower block.

alpha Contains the diagonal element associated with the added row.

beta Contains the off-diagonal element associated with the added row.

u Array, DIMENSION (ldu, n). On entry u(1:nl, 1:nl) contains the left
singular vectors of the upper block; u(nl+2:n, nl+2:n) contains the left
singular vectors of the lower block.

ldu The leading dimension of the array U.

ldu≥ max(1, n).

vt Array, DIMENSION (ldvt, m), where m = n + sqre.

On entry vt(1:nl+1, 1:nl+1)T contains the right singular vectors of the

upper block; vt(nl+2:m, nl+2:m)T contains the right singular vectors of
the lower block.

ldvt The leading dimension of the array vt.

ldvt≥ max(1, M).

iwork Workspace array, DIMENSION (4n).

work Workspace array, DIMENSION (3m2 + 2m).

Output Parameters

d On exit d(1:n) contains the singular values of the modified matrix.

alpha On exit, the diagonal element associated with the added row deflated by
max( abs( alpha ), abs( beta ), abs( D(I) ) ), I = 1,n.

beta On exit, the off-diagonal element associated with the added row deflated by
max( abs( alpha ), abs( beta ), abs( D(I) ) ), I = 1,n.

u On exit u contains the left singular vectors of the bidiagonal matrix.

vt On exit vtT contains the right singular vectors of the bidiagonal matrix.

idxq Array, DIMENSION (n). Contains the permutation which will reintegrate the
subproblem just solved back into sorted order, that is, d(idxq( i = 1,
n )) will be in ascending order.

info If info = 0: successful exit.

If info = -i < 0, the i-th argument had an illegal value.

1226
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = 1, a singular value did not converge.

?lasd2
Merges the two sets of singular values together into a
single sorted set. Used by ?bdsdc.

Syntax
void slasd2( lapack_int *nl, lapack_int *nr, lapack_int *sqre, lapack_int *k, float *d,
float *z, float *alpha, float *beta, float *u, lapack_int *ldu, float *vt, lapack_int
*ldvt, float *dsigma, float *u2, lapack_int *ldu2, float *vt2, lapack_int *ldvt2,
lapack_int *idxp, lapack_int *idx, lapack_int *idxq, lapack_int *coltyp, lapack_int
*info );
void dlasd2( lapack_int *nl, lapack_int *nr, lapack_int *sqre, lapack_int *k, double *d,
double *z, double *alpha, double *beta, double *u, lapack_int *ldu, double *vt,
lapack_int *ldvt, double *dsigma, double *u2, lapack_int *ldu2, double *vt2, lapack_int
*ldvt2, lapack_int *idxp, lapack_int *idx, lapack_int *idxq, lapack_int *coltyp,
lapack_int *info );

Include Files
• mkl.h

Description

The routine ?lasd2 merges the two sets of singular values together into a single sorted set. Then it tries to
deflate the size of the problem. There are two ways in which deflation can occur: when two or more singular
values are close together or if there is a tiny entry in the Z vector. For each such occurrence the order of the
related secular equation problem is reduced by one.
The routine ?lasd2 is called from ?lasd1.

Input Parameters

nl The row dimension of the upper block.

nl≥ 1.

nr The row dimension of the lower block.

nr≥ 1.

sqre If sqre = 0): the lower block is an nr-by-nr square matrix

If sqre = 1): the lower block is an nr-by-(nr+1) rectangular matrix. The

bidiagonal matrix has n = nl + nr + 1 rows and m = n + sqre≥n
columns.

d Array, DIMENSION (n). On entry d contains the singular values of the two
submatrices to be combined.

alpha Contains the diagonal element associated with the added row.

beta Contains the off-diagonal element associated with the added row.

1227
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

u Array, DIMENSION (ldu, n). On entry u contains the left singular vectors of
two submatrices in the two square blocks with corners at (1,1), (nl, nl), and
(nl+2, nl+2), (n,n).

ldu The leading dimension of the array u.

ldu≥n.

ldu2 The leading dimension of the output array u2. ldu2≥n.

vt Array, DIMENSION (ldvt, m). On entry, vtT contains the right singular
vectors of two submatrices in the two square blocks with corners at (1,1),
(nl+1, nl+1), and (nl+2, nl+2), (m, m).

ldvt The leading dimension of the array vt. ldvt≥m.

ldvt2 The leading dimension of the output array vt2. ldvt2≥m.

idxp Workspace array, DIMENSION (n). This will contain the permutation used to
place deflated values of D at the end of the array. On output idxp(2:k)
points to the nondeflated d-values and idxp(k+1:n) points to the deflated
singular values.

idx Workspace array, DIMENSION (n). This will contain the permutation used to
sort the contents of d into ascending order.

coltyp Workspace array, DIMENSION (n). As workspace, this array contains a label
that indicates which of the following types a column in the u2 matrix or a
row in the vt2 matrix is:
1 : non-zero in the upper half only
2 : non-zero in the lower half only
3 : dense
4 : deflated.

idxq Array, DIMENSION (n). This parameter contains the permutation that
separately sorts the two sub-problems in D in the ascending order. Note
that entries in the first half of this permutation must first be moved one
position backwards and entries in the second half must have nl+1 added to
their values.

Output Parameters

k Contains the dimension of the non-deflated matrix, This is the order of the
related secular equation. 1 ≤ k ≤ n.

d On exit D contains the trailing (n-k) updated singular values (those which
were deflated) sorted into increasing order.

u On exit u contains the trailing (n-k) updated left singular vectors (those
which were deflated) in its last n-k columns.

z Array, DIMENSION (n). On exit, z contains the updating row vector in the
secular equation.

dsigma Array, DIMENSION (n). Contains a copy of the diagonal elements (k-1
singular values and one zero) in the secular equation.

1228
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
u2 Array, DIMENSION (ldu2, n). Contains a copy of the first k-1 left singular
vectors which will be used by ?lasd3 in a matrix multiply (?gemm) to solve
for the new left singular vectors. u2 is arranged into four blocks. The first
block contains a column with 1 at nl+1 and zero everywhere else; the
second block contains non-zero entries only at and above nl; the third
contains non-zero entries only below nl+1; and the fourth is dense.

vt On exit, vtT contains the trailing (n-k) updated right singular vectors (those
which were deflated) in its last n-k columns. In case sqre =1, the last row
of vt spans the right null space.

vt2 Array, DIMENSION (ldvt2, n). vt2T contains a copy of the first k right
singular vectors which will be used by ?lasd3 in a matrix multiply (?gemm)
to solve for the new right singular vectors. vt2 is arranged into three blocks.
The first block contains a row that corresponds to the special 0 diagonal
element in sigma; the second block contains non-zeros only at and before
nl +1; the third block contains non-zeros only at and after nl +2.

idxc Array, DIMENSION (n). This will contain the permutation used to arrange the
columns of the deflated u matrix into three groups: the first group contains
non-zero entries only at and above nl, the second contains non-zero entries
only below nl+2, and the third is dense.

coltyp On exit, it is an array of dimension 4, with coltyp(i) being the dimension of

the i-th type columns.

info If info = 0): successful exit

If info = -i < 0, the i-th argument had an illegal value.

?lasd3
Finds all square roots of the roots of the secular
equation, as defined by the values in D and Z, and
then updates the singular vectors by matrix
multiplication. Used by ?bdsdc.

Syntax
void slasd3( lapack_int *nl, lapack_int *nr, lapack_int *sqre, lapack_int *k, float *d,
float *q, lapack_int *ldq, float *dsigma, float *u, lapack_int *ldu, float *u2,
lapack_int *ldu2, float *vt, lapack_int *ldvt, float *vt2, lapack_int *ldvt2,
lapack_int *idxc, lapack_int *ctot, float *z, lapack_int *info );
void dlasd3( lapack_int *nl, lapack_int *nr, lapack_int *sqre, lapack_int *k, double *d,
double *q, lapack_int *ldq, double *dsigma, double *u, lapack_int *ldu, double *u2,
lapack_int *ldu2, double *vt, lapack_int *ldvt, double *vt2, lapack_int *ldvt2,
lapack_int *idxc, lapack_int *ctot, double *z, lapack_int *info );

Include Files
• mkl.h

Description

The routine ?lasd3 finds all the square roots of the roots of the secular equation, as defined by the values in
D and Z.

1229
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

It makes the appropriate calls to ?lasd4 and then updates the singular vectors by matrix multiplication.

The routine ?lasd3 is called from ?lasd1.

Input Parameters

nl The row dimension of the upper block.

nl≥ 1.

nr The row dimension of the lower block.

nr≥ 1.

sqre If sqre = 0): the lower block is an nr-by-nr square matrix.

If sqre = 1): the lower block is an nr-by-(nr+1) rectangular matrix. The

bidiagonal matrix has n = nl + nr + 1 rows and m = n + sqre≥n
columns.

k The size of the secular equation, 1 ≤ k ≤ n.

q Workspace array, DIMENSION at least (ldq, k).

ldq The leading dimension of the array Q.

ldq≥k.

dsigma Array, DIMENSION (k). The first k elements of this array contain the old
roots of the deflated updating problem. These are the poles of the secular
equation.

ldu The leading dimension of the array u.

ldu≥n.

u2 Array, DIMENSION (ldu2, n).

The first k columns of this matrix contain the non-deflated left singular
vectors for the split problem.

ldu2 The leading dimension of the array u2.

ldu2≥n.

ldvt The leading dimension of the array vt.

ldvt≥n.

vt2 Array, DIMENSION (ldvt2, n).

The first k columns of vt2' contain the non-deflated right singular vectors
for the split problem.

ldvt2 The leading dimension of the array vt2.

ldvt2≥n.

idxc Array, DIMENSION (n).

The permutation used to arrange the columns of u (and rows of vt) into
three groups: the first group contains non-zero entries only at and above
(or before) nl +1; the second contains non-zero entries only at and below
(or after) nl+2; and the third is dense. The first column of u and the row of

1230
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
vt are treated separately, however. The rows of the singular vectors found
by ?lasd4 must be likewise permuted before the matrix multiplies can take
place.

ctot Array, DIMENSION (4). A count of the total number of the various types of
columns in u (or rows in vt), as described in idxc.
The fourth column type is any column which has been deflated.

z Array, DIMENSION (k). The first k elements of this array contain the
components of the deflation-adjusted updating row vector.

Output Parameters

d Array, DIMENSION (k). On exit the square roots of the roots of the secular
equation, in ascending order.

u Array, DIMENSION (ldu, n).

The last n - k columns of this matrix contain the deflated left singular
vectors.

vt Array, DIMENSION (ldvt, m).

The last m - k columns of vt' contain the deflated right singular vectors.

vt2 Destroyed on exit.

z Destroyed on exit.

info If info = 0): successful exit.

If info = -i < 0, the i-th argument had an illegal value.

If info = 1, an singular value did not converge.

Application Notes
This code makes very mild assumptions about floating point arithmetic. It will work on machines with a guard
digit in add/subtract, or on those binary machines without guard digits which subtract like the Cray XMP, Cray
YMP, Cray C 90, or Cray 2. It could conceivably fail on hexadecimal or decimal machines without guard digits,
but we know of none.

?lasd4
Computes the square root of the i-th updated
eigenvalue of a positive symmetric rank-one
modification to a positive diagonal matrix. Used
by ?bdsdc.

Syntax
void slasd4( lapack_int *n, lapack_int *i, float *d, float *z, float *delta, float *rho,
float *sigma, float *work, lapack_int *info);
void dlasd4( lapack_int *n, lapack_int *i, double *d, double *z, double *delta, double
*rho, double *sigma, double *work, lapack_int *info);

Include Files
• mkl.h

1231
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Description

The routine computes the square root of the i-th updated eigenvalue of a positive symmetric rank-one
modification to a positive diagonal matrix whose entries are given as the squares of the corresponding
entries in the array d, and that 0 ≤ d(i) < d(j) for i < j and that rho > 0. This is arranged by the
calling routine, and is no loss in generality. The rank-one modified system is thus
diag(d)*diag(d) + rho*Z*ZT,
where the Euclidean norm of Z is equal to 1.The method consists of approximating the rational functions in
the secular equation by simpler interpolating rational functions.

Input Parameters

n The length of all arrays.

i The index of the eigenvalue to be computed. 1 ≤ i ≤ n.

d Array, DIMENSION (n).

The original eigenvalues. They must be in order, 0 ≤ d(i) < d(j) for i <
j.

z Array, DIMENSION (n).

The components of the updating vector.

rho The scalar in the symmetric updating formula.

work Workspace array, DIMENSION (n ).

If n≠ 1, work contains (d(j) + sigma_i) in its j-th component.

If n = 1, then work( 1 ) = 1.

Output Parameters

delta Array, DIMENSION (n).

If n≠ 1, delta contains (d(j) - sigma_i) in its j-th component.

If n = 1, then delta (1) = 1. The vector delta contains the information

necessary to construct the (singular) eigenvectors.

sigma The computed sigma_i, the i-th updated eigenvalue.

info = 0: successful exit

> 0: If info = 1, the updating process failed.

?lasd5
Computes the square root of the i-th eigenvalue of a
positive symmetric rank-one modification of a 2-by-2
diagonal matrix.Used by ?bdsdc.

Syntax
void slasd5( lapack_int *i, float *d, float *z, float *delta, float *rho, float *dsigma,
float *work );

1232
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void dlasd5( lapack_int *i, double *d, double *z, double *delta, double *rho, double
*dsigma, double *work );

Include Files
• mkl.h

Description

The routine computes the square root of the i-th eigenvalue of a positive symmetric rank-one modification of
a 2-by-2 diagonal matrix diag(d)*diag(d)+rho*Z*ZT

The diagonal entries in the array d must satisfy 0 ≤ d(i) < d(j) for i<i, rho mustbe greater than 0, and
that the Euclidean norm of the vector Z is equal to 1.

Input Parameters

i The index of the eigenvalue to be computed. i = 1 or i = 2.

d Array, dimension (2 ).

The original eigenvalues, 0 ≤ d(1) < d(2).

z Array, dimension ( 2 ).

The components of the updating vector.

rho The scalar in the symmetric updating formula.

work Workspace array, dimension ( 2 ). Contains (d(j) + sigma_i) in its j-th

component.

Output Parameters

delta Array, dimension ( 2 ).

Contains (d(j) - sigma_i) in its j-th component. The vector delta

contains the information necessary to construct the eigenvectors.

dsigma The computed sigma_i, the i-th updated eigenvalue.

?lasd6
Computes the SVD of an updated upper bidiagonal
matrix obtained by merging two smaller ones by
appending a row. Used by ?bdsdc.

Syntax
void slasd6( lapack_int *icompq, lapack_int *nl, lapack_int *nr, lapack_int *sqre,
float *d, float *vf, float *vl, float *alpha, float *beta, lapack_int *idxq, lapack_int
*perm, lapack_int *givptr, lapack_int *givcol, lapack_int *ldgcol, float *givnum,
lapack_int *ldgnum, float *poles, float *difl, float *difr, float *z, lapack_int *k,
float *c, float *s, float *work, lapack_int *iwork, lapack_int *info );
void dlasd6( lapack_int *icompq, lapack_int *nl, lapack_int *nr, lapack_int *sqre,
double *d, double *vf, double *vl, double *alpha, double *beta, lapack_int *idxq,
lapack_int *perm, lapack_int *givptr, lapack_int *givcol, lapack_int *ldgcol, double

1233
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

*givnum, lapack_int *ldgnum, double *poles, double *difl, double *difr, double *z,
lapack_int *k, double *c, double *s, double *work, lapack_int *iwork, lapack_int
*info );

Include Files
• mkl.h

Description

The routine ?lasd6 computes the SVD of an updated upper bidiagonal matrix B obtained by merging two
smaller ones by appending a row. This routine is used only for the problem which requires all singular values
and optionally singular vector matrices in factored form. B is an n-by-m matrix with n = nl + nr + 1 and m
= n + sqre. A related subroutine, ?lasd1, handles the case in which all singular values and singular vectors
of the bidiagonal matrix are desired. ?lasd6 computes the SVD as follows:

= U(out)*(D(out)*VT(out)
where Z' = (Z1' aZ2' b) = u'*VT', and u is a vector of dimension m with alpha and beta in the nl+1
and nl+2-th entries and zeros elsewhere; and the entry b is empty if sqre = 0.

The singular values of B can be computed using D1, D2, the first components of all the right singular vectors
of the lower block, and the last components of all the right singular vectors of the upper block. These
components are stored and updated in vf and vl, respectively, in ?lasd6. Hence U and VT are not explicitly
referenced.
The singular values are stored in D. The algorithm consists of two stages:

1. The first stage consists of deflating the size of the problem when there are multiple singular values or if
there is a zero in the Z vector. For each such occurrence the dimension of the secular equation problem
is reduced by one. This stage is performed by the routine ?lasd7.
2. The second stage consists of calculating the updated singular values. This is done by finding the roots
of the secular equation via the routine ?lasd4 (as called by ?lasd8). This routine also updates vf and
vl and computes the distances between the updated singular values and the old singular
values. ?lasd6 is called from ?lasda.

Input Parameters

icompq Specifies whether singular vectors are to be computed in factored form:

= 0: Compute singular values only
= 1: Compute singular vectors in factored form as well.

nl The row dimension of the upper block.

nl≥ 1.

nr The row dimension of the lower block.

1234
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
nr≥ 1.

sqre = 0: the lower block is an nr-by-nr square matrix.

= 1: the lower block is an nr-by-(nr+1) rectangular matrix.
The bidiagonal matrix has row dimension n=nl+nr+1, and column
dimension m = n + sqre.

d Array, dimension ( nl+nr+1 ). On entry d(1:nl,1:nl) contains the singular

values of the upper block, and d(nl+2:n) contains the singular values of the
lower block.

vf Array, dimension ( m ).

On entry, vf(1:nl+1) contains the first components of all right singular

vectors of the upper block; and vf(nl+2:m)
contains the first components of all right singular vectors of the lower block.

vl Array, dimension ( m ).

On entry, vl(1:nl+1) contains the last components of all right singular

vectors of the upper block; and vl(nl+2:m) contains the last components of
all right singular vectors of the lower block.

alpha Contains the diagonal element associated with the added row.

beta Contains the off-diagonal element associated with the added row.

ldgcol The leading dimension of the output array givcol, must be at least n.

ldgnum The leading dimension of the output arrays givnum and poles, must be at
least n.

work Workspace array, dimension ( 4m ).

iwork Workspace array, dimension ( 3n ).

Output Parameters

d On exit d(1:n) contains the singular values of the modified matrix.

vf On exit, vf contains the first components of all right singular vectors of the
bidiagonal matrix.

vl On exit, vl contains the last components of all right singular vectors of the
bidiagonal matrix.

alpha On exit, the diagonal element associated with the added row deflated by
max(abs(alpha), abs(beta), abs(D(I))), I = 1,n.

beta On exit, the off-diagonal element associated with the added row deflated by
max(abs(alpha), abs(beta), abs(D(I))), I = 1,n.

idxq Array, dimension (n). This contains the permutation which will reintegrate
the subproblem just solved back into sorted order, that is, d( idxq( i =
1, n ) ) will be in ascending order.

perm Array, dimension (n). The permutations (from deflation and sorting) to be
applied to each block. Not referenced if icompq = 0.

1235
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

givptr The number of Givens rotations which took place in this subproblem. Not
referenced if icompq = 0.

givcol Array, dimension ( ldgcol, 2 ). Each pair of numbers indicates a pair of

columns to take place in a Givens rotation. Not referenced if icompq = 0.

givnum Array, dimension ( ldgnum, 2 ). Each number indicates the C or S value to

be used in the corresponding Givens rotation. Not referenced if icompq =
0.

poles Array, dimension ( ldgnum, 2 ). On exit, poles(1,*) is an array containing

the new singular values obtained from solving the secular equation, and
poles(2,*) is an array containing the poles in the secular equation. Not
referenced if icompq = 0.

difl Array, dimension (n). On exit, difl(i) is the distance between i-th updated
(undeflated) singular value and the i-th (undeflated) old singular value.

difr Array, dimension (ldgnum, 2 ) if icompq = 1 and dimension (n) if

icompq = 0.
On exit, difr(i, 1) is the distance between i-th updated (undeflated) singular
value and the i+1-th (undeflated) old singular value. If icompq = 1,
difr(1: k, 2) is an array containing the normalizing factors for the right
singular vector matrix.
See ?lasd8 for details on difl and difr.

z Array, dimension ( m ).

The first elements of this array contain the components of the deflation-
adjusted updating row vector.

k Contains the dimension of the non-deflated matrix. This is the order of the
related secular equation. 1 ≤ k ≤ n.

c c contains garbage if sqre =0 and the C-value of a Givens rotation related

to the right null space if sqre = 1.

s s contains garbage if sqre =0 and the S-value of a Givens rotation related

to the right null space if sqre = 1.

info = 0: successful exit.

< 0: if info = -i, the i-th argument had an illegal value.

> 0: if info = 1, an singular value did not converge

?lasd7
Merges the two sets of singular values together into a
single sorted set. Then it tries to deflate the size of
the problem. Used by ?bdsdc.

1236
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void slasd7( lapack_int *icompq, lapack_int *nl, lapack_int *nr, lapack_int *sqre,
lapack_int *k, float *d, float *z, float *zw, float *vf, float *vfw, float *vl, float
*vlw, float *alpha, float *beta, float *dsigma, lapack_int *idx, lapack_int *idxp,
lapack_int *idxq, lapack_int *perm, lapack_int *givptr, lapack_int *givcol, lapack_int
*ldgcol, float *givnum, lapack_int *ldgnum, float *c, float *s, lapack_int *info );
void dlasd7( lapack_int *icompq, lapack_int *nl, lapack_int *nr, lapack_int *sqre,
lapack_int *k, double *d, double *z, double *zw, double *vf, double *vfw, double *vl,
double *vlw, double *alpha, double *beta, double *dsigma, lapack_int *idx, lapack_int
*idxp, lapack_int *idxq, lapack_int *perm, lapack_int *givptr, lapack_int *givcol,
lapack_int *ldgcol, double *givnum, lapack_int *ldgnum, double *c, double *s,
lapack_int *info );

Include Files
• mkl.h

Description

The routine ?lasd7 merges the two sets of singular values together into a single sorted set. Then it tries to
deflate the size of the problem. There are two ways in which deflation can occur: when two or more singular
values are close together or if there is a tiny entry in the Z vector. For each such occurrence the order of the
related secular equation problem is reduced by one. ?lasd7 is called from ?lasd6.

Input Parameters

icompq Specifies whether singular vectors are to be computed in compact form, as

follows:
= 0: Compute singular values only.
= 1: Compute singular vectors of upper bidiagonal matrix in compact form.

nl The row dimension of the upper block.

nl≥ 1.

nr The row dimension of the lower block.

nr≥ 1.

sqre = 0: the lower block is an nr-by-nr square matrix.

= 1: the lower block is an nr-by-(nr+1) rectangular matrix. The bidiagonal
matrix has n = nl + nr + 1 rows and m = n + sqre≥n columns.

d Array, DIMENSION (n). On entry d contains the singular values of the two
submatrices to be combined.

zw Array, DIMENSION ( m ).

Workspace for z.

vf Array, DIMENSION ( m ). On entry, vf(1:nl+1) contains the first

components of all right singular vectors of the upper block; and vf(nl
+2:m) contains the first components of all right singular vectors of the
lower block.

1237
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

vfw Array, DIMENSION ( m ).

Workspace for vf.

vl Array, DIMENSION ( m ).

On entry, vl(1:nl+1) contains the last components of all right singular

vectors of the upper block; and vl(nl+2:m) contains the last components
of all right singular vectors of the lower block.

VLW Array, DIMENSION ( m ).

Workspace for VL.

alpha REAL for slasd7

DOUBLE PRECISION for dlasd7.
Contains the diagonal element associated with the added row.

beta Contains the off-diagonal element associated with the added row.

idx Workspace array, DIMENSION (n). This will contain the permutation used to
sort the contents of d into ascending order.

idxp Workspace array, DIMENSION (n). This will contain the permutation used to
place deflated values of d at the end of the array.

idxq Array, DIMENSION (n).

This contains the permutation which separately sorts the two sub-problems
in d into ascending order. Note that entries in the first half of this
permutation must first be moved one position backward; and entries in the
second half must first have nl+1 added to their values.

ldgcol The leading dimension of the output array givcol, must be at least n.

ldgnum The leading dimension of the output array givnum, must be at least n.

Output Parameters

k Contains the dimension of the non-deflated matrix, this is the order of the
related secular equation.
1 ≤ k ≤ n.

d On exit, d contains the trailing (n-k) updated singular values (those which
were deflated) sorted into increasing order.

z Array, DIMENSION (m).

On exit, Z contains the updating row vector in the secular equation.

vf On exit, vf contains the first components of all right singular vectors of the
bidiagonal matrix.

vl On exit, vl contains the last components of all right singular vectors of the
bidiagonal matrix.

dsigma Array, DIMENSION (n). Contains a copy of the diagonal elements (k-1
singular values and one zero) in the secular equation.

1238
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
idxp On output, idxp(2: k) points to the nondeflated d-values and idxp( k+1:n)
points to the deflated singular values.

perm Array, DIMENSION (n).

The permutations (from deflation and sorting) to be applied to each singular

block. Not referenced if icompq = 0.

givptr The number of Givens rotations which took place in this subproblem. Not
referenced if icompq = 0.

givcol Array, DIMENSION ( ldgcol, 2 ). Each pair of numbers indicates a pair of

columns to take place in a Givens rotation. Not referenced if icompq = 0.

givnum Array, DIMENSION ( ldgnum, 2 ). Each number indicates the C or S value to

be used in the corresponding Givens rotation. Not referenced if icompq =
0.

c If sqre =0, then c contains garbage, and if sqre = 1, then c contains C-

value of a Givens rotation related to the right null space.

S If sqre =0, then s contains garbage, and if sqre = 1, then s contains S-

value of a Givens rotation related to the right null space.

info = 0: successful exit.

< 0: if info = -i, the i-th argument had an illegal value.

?lasd8
Finds the square roots of the roots of the secular
equation, and stores, for each element in D, the
distance to its two nearest poles. Used by ?bdsdc.

Syntax
void slasd8( lapack_int *icompq, lapack_int *k, float *d, float *z, float *vf, float
*vl, float *difl, float *difr, lapack_int *lddifr, float *dsigma, float *work,
lapack_int *info );
void dlasd8( lapack_int *icompq, lapack_int *k, double *d, double *z, double *vf, double
*vl, double *difl, double *difr, lapack_int *lddifr, double *dsigma, double *work,
lapack_int *info );

Include Files
• mkl.h

Description

The routine ?lasd8 finds the square roots of the roots of the secular equation, as defined by the values in
dsigma and z. It makes the appropriate calls to ?lasd4, and stores, for each element in d, the distance to its
two nearest poles (elements in dsigma). It also updates the arrays vf and vl, the first and last components of
all the right singular vectors of the original bidiagonal matrix. ?lasd8 is called from ?lasd6.

1239
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

icompq Specifies whether singular vectors are to be computed in factored form in

the calling routine:
= 0: Compute singular values only.
= 1: Compute singular vectors in factored form as well.

k The number of terms in the rational function to be solved by ?lasd4. k≥ 1.

z Array, DIMENSION ( k ).

The first k elements of this array contain the components of the deflation-
adjusted updating row vector.

vf Array, DIMENSION ( k ).

On entry, vf contains information passed through dbede8.

vl Array, DIMENSION ( k ). On entry, vl contains information passed through

dbede8.

lddifr The leading dimension of the output array difr, must be at least k.

dsigma Array, DIMENSION ( k ).

The first k elements of this array contain the old roots of the deflated
updating problem. These are the poles of the secular equation.

work Workspace array, DIMENSION at least (3k).

Output Parameters

d Array, DIMENSION ( k ).

On output, D contains the updated singular values.

z Updated on exit.

vf On exit, vf contains the first k components of the first components of all

right singular vectors of the bidiagonal matrix.

vl On exit, vl contains the first k components of the last components of all

right singular vectors of the bidiagonal matrix.

difl Array, DIMENSION ( k ). On exit, difl(i) = d(i) - dsigma(i).

difr Array,
DIMENSION ( lddifr, 2 ) if icompq = 1 and
DIMENSION ( k ) if icompq = 0.
On exit, difr(i,1) = d(i) - dsigma(i+1), difr(k,1) is not defined
and will not be referenced. If icompq = 1, difr(1:k,2) is an array
containing the normalizing factors for the right singular vector matrix.

dsigma The elements of this array may be very slightly altered in value.

info = 0: successful exit.

< 0: if info = -i, the i-th argument had an illegal value.

1240
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
> 0: If info = 1, an singular value did not converge.

?lasd9
Finds the square roots of the roots of the secular
equation, and stores, for each element in D, the
distance to its two nearest poles. Used by ?bdsdc.

Syntax
void slasd9( lapack_int *icompq, lapack_int *k, float *d, float *z, float *vf, float
*vl, float *difl, float *difr, float *dsigma, float *work, lapack_int *info );
void dlasd9( lapack_int *icompq, lapack_int *k, double *d, double *z, double *vf, double
*vl, double *difl, double *difr, double *dsigma, double *work, lapack_int *info );

Include Files
• mkl.h

Description

The routine ?lasd9 finds the square roots of the roots of the secular equation, as defined by the values in
dsigma and z. It makes the appropriate calls to ?lasd4, and stores, for each element in d, the distance to its
two nearest poles (elements in dsigma). It also updates the arrays vf and vl, the first and last components of
all the right singular vectors of the original bidiagonal matrix. ?lasd9 is called from ?lasd7.

Input Parameters

icompq Specifies whether singular vectors are to be computed in factored form in

the calling routine:
If icompq = 0, compute singular values only;

If icompq = 1, compute singular vector matrices in factored form also.

k The number of terms in the rational function to be solved by slasd4. k≥ 1.

dsigma Array, DIMENSION(k).

The first k elements of this array contain the old roots of the deflated
updating problem. These are the poles of the secular equation.

z Array, DIMENSION (k). The first k elements of this array contain the
components of the deflation-adjusted updating row vector.

vf Array, DIMENSION(k). On entry, vf contains information passed through

sbede8.

vl Array, DIMENSION(k). On entry, vl contains information passed through

sbede8.

work Workspace array, DIMENSION at least (3k).

Output Parameters

d Array, DIMENSION(k). d(i) contains the updated singular values.

1241
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

vf On exit, vf contains the first k components of the first components of all

right singular vectors of the bidiagonal matrix.

vl On exit, vl contains the first k components of the last components of all

right singular vectors of the bidiagonal matrix.

difl Array, DIMENSION (k).

On exit, difl(i) = d(i) - dsigma(i).

difr Array,
DIMENSION (ldu, 2) if icompq =1 and
DIMENSION (k) if icompq = 0.
On exit, difr(i, 1) = d(i) - dsigma(i+1), difr(k, 1) is not defined
and will not be referenced.
If icompq = 1, difr(1:k, 2) is an array containing the normalizing
factors for the right singular vector matrix.

info = 0: successful exit.

< 0: if info = -i, the i-th argument had an illegal value.

> 0: If info = 1, an singular value did not converge

?lasda
Computes the singular value decomposition (SVD) of a
real upper bidiagonal matrix with diagonal d and off-
diagonal e. Used by ?bdsdc.

Syntax
void slasda( lapack_int *icompq, lapack_int *smlsiz, lapack_int *n, lapack_int *sqre,
float *d, float *e, float *u, lapack_int *ldu, float *vt, lapack_int *k, float *difl,
float *difr, float *z, float *poles, lapack_int *givptr, lapack_int *givcol, lapack_int
*ldgcol, lapack_int *perm, float *givnum, float *c, float *s, float *work, lapack_int
*iwork, lapack_int *info );
void dlasda( lapack_int *icompq, lapack_int *smlsiz, lapack_int *n, lapack_int *sqre,
double *d, double *e, double *u, lapack_int *ldu, double *vt, lapack_int *k, double
*difl, double *difr, double *z, double *poles, lapack_int *givptr, lapack_int *givcol,
lapack_int *ldgcol, lapack_int *perm, double *givnum, double *c, double *s, double
*work, lapack_int *iwork, lapack_int *info );

Include Files
• mkl.h

Description

Using a divide and conquer approach, ?lasda computes the singular value decomposition (SVD) of a real
upper bidiagonal n-by-m matrix B with diagonal d and off-diagonal e, where m = n + sqre.

The algorithm computes the singular values in the SVDB = U*S*VT. The orthogonal matrices U and VT are
optionally computed in compact form. A related subroutine ?lasd0 computes the singular values and the
singular vectors in explicit form.

1242
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

icompq Specifies whether singular vectors are to be computed in compact form, as

follows:
= 0: Compute singular values only.
= 1: Compute singular vectors of upper bidiagonal matrix in compact form.

smlsiz The maximum size of the subproblems at the bottom of the computation
tree.

n The row dimension of the upper bidiagonal matrix. This is also the
dimension of the main diagonal array d.

sqre Specifies the column dimension of the bidiagonal matrix.

If sqre = 0: the bidiagonal matrix has column dimension m = n

If sqre = 1: the bidiagonal matrix has column dimension m = n + 1.

d Array, DIMENSION (n). On entry, d contains the main diagonal of the

bidiagonal matrix.

e Array, DIMENSION ( m - 1 ). Contains the subdiagonal entries of the

bidiagonal matrix. On exit, e is destroyed.

ldu The leading dimension of arrays u, vt, difl, difr, poles, givnum, and z.
ldu≥n.

ldgcol The leading dimension of arrays givcol and perm. ldgcol≥n.

work Workspace array, DIMENSION (6n+(smlsiz+1)2).

iwork Workspace array, Dimension must be at least (7n).

Output Parameters

d On exit d, if info = 0, contains the singular values of the bidiagonal

matrix.

u Array, DIMENSION (ldu, smlsiz) if icompq =1.

Not referenced if icompq = 0.

If icompq = 1, on exit, u contains the left singular vector matrices of all

subproblems at the bottom level.

vt Array, DIMENSION ( ldu, smlsiz+1 ) if icompq = 1, and not referenced if

icompq = 0. If icompq = 1, on exit, vt' contains the right singular vector
matrices of all subproblems at the bottom level.

k Array, DIMENSION (n) if icompq = 1 and

DIMENSION (1) if icompq = 0.

If icompq = 1, on exit, k(i) is the dimension of the i-th secular equation on
the computation tree.

difl REAL for slasda

DOUBLE PRECISION for dlasda.

1243
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Array, DIMENSION ( ldu, nlvl ),

where nlvl = floor(log2(n/smlsiz)).

difr Array,
DIMENSION ( ldu, 2 nlvl ) if icompq = 1 and
DIMENSION (n) if icompq = 0.
If icompq = 1, on exit, difl(1:n, i) and difr(1:n,2i -1) record distances
between singular values on the i-th level and singular values on the (i -1)-
th level, and difr(1:n, 2i ) contains the normalizing factors for the right
singular vector matrix. See ?lasd8 for details.

z Array,
DIMENSION ( ldu, nlvl ) if icompq = 1 and
DIMENSION (n) if icompq = 0. The first k elements of z(1, i) contain the
components of the deflation-adjusted updating row vector for subproblems
on the i-th level.

poles Array, DIMENSION(ldu, 2*nlvl)

if icompq = 1, and not referenced if icompq = 0. If icompq = 1, on exit,

poles(1, 2i - 1) and poles(1, 2i) contain the new and old singular values
involved in the secular equations on the i-th level.

givptr Array, DIMENSION (n) if icompq = 1, and not referenced if icompq = 0. If

icompq = 1, on exit, givptr( i ) records the number of Givens rotations
performed on the i-th problem on the computation tree.

givcol Array, DIMENSION(ldgcol, 2*nlvl) if icompq = 1, and not referenced if

icompq = 0. If icompq = 1, on exit, for each i, givcol(1, 2 i - 1) and
givcol(1, 2 i) record the locations of Givens rotations performed on the i-th
level on the computation tree.

perm Array, DIMENSION ( ldgcol, nlvl ) if icompq = 1, and not referenced if

icompq = 0. If icompq = 1, on exit, perm (1, i) records permutations
done on the i-th level of the computation tree.

givnum Array DIMENSION ( ldu, 2*nlvl ) if icompq = 1, and not referenced if

icompq = 0. If icompq = 1, on exit, for each i, givnum(1, 2 i - 1) and
givnum(1, 2 i) record the C- and S-values of Givens rotations performed on
the i-th level on the computation tree.

c Array,
DIMENSION (n) if icompq = 1, and
DIMENSION (1) if icompq = 0.
If icompq = 1 and the i-th subproblem is not square, on exit, c(i) contains
the C-value of a Givens rotation related to the right null space of the i-th
subproblem.

s Array,
DIMENSION (n) icompq = 1, and
DIMENSION (1) if icompq = 0.

1244
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If icompq = 1 and the i-th subproblem is not square, on exit, s(i) contains
the S-value of a Givens rotation related to the right null space of the i-th
subproblem.

info = 0: successful exit.

< 0: if info = -i, the i-th argument had an illegal value
> 0: If info = 1, an singular value did not converge

?lasdq
Computes the SVD of a real bidiagonal matrix with
diagonal d and off-diagonal e. Used by ?bdsdc.

Syntax
void slasdq( char *uplo, lapack_int *sqre, lapack_int *n, lapack_int *ncvt, lapack_int
*nru, lapack_int *ncc, float *d, float *e, float *vt, lapack_int *ldvt, float *u,
lapack_int *ldu, float *c, lapack_int *ldc, float *work, lapack_int *info );
void dlasdq( char *uplo, lapack_int *sqre, lapack_int *n, lapack_int *ncvt, lapack_int
*nru, lapack_int *ncc, double *d, double *e, double *vt, lapack_int *ldvt, double *u,
lapack_int *ldu, double *c, lapack_int *ldc, double *work, lapack_int *info );

Include Files
• mkl.h

Description

The routine ?lasdq computes the singular value decomposition (SVD) of a real (upper or lower) bidiagonal
matrix with diagonal d and off-diagonal e, accumulating the transformations if desired. If B is the input
bidiagonal matrix, the algorithm computes orthogonal matrices Q and P such that B = Q*S*PT. The singular
values S are overwritten on d.
The input matrix U is changed to U*Q if desired.

The input matrix VT is changed to PT*VT if desired.

The input matrix C is changed to QT*C if desired.

Input Parameters

uplo On entry, uplo specifies whether the input bidiagonal matrix is upper or
lower bidiagonal.
If uplo = 'U' or 'u', B is upper bidiagonal;

If uplo = 'L' or 'l', B is lower bidiagonal.

sqre = 0: then the input matrix is n-by-n.

= 1: then the input matrix is n-by-(n+1) if uplu = 'U' and (n+1)-by-n if
uplu
= 'L'. The bidiagonal matrix has n = nl + nr + 1 rows and m = n +
sqre≥n columns.

1245
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

n On entry, n specifies the number of rows and columns in the matrix. n must
be at least 0.

ncvt On entry, ncvt specifies the number of columns of the matrix VT. ncvt must
be at least 0.

nru On entry, nru specifies the number of rows of the matrix U. nru must be at
least 0.

ncc On entry, ncc specifies the number of columns of the matrix C. ncc must be
at least 0.

d Array, DIMENSION (n). On entry, d contains the diagonal entries of the

bidiagonal matrix.

e Array, DIMENSION is (n-1) if sqre = 0 and n if sqre = 1. On entry, the

entries of e contain the off-diagonal entries of the bidiagonal matrix.

vt Array, DIMENSION (ldvt, ncvt). On entry, contains a matrix which on exit

has been premultiplied by PT, dimension n-by-ncvt if sqre = 0 and (n+1)-
by-ncvt if sqre = 1 (not referenced if ncvt=0).

ldvt On entry, ldvt specifies the leading dimension of vt as declared in the calling
(sub) program. ldvt must be at least 1. If ncvt is nonzero, ldvt must also be
at least n.

u Array, DIMENSION (ldu, n). On entry, contains a matrix which on exit has
been postmultiplied by Q, dimension nru-by-n if sqre = 0 and nru-by-(n
+1) if sqre = 1 (not referenced if nru=0).

ldu On entry, ldu specifies the leading dimension of u as declared in the calling
(sub) program. ldu must be at least max(1, nru ) .

c Array, DIMENSION (ldc, ncc). On entry, contains an n-by-ncc matrix which

on exit has been premultiplied by Q', dimension n-by-ncc if sqre = 0 and
(n+1)-by-ncc if sqre = 1 (not referenced if ncc=0).

ldc On entry, ldc specifies the leading dimension of C as declared in the calling
(sub) program. ldc must be at least 1. If ncc is non-zero, ldc must also be
at least n.

work Array, DIMENSION (4n). This is a workspace array. Only referenced if one of
ncvt, nru, or ncc is nonzero, and if n is at least 2.

Output Parameters

d On normal exit, d contains the singular values in ascending order.

e On normal exit, e will contain 0. If the algorithm does not converge, d and e
will contain the diagonal and superdiagonal entries of a bidiagonal matrix
orthogonally equivalent to the one given as input.

vt On exit, the matrix has been premultiplied by P'.

u On exit, the matrix has been postmultiplied by Q.

c On exit, the matrix has been premultiplied by Q'.

1246
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
info On exit, a value of 0 indicates a successful exit. If info < 0, argument
number -info is illegal. If info > 0, the algorithm did not converge, and
info specifies how many superdiagonals did not converge.

?lasdt
Creates a tree of subproblems for bidiagonal divide
and conquer. Used by ?bdsdc.

Syntax
void slasdt( lapack_int *n, lapack_int *lvl, lapack_int *nd, lapack_int *inode,
lapack_int *ndiml, lapack_int *ndimr, lapack_int *msub );
void dlasdt( lapack_int *n, lapack_int *lvl, lapack_int *nd, lapack_int *inode,
lapack_int *ndiml, lapack_int *ndimr, lapack_int *msub );

Include Files
• mkl.h

Description

The routine creates a tree of subproblems for bidiagonal divide and conquer.

Input Parameters

n On entry, the number of diagonal elements of the bidiagonal matrix.

msub On entry, the maximum row dimension each subproblem at the bottom of
the tree can be of.

Output Parameters

lvl On exit, the number of levels on the computation tree.

nd On exit, the number of nodes on the tree.

inode Array, DIMENSION (n). On exit, centers of subproblems.

ndiml Array, DIMENSION (n). On exit, row dimensions of left children.

ndimr Array, DIMENSION (n). On exit, row dimensions of right children.

?laset
Initializes the off-diagonal elements and the diagonal
elements of a matrix to given values.

Syntax
lapack_int LAPACKE_slaset (int matrix_layout , char uplo , lapack_int m , lapack_int
n , float alpha , float beta , float * a , lapack_int lda );
lapack_int LAPACKE_dlaset (int matrix_layout , char uplo , lapack_int m , lapack_int
n , double alpha , double beta , double * a , lapack_int lda );

1247
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lapack_int LAPACKE_claset (int matrix_layout , char uplo , lapack_int m , lapack_int

n , lapack_complex_float alpha , lapack_complex_float beta , lapack_complex_float * a ,
lapack_int lda );
lapack_int LAPACKE_zlaset (int matrix_layout , char uplo , lapack_int m , lapack_int
n , lapack_complex_double alpha , lapack_complex_double beta , lapack_complex_double *
a , lapack_int lda );

Include Files
• mkl.h

Description

The routine initializes an m-by-n matrix A to beta on the diagonal and alpha on the off-diagonals.

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major ( LAPACK_COL_MAJOR ).

uplo Specifies the part of the matrix A to be set.

If uplo = 'U', upper triangular part is set; the strictly lower triangular
part of A is not changed.
If uplo = 'L': lower triangular part is set; the strictly upper triangular
part of A is not changed.
Otherwise: All of the matrix A is set.

m The number of rows of the matrix A. m≥ 0.

n The number of columns of the matrix A.

n≥ 0.

alpha, beta The constants to which the off-diagonal and diagonal elements are to be
set, respectively.

a Array, size at least max(1, lda*n) for column major and max(1, lda*m)
for row major layout.
The array a contains the m-by-n matrix A.

lda The leading dimension of the array a.

lda≥ max(1,m) for column major layout and lda ≥ max(1,n) for row major
layout.

Output Parameters

a On exit, the leading m-by-n submatrix of A is set as follows:

if uplo = 'U', Aij = alpha, 1≤i≤j-1, 1≤j≤n,

if uplo = 'L', Aij = alpha, j+1≤i≤m, 1≤j≤n,

otherwise, Aij = alpha, 1≤i≤m, 1≤j≤n, i≠j,

1248
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
and, for all uplo, Aii = beta, 1≤i≤min(m, n).

Return Values
This function returns a value info.
If info = 0, the execution is successful.

If info = i< 0, the i-th parameter had an illegal value.

If info = -1011, memory allocation error occurred.

?lasrt
Sorts numbers in increasing or decreasing order.

Syntax
lapack_int LAPACKE_slasrt (char id , lapack_int n , float * d );
lapack_int LAPACKE_dlasrt (char id , lapack_int n , double * d );

Include Files
• mkl.h

Description

The routine ?lasrt sorts the numbers in d in increasing order (if id = 'I') or in decreasing order (if id =
'D'). It uses Quick Sort, reverting to Insertion Sort on arrays of size ≤ 20. Dimension of stack limits n to
about 232.

id = 'I': sort d in increasing order;

= 'D': sort d in decreasing order.

n The length of the array d.

d On entry, the array to be sorted.

Output Parameters

d On exit, d has been sorted into increasing order

(d[0]≤d[1]≤ ... ≤ d[n-1]) or into decreasing order
(d[0] ≥ d[1] ≥ ... ≥d[n-1]), depending on id.

Return Values
This function returns a value info.
If info = 0, the execution is successful.

If info < 0, the i-th parameter had an illegal value.

1249
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

?laswp
Performs a series of row interchanges on a general
rectangular matrix.

Syntax
lapack_int LAPACKE_slaswp (int matrix_layout , lapack_int n , float * a , lapack_int
lda , lapack_int k1 , lapack_int k2 , const lapack_int * ipiv , lapack_int incx );
lapack_int LAPACKE_dlaswp (int matrix_layout , lapack_int n , double * a , lapack_int
lda , lapack_int k1 , lapack_int k2 , const lapack_int * ipiv , lapack_int incx );
lapack_int LAPACKE_claswp (int matrix_layout , lapack_int n , lapack_complex_float *
a , lapack_int lda , lapack_int k1 , lapack_int k2 , const lapack_int * ipiv ,
lapack_int incx );
lapack_int LAPACKE_zlaswp (int matrix_layout , lapack_int n , lapack_complex_double *
a , lapack_int lda , lapack_int k1 , lapack_int k2 , const lapack_int * ipiv ,
lapack_int incx );

Include Files
• mkl.h

Description

The routine performs a series of row interchanges on the matrix A. One row interchange is initiated for each
of rows k1 through k2 of A.

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major ( LAPACK_COL_MAJOR ).

n The number of columns of the matrix A.

a Array, size max(1, lda*n) for column major and max(1, lda*mm) for row
major layout. Here mm is not less than maximum of values
ipiv[k1-1+j*|incx|], 0≤j<k2-k1.
Array a contains the m-by-n matrix A.

lda The leading dimension of the array a.

k1 The first element of ipiv for which a row interchange will be done.

k2 The last element of ipiv for which a row interchange will be done.

ipiv Array, size k1+(k2-k1)*|incx|).

The vector of pivot indices. Only the elements in positions k1 through k2 of

ipiv are accessed.
ipiv(k) = l implies rows k and l are to be interchanged.

1250
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
incx The increment between successive values of ipiv. If ipiv is negative, the
pivots are applied in reverse order.

Output Parameters

a On exit, the permuted matrix.

Return Values
This function returns a value info.
If info = 0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = -1011, memory allocation error occurred.

?latm1
Computes the entries of a matrix as specified.

Syntax
void slatm1 (lapack_int *mode, *cond, lapack_int *irsign, lapack_int *idist, lapack_int
*iseed, float *d, lapack_int *n, lapack_int *info);
void dlatm1 (lapack_int *mode, *cond, lapack_int *irsign, lapack_int *idist, lapack_int
*iseed, double *d, lapack_int *n, lapack_int *info);
void clatm1 (lapack_int *mode, *cond, lapack_int *irsign, lapack_int *idist, lapack_int
*iseed, lapack_complex *d, lapack_int *n, lapack_int *info);
void zlatm1 (lapack_int *mode, *cond, lapack_int *irsign, lapack_int *idist, lapack_int
*iseed, lapack_complex_double *d, lapack_int *n, lapack_int *info);

Include Files
• mkl.h

Description

The ?latm1 routine computes the entries of D(1..n) as specified by mode, cond and irsign. idist and
iseed determine the generation of random numbers.
?latm1 is called by slatmr (for slatm1 and dlatm1), and by clatmr(for clatm1 and zlatm1) to generate
random test matrices for LAPACK programs.

Input Parameters

mode On entry describes how d is to be computed:

mode = 0 means do not change d.

mode = 1 sets d[0] = 1 and d[1:n - 1] = 1.0/cond
mode = 2 sets d[0:n - 2] = 1 and d[n - 1]=1.0/cond
mode = 3 sets d[i - 1]=cond**(-(i-1)/(n-1))
mode = 4 sets d[i - 1]= 1 - (i-1)/(n-1)*(1 - 1/cond)

1251
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

mode = 5 sets d to random numbers in the range ( 1/cond , 1 ) such

that their logarithms are uniformly distributed.
mode = 6 sets d to random numbers from same distribution as the rest of
the matrix.
mode < 0 has the same meaning as abs(mode), except that the order of
the elements of d is reversed.

Thus if mode is positive, d has entries ranging from 1 to 1/cond, if

negative, from 1/cond to 1.

cond On entry, used as described under mode above. If used, it must be ≥ 1.

irsign On entry, if mode is not -6, 0, or 6, determines sign of entries of d.

If irsign = 0, entries of d are unchanged.

If irsign = 1, each entry of d is multiplied by a random complex number

uniformly distributed with absolute value 1.

idist Specifies the distribution of the random numbers.

For slatm1 and dlatm1:

= 1: uniform (0,1)
= 2: uniform (-1,1)
= 3: normal (0,1)
For clatm1 and zlatm1:

= 1: real and imaginary parts each uniform (0,1)

= 2: real and imaginary parts each uniform (-1,1)
= 3: real and imaginary parts each normal (0,1)
= 4: complex number uniform in disk(0, 1)

iseed Array, size (4).

Specifies the seed of the random number generator. The random number
generator uses a linear congruential sequence limited to small integers, and
so should produce machine independent random numbers. The values of
iseed[3] are changed on exit, and can be used in the next call to ?latm1
to continue the same random number sequence.

d Array, size n.

n Number of entries of d.

Output Parameters

iseed On exit, the seed is updated.

d On exit, d is updated, unless mode = 0.

info If info = 0, the execution is successful.

If info = -1, mode is not in range -6 to 6.

If info = -2, mode is neither -6, 0 nor 6, and irsign is neither 0 nor 1.

1252
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = -3, mode is neither -6, 0 nor 6 and cond is less than 1.

If info = -4, mode equals 6 or -6 and idist is not in range 1 to 4.

If info = -7, n is negative.

?latm2
Returns an entry of a random matrix.

Syntax
float slatm2 (lapack_int *m, lapack_int *n, lapack_int *i, lapack_int *j, lapack_int
*kl, lapack_int *ku, lapack_int *idist, lapack_int *iseed, float *d, lapack_int *igrade,
float *dl, float *dr, lapack_int *ipvtng, lapack_int *iwork, float *sparse);
double dlatm2 (lapack_int *m, lapack_int *n, lapack_int *i, lapack_int *j, lapack_int
*kl, lapack_int *ku, lapack_int *idist, lapack_int *iseed, double *d, lapack_int
*igrade, double *dl, double *dr, lapack_int *ipvtng, lapack_int *iwork, double *sparse);
The data types for complex variations depend on whether or not the application links with Gnu Fortran
(gfortran) libraries.
For non-gfortran (libmkl_intel_*) interface libraries:
void clatm2 (lapack_complex_float *res, lapack_int *m, lapack_int *n, lapack_int *i,
lapack_int *j, lapack_int *kl, lapack_int *ku, lapack_int *idist, lapack_int *iseed,
lapack_complex_float *d, lapack_int *igrade, lapack_complex_float *dl,
lapack_complex_float *dr, lapack_int *ipvtng, lapack_int *iwork, float *sparse);
void zlatm2 (lapack_complex_double *res, lapack_int *m, lapack_int *n, lapack_int *i,
lapack_int *j, lapack_int *kl, lapack_int *ku, lapack_int *idist, lapack_int *iseed,
lapack_complex_double *d, lapack_int *igrade, lapack_complex_double *dl,
lapack_complex_double *dr, lapack_int *ipvtng, lapack_int *iwork, double *sparse);
For gfortran (libmkl_gf_*) interface libraries:
lapack_complex_float clatm2 (lapack_int *m, lapack_int *n, lapack_int *i, lapack_int
*j, lapack_int *kl, lapack_int *ku, lapack_int *idist, lapack_int *iseed,
lapack_complex_float *d, lapack_int *igrade, lapack_complex_float *dl,
lapack_complex_float *dr, lapack_int *ipvtng, lapack_int *iwork, float *sparse);
lapack_complex_double zlatm2 (lapack_int *m, lapack_int *n, lapack_int *i, lapack_int
*j, lapack_int *kl, lapack_int *ku, lapack_int *idist, lapack_int *iseed,
lapack_complex_double *d, lapack_int *igrade, lapack_complex_double *dl,
lapack_complex_double *dr, lapack_int *ipvtng, lapack_int *iwork, double *sparse);
To understand the difference between the non-gfortran and gfortran interfaces and when to use each of
them, see Dynamic Libraries in the lib/intel64 Directory in the oneAPI Math Kernel Library Developer Guide.

Include Files
• mkl.h

Description

The ?latm2 routine returns entry (i , j ) of a random matrix of dimension (m, n). It is called by the ?latmr
routine in order to build random test matrices. No error checking on parameters is done, because this routine
is called in a tight loop by ?latmr which has already checked the parameters.

1253
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Use of ?latm2 differs from ?latm3 in the order in which the random number generator is called to fill in
random matrix entries. With ?latm2, the generator is called to fill in the pivoted matrix columnwise.
With ?latm2, the generator is called to fill in the matrix columnwise, after which it is pivoted. Thus, ?latm3
can be used to construct random matrices which differ only in their order of rows and/or columns. ?latm2 is
used to construct band matrices while avoiding calling the random number generator for entries outside the
band (and therefore generating random numbers).
The matrix whose (i , j ) entry is returned is constructed as follows (this routine only computes one entry):

• If i is outside (1..m) or j is outside (1..n), returns zero (this is convenient for generating matrices in
band format).
• Generate a matrix A with random entries of distribution idist.
• Set the diagonal to D.
• Grade the matrix, if desired, from the left (by dl) and/or from the right (by dr or dl) as specified by
igrade.
• Permute, if desired, the rows and/or columns as specified by ipvtng and iwork.
• Band the matrix to have lower bandwidth kl and upper bandwidth ku.
• Set random entries to zero as specified by sparse.

Input Parameters

m Number of rows of the matrix.

n Number of columns of the matrix.

i Row of the entry to be returned.

j Column of the entry to be returned.

kl Lower bandwidth.

ku Upper bandwidth.

idist On entry, idist specifies the type of distribution to be used to generate a

random matrix .
for slatm2 and dlatm2:

= 1: uniform (0,1)
= 2: uniform (-1,1)
= 3: normal (0,1)
for clatm2 and zlatm2:

= 1: real and imaginary parts each uniform (0,1)

= 2: real and imaginary parts each uniform (-1,1)
= 3: real and imaginary parts each normal (0,1)
= 4: complex number uniform in disk (0, 1)

iseed Array, size 4.

Seed for the random number generator.

d Array, size (min(i, j)). Diagonal entries of matrix.

igrade Specifies grading of matrix as follows:

= 0: no grading
= 1: matrix premultiplied by diag( dl )

1254
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
= 2: matrix postmultiplied by diag( dr )

= 3: matrix premultiplied by diag( dl ) and postmultiplied by diag( dr)

= 4: matrix premultiplied by diag( dl ) and postmultiplied by

inv( diag( dl ) )

For slatm2 and slatm2:

= 5: matrix premultiplied by diag( dl ) and postmultiplied by diag( dl)

For clatm2 and zlatm2:

= 5: matrix premultiplied by diag( dl ) and postmultiplied by

diag( conjg( dl ) )

= 6: matrix premultiplied by diag( dl ) and postmultiplied by diag( dl)

dl Array, size (i or j), as appropriate.

Left scale factors for grading matrix.

dr Array, size (i or j), as appropriate.

Right scale factors for grading matrix.

ipvtng On entry specifies pivoting permutations as follows:

= 0: none
= 1: row pivoting
= 2: column pivoting
= 3: full pivoting, i.e., on both sides

iwork Array, size (i or j), as appropriate. This array specifies the permutation
used. The row (or column) in position k was originally in position iwork[k
- 1]. This differs from iwork for ?latm3.

sparse Specifies the sparsity of the matrix. If sparse matrix is to be generated,

sparse should lie between 0 and 1. A uniform ( 0, 1 ) random number x is
generated and compared to sparse. If x is larger the matrix entry is
unchanged and if x is smaller the entry is set to zero. Thus on the average
a fraction sparse of the entries will be set to zero.

Output Parameters

iseed On exit, the seed is updated.

Return Values
The function returns an entry of a random matrix (for complex variations libmkl_gf_* interface layer/
libraries return the result as the parameter res).

?latm3
Returns set entry of a random matrix.

1255
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
float slatm3 (lapack_int *m, lapack_int *n, lapack_int *i, lapack_int *j, lapack_int
*isub, lapack_int *jsub, lapack_int *kl, lapack_int *ku, lapack_int *idist, lapack_int
*iseed, float *d, lapack_int *igrade, float *dl, float *dr, lapack_int *ipvtng,
lapack_int *iwork, float *sparse);
double dlatm3 (lapack_int *m, lapack_int *n, lapack_int *i, lapack_int *j, lapack_int
*isub, lapack_int *jsub, lapack_int *kl, lapack_int *ku, lapack_int *idist, lapack_int
*iseed, double *d, lapack_int *igrade, double *dl, double *dr, lapack_int *ipvtng,
lapack_int *iwork, double *sparse);
The data types for complex variations depend on whether or not the application links with Gnu Fortran
(gfortran) libraries.
For non-gfortran (libmkl_intel_*) interface libraries:
void clatm3 (lapack_complex_float *res, lapack_int *m, lapack_int *n, lapack_int *i,
lapack_int *j, lapack_int *isub, lapack_int *jsub, lapack_int *kl, lapack_int *ku,
lapack_int *idist, lapack_int *iseed, lapack_complex_float *d, lapack_int *igrade,
lapack_complex_float *dl, lapack_complex_float *dr, lapack_int *ipvtng, lapack_int
*iwork, float *sparse);
void zlatm3 (lapack_complex_double *res, lapack_int *m, lapack_int *n, lapack_int *i,
lapack_int *j, lapack_int *isub, lapack_int *jsub, lapack_int *kl, lapack_int *ku,
lapack_int *idist, lapack_int *iseed, lapack_complex_double *d, lapack_int *igrade,
lapack_complex_double *dl, lapack_complex_double *dr, lapack_int *ipvtng, lapack_int
*iwork, double *sparse);
For gfortran (libmkl_gf_*) interface libraries:
lapack_complex_float clatm3 (lapack_int *m, lapack_int *n, lapack_int *i, lapack_int
*j, lapack_int *isub, lapack_int *jsub, lapack_int *kl, lapack_int *ku, lapack_int
*idist, lapack_int *iseed, lapack_complex_float *d, lapack_int *igrade,
lapack_complex_float *dl, lapack_complex_float *dr, lapack_int *ipvtng, lapack_int
*iwork, float *sparse);
lapack_complex_double zlatm3 (lapack_int *m, lapack_int *n, lapack_int *i, lapack_int
*j, lapack_int *isub, lapack_int *jsub, lapack_int *kl, lapack_int *ku, lapack_int
*idist, lapack_int *iseed, lapack_complex_double *d, lapack_int *igrade,
lapack_complex_double *dl, lapack_complex_double *dr, lapack_int *ipvtng, lapack_int
*iwork, double *sparse);
To understand the difference between the non-gfortran and gfortran interfaces and when to use each of
them, see Dynamic Libraries in the lib/intel64 Directory in the oneAPI Math Kernel Library Developer Guide.

Include Files
• mkl.h

Description

The ?latm3 routine returns the (isub, jsub) entry of a random matrix of dimension (m, n) described by the
other parameters. (isub, jsub) is the final position of the (i ,j ) entry after pivoting according to ipvtng and
iwork. ?latm3 is called by the ?latmr routine in order to build random test matrices. No error checking on
parameters is done, because this routine is called in a tight loop by ?latmr which has already checked the
parameters.

1256
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Use of ?latm3 differs from ?latm2 in the order in which the random number generator is called to fill in
random matrix entries. With ?latm2, the generator is called to fill in the pivoted matrix columnwise.
With ?latm3, the generator is called to fill in the matrix columnwise, after which it is pivoted. Thus, ?latm3
can be used to construct random matrices which differ only in their order of rows and/or columns. ?latm2 is
used to construct band matrices while avoiding calling the random number generator for entries outside the
band (and therefore generating random numbers in different orders for different pivot orders).
The matrix whose (isub, jsub ) entry is returned is constructed as follows (this routine only computes one
entry):

• If isub is outside (1..m) or jsub is outside (1..n), returns zero (this is convenient for generating
matrices in band format).
• Generate a matrix A with random entries of distribution idist.
• Set the diagonal to D.
• Grade the matrix, if desired, from the left (by dl) and/or from the right (by dr or dl) as specified by
igrade.
• Permute, if desired, the rows and/or columns as specified by ipvtng and iwork.
• Band the matrix to have lower bandwidth kl and upper bandwidth ku.
• Set random entries to zero as specified by sparse.

Input Parameters

m Number of rows of matrix.

n Number of columns of matrix.

i Row of unpivoted entry to be returned.

j Column of unpivoted entry to be returned.

isub Row of pivoted entry to be returned.

jsub Column of pivoted entry to be returned.

kl Lower bandwidth.

ku Upper bandwidth.

idist On entry, idist specifies the type of distribution to be used to generate a

random matrix.
for slatm2 and dlatm2:

= 1: uniform (0,1)
= 2: uniform (-1,1)
= 3: normal (0,1)
for clatm2 and zlatm2:

= 1: real and imaginary parts each uniform (0,1)

= 2: real and imaginary parts each uniform (-1,1)
= 3: real and imaginary parts each normal (0,1)
= 4: complex number uniform in disk(0, 1)

iseed Array, size 4.

Seed for random number generator.

d Array, size (min(i, j)). Diagonal entries of matrix.

1257
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

igrade Specifies grading of matrix as follows:

= 0: no grading
= 1: matrix premultiplied by diag( dl )

= 2: matrix postmultiplied by diag( dr )

= 3: matrix premultiplied by diag( dl ) and postmultiplied by diag( dr)

= 4: matrix premultiplied by diag( dl ) and postmultiplied by

inv( diag( dl ) )

For slatm2 and slatm2:

= 5: matrix premultiplied by diag( dl ) and postmultiplied by diag( dl)

For clatm2 and zlatm2:

= 5: matrix premultiplied by diag( dl ) and postmultiplied by

diag( conjg( dl ) )

= 6: matrix premultiplied by diag( dl ) and postmultiplied by diag( dl)

dl Array, size (i or j, as appropriate).

Left scale factors for grading matrix.

dr Array, size (i or j, as appropriate).

Right scale factors for grading matrix.

ipvtng On entry specifies pivoting permutations as follows:

If ipvtng = 0: none.

If ipvtng = 1: row pivoting.

If ipvtng = 2: column pivoting.

If ipvtng = 3: full pivoting, i.e., on both sides.

sparse On entry, specifies the sparsity of the matrix if sparse matrix is to be

generated. sparse should lie between 0 and 1. A uniform( 0, 1 ) random
number x is generated and compared to sparse; if x is larger the matrix
entry is unchanged and if x is smaller the entry is set to zero. Thus on the
average a fraction sparse of the entries will be set to zero.

iwork Array, size (i or j, as appropriate). This array specifies the permutation

used. The row (or column) originally in position k is in position iwork[k -
1] after pivoting. This differs from iwork for ?latm2.

Output Parameters

isub On exit, row of pivoted entry is updated.

jsub On exit, column of pivoted entry is updated.

iseed On exit, the seed is updated.

Return Values
The function returns an entry of a random matrix (for complex variations libmkl_gf_* interface layer/
libraries return the result as the parameter res).

1258
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

?latm5
Generates matrices involved in the Generalized
Sylvester equation.

Syntax
void slatm5 (*prtype, lapack_int *m, lapack_int *n, float *a, lapack_int *lda, float *b,
lapack_int *ldb, float *c, lapack_int *ldc, float *d, lapack_int *ldd, float *e,
lapack_int *lde, float *f, lapack_int *ldf, float *r, lapack_int *ldr, float *l,
lapack_int *ldl, float *alpha, lapack_int *qblcka, lapack_int *qblckb);
void dlatm5 (*prtype, lapack_int *m, lapack_int *n, double *a, lapack_int *lda, double
*b, lapack_int *ldb, double *c, lapack_int *ldc, double *d, lapack_int *ldd, double *e,
lapack_int *lde, double *f, lapack_int *ldf, double *r, lapack_int *ldr, double *l,
lapack_int *ldl, double *alpha, lapack_int *qblcka, lapack_int *qblckb);
void clatm5 (*prtype, lapack_int *m, lapack_int *n, lapack_complex_float *a, lapack_int
*lda, lapack_complex_float *b, lapack_int *ldb, lapack_complex_float *c, lapack_int
*ldc, lapack_complex_float *d, lapack_int *ldd, lapack_complex_float *e, lapack_int
*lde, lapack_complex_float *f, lapack_int *ldf, lapack_complex_float *r, lapack_int
*ldr, lapack_complex_float *l, lapack_int *ldl, float *alpha, lapack_int *qblcka,
lapack_int *qblckb);
void zlatm5 (*prtype, lapack_int *m, lapack_int *n, lapack_complex_double *a,
lapack_int *lda, lapack_complex_double *b, lapack_int *ldb, lapack_complex_double *c,
lapack_int *ldc, lapack_complex_double *d, lapack_int *ldd, lapack_complex_double *e,
lapack_int *lde, lapack_complex_double *f, lapack_int *ldf, lapack_complex_double *r,
lapack_int *ldr, lapack_complex_double *l, lapack_int *ldl, float *alpha, lapack_int
*qblcka, lapack_int *qblckb);

Include Files
• mkl.h

Description

The ?latm5 routine generates matrices involved in the Generalized Sylvester equation:

A * R - L * B = C
D * R - L * E = F
They also satisfy the diagonalization condition:

Input Parameters

prtype Specifies the type of matrices to generate.

• If prtype = 1, A and B are Jordan blocks, D and E are identity

matrices.

1259
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

A:
If (i == j) then Ai, j = 1.0.

If (j == i + 1) then Ai, j = -1.0.

Otherwise Ai, j = 0.0, i, j = 1...m

B:
If (i == j) then Bi, j = 1.0 - alpha.

If (j == i + 1) then Bi, j = 1.0 .

Otherwise Bi, j = 0.0, i, j = 1...n.

D:
If (i == j) then Di, j = 1.0.

Otherwise Di, j = 0.0, i, j = 1...m.

E:
If (i == j) then Ei, j = 1.0

Otherwise Ei, j = 0.0, i, j = 1...n.

L = R are chosen from [-10...10], which specifies the right hand sides
(C, F).
• If prtype = 2 or 3: Triangular and/or quasi- triangular.
A:
If (i ≤ j) then Ai, j = [-1...1].

Otherwise Ai, j = 0.0, i, j = 1...M.

If (prtype = 3) then Ak + 1, k + 1 = Ak, k;

Ak + 1, k = [-1...1];

sign(Ak, k + 1) = -(sign(Ak + 1, k).

k = 1, m- 1, qblcka
B:
If (i ≤ j) then Bi, j = [-1...1].

Otherwise Bi, j = 0.0, i, j = 1...n.

If (prtype = 3) thenBk + 1, k + 1 = Bk, k

Bk + 1, k = [-1...1]

sign(Bk, k + 1)= -(sign(Bk + 1, k)

k = 1, n - 1, qblckb.

D:
If (i ≤ j) then Di, j = [-1...1].

Otherwise Di, j = 0.0, i, j = 1...m.

E:
If (i <= j) then Ei, j = [-1...1].

1260
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Otherwise Ei, j = 0.0, i, j = 1...N.

L, R are chosen from [-10...10], which specifies the right hand sides (C,
F).
• If prtype = 4 Full
Ai, j = [-10...10]

Di, j = [-1...1] i,j = 1...m

Bi, j = [-10...10]

Ei, j = [-1...1] i,j = 1...n

Ri, j = [-10...10]

Li, j = [-1...1] i = 1..m ,j = 1...n

L and R specifies the right hand sides (C, F).

• If prtype = 5 special case common and/or close eigs.

m Specifies the order of A and D and the number of rows in C, F, R and L.

n Specifies the order of B and E and the number of columns in C, F, R and L.

lda The leading dimension of a.

ldb The leading dimension of b.

ldc The leading dimension of c.

ldd The leading dimension of d.

lde The leading dimension of e.

ldf The leading dimension of f.

ldr The leading dimension of r.

ldl The leading dimension of l.

alpha Parameter used in generating prtype = 1 and 5 matrices.

qblcka When prtype = 3, specifies the distance between 2-by-2 blocks on the
diagonal in A. Otherwise, qblcka is not referenced. qblcka > 1.

qblckb When prtype = 3, specifies the distance between 2-by-2 blocks on the
diagonal in B. Otherwise, qblckb is not referenced. qblckb > 1.

Output Parameters

a Array, size lda*m. On exit a contains them-by-m array A initialized

according to prtype.

b Array, size ldb*n. On exit b contains the n-by-n array B initialized

according to prtype.

c Array, size ldc*n. On exit c contains the m-by-n array C initialized

according to prtype.

1261
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

d Array, size ldd*m. On exit d contains the m-by-m array D initialized

according to prtype.

e Array, size lde*n. On exit e contains the n-by-n array E initialized according
to prtype.

f Array, size ldf*n. On exit f contains the m-by-n array F initialized

according to prtype.

r Array, size ldr*n. On exit R contains the m-by-n array R initialized

according to prtype.

l Array, size ldl*n. On exit l contains the m-by-narray L initialized according

to prtype.

?latm6
Generates test matrices for the generalized eigenvalue
problem, their corresponding right and left
eigenvector matrices, and also reciprocal condition
numbers for all eigenvalues and the reciprocal
condition numbers of eigenvectors corresponding to
the 1th and 5th eigenvalues.

Syntax
void slatm6 (lapack_int *type, lapack_int *n, float *a, lapack_int *lda, float *b, float
*x, lapack_int *ldx, float *y, lapack_int *ldy, float *alpha, float *beta, float *wx,
float *wy, float *s, float *dif);
void dlatm6 (lapack_int *type, lapack_int *n, double *a, lapack_int *lda, double *b,
double *x, lapack_int *ldx, double *y, lapack_int *ldy, double *alpha, double *beta,
double *wx, double *wy, double *s, double *dif);
void clatm6 (lapack_int *type, lapack_int *n, lapack_complex_float *a, lapack_int *lda,
lapack_complex_float *b, lapack_complex_float *x, lapack_int *ldx, lapack_complex_float
*y, lapack_int *ldy, lapack_complex_float *alpha, lapack_complex_float *beta,
lapack_complex_float *wx, lapack_complex_float *wy, float *s, float *dif);
void zlatm6 (lapack_int *type, lapack_int *n, lapack_complex_double *a, lapack_int
*lda, lapack_complex_double *b, lapack_complex_double *x, lapack_int *ldx,
lapack_complex_double *y, lapack_int *ldy, lapack_complex_double *alpha,
lapack_complex_double *beta, lapack_complex_double *wx, lapack_complex_double *wy,
double *s, double *dif);

Include Files
• mkl.h

Description

The ?latm6 routine generates test matrices for the generalized eigenvalue problem, their corresponding right
and left eigenvector matrices, and also reciprocal condition numbers for all eigenvalues and the reciprocal
condition numbers of eigenvectors corresponding to the 1th and 5th eigenvalues.
There two kinds of test matrix pairs:
(A, B)= inverse(YH) * (Da, Db) * inverse(X)

1262
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Type 1:

Type 2:

In both cases the same inverse(YH) and inverse(X) are used to compute (A, B), giving the exact eigenvectors
to (A,B) as (YH, X):

,
where a, b, x and y will have all values independently of each other.

Input Parameters

type Specifies the problem type.

n Size of the matrices A and B.

lda The leading dimension of a and of b.

ldx The leading dimension of x.

ldy The leading dimension of y.

alpha, beta Weighting constants for matrix A.

wx Constant for right eigenvector matrix.

wy Constant for left eigenvector matrix.

Output Parameters

a Array, size lda*n. On exit, a contains the n-by-n matrix initialized

according to type.

1263
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

b Array, size lda*n. On exit, b contains the n-by-n matrix initialized

according to type.

x Array, size ldx*n. On exit, x contains the n-by-n matrix of right

eigenvectors.

y Array, size ldy*n. On exit, y is the n-by-n matrix of left eigenvectors.

s Array, size (n). s[i - 1] is the reciprocal condition number for eigenvalue
i.

dif Array, size(n). dif[i - 1] is the reciprocal condition number for

eigenvector i .

?latme
Generates random non-symmetric square matrices
with specified eigenvalues.

Syntax
void slatme (lapack_int *n, char *dist, lapack_int *iseed, float *d, lapack_int *mode,
float *cond, float *dmax, char *ei, char *rsign, char *upper, char *sim, float *ds,
lapack_int *modes, float *conds, lapack_int *kl, lapack_int *ku, float *anorm, float *a,
lapack_int *lda, float *work, lapack_int *info);void dlatme (lapack_int *n, char *dist,
lapack_int *iseed, double *d, lapack_int *mode, double *cond, double *dmax, char *ei,
char *rsign, char *upper, char *sim, double *ds, lapack_int *modes, double *conds,
lapack_int *kl, lapack_int *ku, double *anorm, double *a, lapack_int *lda, double *work,
lapack_int *info);void clatme (lapack_int *n, char *dist, lapack_int *iseed,
lapack_complex_float *d, lapack_int *mode, float *cond, lapack_complex_float *dmax,
char *ei, char *rsign, char *upper, char *sim, float *ds, lapack_int *modes, float
*conds, lapack_int *kl, lapack_int *ku, float *anorm, lapack_complex_float *a,
lapack_int *lda, lapack_complex_float *work, lapack_int *info);void zlatme (lapack_int
*n, char *dist, lapack_int *iseed, lapack_complex_double *d, lapack_int *mode, double
*cond, lapack_complex_double *dmax, char *ei, char *rsign, char *upper, char *sim,
double *ds, lapack_int *modes, double *conds, lapack_int *kl, lapack_int *ku, double
*anorm, lapack_complex_double *a, lapack_int *lda, lapack_complex_double *work,
lapack_int *info);

Include Files
• mkl.h

Description

The ?latme routine generates random non-symmetric square matrices with specified eigenvalues. ?latme
operates by applying the following sequence of operations:

1. Set the diagonal to d, where d may be input or computed according to mode, cond, dmax, and rsign as
described below.
2. If upper = 'T', the upper triangle of a is set to random values out of distribution dist.
3. If sim='T', a is multiplied on the left by a random matrix X, whose singular values are specified by ds,
modes, and conds, and on the right by X inverse.
4. If kl < n-1, the lower bandwidth is reduced to kl using Householder transformations. If ku < n-1,
the upper bandwidth is reduced to ku.

1264
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
5. If anorm is not negative, the matrix is scaled to have maximum-element-norm anorm.

NOTE
Since the matrix cannot be reduced beyond Hessenberg form, no packing options are
available.

Input Parameters

n The number of columns (or rows) of A.

dist On entry, dist specifies the type of distribution to be used to generate the
random eigen-/singular values, and on the upper triangle (see upper).

If dist = 'U': uniform( 0, 1 )

If dist = 'S': uniform( -1, 1 )

If dist = 'N': normal( 0, 1 )

If dist = 'D': uniform on the complex disc |z| < 1.

iseed Array, size 4.

On entry iseed specifies the seed of the random number generator. The
elements should lie between 0 and 4095 inclusive, and iseed[3] should be
odd. The random number generator uses a linear congruential sequence
limited to small integers, and so should produce machine independent
random numbers.

d Array, size (n). This array is used to specify the eigenvalues of A.

If mode = 0, then d is assumed to contain the eigenvalues. Otherwise they
are computed according to mode, cond, dmax, and rsign and placed in d.

mode On entry mode describes how the eigenvalues are to be specified:

mode = 0 means use d (with ei for slatme and dlatme) as input.

mode = 1 sets d[0] = 1 and d(2:n]=1.0/cond.
mode = 2 sets d[0:n - 2] = 1 and d[n - 1]=1.0/cond.
mode = 3 sets d[i - 1] = cond**(-(i-1)/(n-1)).
mode = 4 sets d[i - 1] = 1 - (i-1)/(n-1)*(1 - 1/cond).
mode = 5 sets d to random numbers in the range ( 1/cond , 1 ) such
that their logarithms are uniformly distributed.
mode = 6 sets d to random numbers from same distribution as the rest of
the matrix.
mode < 0 has the same meaning as abs(mode), except that the order of
the elements of d is reversed.

Thus if mode is between 1 and 4, d has entries ranging from 1 to 1/cond, if

between -1 and -4, d has entries ranging from 1/cond to 1.

cond On entry, this is used as described under mode above. If used, it must be ≥
1.

1265
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

dmax If mode is not -6, 0 or 6, the contents of d as computed according to mode

and cond are scaled by dmax / max(abs(d[i - 1])). Note that dmax
needs not be positive or real: if dmax is negative or complex (or zero), d will
be scaled by a negative or complex number (or zero). If rsign='F' then
the largest (absolute) eigenvalue will be equal to dmax.

ei Used by slatme and dlatme only.

Array, size (n).

If mode = 0, and ei[0]is not ' ' (space character), this array specifies
which elements of d (on input) are real eigenvalues and which are the real
and imaginary parts of a complex conjugate pair of eigenvalues. The
elements of ei may then only have the values 'R' and 'I'.

If ei[j - 1] = 'R' and ei[j] = 'I', then the j -th eigenvalue is

cmplx( d[j - 1] , d[j] ), and the (j +1)-th is the complex conjugate
thereof.
If ei[j - 1] = ei[j]='R', then the j-th eigenvalue is d[j - 1] (i.e.,
real). ei[0] may not be 'I', nor may two adjacent elements of ei both
have the value 'I'.

If mode is not 0, then ei is ignored. If mode is 0 and ei[0] = ' ', then
the eigenvalues will all be real.

rsign If mode is not 0, 6, or -6, and rsign = 'T', then the elements of d, as
computed according to mode and cond, are multiplied by a random sign (+1
or -1) for slatme and dlatme or by a complex number from the unit circle
|z| = 1 for clatme and zlatme.

If rsign = 'F', the elements of d are not multiplied. rsign may only have
the values 'T' or 'F'.

upper If upper = 'T', then the elements of a above the diagonal will be set to
random numbers out of dist.

If upper = 'F', they will not. upper may only have the values 'T' or 'F'.

sim If sim = 'T', then a will be operated on by a "similarity transform", i.e.,

multiplied on the left by a matrix X and on the right by X inverse. X = USV,
where U and V are random unitary matrices and S is a (diagonal) matrix of
singular values specified by ds, modes, and conds.

If sim = 'F', then a will not be transformed.

ds This array is used to specify the singular values of X, in the same way that
d specifies the eigenvalues of a. If mode = 0, the ds contains the singular
values, which may not be zero.

modes Similar to mode, but for specifying the diagonal of S. modes = -6 and +6
are not allowed (since they would result in randomly ill-conditioned
eigenvalues.)

conds Similar to cond, but for specifying the diagonal of S.

1266
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
kl This specifies the lower bandwidth of the matrix. kl = 1 specifies upper
Hessenberg form. If kl is at least n-1, then A will have full lower
bandwidth.

ku This specifies the upper bandwidth of the matrix. ku = 1 specifies lower

Hessenberg form.
If ku is at least n-1, then a will have full upper bandwidth.

If ku and ku are both at least n-1, then a will be dense. Only one of ku and
kl may be less than n-1.

anorm If anorm is not negative, then a is scaled by a non-negative real number to

make the maximum-element-norm of a to be anorm.

lda Number of rows of matrix A.

work Array, size (3*n). Workspace.

Output Parameters

iseed On exit, the seed is updated.

d Modified if mode is nonzero.

ds Modified if mode is nonzero.

a Array, size lda*n. On exit, a is the desired test matrix.

info If info = 0, execution is successful.

If info = -1, n is negative .

If info = -2, dist is an illegal string.

If info = -5, mode is not in range -6 to 6.

If info = -6, cond is less than 1.0, and mode is not -6, 0, or 6 .

If info = -9, rsign is not 'T' or 'F' .

If info = -10, upper is not 'T' or 'F'.

If info = -11, sim is not 'T' or 'F'.

If info = -12, modes = 0 and ds has a zero singular value.

If info = -13, modes is not in the range -5 to 5.

If info = -14, modes is nonzero and conds is less than 1. .

If info = -15, kl is less than 1.

If info = -16, ku is less than 1, or kl and ku are both less than n-1.

If info = -19, lda is less than m.

If info = 1, error return from ?latm1 (computing d) .

If info = 2, cannot scale to dmax (max. eigenvalue is 0) .

If info = 3, error return from slatm1(for slatme and clatme), dlatm1

(for dlatme and zlatme) .

If info = 4, error return from ?large.

1267
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If info = 5, zero singular value from slatm1(for slatme and clatme),

dlatm1(for dlatme and zlatme).

?latmr
Generates random matrices of various types.

Syntax
void slatmr (lapack_int *m, lapack_int *n, char *dist, lapack_int *iseed, char *sym,
float *d, lapack_int *mode, float *cond, float *dmax, char *rsign, char *grade, float
*dl, lapack_int *model, float *condl, float *dr, lapack_int *moder, float *condr, char
*pivtng, lapack_int *ipivot, lapack_int *kl, lapack_int *ku, float *sparse, float
*anorm, char *pack, float *a, lapack_int *lda, lapack_int *iwork, lapack_int *info);
void dlatmr (lapack_int *m, lapack_int *n, char *dist, lapack_int *iseed, char *sym,
double *d, lapack_int *mode, double *cond, double *dmax, char *rsign, char *grade,
double *dl, lapack_int *model, double *condl, double *dr, lapack_int *moder, double
*condr, char *pivtng, lapack_int *ipivot, lapack_int *kl, lapack_int *ku, double
*sparse, double *anorm, char *pack, double *a, lapack_int *lda, lapack_int *iwork,
lapack_int *info);
void clatmr (lapack_int *m, lapack_int *n, char *dist, lapack_int *iseed, char *sym,
lapack_complex *d, lapack_int *mode, float *cond, lapack_complex *dmax, char *rsign,
char *grade, lapack_complex *dl, lapack_int *model, float *condl, lapack_complex *dr,
lapack_int *moder, float *condr, char *pivtng, lapack_int *ipivot, lapack_int *kl,
lapack_int *ku, float *sparse, float *anorm, char *pack, float *a, lapack_int *lda,
lapack_int *iwork, lapack_int *info);
void zlatmr (lapack_int *m, lapack_int *n, char *dist, lapack_int *iseed, char *sym,
lapack_complex_double *d, lapack_int *mode, float *cond, lapack_complex_double *dmax,
char *rsign, char *grade, lapack_complex_double *dl, lapack_int *model, float *condl,
lapack_complex_double *dr, lapack_int *moder, float *condr, char *pivtng, lapack_int
*ipivot, lapack_int *kl, lapack_int *ku, float *sparse, float *anorm, char *pack, float
*a, lapack_int *lda, lapack_int *iwork, lapack_int *info);

Description

The ?latmr routine operates by applying the following sequence of operations:

1. Generate a matrix A with random entries of distribution dist:

If sym = 'S', the matrix is symmetric,

If sym = 'H', the matrix is Hermitian,

If sym = 'N', the matrix is nonsymmetric.

2. Set the diagonal to D, where D may be input or computed according to mode, cond, dmax and rsign as
described below.
3. Grade the matrix, if desired, from the left or right as specified by grade. The inputs dl, model, condl,
dr, moder and condr also determine the grading as described below.
4. Permute, if desired, the rows and/or columns as specified by pivtng and ipivot.
5. Set random entries to zero, if desired, to get a random sparse matrix as specified by sparse.
6. Make A a band matrix, if desired, by zeroing out the matrix outside a band of lower bandwidth kl and
upper bandwidth ku.
7. Scale A, if desired, to have maximum entry anorm.

1268
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
8. Pack the matrix if desired. See options specified by the pack parameter.

NOTE
If two calls to ?latmr differ only in the pack parameter, they generate mathematically equivalent
matrices. If two calls to ?latmr both have full bandwidth (kl = m-1 and ku = n-1), and differ only in
the pivtng and pack parameters, then the matrices generated differ only in the order of the rows and
columns, and otherwise contain the same data. This consistency cannot be and is not maintained with
less than full bandwidth.

Input Parameters

m Number of rows of A.

n Number of columns of A.

dist On entry, dist specifies the type of distribution to be used to generate a

random matrix .
If dist = 'U', real and imaginary parts are independent uniform( 0, 1 ).

If dist = 'S', real and imaginary parts are independent uniform( -1, 1 ).

If dist = 'N', real and imaginary parts are independent normal( 0, 1 ).

If dist = 'D', distribution is uniform on interior of unit disk.

iseed Array, size 4.

On entry, iseed specifies the seed of the random number generator. They
should lie between 0 and 4095 inclusive, and iseed[3] should be odd. The
random number generator uses a linear congruential sequence limited to
small integers, and so should produce machine independent random
numbers.

sym If sym = 'S', generated matrix is symmetric.

If sym = 'H', generated matrix is Hermitian.

If sym = 'N', generated matrix is nonsymmetric.

d On entry this array specifies the diagonal entries of the diagonal of A. d

may either be specified on entry, or set according to mode and cond as
described below. If the matrix is Hermitian, the real part of d is taken. May
be changed on exit if mode is nonzero.

mode On entry describes how d is to be used:

mode = 0 means use d as input.

mode = 1 sets d[0]=1 and d[1:n - 1]=1.0/cond.
mode = 2 sets d[0:n - 2]=1 and d[n - 1]=1.0/cond.
mode = 3 sets d[i - 1]=cond**(-(i-1)/(n-1)).
mode = 4 sets d[i - 1]=1 - (i-1)/(n-1)*(1 - 1/cond).
mode = 5 sets d to random numbers in the range ( 1/cond , 1 ) such
that their logarithms are uniformly distributed.

1269
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

mode = 6 sets d to random numbers from same distribution as the rest of

the matrix.
mode < 0 has the same meaning as abs(mode), except that the order of
the elements of d is reversed.

Thus if mode is between 1 and 4, d has entries ranging from 1 to 1/cond, if

between -1 and -4, D has entries ranging from 1/cond to 1.

cond On entry, used as described under mode above. If used, cond must be ≥ 1.

dmax If mode is not -6, 0, or 6, the diagonal is scaled by dmax /

max(abs(d[i])), so that maximum absolute entry of diagonal is
abs(dmax). If dmax is complex (or zero), the diagonal is scaled by a
complex number (or zero).

rsign If mode is not -6, 0, or 6, specifies the sign of the diagonal as follows:

For slatmr and dlatmr, if rsign = 'T', diagonal entries are multiplied 1
or -1 with a probability of 0.5.
For clatmr and zlatmr, if rsign = 'T', diagonal entries are multiplied by
a random complex number uniformly distributed with absolute value 1.
If rsign = 'F', diagonal entries are unchanged.

grade Specifies grading of matrix as follows:

If grade = 'N', there is no grading

If grade = 'L', matrix is premultiplied by diag( dl) (only if matrix is

nonsymmetric)
If grade = 'R', matrix is postmultiplied by diag( dr ) (only if matrix is
nonsymmetric)
If grade = 'B', matrix is premultiplied by diag( dl ) and postmultiplied by
diag( dr ) (only if matrix is nonsymmetric)

If grade = 'H', matrix is premultiplied by diag( dl ) and postmultiplied by

diag( conjg(dl) ) (only if matrix is Hermitian or nonsymmetric)

If grade = 'S', matrix is premultiplied by diag(dl ) and postmultiplied by

diag( dl ) (only if matrix is symmetric or nonsymmetric)

If grade = 'E', matrix is premultiplied by diag( dl ) and postmultiplied by

inv( diag( dl ) ) (only if matrix is nonsymmetric)

NOTE
if grade = 'E', then m must equal n.

dl Array, size (m).

If model = 0, then on entry this array specifies the diagonal entries of a

diagonal matrix used as described under grade above.
If model is not zero, then dl is set according to model and condl,
analogous to the way D is set according to mode and cond (except there is
no dmax parameter for dl).

If grade = 'E', then dl cannot have zero entries.

1270
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Not referenced if grade = 'N' or 'R'. Changed on exit.

model This specifies how the diagonal array dl is computed, just as mode specifies
how D is computed.

condl When model is not zero, this specifies the condition number of the
computed dl.

dr If moder = 0, then on entry this array specifies the diagonal entries of a

diagonal matrix used as described under grade above.

If moder is not zero, then dr is set according to moder and condr,

analogous to the way d is set according to mode and cond (except there is
no dmax parameter for dr).

Not referenced if grade = 'N', 'L', 'H''S' or 'E'.

moder This specifies how the diagonal array dr is to be computed, just as mode
specifies how d is to be computed.

condr When moder is not zero, this specifies the condition number of the
computed dr.

pivtng On entry specifies pivoting permutations as follows:

If pivtng = 'N' or ' ': no pivoting permutation.

If pivtng = 'L': left or row pivoting (matrix must be nonsymmetric).

If pivtng = 'R': right or column pivoting (matrix must be nonsymmetric).

If pivtng = 'B' or 'F': both or full pivoting, i.e., on both sides. In this
case, m must equal n.

If two calls to ?latmr both have full bandwidth (kl = m - 1 and ku =

n-1), and differ only in the pivtng and pack parameters, then the matrices
generated differs only in the order of the rows and columns, and otherwise
contain the same data. This consistency cannot be maintained with less
than full bandwidth.

ipivot Array, size (n or m) This array specifies the permutation used. After the
basic matrix is generated, the rows, columns, or both are permuted.
If row pivoting is selected, ?latmr starts with the last row and interchanges
row m and row ipivot[m - 1], then moves to the next-to-last row,
interchanging rows [m - 2] and row ipivot[m - 2], and so on. In terms
of "2-cycles", the permutation is (1 ipivot[0]) (2 ipivot[1]) ...
(mipivot[m - 1]) where the rightmost cycle is applied first. This is the
inverse of the effect of pivoting in LINPACK. The idea is that factoring (with
pivoting) an identity matrix which has been inverse-pivoted in this way
should result in a pivot vector identical to ipivot. Not referenced if pivtng
= 'N'.

sparse On entry, specifies the sparsity of the matrix if a sparse matrix is to be

generated. sparse should lie between 0 and 1. To generate a sparse
matrix, for each matrix entry a uniform ( 0, 1 ) random number x is
generated and compared to sparse; if x is larger the matrix entry is
unchanged and if x is smaller the entry is set to zero. Thus on the average
a fraction sparse of the entries is set to zero.

1271
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

kl On entry, specifies the lower bandwidth of the matrix. For example, kl = 0

implies upper triangular, kl = 1 implies upper Hessenberg, and kl at least
m-1 implies the matrix is not banded. Must equal ku if matrix is symmetric
or Hermitian.

ku On entry, specifies the upper bandwidth of the matrix. For example, ku = 0

implies lower triangular, ku = 1 implies lower Hessenberg, and kuat least
n-1 implies the matrix is not banded. Must equal kl if matrix is symmetric
or Hermitian.

anorm On entry, specifies maximum entry of output matrix (output matrix is

multiplied by a constant so that its largest absolute entry equal anorm) if
anorm is nonnegative. If anorm is negative no scaling is done.

pack On entry, specifies packing of matrix as follows:

If pack = 'N': no packing

If pack = 'U': zero out all subdiagonal entries (if symmetric or Hermitian)

If pack = 'L': zero out all superdiagonal entries (if symmetric or

Hermitian)
If pack = 'C': store the upper triangle columnwise (only if matrix
symmetric or Hermitian or square upper triangular)
If pack = 'R': store the lower triangle columnwise (only if matrix
symmetric or Hermitian or square lower triangular) (same as upper half
rowwise if symmetric) (same as conjugate upper half rowwise if Hermitian)
If pack = 'B': store the lower triangle in band storage scheme (only if
matrix symmetric or Hermitian)
If pack = 'Q': store the upper triangle in band storage scheme (only if
matrix symmetric or Hermitian)
If pack = 'Z': store the entire matrix in band storage scheme (pivoting
can be provided for by using this option to store A in the trailing rows of the
allocated storage)
Using these options, the various LAPACK packed and banded storage
schemes can be obtained:

LAPACK storage scheme Value of pack

GB 'Z'
PB, HB or TB 'B' or 'Q'
PP, HP or TP 'C' or 'R'

If two calls to ?latmr differ only in the pack parameter, they generate
mathematically equivalent matrices.

lda On entry, lda specifies the first dimension of a as declared in the calling
program.
If pack = 'N', 'U' or 'L', lda must be at least max( 1, m ).

If pack = 'C' or 'R', lda must be at least 1.

If pack = 'B', or 'Q', lda must be min( ku + 1, n ).

1272
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If pack = 'Z', lda must be at least kuu + kll + 1, where kuu =
min( ku, n-1 ) and kll = min( kl, n-1 ).

iwork Array, size (n or m). Workspace. Not referenced if pivtng = 'N'. Changed
on exit.

Output Parameters

iseed On exit, the seed is changed.

d May be changed on exit if mode is nonzero.

dl On exit, array is changed.

dr On exit, array is changed.

a On exit, a is the desired test matrix. Only those entries of a which are
significant on output is referenced (even if a is in packed or band
storage format). The unoccupied corners of a in band format are
zeroed out.

info If info = 0, the execution is successful.

If info = -1, m is negative or unequal to n and sym = 'S' or 'H'.

If info = -2, n is negative .

If info = -3, dist is an illegal string.

If info = -5, sym is an illegal string..

If info = -7, mode is not in range -6 to 6.

If info = -8, cond is less than 1.0, and mode is neither -6, 0 nor 6.

If info = -10, mode is neither -6, 0 nor 6 and rsign is an illegal

string.
If info = -11, grade is an illegal string, or grade = 'E' and m is
not equal to n, or grade='L', 'R', 'B', 'S' or 'E' and sym =
'H', or grade = 'L', 'R', 'B', 'H' or 'E' and sym = 'S'
If info = -12,grade = 'E'and dl contains zero .

If info = -13, model is not in range -6 to 6 and grade = 'L',

'B', 'H', 'S' or 'E' .
If info = -14, condl is less than 1.0, grade = 'L', 'B', 'H',
'S' or 'E', and model is neither -6, 0 nor 6.
If info = -16, moder is not in range -6 to 6 and grade = 'R' or
'B' .
If info = -17, condr is less than 1.0, grade = 'R' or 'B', and
moder is neither -6, 0 nor 6 .
If info = -18, pivtng is an illegal string, or pivtng = 'B' or 'F'
and m is not equal to n, or pivtng = 'L' or 'R' and sym = 'S' or
'H'.
If info = -19, ipivot contains out of range number and pivtng is
not equal to 'N' .

1273
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If info = -20, kl is negative.

If info = -21, ku is negative, or sym = 'S' or 'H' and ku not

equal to kl .

If info = -22, sparse is not in range 0 to 1.

If info = -24, pack is an illegal string, or pack = 'U', 'L', 'B'

or 'Q' and sym = 'N', or pack = 'C' and sym = 'N' and either
kl is not equal to 0 or n is not equal to m, or pack = 'R' and sym =
'N', and either ku is not equal to 0 or n is not equal to m .
If info = -26, lda is too small .

If info = 1, error return from ?latm1 (computing D ) .

If info = 2, cannot scale to dmax (max. entry is 0) .

If info = 3, error return from ?latm1(computing dl) .

If info = 4, error return from ?latm1(computing dr) .

If info = 5, anorm is positive, but matrix constructed prior to

attempting to scale it to have norm anorm, is zero .

?lauum
Computes the product U*UT(U*UH) or LT*L (LH*L),
where U and L are upper or lower triangular matrices
(blocked algorithm).

Syntax
lapack_int LAPACKE_slauum (int matrix_layout , char uplo , lapack_int n , float * a ,
lapack_int lda );
lapack_int LAPACKE_dlauum (int matrix_layout , char uplo , lapack_int n , double * a ,
lapack_int lda );
lapack_int LAPACKE_clauum (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * a , lapack_int lda );
lapack_int LAPACKE_zlauum (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * a , lapack_int lda );

Include Files
• mkl.h

Description
The routine ?lauum computes the product U*UT or LT*L for real flavors, and U*UH or LH*L for complex
flavors. Here the triangular factor U or L is stored in the upper or lower triangular part of the array a.
If uplo = 'U' or 'u', then the upper triangle of the result is stored, overwriting the factor U in A.

If uplo = 'L' or 'l', then the lower triangle of the result is stored, overwriting the factor L in A.

This is the blocked form of the algorithm, calling BLAS Level 3 Routines.

1274
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
uplo Specifies whether the triangular factor stored in the array a is upper or
lower triangular:
= 'U': Upper triangular

= 'L': Lower triangular

n The order of the triangular factor U or L. n≥ 0.

a Array of size max(1,lda *n).

On entry, the triangular factor U or L.

lda The leading dimension of the array a. lda≥ max(1,n).

Output Parameters

a On exit,
if uplo = 'U', then the upper triangle of a is overwritten with the upper
triangle of the product U*UT(U*UH);

if uplo = 'L', then the lower triangle of a is overwritten with the lower
triangle of the product LT*L (LH*L).

Return Values
This function returns a value info.
If info = 0, the execution is successful.

If info = -k, the k-th parameter had an illegal value.

If info = -1011, memory allocation error occurred.

?syswapr
Applies an elementary permutation on the rows and
columns of a symmetric matrix.

Syntax
lapack_int LAPACKE_ssyswapr (int matrix_layout , char uplo , lapack_int n , float * a ,
lapack_int i1 , lapack_int i2 );
lapack_int LAPACKE_dsyswapr (int matrix_layout , char uplo , lapack_int n , double *
a , lapack_int i1 , lapack_int i2 );
lapack_int LAPACKE_csyswapr (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * a , lapack_int i1 , lapack_int i2 );
lapack_int LAPACKE_zsyswapr (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * a , lapack_int i1 , lapack_int i2 );

Include Files
• mkl.h

Description
The routine applies an elementary permutation on the rows and columns of a symmetric matrix.

1275
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR) or

column major ( LAPACK_COL_MAJOR ).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', the array a stores the upper triangular factor U of the
factorization A = U*D*UT.

If uplo = 'L', the array a stores the lower triangular factor L of the
factorization A = L*D*LT.

n The order of matrix A; n≥ 0.

nrhs The number of right-hand sides; nrhs≥ 0.

a Array of size at least max(1,lda*n).

The array a contains the block diagonal matrix D and the multipliers used to
obtain the factor U or L as computed by ?sytrf.

i1 Index of the first row to swap.

i2 Index of the second row to swap.

Output Parameters

a If info = 0, the symmetric inverse of the original matrix.

If info = 'U', the upper triangular part of the inverse is formed and the part of
A below the diagonal is not referenced.
If info = 'L', the lower triangular part of the inverse is formed and the part of
A above the diagonal is not referenced.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = -1011, memory allocation error occurred.

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR) or

column major ( LAPACK_COL_MAJOR ).

uplo Must be 'U' or 'L'.

Indicates how the input matrix A has been factored:

If uplo = 'U', the array a stores the upper triangular factor U of the
factorization A = U*D*UH.

If uplo = 'L', the array a stores the lower triangular factor L of the
factorization A = L*D*LH.

n The order of matrix A; n≥ 0.

nrhs The number of right-hand sides; nrhs≥ 0.

a Array of size at least max(1,lda*n).

The array a contains the block diagonal matrix D and the multipliers used to
obtain the factor U or L as computed by ?hetrf.

i1 Index of the first row to swap.

i2 Index of the second row to swap.

Output Parameters

a If info = 0, the inverse of the original matrix.

If info = 'U', the upper triangular part of the inverse is formed and the part of
A below the diagonal is not referenced.
If info = 'L', the lower triangular part of the inverse is formed and the part of
A above the diagonal is not referenced.

Return Values
This function returns a value info.

If info = 0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = -1011, memory allocation error occurred.

1277
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major ( LAPACK_COL_MAJOR ).

transr if transr = 'N' or 'n', the normal form of RFP C is stored;

if transr= 'T' or 't', the transpose form of RFP C is stored.

uplo Specifies whether the upper or lower triangular part of the array c is used.

If uplo = 'U' or 'u', then the upper triangular part of the array c is used.

If uplo = 'L' or 'l', then the low triangular part of the array c is used.

trans Specifies the operation:

if trans = 'N' or 'n', then C := alpha*A*AT + beta*C;

if trans = 'T' or 't', then C := alphaATA + beta*C;

n Specifies the order of the matrix C. The value of n must be at least zero.

1278
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
k On entry with trans = 'N' or 'n', k specifies the number of columns of
the matrix A, and on entry with trans = 'T' or 't', k specifies the
number of rows of the matrix A.
The value of k must be at least zero.

alpha Specifies the scalar alpha.

a Array, size max(1,lda*ka), where ka is in the following table:

Col_major Row_major
k n
trans = 'N'
n k
trans = 'T'

Before entry with trans = 'N' or 'n', the leading n-by-k part of the array
a must contain the matrix A, otherwise the leading k-by-n part of the array
a must contain the matrix A.

lda Specifies the leading dimension of a as declared in the calling

(sub)program. lda is defined by the following table:
Col_major Row_major

trans = 'N' max(1,n) max(1,k)

trans = 'T' max(1,k) max(1,n)

beta Specifies the scalar beta.

c Array, size (n*(n+1)/2 ). Before entry contains the symmetric matrix C in

RFP format.

Output Parameters

c If trans = 'N' or 'n', then c contains C := alphaAA' + beta*C;

if trans = 'T' or 't', then c contains C := alphaA'A + beta*C;

Return Values
This function returns a value info.
If info = 0, the execution is successful.

If info < 0, the i-th parameter had an illegal value.

If info = -1011, memory allocation error occurred.

?hfrk
Performs a Hermitian rank-k operation for matrix in
RFP format.

Syntax
lapack_int LAPACKE_chfrk( int matrix_layout, char transr, char uplo, char trans,
lapack_int n, lapack_int k, float alpha, const lapack_complex_float* a, lapack_int lda,
float beta, lapack_complex_float* c );

1279
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lapack_int LAPACKE_zhfrk( int matrix_layout, char transr, char uplo, char trans,
lapack_int n, lapack_int k, double alpha, const lapack_complex_double* a, lapack_int
lda, double beta, lapack_complex_double* c );

Include Files
• mkl.h

Description

The ?hfrk routines perform a matrix-matrix operation using Hermitian matrices. The operation is defined as

C := alpha*A*AH + beta*C,
or

C := alpha*AH*A + beta*C,
where:
alpha and beta are real scalars,
C is an n-by-n Hermitian matrix in RFP format,
A is an n-by-k matrix in the first case and a k-by-n matrix in the second case.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major ( LAPACK_COL_MAJOR ).

transr if transr = 'N' or 'n', the normal form of RFP C is stored;

if transr = 'C' or 'c', the conjugate-transpose form of RFP C is stored.

uplo Specifies whether the upper or lower triangular part of the array c is used.

If uplo = 'U' or 'u', then the upper triangular part of the array c is used.

If uplo = 'L' or 'l', then the low triangular part of the array c is used.

trans Specifies the operation:

if trans = 'N' or 'n', then C := alpha*A*AH + beta*C;

if trans = 'C' or 'c', then C := alphaAHA + beta*C.

n Specifies the order of the matrix C. The value of n must be at least zero.

k On entry with trans = 'N' or 'n', k specifies the number of columns of

the matrix a, and on entry with trans = 'T' or 't' or 'C' or 'c', k
specifies the number of rows of the matrix a.
The value of k must be at least zero.

alpha Specifies the scalar alpha.

a Array, size max(1,lda*ka), where ka is in the following table:

Col_major Row_major
k n
trans = 'N'

1280
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
n k
trans = 'T'

Before entry with trans = 'N' or 'n', the leading n-by-k part of the array
a must contain the matrix A, otherwise the leading k-by-n part of the array
a must contain the matrix A.

lda Specifies the leading dimension of a as declared in the calling

(sub)program. lda is defined by the following table:
Col_major Row_major

trans = 'N' max(1,n) max(1,k)

trans = 'T' max(1,k) max(1,n)

beta Specifies the scalar beta.

c Array, size (n*(n+1)/2 ). Before entry contains the Hermitian matrix C in

in RFP format.

Output Parameters

c If trans = 'N' or 'n', then c contains C := alphaAAH + beta*C;

if trans = 'C' or 'c', then c contains C := alphaAHA + beta*C ;

Return Values
This function returns a value info.
If info = 0, the execution is successful.

If info < 0, the i-th parameter had an illegal value.

If info = -1011, memory allocation error occurred.

?tfsm
Solves a matrix equation (one operand is a triangular
matrix in RFP format).

Syntax
lapack_int LAPACKE_stfsm (int matrix_layout , char transr , char side , char uplo ,
char trans , char diag , lapack_int m , lapack_int n , float alpha , const float * a ,
float * b , lapack_int ldb );
lapack_int LAPACKE_dtfsm (int matrix_layout , char transr , char side , char uplo ,
char trans , char diag , lapack_int m , lapack_int n , double alpha , const double * a ,
double * b , lapack_int ldb );
lapack_int LAPACKE_ctfsm (int matrix_layout , char transr , char side , char uplo ,
char trans , char diag , lapack_int m , lapack_int n , lapack_complex_float alpha ,
const lapack_complex_float * a , lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_ztfsm (int matrix_layout , char transr , char side , char uplo ,
char trans , char diag , lapack_int m , lapack_int n , lapack_complex_double alpha ,
const lapack_complex_double * a , lapack_complex_double * b , lapack_int ldb );

1281
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl.h

Description

The ?tfsm routines solve one of the following matrix equations:

op(A)*X = alpha*B,
or

X*op(A) = alpha*B,
where:
alpha is a scalar,
X and B are m-by-n matrices,
A is a unit, or non-unit, upper or lower triangular matrix in rectangular full packed (RFP) format.
op(A) can be one of the following:
• op(A) = A or op(A) = AT for real flavors
• op(A) = A or op(A) = AH for complex flavors
The matrix B is overwritten by the solution matrix X.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major ( LAPACK_COL_MAJOR ).

transr if transr = 'N' or 'n', the normal form of RFP A is stored;

if transr = 'T' or 't', the transpose form of RFP A is stored;

if transr = 'C' or 'c', the conjugate-transpose form of RFP A is stored.

side Specifies whether op(A) appears on the left or right of X in the equation:

if side = 'L' or 'l', then op(A)X = alphaB;

if side = 'R' or 'r', then Xop(A) = alphaB.

uplo Specifies whether the RFP matrix A is upper or lower triangular:

if uplo = 'U' or 'u', then the matrix is upper triangular;

if uplo = 'L' or 'l', then the matrix is low triangular.

trans Specifies the form of op(A) used in the matrix multiplication:

if trans = 'N' or 'n', then op(A) = A;

if trans = 'T' or 't', then op(A) = A';

if trans = 'C' or 'c', then op(A) = conjg(A').

diag Specifies whether the RFP matrix A is unit triangular:

if diag = 'U' or 'u' then the matrix is unit triangular;

if diag = 'N' or 'n', then the matrix is not unit triangular.

1282
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
m Specifies the number of rows of B. The value of m must be at least zero.

n Specifies the number of columns of B. The value of n must be at least zero.

alpha Specifies the scalar alpha.

When alpha is zero, then a is not referenced and b need not be set before
entry.

a Array, size (n*(n+1)/2). Contains the matrix A in RFP format.

b Array, size max(1, ldb*n) for column major and max(1, ldb*m) for row
major.
Before entry, the leading m-by-n part of the array b must contain the right-
hand side matrix B.

ldb Specifies the leading dimension of b as declared in the calling

(sub)program. The value of ldb must be at least max(1, m) for column
major and max(1,n) for row major.

Output Parameters

b Overwritten by the solution matrix X.

Return Values
This function returns a value info.
If info = 0, the execution is successful.

If info < 0, the i-th parameter had an illegal value.

If info = -1011, memory allocation error occurred.

?tfttp
Copies a triangular matrix from the rectangular full
packed format (TF) to the standard packed format
(TP) .

Syntax
lapack_int LAPACKE_stfttp (int matrix_layout , char transr , char uplo , lapack_int n ,
const float * arf , float * ap );
lapack_int LAPACKE_dtfttp (int matrix_layout , char transr , char uplo , lapack_int n ,
const double * arf , double * ap );
lapack_int LAPACKE_ctfttp (int matrix_layout , char transr , char uplo , lapack_int n ,
const lapack_complex_float * arf , lapack_complex_float * ap );
lapack_int LAPACKE_ztfttp (int matrix_layout , char transr , char uplo , lapack_int n ,
const lapack_complex_double * arf , lapack_complex_double * ap );

Include Files
• mkl.h

Description

1283
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The routine copies a triangular matrix A from the Rectangular Full Packed (RFP) format to the standard
packed format. For the description of the RFP format, see Matrix Storage Schemes.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major ( LAPACK_COL_MAJOR ).

transr = 'N': arf is in the Normal format,

= 'T': arf is in the Transpose format (for stfttp and dtfttp),

= 'C': arf is in the Conjugate-transpose format (for ctfttp and ztfttp).

uplo Specifies whether A is upper or lower triangular:

= 'U': A is upper triangular,
= 'L': A is lower triangular.

n The order of the matrix A. n≥ 0.

arf Array, size at least max (1, n*(n+1)/2).

On entry, the upper or lower triangular matrix A stored in the RFP format.

Output Parameters

ap Array, size at least max (1, n*(n+1)/2).

On exit, the upper or lower triangular matrix A, packed columnwise in a

linear array.

Return Values
This function returns a value info.
If info = 0, the execution is successful.

If info < 0, the i-th parameter had an illegal value.

If info = -1011, memory allocation error occurred.

?tfttr
Copies a triangular matrix from the rectangular full
packed format (TF) to the standard full format (TR) .

Syntax
lapack_int LAPACKE_stfttr (int matrix_layout , char transr , char uplo , lapack_int n ,
const float * arf , float * a , lapack_int lda );
lapack_int LAPACKE_dtfttr (int matrix_layout , char transr , char uplo , lapack_int n ,
const double * arf , double * a , lapack_int lda );
lapack_int LAPACKE_ctfttr (int matrix_layout , char transr , char uplo , lapack_int n ,
const lapack_complex_float * arf , lapack_complex_float * a , lapack_int lda );
lapack_int LAPACKE_ztfttr (int matrix_layout , char transr , char uplo , lapack_int n ,
const lapack_complex_double * arf , lapack_complex_double * a , lapack_int lda );

1284
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h

Description

The routine copies a triangular matrix A from the Rectangular Full Packed (RFP) format to the standard full
format. For the description of the RFP format, see Matrix Storage Schemes.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major ( LAPACK_COL_MAJOR ).

transr = 'N': arf is in the Normal format,

= 'T': arf is in the Transpose format (for stfttr and dtfttr),

= 'C': arf is in the Conjugate-transpose format (for ctfttr and ztfttr).

uplo Specifies whether A is upper or lower triangular:

= 'U': A is upper triangular,
= 'L': A is lower triangular.

n The order of the matrices arf and a. n≥ 0.

arf Array, size at least max (1, n*(n+1)/2).

On entry, the upper or lower triangular matrix A stored in the RFP

format.
lda The leading dimension of the array a. lda ≥ max(1,n).

Output Parameters

a Array, size max(1,lda *n).

On exit, the triangular matrix A. If uplo = 'U', the leading n-by-n upper
triangular part of the array a contains the upper triangular matrix, and the
strictly lower triangular part of a is not referenced. If uplo = 'L', the leading
n-by-n lower triangular part of the array a contains the lower triangular
matrix, and the strictly upper triangular part of a is not referenced.

Return Values
This function returns a value info.
If info = 0, the execution is successful.

If info < 0, the i-th parameter had an illegal value.

If info = -1011, memory allocation error occurred.

1285
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

?tpqrt2
Computes a QR factorization of a real or complex
"triangular-pentagonal" matrix, which is composed of
a triangular block and a pentagonal block, using the
compact WY representation for Q.

Syntax
lapack_int LAPACKE_stpqrt2 (int matrix_layout, lapack_int m, lapack_int n, lapack_int
l, float * a, lapack_int lda, float * b, lapack_int ldb, float * t, lapack_int ldt);
lapack_int LAPACKE_dtpqrt2 (int matrix_layout, lapack_int m, lapack_int n, lapack_int
l, double * a, lapack_int lda, double * b, lapack_int ldb, double * t, lapack_int ldt);
lapack_int LAPACKE_ctpqrt2 (int matrix_layout, lapack_int m, lapack_int n, lapack_int
l, lapack_complex_float * a, lapack_int lda, lapack_complex_float * b, lapack_int ldb,
lapack_complex_float * t, lapack_int ldt );
lapack_int LAPACKE_ztpqrt2 (int matrix_layout, lapack_int m, lapack_int n, lapack_int
l, lapack_complex_double * a, lapack_int lda, lapack_complex_double * b, lapack_int
ldb, lapack_complex_double * t, lapack_int ldt );

Include Files
• mkl.h

Description

The input matrix C is an (n+m)-by-n matrix

where A is an n-by-n upper triangular matrix, and B is an m-by-n pentagonal matrix consisting of an (m-l)-
by-n rectangular matrix B1 on top of an l-by-n upper trapezoidal matrix B2:

The upper trapezoidal matrix B2 consists of the first l rows of an n-by-n upper triangular matrix, where 0 ≤
l ≤ min(m,n). If l=0, B is an m-by-n rectangular matrix. If m=l=n, B is upper triangular. The matrix W
contains the elementary reflectors H(i) in the ith column below the diagonal (of A) in the (n+m)-by-n input
matrix C so that W can be represented as

1286
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

Thus, V contains all of the information needed for W, and is returned in array b.

NOTE
V has the same form as B:

The columns of V represent the vectors which define the H(i)s.

The (m+n)-by-(m+n) block reflector H is then given by

H = I - WTWT for real flavors, and

H = I - W*T*WH for complex flavors
where WT is the transpose of W, WH is the conjugate transpose of W, and T is the upper triangular factor of
the block reflector.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major ( LAPACK_COL_MAJOR ).

m The total number of rows in the matrix B (m ≥ 0).

n The number of columns in B and the order of the triangular matrix A (n ≥

0).

l The number of rows of the upper trapezoidal part of B (min(m, n) ≥ l ≥ 0).

a, b Arrays: a, size max(1, lda *n) contains the n-by-n upper triangular matrix
A.
b, size max(1,ldb* n) for column major and max(1,ldb*m) for row major,
the pentagonal m-by-n matrix B. The first (m-l) rows contain the
rectangular B1 matrix, and the next l rows contain the upper trapezoidal
B2 matrix.

lda The leading dimension of a; at least max(1, n).

ldb The leading dimension of b; at least max(1, m) for column major and
max(1,n) for row major.

1287
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ldt The leading dimension of t; at least max(1, n).

Output Parameters

a The elements on and above the diagonal of the array contain the upper
triangular matrix R.

b The pentagonal matrix V.

t Array, size max(1, ldt *n).

The upper n-by-n upper triangular factor T of the block reflector.

Return Values
This function returns a value info.
If info = 0, the execution is successful.

If info < 0 and info = -i, the ith argument had an illegal value.

If info = -1011, memory allocation error occurred.

?tprfb
Applies a real or complex "triangular-pentagonal"
blocked reflector to a real or complex matrix, which is
composed of two blocks.

Syntax
lapack_int LAPACKE_stprfb (int matrix_layout, char side, char trans, char direct, char
storev, lapack_int m, lapack_int n, lapack_int k, lapack_int l, const float * v,
lapack_int ldv, const float * t, lapack_int ldt, float * a, lapack_int lda, float * b,
lapack_int ldb);
lapack_int LAPACKE_dtprfb (int matrix_layout, char side, char trans, char direct, char
storev, lapack_int m, lapack_int n, lapack_int k, lapack_int l, const double * v,
lapack_int ldv, const double * t, lapack_int ldt, double * a, lapack_int lda, double *
b, lapack_int ldb);
lapack_int LAPACKE_ctprfb (int matrix_layout, char side, char trans, char direct, char
storev, lapack_int m, lapack_int n, lapack_int k, lapack_int l, const
lapack_complex_float * v, lapack_int ldv, const lapack_complex_float * t, lapack_int
ldt, lapack_complex_float * a, lapack_int lda, lapack_complex_float * b, lapack_int
ldb);
lapack_int LAPACKE_ztprfb (int matrix_layout, char side, char trans, char direct, char
storev, lapack_int m, lapack_int n, lapack_int k, lapack_int l, const
lapack_complex_double * v, lapack_int ldv, const lapack_complex_double * t, lapack_int
ldt, lapack_complex_double * a, lapack_int lda, lapack_complex_double * b, lapack_int
ldb);

Include Files
• mkl.h

Description

1288
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The ?tprfb routine applies a real or complex "triangular-pentagonal" block reflector H, HT, or HH from either
the left or the right to a real or complex matrix C, which is composed of two blocks A and B.
The block B is m-by-n. If side = 'R', A is m-by-k, and if side = 'L', A is of size k-by-n.

The pentagonal matrix V is composed of a rectangular block V1 and a trapezoidal block V2. The size of the
trapezoidal block is determined by the parameter l, where 0≤l≤k. if l=k, the V2 block of V is triangular; if
l=0, there is no trapezoidal block, thus V = V1 is rectangular.

direct='F' direct='B'
storev='C'

V2 is upper trapezoidal (first l rows of k-by-k V2 is lower trapezoidal (last l rows of k-by-k
upper triangular) lower triangular matrix)
storev='R'

V2 is lower trapezoidal (first l columns of k- V2 is upper trapezoidal (last l columns of k-

by-k lower triangular matrix) by-k upper triangular matrix)

side='L' side='R'
storev='C'
V is m-by-k V is n-by-k

V2 is l-by-k V2 is l-by-k
storev='R'
V is k-by-m V is k-by-n

V2 is k-by-l V2 is k-by-l

1289
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major ( LAPACK_COL_MAJOR ).

side = 'L': apply H, HT, or HH from the left,

= 'R': apply H, HT, or HH from the right.

trans = 'N': apply H (no transpose),

= 'T': apply HT (transpose),
= 'C': apply HH (conjugate transpose).

direct Indicates how H is formed from a product of elementary reflectors:

= 'F': H = H(1) H(2) . . . H(k) (Forward),

= 'B': H = H(k) . . . H(2) H(1) (Backward).

storev Indicates how the vectors that define the elementary reflectors are stored:
= 'C': Columns,
= 'R': Rows.

m The total number of rows in the matrix B (m ≥ 0).

n The number of columns in B (n ≥ 0).

k The order of the matrix T, which is the number of elementary reflectors

whose product defines the block reflector. (k ≥ 0)

l The order of the trapezoidal part of V. (k ≥ l ≥ 0).

v An array containing the pentagonal matrix V (the elementary reflectors

H(1), H(2), …, H(k). The size limitations depend on values of
parameters storev and side as described in the following table

storev = C storev = R

side = L side = R side = L side = R

Column max(1,ldv* max(1,ldv* max(1,ldv* max(1,ldv*

major k) k) m) n)

Row major max(1,ldv* max(1,ldv* max(1,ldv* max(1,ldv*

m) n) k) k)

ldv The leading dimension of the array v.It should satisfy the following
conditions:

storev = C storev = R

side = L side = R side = L side = R

Column max(1,m) max(1,n) max(1,k) max(1,k)

major

Row major max(1,k) max(1,k) max(1,m) max(1,n)

1290
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
t Array size max(1,ldt * k). The triangular k-by-k matrix T in the
representation of the block reflector.

ldt The leading dimension of the array t (ldt ≥ k).

a size should satisfy the following conditions:

k if side = 'R'.

side = L side = R
Column major
max(1,lda*n) max(1,lda*k)
Row major
max(1,lda*k) max(1,lda*m)

The k-by-n or m-by-k matrix A.

lda The leading dimension of the array a should satisfy the following conditions:

side = L side = R
Column major
max(1,k) max(1,m)
Row major
max(1,n) max(1,k)

b Array size at least max(1, ldb *n) for column major layout and max(1, ldb
*m) for row major layout, the m-by-n matrix B.

ldb The leading dimension of the array b (ldb ≥ max(1, m) for column major
layout and ldb ≥ max(1, n) for row major layout).

Output Parameters

a Contains the corresponding block of H*C, HT*C, HH*C, C*H, C*HT, or C*HH.

b Contains the corresponding block of H*C, HT*C, HH*C, C*H, C*HT, or C*HH.

Return Values
This function returns a value info.
If info = 0, the execution is successful.

If info < 0, the i-th parameter had an illegal value.

If info = -1011, memory allocation error occurred.

?tpttf
Copies a triangular matrix from the standard packed
format (TP) to the rectangular full packed format (TF).

Syntax
lapack_int LAPACKE_stpttf (int matrix_layout , char transr , char uplo , lapack_int n ,
const float * ap , float * arf );
lapack_int LAPACKE_dtpttf (int matrix_layout , char transr , char uplo , lapack_int n ,
const double * ap , double * arf );
lapack_int LAPACKE_ctpttf (int matrix_layout , char transr , char uplo , lapack_int n ,
const lapack_complex_float * ap , lapack_complex_float * arf );

1291
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lapack_int LAPACKE_ztpttf (int matrix_layout , char transr , char uplo , lapack_int n ,

const lapack_complex_double * ap , lapack_complex_double * arf );

Include Files
• mkl.h

Description

The routine copies a triangular matrix A from the standard packed format to the Rectangular Full Packed
(RFP) format. For the description of the RFP format, see Matrix Storage Schemes.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major ( LAPACK_COL_MAJOR).

transr = 'N': arf must be in the Normal format,

= 'T': arf must be in the Transpose format (for stpttf and dtpttf),

= 'C': arf must be in the Conjugate-transpose format (for ctpttf and

ztpttf).

uplo Specifies whether A is upper or lower triangular:

= 'U': A is upper triangular,
= 'L': A is lower triangular.

n The order of the matrix A. n≥ 0.

ap Array, size at least max (1, n*(n+1)/2).

On entry, the upper or lower triangular matrix A, packed in a linear array.

See Matrix Storage Schemes for more information.

Output Parameters

arf Array, size at least max (1, n*(n+1)/2).

On exit, the upper or lower triangular matrix A stored in the RFP

format.

Return Values
This function returns a value info.
If info = 0, the execution is successful.

< 0: if info = -i, the i-th parameter had an illegal value.

If info = -1011, memory allocation error occurred.

?tpttr
Copies a triangular matrix from the standard packed
format (TP) to the standard full format (TR) .

1292
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_stpttr (int matrix_layout , char uplo , lapack_int n , const float *
ap , float * a , lapack_int lda );
lapack_int LAPACKE_dtpttr (int matrix_layout , char uplo , lapack_int n , const double
* ap , double * a , lapack_int lda );
lapack_int LAPACKE_ctpttr (int matrix_layout , char uplo , lapack_int n , const
lapack_complex_float * ap , lapack_complex_float * a , lapack_int lda );
lapack_int LAPACKE_ztpttr (int matrix_layout , char uplo , lapack_int n , const
lapack_complex_double * ap , lapack_complex_double * a , lapack_int lda );

Include Files
• mkl.h

Description

The routine copies a triangular matrix A from the standard packed format to the standard full format.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major ( LAPACK_COL_MAJOR ).

uplo Specifies whether A is upper or lower triangular:

= 'U': A is upper triangular,
= 'L': A is lower triangular.

n The order of the matrices ap and a. n≥ 0.

ap Array, size at least max (1, n*(n+1)/2). (see Matrix Storage Schemes).

lda The leading dimension of the array a. lda ≥ max(1,n).

Output Parameters

a Array, size max(1,lda*n).

On exit, the triangular matrix A. If uplo = 'U', the leading n-by-n upper
triangular part of the array a contains the upper triangular part of the
matrix A, and the strictly lower triangular part of a is not referenced. If
uplo = 'L', the leading n-by-n lower triangular part of the array a contains
the lower triangular part of the matrix A, and the strictly upper triangular
part of a is not referenced.

Return Values
This function returns a value info.
If info = 0, the execution is successful.

If info = -i, the i-th parameter had an illegal value.

If info = -1011, memory allocation error occurred.

1293
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

?trttf
Copies a triangular matrix from the standard full
format (TR) to the rectangular full packed format (TF).

Syntax
lapack_int LAPACKE_strttf (int matrix_layout , char transr , char uplo , lapack_int n ,
const float * a , lapack_int lda , float * arf );
lapack_int LAPACKE_dtrttf (int matrix_layout , char transr , char uplo , lapack_int n ,
const double * a , lapack_int lda , double * arf );
lapack_int LAPACKE_ctrttf (int matrix_layout , char transr , char uplo , lapack_int n ,
const lapack_complex_float * a , lapack_int lda , lapack_complex_float * arf );
lapack_int LAPACKE_ztrttf (int matrix_layout , char transr , char uplo , lapack_int n ,
const lapack_complex_double * a , lapack_int lda , lapack_complex_double * arf );

Include Files
• mkl.h

Description

The routine copies a triangular matrix A from the standard full format to the Rectangular Full Packed (RFP)
format. For the description of the RFP format, see Matrix Storage Schemes.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major ( LAPACK_COL_MAJOR ).

transr = 'N': arf must be in the Normal format,

= 'T': arf must be in the Transpose format (for strttf and dtrttf),

= 'C': arf must be in the Conjugate-transpose format (for ctrttf and

ztrttf).

uplo Specifies whether A is upper or lower triangular:

= 'U': A is upper triangular,
= 'L': A is lower triangular.

n The order of the matrix A. n≥ 0.

a Array, size max(1,(lda*n)).

On entry, the triangular matrix A. If uplo = 'U', the leading n-by-n upper
triangular part of the array a contains the upper triangular matrix, and the
strictly lower triangular part of a is not referenced. If uplo = 'L', the leading
n-by-n lower triangular part of the array a contains the lower triangular
matrix, and the strictly upper triangular part of a is not referenced.

lda The leading dimension of the array a. lda ≥ max(1,n).

1294
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

arf Array, size at least max (1, n*(n+1)/2).

On exit, the upper or lower triangular matrix A stored in the RFP

format.

Return Values
This function returns a value info.
If info = 0, the execution is successful.

If info < 0, the i-th parameter had an illegal value.

If info = -1011, memory allocation error occurred.

?trttp
Copies a triangular matrix from the standard full
format (TR) to the standard packed format (TP) .

Syntax
lapack_int LAPACKE_strttp (int matrix_layout , char uplo , lapack_int n , const float *
a , lapack_int lda , float * ap );
lapack_int LAPACKE_dtrttp (int matrix_layout , char uplo , lapack_int n , const double
* a , lapack_int lda , double * ap );
lapack_int LAPACKE_ctrttp (int matrix_layout , char uplo , lapack_int n , const
lapack_complex_float * a , lapack_int lda , lapack_complex_float * ap );
lapack_int LAPACKE_ztrttp (int matrix_layout , char uplo , lapack_int n , const
lapack_complex_double * a , lapack_int lda , lapack_complex_double * ap );

Include Files
• mkl.h

Description

The routine copies a triangular matrix A from the standard full format to the standard packed format.

Input Parameters

uplo Specifies whether A is upper or lower triangular:

= 'U': A is upper triangular,
= 'L': A is lower triangular.

n The order of the matrix A, n≥ 0.

a Array, size max(1, lda *n).

1295
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lda The leading dimension of the array a. lda ≥ max(1,n).

Output Parameters

ap Array, size at least max (1, n*(n+1)/2).

On exit, the upper or lower triangular matrix A, packed columnwise in a

linear array. (see Matrix Storage Schemes)

Return Values
This function returns a value info.
If info = 0, the execution is successful.

If info < 0, the i-th parameter had an illegal value.

If info = -1011, memory allocation error occurred.

?lacp2
Copies all or part of a real two-dimensional array to a
complex array.

Syntax
lapack_int LAPACKE_clacp2 (int matrix_layout , char uplo , lapack_int m , lapack_int
n , const float * a , lapack_int lda , lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zlacp2 (int matrix_layout , char uplo , lapack_int m , lapack_int
n , const double * a , lapack_int lda , lapack_complex_double * b , lapack_int ldb );

Include Files
• mkl.h

Description

The routine copies all or part of a real matrix A to another matrix B.

Input Parameters

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).

uplo Specifies the part of the matrix A to be copied to B.

If uplo = 'U', the upper triangular part of A;

if uplo = 'L', the lower triangular part of A.

Otherwise, all of the matrix A is copied.

m The number of rows in the matrix A (m≥ 0).

n The number of columns in A (n≥ 0).

a Array, size at least max(1,ldan) for column major and max(1,ldam)

for row major, contains the m-by-n matrix A.

1296
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If uplo = 'U', only the upper triangle or trapezoid is accessed; if uplo =
'L', only the lower triangle or trapezoid is accessed.

lda The leading dimension of a; lda≥ max(1, m) for column major and lda≥
max(1, n) for row major.

ldb The leading dimension of the output array b; ldb≥ max(1, m) for column
major and ldb≥ max(1, n) for row major.

Output Parameters

b Array, size at least max(1,ldb*n) for column major layout and

max(1,ldb*m) for row major layout, contains the m-by-n matrix B.
On exit, B = A in the locations specified by uplo.

Return Values
This function returns a value info.
If info = 0, the execution is successful.

If info < 0, the i-th parameter had an illegal value.

If info = -1011, memory allocation error occurred.

?larcm
Multiplies a square real matrix by a complex matrix.

Syntax
lapack_int LAPACKE_clarcm(int matrix_layout,lapack_int m,lapack_int n,const float
*a,lapack_int lda,const lapack_complex_float * b,lapack_int ldb,lapack_complex_float *
c,lapack_int ldc);
lapack_int LAPACKE_zlarcm(int matrix_layout,lapack_int m,lapack_int n,const double *
a,lapack_int lda,const lapack_complex_double *b,lapack_int ldb,lapack_complex_double
*c ,lapack_int ldc);

Description

The routine performs a simple matrix-matrix multiplication of the form

C = A*B,
where A is m-by-m and real, B is m-by-n and complex, and C is m-by-n and complex.

Input Parameters

m The number of rows and columns of matrix A and the number of rows of
matrix C (m≥ 0).

n The number of columns of matrix B and the number of columns of matrix C

(n≥ 0).

a Array, size [lda* m]. Contains the m-by-m matrix A.

1297
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lda The leading dimension of the array a, lda≥max(1, m).

b Array, size(ldb, n). Contains the m-by-n matrix B.

ldb The leading dimension of the array b, ldb≥max(1, m) for column-major

layout; ldb≥max(1, n) for row-major layout .

ldc The leading dimension of the array c, ldc≥max(1, m) for column-major

layout; ldc≥max(1, n) for row-major layout .

Output Parameters

c Array, size (ldc, n). Contains the m-by-n matrix C.

Return Values
This function returns a value info. If info = 0, the execution is successful. If info = -i, parameter i had
an illegal value.

mkl_?tppack
Copies a triangular/symmetric matrix or submatrix
from standard full format to standard packed format.

Syntax
lapack_int LAPACKE_mkl_stppack (int matrix_layout, char uplo, char trans, lapack_int n,
float* ap, lapack_int i, lapack_int j, lapack_int rows, lapack_int cols, const float* a,
lapack_int lda);
lapack_int LAPACKE_mkl_dtppack (int matrix_layout, char uplo, char trans, lapack_int n,
double* ap, lapack_int i, lapack_int j, lapack_int rows, lapack_int cols, const double*
a, lapack_int lda);
lapack_int LAPACKE_mkl_ctppack (int matrix_layout, char uplo, char trans, lapack_int n,
MKL_Complex8* ap, lapack_int i, lapack_int j, lapack_int rows, lapack_int cols, const
MKL_Complex8* a, lapack_int lda);
lapack_int LAPACKE_mkl_ztppack (int matrix_layout, char uplo, char trans, lapack_int n,
MKL_Complex16* ap, lapack_int i, lapack_int j, lapack_int rows, lapack_int cols, const
MKL_Complex16* a, lapack_int lda);

Include Files
• mkl.h

Description
The routine copies a triangular or symmetric matrix or its submatrix from standard full format to packed
format

APi:i+rows-1, j:j+cols-1 := op(A)

Standard packed formats include:

• TP: triangular packed storage

• SP: symmetric indefinite packed storage
• HP: Hermitian indefinite packed storage
• PP: symmetric or Hermitian positive definite packed storage

Full formats include:

1298
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• GE: general
• TR: triangular
• SY: symmetric indefinite
• HE: Hermitian indefinite
• PO: symmetric or Hermitian positive definite

NOTE
Any elements of the copied submatrix rectangular outside of the triangular part of the
matrix AP are skipped.

uplo Specifies whether the matrix AP is upper or lower triangular.

If uplo = 'U', AP is upper triangular.

If uplo = 'L': AP is lower triangular.

trans Specifies whether or not the copied block of A is transposed or not.

If trans = 'N', no transpose: op(A) = A.

If trans = 'T',transpose: op(A) = AT.

If trans = 'C',conjugate transpose: op(A) = AH. For real data this is the
same as trans = 'T'.

n The order of the matrix AP; n ≥ 0

i, j Coordinates of the left upper corner of the destination submatrix in AP.

If uplo=’U’, 1 ≤i≤j≤n.

If uplo=’L’, 1 ≤j≤i≤n.

rows Number of rows in the destination submatrix. 0 ≤rows≤n - i + 1.

cols Number of columns in the destination submatrix. 0 ≤cols≤n - j + 1.

a Pointer to the source submatrix.

Array a contains the rows-by-cols submatrix stored as unpacked rows-by-
columns if trans = ’N’, or unpacked columns-by-rows if trans = ’T’ or
trans = ’C’.
The size of a is

trans = 'N' trans='T' or

trans='C'

matrix_layout = ldacols ldarows

LAPACK_COL_MAJOR

matrix_layout = ldarows ldacols

LAPACK_ROW_MAJOR

1299
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

NOTE
If there are elements outside of the triangular part of AP, they
are skipped and are not copied from a.

lda The leading dimension of the array a.

trans = 'N' trans='T' or

trans='C'

matrix_layout = lda≥ max(1, lda≥ max(1, cols)

LAPACK_COL_MAJOR rows)

matrix_layout = lda≥ max(1, lda≥ max(1, rows)

LAPACK_ROW_MAJOR cols)

Output Parameters

ap Array of size at least max(1, n(n+1)/2). The array ap contains either

the upper or the lower triangular part of the matrix AP (as specified by
uplo) in packed storage (see Matrix Storage Schemes). The submatrix
of ap from row i to row i + rows - 1 and column j to column j +
cols - 1 is overwritten with a copy of the source matrix.

Return Values
This function returns a value info. If info=0, the execution is successful. If info = -i, the i-th parameter
had an illegal value.

mkl_?tpunpack
Copies a triangular/symmetric matrix or submatrix
from standard packed format to full format.

Syntax
lapack_int LAPACKE_mkl_stpunpack ( int matrix_layout, char uplo, char trans,
lapack_int n, const float* ap, lapack_int i, lapack_int j, lapack_int rows,
lapack_int cols, float* a, lapack_int lda );
lapack_int LAPACKE_mkl_dtpunpack ( int matrix_layout, char uplo, char trans,
lapack_int n, const double* ap, lapack_int i, lapack_int j, lapack_int rows,
lapack_int cols, double* a, lapack_int lda );
lapack_int LAPACKE_mkl_ctpunpack ( int matrix_layout, char uplo, char trans,
lapack_int n, const MKL_Complex8* ap, lapack_int i, lapack_int j, lapack_int rows,
lapack_int cols, MKL_Complex8* a, lapack_int lda );
lapack_int LAPACKE_mkl_ztpunpack ( int matrix_layout, char uplo, char trans,
lapack_int n, const MKL_Complex16* ap, lapack_int i, lapack_int j, lapack_int rows,
lapack_int cols, MKL_Complex16* a, lapack_int lda );

Include Files
• mkl.h

1300
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
The routine copies a triangular or symmetric matrix or its submatrix from standard packed format to full
format.

A := op(APi:i+rows-1, j:j+cols-1)

Standard packed formats include:

• TP: triangular packed storage

• SP: symmetric indefinite packed storage
• HP: Hermitian indefinite packed storage
• PP: symmetric or Hermitian positive definite packed storage

Full formats include:

• GE: general
• TR: triangular
• SY: symmetric indefinite
• HE: Hermitian indefinite
• PO: symmetric or Hermitian positive definite

NOTE
Any elements of the copied submatrix rectangular outside of the triangular part of AP are
skipped.

matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).

uplo Specifies whether matrix AP is upper or lower triangular.

If uplo = 'U', AP is upper triangular.

If uplo = 'L': AP is lower triangular.

trans Specifies whether or not the copied block of AP is transposed.

If trans = 'N', no transpose: op(AP) = AP.

If trans = 'T',transpose: op(AP) = APT.

If trans = 'C',conjugate transpose: op(AP) = APH. For real data this

is the same as trans = 'T'.

n The order of the matrix AP; n ≥ 0.

ap Array, size at least max(1, n(n+1)/2). The array ap contains either

the upper or the lower triangular part of the matrix AP (as specified by
uplo) in packed storage (see Matrix Storage Schemes). It is the
source for the submatrix of AP from row i to row i + rows - 1 and
column j to column j + cols - 1 to be copied.

i, j Coordinates of left upper corner of the submatrix in AP to copy.

If uplo=’U’, 1 ≤i≤j≤n.

1301
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If uplo=’L’, 1 ≤j≤i≤n.

rows Number of rows to copy. 0 ≤rows≤n - i + 1.

cols Number of columns to copy. 0 ≤cols≤n - j + 1.

lda The leading dimension of array a.

trans = 'N' trans='T' or

trans='C'

matrix_layout = lda≥ max(1,rows) lda≥ max(1,cols)

LAPACK_COL_MAJOR

matrix_layout = lda≥ max(1,cols) lda≥ max(1,rows)

LAPACK_ROW_MAJOR

Output Parameters

a Pointer to the destination matrix. On exit, array a is overwritten with a

copy of the unpacked rows-by-cols submatrix of ap unpacked rows-
by-columns if trans = ’N’, or unpacked columns-by-rows if trans
= ’T’ or trans = ’C’.

The size of a is

trans = 'N' trans='T' or

trans='C'

matrix_layout = ldacols ldarows

LAPACK_COL_MAJOR

matrix_layout = ldarows ldacols

LAPACK_ROW_MAJOR

NOTE
If there are elements outside of the triangular part of ap
indicated by uplo, they are skipped and are not copied to
a.

Return Values
This function returns a value info. If info=0, the execution is successful. If info = -i, the i-th parameter
had an illegal value.

LAPACK Utility Functions and Routines

This section describes LAPACK utility functions and routines.
Summary information about these routines is given in the following table:

1302
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
LAPACK Utility Routines
Routine Name Data Description
Types

ilaver Returns the version of the Lapack library.

ilaenv Environmental enquiry function which returns values for tuning

algorithmic performance.

?lamch s, d Determines machine parameters for floating-point arithmetic.

See Also
lsame Tests two characters for equality regardless of the case.
lsamen Tests two character strings for equality regardless of the case.
second/dsecnd Returns elapsed time in seconds. Use to estimate real time between two calls to
this function.
xerbla Error handling function called by BLAS, LAPACK, Vector Math, and Vector Statistics
functions.

ilaver
Returns the version of the LAPACK library.

Syntax
void LAPACKE_ilaver (lapack_int * vers_major, lapack_int * vers_minor, lapack_int *
vers_patch);

Include Files
• mkl.h

Description
This routine returns the version of the LAPACK library.

Output Parameters

vers_major Returns the major version of the LAPACK library.

vers_minor Returns the minor version from the major version of the LAPACK library.

vers_patch Returns the patch version from the minor version of the LAPACK library.

ilaenv
Environmental enquiry function that returns values for
tuning algorithmic performance.

Syntax
MKL_INT ilaenv (const MKL_INT *ispec, const char *name, const char *opts, const MKL_INT
*n1, const MKL_INT *n2, const MKL_INT *n3, const MKL_INT *n4);

Include Files
• mkl.h

1303
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Description
The enquiry function ilaenv is called from the LAPACK routines to choose problem-dependent parameters
for the local environment. See ispec below for a description of the parameters.
This version provides a set of parameters that should give good, but not optimal, performance on many of
the currently available computers.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

ispec Specifies the parameter to be returned as the value of ilaenv:

= 1: the optimal blocksize; if this value is 1, an unblocked algorithm will

give the best performance.
= 2: the minimum block size for which the block routine should be used; if
the usable block size is less than this value, an unblocked routine should be
used.
= 3: the crossover point (in a block routine, for n less than this value, an
unblocked routine should be used)
= 4: the number of shifts, used in the nonsymmetric eigenvalue routines
(deprecated)
= 5: the minimum column dimension for blocking to be used; rectangular
blocks must have dimension at least k-by-m, where k is given by
ilaenv(2,...) and m by ilaenv(5,...)
= 6: the crossover point for the SVD (when reducing an m-by-n matrix to
bidiagonal form, if max(m,n)/min(m,n) exceeds this value, a QR
factorization is used first to reduce the matrix to a triangular form.)
= 7: the number of processors
= 8: the crossover point for the multishift QR and QZ methods for
nonsymmetric eigenvalue problems (deprecated).
= 9: maximum size of the subproblems at the bottom of the computation
tree in the divide-and-conquer algorithm (used by ?gelsd and ?gesdd)

=10: ieee NaN arithmetic can be trusted not to trap

=11: infinity arithmetic can be trusted not to trap
12 ≤ ispec ≤ 16: ?hseqr or one of its subroutines, see iparmq for detailed
explanation.

name The name of the calling subroutine, in either upper case or lower case.

opts The character options to the subroutine name, concatenated into a single
character string. For example, uplo = 'U', trans = 'T', and diag =
'N' for a triangular routine would be specified as opts = 'UTN'.

1304
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

NOTE
Use only uppercase characters for the opts string.

n1, n2, n3, n4 Problem dimensions for the subroutine name; these may not all be
required.

Output Parameters

value If value≥ 0: the value of the parameter specified by ispec;

If value = -k < 0: the k-th argument had an illegal value.

Return Values
ilaenv returns value.
If value≥ 0: the value of the parameter specified by ispec;

If value = -k < 0: the k-th argument had an illegal value.

Application Notes
The following conventions have been used when calling ilaenv from the LAPACK routines:

1. opts is a concatenation of all of the character options to subroutine name, in the same order that they
appear in the argument list for name, even if they are not used in determining the value of the
parameter specified by ispec.
2. The problem dimensions n1, n2, n3, n4 are specified in the order that they appear in the argument list
for name. n1 is used first, n2 second, and so on, and unused problem dimensions are passed a value of
-1.
3. The parameter value returned by ilaenv is checked for validity in the calling subroutine. For example,
ilaenv is used to retrieve the optimal blocksize for strtri as follows:

nb := ilaenv( 1, 'strtri', strcat (uplo, diag), n, -1, -1, -1> );

if( nb <= 1 ) {
nb := max( 1, n );
}
Below is an example of ilaenv usage in C language:

#include <stdio.h>
#include "mkl.h"

int main(void)
{
int size = 1000;
int ispec = 1;
int dummy = -1;
int blockSize1 = ilaenv(&ispec, "dsytrd", "U", &size, &dummy, &dummy, &dummy);
int blockSize2 = ilaenv(&ispec, "dormtr", "LUN", &size, &size, &dummy, &dummy);
printf("DSYTRD blocksize = %d\n", blockSize1);
printf("DORMTR blocksize = %d\n", blockSize2);
return 0;
}

cmach Specifies the value to be returned by ?lamch:

= 'E' or 'e', val = eps

= 'S' or 's', val = sfmin

= 'B' or 'b', val = base

= 'P' or 'p', val = eps*base

= 'n' or 'n', val = t

= 'R' or 'r', val = rnd

= 'M' or 'm', val = emin

= 'U' or 'u', val = rmin

= 'L' or 'l', val = emax

= 'O' or 'o', val = rmax

where
eps = relative machine precision;
sfmin = safe minimum, such that 1/sfmin does not overflow;
base = base of the machine;
prec = eps*base;
t = number of (base) digits in the mantissa;
rnd = 1.0 when rounding occurs in addition, 0.0 otherwise;
emin = minimum exponent before (gradual) underflow;
rmin = underflow_threshold - base**(emin-1);
emax = largest exponent before overflow;
rmax = overflow_threshold - (base**emax)*(1-eps).

1306
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

NOTE
You can use a character string for cmach instead of a single
character in order to make your code more readable. The first
character of the string determines the value to be returned. For
example, 'Precision' is interpreted as 'p'.

Output Parameters

val Value returned by the function.

LAPACK Test Functions and Routines

This section describes LAPACK test functions and routines.

?lagge
Generates a general m-by-n matrix .

Syntax
lapack_int LAPACKE_slagge (int matrix_layout , lapack_int m , lapack_int n , lapack_int
kl , lapack_int ku , const float * d , float * a , lapack_int lda , lapack_int *
iseed );
lapack_int LAPACKE_dlagge (int matrix_layout , lapack_int m , lapack_int n , lapack_int
kl , lapack_int ku , const double * d , double * a , lapack_int lda , lapack_int *
iseed );
lapack_int LAPACKE_clagge (int matrix_layout , lapack_int m , lapack_int n , lapack_int
kl , lapack_int ku , const float * d , lapack_complex_float * a , lapack_int lda ,
lapack_int * iseed );
lapack_int LAPACKE_zlagge (int matrix_layout , lapack_int m , lapack_int n , lapack_int
kl , lapack_int ku , const double * d , lapack_complex_double * a , lapack_int lda ,
lapack_int * iseed );

Include Files
• mkl.h

Description

The routine generates a general m-by-n matrix A, by pre- and post- multiplying a real diagonal matrix D with
random matrices U and V:
A := U*D*V,
where U and V are orthogonal for real flavors and unitary for complex flavors. The lower and upper
bandwidths may then be reduced to kl and ku by additional orthogonal transformations.

1307
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

m The number of rows of the matrix A (m≥ 0).

n The number of columns of the matrix A (n≥ 0).

kl The number of nonzero subdiagonals within the band of A (0 ≤kl≤m-1).

ku The number of nonzero superdiagonals within the band of A (0 ≤ku≤n-1).

d The array d with the dimension of (min(m, n)) contains the diagonal
elements of the diagonal matrix D.

lda The leading dimension of the array a (lda≥m) for column major layout and
(lda≥n) for row major layout.

iseed The array iseed with the dimension of 4 contains the seed of the random
number generator. The elements must be between 0 and 4095 and iseed
must be odd.

Output Parameters

a The array a with size at least max(1,lda*n) for column major layout and
max(1,lda*m) for row major layout contains the generated m-by-n matrix
A.

iseed The array iseed contains the updated seed on exit.

Return Values
This function returns a value info.
If info = 0, the execution is successful.

If info < 0, the i-th parameter had an illegal value.

If info = -1011, memory allocation error occurred.

?laghe
Generates a complex Hermitian matrix .

Syntax
lapack_int LAPACKE_claghe (int matrix_layout , lapack_int n , lapack_int k , const
float * d , lapack_complex_float * a , lapack_int lda , lapack_int * iseed );
lapack_int LAPACKE_zlaghe (int matrix_layout , lapack_int n , lapack_int k , const
double * d , lapack_complex_double * a , lapack_int lda , lapack_int * iseed );

Include Files
• mkl.h

Description

The routine generates a complex Hermitian matrix A, by pre- and post- multiplying a real diagonal matrix D
with random unitary matrix:
A := U*D*UH
The semi-bandwidth may then be reduced to k by additional unitary transformations.

1308
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major ( LAPACK_COL_MAJOR ).

n The order of the matrix A (n≥ 0).

k The number of nonzero subdiagonals within the band of A (0 ≤k≤n-1).

d The array d with the dimension of (n) contains the diagonal elements of the
diagonal matrix D.

lda The leading dimension of the array a (lda≥n).

iseed The array iseed with the dimension of 4 contains the seed of the random
number generator. The elements must be between 0 and 4095 and
iseed[3] must be odd.

Output Parameters

a The array a of size at least max (1,lda*n) contains the generated n-by-n
Hermitian matrix D.

iseed The array iseed contains the updated seed on exit.

Return Values
This function returns a value info.
If info = 0, the execution is successful.

If info < 0, the i-th parameter had an illegal value.

If info = -1011, memory allocation error occurred.

?lagsy
Generates a symmetric matrix by pre- and post-
multiplying a real diagonal matrix with a random
unitary matrix .

Syntax
lapack_int LAPACKE_slagsy (int matrix_layout , lapack_int n , lapack_int k , const
float * d , float * a , lapack_int lda , lapack_int * iseed );
lapack_int LAPACKE_dlagsy (int matrix_layout , lapack_int n , lapack_int k , const
double * d , double * a , lapack_int lda , lapack_int * iseed );
lapack_int LAPACKE_clagsy (int matrix_layout , lapack_int n , lapack_int k , const
float * d , lapack_complex_float * a , lapack_int lda , lapack_int * iseed );
lapack_int LAPACKE_zlagsy (int matrix_layout , lapack_int n , lapack_int k , const
double * d , lapack_complex_double * a , lapack_int lda , lapack_int * iseed );

Include Files
• mkl.h

1309
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Description

The ?lagsy routine generates a symmetric matrix A by pre- and post- multiplying a real diagonal matrix D
with a random matrix U:
A := U*D*UT,
where U is orthogonal for real flavors and unitary for complex flavors. The semi-bandwidth may then be
reduced to k by additional unitary transformations.

n The order of the matrix A (n≥ 0).

k The number of nonzero subdiagonals within the band of A (0 ≤k≤n-1).

d The array d with the dimension of (n) contains the diagonal elements of the
diagonal matrix D.

lda The leading dimension of the array a (lda≥n).

iseed The array iseed with the dimension of 4 contains the seed of the random
number generator. The elements must be between 0 and 4095 and
iseed[3] must be odd.

Output Parameters

a The array aof size max (1,lda*n) contains the generated symmetric n-by-n
matrix D.

iseed The array iseed contains the updated seed on exit.

Return Values
This function returns a value info.
If info = 0, the execution is successful.

If info < 0, the i-th parameter had an illegal value.

If info = -1011, memory allocation error occurred.

?latms
Generates a general m-by-n matrix with specific
singular values.

Syntax
lapack_int LAPACKE_slatms (int matrix_layout, lapack_int m, lapack_int n, char dist,
lapack_int * iseed, char sym, float * d, lapack_int mode, float cond, float dmax,
lapack_int kl, lapack_int ku, char pack, float * a, lapack_int lda);
lapack_int LAPACKE_dlatms (int matrix_layout, lapack_int m, lapack_int n, char dist,
lapack_int * iseed, char sym, double * d, lapack_int mode, double cond, double dmax,
lapack_int kl, lapack_int ku, char pack, double * a, lapack_int lda);

1310
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_clatms (int matrix_layout, lapack_int m, lapack_int n, char dist,
lapack_int * iseed, char sym, float * d, lapack_int mode, float cond, float dmax,
lapack_int kl, lapack_int ku, char pack, lapack_complex_float * a, lapack_int lda);
lapack_int LAPACKE_zlatms (int matrix_layout, lapack_int m, lapack_int n, char dist,
lapack_int * iseed, char sym, double * d, lapack_int mode, double cond, double dmax,
lapack_int kl, lapack_int ku, char pack, lapack_complex_double * a, lapack_int lda);

Include Files
• mkl.h

Description

The ?latms routine generates random matrices with specified singular values, or symmetric/Hermitian
matrices with specified eigenvalues for testing LAPACK programs.
It applies this sequence of operations:

1. Set the diagonal to d, where d is input or computed according to mode, cond, dmax, and sym as
described in Input Parameters.
2. Generate a matrix with the appropriate band structure, by one of two methods:

Method A 1. Generate a dense m-by-n matrix by multiplying d on the left

and the right by random unitary matrices, then:
2. Reduce the bandwidth according to kl and ku, using
Householder transformations.

Method B: Convert the bandwidth-0 (i.e., diagonal) matrix to a bandwidth-1

matrix using Givens rotations, "chasing" out-of-band elements
back, much as in QR; then convert the bandwidth-1 to a
bandwidth-2 matrix, etc.
Note that for reasonably small bandwidths (relative to m and n)
this requires less storage, as a dense matrix is not generated.
Also, for symmetric or Hermitian matrices, only one triangle is
generated.

Method A is chosen if the bandwidth is a large fraction of the order of the matrix, and lda is at least m (so a
dense matrix can be stored.) Method B is chosen if the bandwidth is small (less than (1/2)*n for symmetric
or Hermitian or less than .3*n+m for nonsymmetric), or lda is less than m and not less than the bandwidth.

Pack the matrix if desired, using one of the methods specified by the pack parameter.

If Method B is chosen and band format is specified, then the matrix is generated in the band format and no
repacking is necessary.

matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major ( LAPACK_COL_MAJOR ).

m The number of rows of the matrix A (m≥ 0).

n The number of columns of the matrix A (n≥ 0).

1311
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

dist Specifies the type of distribution to be used to generate the random

singular values or eigenvalues:

• 'U': uniform distribution (0, 1)

• 'S': symmetric uniform distribution (-1, 1)
• 'N': normal distribution (0, 1)

iseed Array with size 4.

Specifies the seed of the random number generator. Values should lie
between 0 and 4095 inclusive, and iseed[3] should be odd. The random
number generator uses a linear congruential sequence limited to small
integers, and so should produce machine independent random numbers.
The values of the array are modified, and can be used in the next call
to ?latms to continue the same random number sequence.

sym If sym='S' or 'H', the generated matrix is symmetric or Hermitian, with

eigenvalues specified by d, cond, mode, and dmax; they can be positive,
negative, or zero.
If sym='P', the generated matrix is symmetric or Hermitian, with
eigenvalues (which are singular, non-negative values) specified by d, cond,
mode, and dmax.
If sym='N', the generated matrix is nonsymmetric, with singular, non-
negative values specified by d, cond, mode, and dmax.

d Array, size (MIN(m , n))

This array is used to specify the singular values or eigenvalues of A (see the
description of sym). If mode=0, then d is assumed to contain the
eigenvalues or singular values, otherwise elements of d are computed
according to mode, cond, and dmax.

mode Describes how the singular/eigenvalues are specified.

• mode= 0: use d as input

• mode= 1: set d[0] = 1 and d[1:n - 1] = 1.0/cond
• mode= 2: set d[0:n - 2] = 1 and d[n - 1] = 1.0/cond
• mode= 3: set d[i] = cond-i/(n - 1)
• mode= 4: set d[i] = 1 - i/(n - 1)*(1 - 1/cond)
• mode= 5: set elements of d to random numbers in the range (1/cond ,
1) such that their logarithms are uniformly distributed.
• mode = 6: set elements of d to random numbers from same distribution
as the rest of the matrix.

mode < 0 has the same meaning as ABS(mode), except that the order of the
elements of d is reversed. Thus, if mode is positive, d has entries ranging
from 1 to 1/cond, if negative, from 1/cond to 1.

If sym='S' or 'H', and mode is not 0, 6, nor -6, then the elements of d are
also given a random sign (multiplied by +1 or -1).

cond Used in setting d as described for the mode parameter. If used, cond≥ 1.

dmax If mode is not -6, 0 nor 6, the contents of d, as computed according to mode
and cond, are scaled by dmax / max(abs(d[i-1])); thus, the maximum
absolute eigenvalue or singular value (the norm) is abs(dmax).

1312
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

NOTE
dmax need not be positive: if dmax is negative (or zero), d will be
scaled by a negative number (or zero).

kl Specifies the lower bandwidth of the matrix. For example, kl=0 implies
upper triangular, kl=1 implies upper Hessenberg, and kl being at least m -
1 means that the matrix has full lower bandwidth. kl must equal ku if the
matrix is symmetric or Hermitian.

ku Specifies the upper bandwidth of the matrix. For example, ku=0 implies
lower triangular, ku=1 implies lower Hessenberg, and ku being at least n -
1 means that the matrix has full upper bandwidth. kl must equal ku if the
matrix is symmetric or Hermitian.

pack Specifies packing of matrix:

• 'N': no packing
• 'U': zero out all subdiagonal entries (if symmetric or Hermitian)
• 'L': zero out all superdiagonal entries (if symmetric or Hermitian)
• 'B': store the lower triangle in band storage scheme (only if matrix
symmetric, Hermitian, or lower triangular)
• 'Q': store the upper triangle in band storage scheme (only if matrix
symmetric, Hermitian, or upper triangular)
• 'Z': store the entire matrix in band storage scheme (pivoting can be
provided for by using this option to store A in the trailing rows of the
allocated storage)

Using these options, the various LAPACK packed and banded storage
schemes can be obtained:

'Z' 'B' 'Q' 'C' 'R'

GB: general band x

PB: symmetric positive definite band x x

SB: symmetric band x x

HB: Hermitian band x x

TB: triangular band x x

PP: symmetric positive definite packed x x

SP: symmetric packed x x

HP: Hermitian packed x x

TP: triangular packed x x

If two calls to ?latms differ only in the pack parameter, they generate
mathematically equivalent matrices.

lda lda specifies the first dimension of a as declared in the calling program.

1313
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If pack='N', 'U', 'L', 'C', or 'R', then lda must be at least m for column major
or at least n for row major.

If pack='B' or 'Q', then lda must be at least MIN(kl, m - 1) (which is

equal to MIN(ku,n - 1)).

If pack='Z', lda must be large enough to hold the packed array: MIN( ku,
n - 1) + MIN( kl, m - 1) + 1.

Output Parameters

iseed The array iseed contains the updated seed.

d The array d contains the updated seed.

NOTE
The array d is not modified if mode = 0.

a Array of size lda by n.

The array a contains the generated m-by-n matrix A.

a is first generated in full (unpacked) form, and then packed, if so specified
by pack. Thus, the first m elements of the first n columns are always
modified. If pack specifies a packed or banded storage scheme, all lda
elements of the first n columns are modified; the elements of the array
which do not correspond to elements of the generated matrix are set to
zero.

Return Values
This function returns a value info.
If info = 0, the execution is successful.

If info < 0, the i-th parameter had an illegal value.

If info = -1011, memory allocation error occurred.

If info = 2, cannot scale to dmax (maximum singular value is 0).

If info = 3, error return from lagge, ?laghe, or lagsy.

Additional LAPACK Routines (Included for Compatibility with Netlib LAPACK)

LAPACK_DECL lapack_int LAPACKE_chesv_aa_2stage (int matrix_layout , char uplo ,
lapack_int n , lapack_int nrhs , lapack_complex_float * a , lapack_int lda ,
lapack_complex_float * tb , lapack_int ltb , lapack_int * ipiv , lapack_int * ipiv2 ,
lapack_complex_float * b , lapack_int ldb );
LAPACK_DECL lapack_int LAPACKE_dsysv_aa_2stage (int matrix_layout , char uplo ,
lapack_int n , lapack_int nrhs , double * a , lapack_int lda , double * tb , lapack_int
ltb , lapack_int * ipiv , lapack_int * ipiv2 , double * b , lapack_int ldb );
LAPACK_DECL lapack_int LAPACKE_ssysv_aa_2stage (int matrix_layout , char uplo ,
lapack_int n , lapack_int nrhs , float * a , lapack_int lda , float * tb , lapack_int
ltb , lapack_int * ipiv , lapack_int * ipiv2 , float * b , lapack_int ldb );

1314
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
LAPACK_DECL lapack_int LAPACKE_zhesv_aa_2stage (int matrix_layout , char uplo ,
lapack_int n , lapack_int nrhs , lapack_complex_double * a , lapack_int lda ,
lapack_complex_double * tb , lapack_int ltb , lapack_int * ipiv , lapack_int * ipiv2 ,
lapack_complex_double * b , lapack_int ldb );
LAPACK_DECL lapack_int LAPACKE_chetrf_aa_2stage (int matrix_layout , char uplo ,
lapack_int n , lapack_complex_float * a , lapack_int lda , lapack_complex_float * tb ,
lapack_int ltb , lapack_int * ipiv , lapack_int * ipiv2 );
LAPACK_DECL lapack_int LAPACKE_dsytrf_aa_2stage (int matrix_layout , char uplo ,
lapack_int n , double * a , lapack_int lda , double * tb , lapack_int ltb , lapack_int
* ipiv , lapack_int * ipiv2 );
LAPACK_DECL lapack_int LAPACKE_ssytrf_aa_2stage (int matrix_layout , char uplo ,
lapack_int n , float * a , lapack_int lda , float * tb , lapack_int ltb , lapack_int *
ipiv , lapack_int * ipiv2 );
LAPACK_DECL lapack_int LAPACKE_zhetrf_aa_2stage (int matrix_layout , char uplo ,
lapack_int n , lapack_complex_double * a , lapack_int lda , lapack_complex_double *
tb , lapack_int ltb , lapack_int * ipiv , lapack_int * ipiv2 );
LAPACK_DECL lapack_int LAPACKE_chetrs_aa_2stage (int matrix_layout , char uplo ,
lapack_int n , lapack_int nrhs , lapack_complex_float * a , lapack_int lda ,
lapack_complex_float * tb , lapack_int ltb , lapack_int * ipiv , lapack_int * ipiv2 ,
lapack_complex_float * b , lapack_int ldb );
LAPACK_DECL lapack_int LAPACKE_dsytrs_aa_2stage (int matrix_layout , char uplo ,
lapack_int n , lapack_int nrhs , double * a , lapack_int lda , double * tb , lapack_int
ltb , lapack_int * ipiv , lapack_int * ipiv2 , double * b , lapack_int ldb );
LAPACK_DECL lapack_int LAPACKE_ssytrs_aa_2stage (int matrix_layout , char uplo ,
lapack_int n , lapack_int nrhs , float * a , lapack_int lda , float * tb , lapack_int
ltb , lapack_int * ipiv , lapack_int * ipiv2 , float * b , lapack_int ldb );
LAPACK_DECL lapack_int LAPACKE_zhetrs_aa_2stage (int matrix_layout , char uplo ,
lapack_int n , lapack_int nrhs , lapack_complex_double * a , lapack_int lda ,
lapack_complex_double * tb , lapack_int ltb , lapack_int * ipiv , lapack_int * ipiv2 ,
lapack_complex_double * b , lapack_int ldb );
call csysv_aa_2stage (uplo , n , nrhs , a , lda , tb , ltb , ipiv , ipiv2 , b , ldb ,
info);
LAPACK_DECL lapack_int LAPACKE_csysv_aa_2stage (int matrix_layout , char uplo ,
lapack_int n , lapack_int nrhs , lapack_complex_float * a , lapack_int lda ,
lapack_complex_float * tb , lapack_int ltb , lapack_int * ipiv , lapack_int * ipiv2 ,
lapack_complex_float * b , lapack_int ldb );
call zsysv_aa_2stage (uplo , n , nrhs , a , lda , tb , ltb , ipiv , ipiv2 , b , ldb ,
info);
LAPACK_DECL lapack_int LAPACKE_zsysv_aa_2stage (int matrix_layout , char uplo ,
lapack_int n , lapack_int nrhs , lapack_complex_double * a , lapack_int lda ,
lapack_complex_double * tb , lapack_int ltb , lapack_int * ipiv , lapack_int * ipiv2 ,
lapack_complex_double * b , lapack_int ldb );
LAPACK_DECL lapack_int LAPACKE_csytrf_aa_2stage (int matrix_layout , char uplo ,
lapack_int n , lapack_complex_float * a , lapack_int lda , lapack_complex_float * tb ,
lapack_int ltb , lapack_int * ipiv , lapack_int * ipiv2 );
LAPACK_DECL lapack_int LAPACKE_zsytrf_aa_2stage (int matrix_layout , char uplo ,
lapack_int n , lapack_complex_double * a , lapack_int lda , lapack_complex_double *
tb , lapack_int ltb , lapack_int * ipiv , lapack_int * ipiv2 );

1315
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

LAPACK_DECL lapack_int LAPACKE_csytrs_aa_2stage (int matrix_layout , char uplo ,

lapack_int n , lapack_int nrhs , lapack_complex_float * a , lapack_int lda ,
lapack_complex_float * tb , lapack_int ltb , lapack_int * ipiv , lapack_int * ipiv2 ,
lapack_complex_float * b , lapack_int ldb );
LAPACK_DECL lapack_int LAPACKE_zsytrs_aa_2stage (int matrix_layout , char uplo ,
lapack_int n , lapack_int nrhs , lapack_complex_double * a , lapack_int lda ,
lapack_complex_double * tb , lapack_int ltb , lapack_int * ipiv , lapack_int * ipiv2 ,
lapack_complex_double * b , lapack_int ldb );
LAPACK_DECL lapack_int LAPACKE_ssyev_2stage (int matrix_layout, char jobz, char uplo,
lapack_int n, float * a, lapack_int lda, float * w);
LAPACK_DECL lapack_int LAPACKE_dsyev_2stage (int matrix_layout, char jobz, char uplo,
lapack_int n, double * a, lapack_int lda, double * w);
LAPACK_DECL lapack_int LAPACKE_ssyevd_2stage (int matrix_layout, char jobz, char uplo,
lapack_int n, float * a, lapack_int lda, float * w);
LAPACK_DECL lapack_int LAPACKE_dsyevd_2stage (int matrix_layout, char jobz, char uplo,
lapack_int n, double * a, lapack_int lda, double * w);
LAPACK_DECL lapack_int LAPACKE_ssyevr_2stage (int matrix_layout, char jobz, char range,
char uplo, lapack_int n, float * a, lapack_int lda, float vl, float vu, lapack_int il,
lapack_int iu, float abstol, lapack_int * m, float * w, float * z, lapack_int ldz,
lapack_int * isuppz);
LAPACK_DECL lapack_int LAPACKE_dsyevr_2stage (int matrix_layout, char jobz, char range,
char uplo, lapack_int n, double * a, lapack_int lda, double vl, double vu, lapack_int
il, lapack_int iu, double abstol, lapack_int * m, double * w, double * z, lapack_int
ldz, lapack_int * isuppz);
LAPACK_DECL lapack_int LAPACKE_ssyevx_2stage (int matrix_layout, char jobz, char range,
char uplo, lapack_int n, float * a, lapack_int lda, float vl, float vu, lapack_int il,
lapack_int iu, float abstol, lapack_int * m, float * w, float * z, lapack_int ldz,
lapack_int * ifail);
LAPACK_DECL lapack_int LAPACKE_dsyevx_2stage (int matrix_layout, char jobz, char range,
char uplo, lapack_int n, double * a, lapack_int lda, double vl, double vu, lapack_int
il, lapack_int iu, double abstol, lapack_int * m, double * w, double * z, lapack_int
ldz, lapack_int * ifail);
LAPACK_DECL lapack_int LAPACKE_ssygv_2stage (int matrix_layout, lapack_int itype, char
jobz, char uplo, lapack_int n, float * a, lapack_int lda, float * b, lapack_int ldb,
float * w);
LAPACK_DECL lapack_int LAPACKE_dsygv_2stage (int matrix_layout, lapack_int itype, char
jobz, char uplo, lapack_int n, double * a, lapack_int lda, double * b, lapack_int ldb,
double * w);
LAPACK_DECL lapack_int LAPACKE_cheev_2stage (int matrix_layout, char jobz, char uplo,
lapack_int n, lapack_complex_float * a, lapack_int lda, float * w);
LAPACK_DECL lapack_int LAPACKE_zheev_2stage (int matrix_layout, char jobz, char uplo,
lapack_int n, lapack_complex_double * a, lapack_int lda, double * w);
LAPACK_DECL lapack_int LAPACKE_cheevd_2stage (int matrix_layout, char jobz, char uplo,
lapack_int n, lapack_complex_float * a, lapack_int lda, float * w);
LAPACK_DECL lapack_int LAPACKE_zheevd_2stage (int matrix_layout, char jobz, char uplo,
lapack_int n, lapack_complex_double * a, lapack_int lda, double * w);
LAPACK_DECL lapack_int LAPACKE_cheevr_2stage (int matrix_layout, char jobz, char range,
char uplo, lapack_int n, lapack_complex_float * a, lapack_int lda, float vl, float vu,
lapack_int il, lapack_int iu, float abstol, lapack_int * m, float * w,
lapack_complex_float * z, lapack_int ldz, lapack_int * isuppz);

1316
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
LAPACK_DECL lapack_int LAPACKE_zheevr_2stage (int matrix_layout, char jobz, char range,
char uplo, lapack_int n, lapack_complex_double * a, lapack_int lda, double vl, double
vu, lapack_int il, lapack_int iu, double abstol, lapack_int * m, double * w,
lapack_complex_double * z, lapack_int ldz, lapack_int * isuppz);
LAPACK_DECL lapack_int LAPACKE_cheevx_2stage (int matrix_layout, char jobz, char range,
char uplo, lapack_int n, lapack_complex_float * a, lapack_int lda, float vl, float vu,
lapack_int il, lapack_int iu, float abstol, lapack_int * m, float * w,
lapack_complex_float * z, lapack_int ldz, lapack_int * ifail);
LAPACK_DECL lapack_int LAPACKE_zheevx_2stage (int matrix_layout, char jobz, char range,
char uplo, lapack_int n, lapack_complex_double * a, lapack_int lda, double vl, double
vu, lapack_int il, lapack_int iu, double abstol, lapack_int * m, double * w,
lapack_complex_double * z, lapack_int ldz, lapack_int * ifail);
LAPACK_DECL lapack_int LAPACKE_chegv_2stage (int matrix_layout, lapack_int itype, char
jobz, char uplo, lapack_int n, lapack_complex_float * a, lapack_int lda,
lapack_complex_float * b, lapack_int ldb, float * w);
LAPACK_DECL lapack_int LAPACKE_zhegv_2stage (int matrix_layout, lapack_int itype, char
jobz, char uplo, lapack_int n, lapack_complex_double * a, lapack_int lda,
lapack_complex_double * b, lapack_int ldb, double * w);
LAPACK_DECL lapack_int LAPACKE_ssbev_2stage (int matrix_layout, char jobz, char uplo,
lapack_int n, lapack_int kd, float * ab, lapack_int ldab, float * w, float * z,
lapack_int ldz);
LAPACK_DECL lapack_int LAPACKE_dsbev_2stage (int matrix_layout, char jobz, char uplo,
lapack_int n, lapack_int kd, double * ab, lapack_int ldab, double * w, double * z,
lapack_int ldz);
LAPACK_DECL lapack_int LAPACKE_ssbevd_2stage (int matrix_layout, char jobz, char uplo,
lapack_int n, lapack_int kd, float * ab, lapack_int ldab, float * w, float * z,
lapack_int ldz);
LAPACK_DECL lapack_int LAPACKE_dsbevd_2stage (int matrix_layout, char jobz, char uplo,
lapack_int n, lapack_int kd, double * ab, lapack_int ldab, double * w, double * z,
lapack_int ldz);
LAPACK_DECL lapack_int LAPACKE_ssbevx_2stage (int matrix_layout, char jobz, char range,
char uplo, lapack_int n, lapack_int kd, float * ab, lapack_int ldab, float * q,
lapack_int ldq, float vl, float vu, lapack_int il, lapack_int iu, float abstol,
lapack_int * m, float * w, float * z, lapack_int ldz, lapack_int * ifail);
LAPACK_DECL lapack_int LAPACKE_dsbevx_2stage (int matrix_layout, char jobz, char range,
char uplo, lapack_int n, lapack_int kd, double * ab, lapack_int ldab, double * q,
lapack_int ldq, double vl, double vu, lapack_int il, lapack_int iu, double abstol,
lapack_int * m, double * w, double * z, lapack_int ldz, lapack_int * ifail);
LAPACK_DECL lapack_int LAPACKE_chbev_2stage (int matrix_layout, char jobz, char uplo,
lapack_int n, lapack_int kd, lapack_complex_float * ab, lapack_int ldab, float * w,
lapack_complex_float * z, lapack_int ldz);
LAPACK_DECL lapack_int LAPACKE_zhbev_2stage (int matrix_layout, char jobz, char uplo,
lapack_int n, lapack_int kd, lapack_complex_double * ab, lapack_int ldab, double * w,
lapack_complex_double * z, lapack_int ldz);
LAPACK_DECL lapack_int LAPACKE_chbevd_2stage (int matrix_layout, char jobz, char uplo,
lapack_int n, lapack_int kd, lapack_complex_float * ab, lapack_int ldab, float * w,
lapack_complex_float * z, lapack_int ldz);
LAPACK_DECL lapack_int LAPACKE_zhbevd_2stage (int matrix_layout, char jobz, char uplo,
lapack_int n, lapack_int kd, lapack_complex_double * ab, lapack_int ldab, double * w,
lapack_complex_double * z, lapack_int ldz);

1317
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

LAPACK_DECL lapack_int LAPACKE_chbevx_2stage (int matrix_layout, char jobz, char range,

char uplo, lapack_int n, lapack_int kd, lapack_complex_float * ab, lapack_int ldab,
lapack_complex_float * q, lapack_int ldq, float vl, float vu, lapack_int il, lapack_int
iu, float abstol, lapack_int * m, float * w, lapack_complex_float * z, lapack_int ldz,
lapack_int * ifail);
LAPACK_DECL lapack_int LAPACKE_zhbevx_2stage (int matrix_layout, char jobz, char range,
char uplo, lapack_int n, lapack_int kd, lapack_complex_double * ab, lapack_int ldab,
lapack_complex_double * q, lapack_int ldq, double vl, double vu, lapack_int il,
lapack_int iu, double abstol, lapack_int * m, double * w, lapack_complex_double * z,
lapack_int ldz, lapack_int * ifail);
For descriptions of these functions, please see https://www.netlib.org/lapack/explore-html/files.html.

ScaLAPACK Routines
Intel® oneAPI Math Kernel Library implements routines from the ScaLAPACK package for distributed-memory
architectures. Routines are supported for both real and complex dense and band matrices to perform the
tasks of solving systems of linear equations, solving linear least-squares problems, eigenvalue and singular
value problems, as well as performing a number of related computational tasks.
Intel® oneAPI Math Kernel Library (oneMKL) ScaLAPACK routines are written in FORTRAN 77 with exception
of a few utility routines written in C to exploit the IEEE arithmetic. All routines are available in all precision
types: single precision, double precision, complexm, and double complex precision. See
themkl_scalapack.h header file for C declarations of ScaLAPACK routines.

NOTE
ScaLAPACK routines are provided only for Intel® 64 or Intel® Many Integrated Core architectures.

See descriptions of ScaLAPACK computational routines that perform distinct computational tasks, as well as
driver routinesfor solving standard types of problems in one call. Additionally, Intel® oneAPI Math Kernel
Library implements ScaLAPACKAuxiliary Routines, Utility Functions and Routines, and Matrix Redistribution/
Copy Routines. The library includes routines for both real and complex data.
The <install_directory>/examples/scalapackf directory contains sample code demonstrating the use
of ScaLAPACK routines.
Generally, ScaLAPACK runs on a network of computers using MPI as a message-passing layer and a set of
prebuilt communication subprograms (BLACS), as well as a set of BLAS optimized for the target architecture.
Intel® oneAPI Math Kernel Library (oneMKL) version of ScaLAPACK is optimized for Intel® processors. For the
detailed system and environment requirements, seeIntel® oneAPI Math Kernel Library (oneMKL) Release
Notes and Intel® oneAPI Math Kernel Library (oneMKL) Developer Guide.
For full reference on ScaLAPACK routines and related information, see [SLUG].

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Overview of ScaLAPACK Routines

The model of the computing environment for ScaLAPACK is represented as a one-dimensional array of
processes (for operations on band or tridiagonal matrices) or also a two-dimensional process grid (for
operations on dense matrices). To use ScaLAPACK, all global matrices or vectors should be distributed on this
array or grid prior to calling the ScaLAPACK routines.
ScaLAPACK is closely tied to other components, including BLAS, BLACS, LAPACK, and PBLAS.

1318
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

ScaLAPACK Array Descriptors

ScaLAPACK uses two-dimensional block-cyclic data distribution as a layout for dense matrix computations.
This distribution provides good work balance between available processors, and also allows use of BLAS Level
3 routines for optimal local computations. Information about the data distribution that is required to establish
the mapping between each global matrix and its corresponding process and memory location is contained in
the array called the array descriptor associated with each global matrix. The size of the array descriptor is
denoted as dlen_.
Let A be a two-dimensional block cyclicly distributed matrix with the array descriptor array desca. The
meaning of each array descriptor element depends on the type of the matrix A. The tables "Array descriptor
for dense matrices" and "Array descriptor for narrow-band and tridiagonal matrices" describe the meaning of
each element for the different types of matrices.

Array descriptor for dense matrices (dlen_=9)

Element Stored in Description Element Index
Name Number

dtype_a desca[dtype_] 0
Descriptor type ( =1 for dense matrices).
ctxt_a desca[ctxt_] BLACS context handle for the process grid. 1
m_a desca[m_] Number of rows in the global matrix A. 2
n_a desca[n_] Number of columns in the global matrix A. 3
mb_a desca[mb_] Row blocking factor. 4

1319
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Element Stored in Description Element Index

Name Number

nb_a desca[nb_] Column blocking factor. 5

rsrc_a desca[rsrc_] Process row over which the first row of the 6
global matrix A is distributed.
csrc_a desca[csrc_] Process column over which the first column of 7
the global matrix A is distributed.
lld_a desca[lld_] Leading dimension of the local matrix A. 8

Array descriptor for narrow-band and tridiagonal matrices (dlen_=7)

Element Stored in Description Element Index
Name Number

dtype_a desca[dtype_] Descriptor type 0

• dtype_a=501: 1-by-P grid,

• dtype_a=502: P-by-1 grid.
ctxt_a desca[ctxt_] BLACS context handle indicating the BLACS 1
process grid over which the global matrix A is
distributed. The context itself is global, but the
handle (the integer value) can vary.
n_a desca[n_] The size of the matrix dimension being 2
distributed.
nb_a desca[nb_] The blocking factor used to distribute the 3
distributed dimension of the matrix A.
src_a desca[src_] The process row or column over which the first 4
row or column of the matrix A is distributed.
lld_a desca[lld_] The leading dimension of the local matrix 5
storing the local blocks of the distributed
matrix A. The minimum value of lld_a depends
on dtype_a.

• dtype_a=501: lld_a≥ max(size of

undistributed dimension, 1),
• dtype_a=502: lld_a≥ max(nb_a, 1).
Not Reserved for future use. 6
applicable

Similar notations are used for different matrices. For example: lld_b is the leading dimension of the local
matrix storing the local blocks of the distributed matrix B and dtype_z is the type of the global matrix Z.
The number of rows and columns of a global dense matrix that a particular process in a grid receives after
data distributing is denoted by LOCr() and LOCc(), respectively. To compute these numbers, you can use the
ScaLAPACK tool routine numroc.

After the block-cyclic distribution of global data is done, you may choose to perform an operation on a
submatrix sub(A) of the global matrix A defined by the following 6 values (for dense matrices):

m The number of rows of sub(A)

n The number of columns of sub(A)

a A pointer to the local matrix containing the entire global matrix A

ia The row index of sub(A) in the global matrix A

ja The column index of sub(A) in the global matrix A

1320
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
desca The array descriptor for the global matrix A

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Naming Conventions for ScaLAPACK Routines

For each routine introduced in this chapter, you can use the ScaLAPACK name. The naming convention for
ScaLAPACK routines is similar to that used for LAPACK routines. A general rule is that each routine name in
ScaLAPACK, which has an LAPACK equivalent, is simply the LAPACK name prefixed by initial letter p.
ScaLAPACK names have the structure p?yyzzz or p?yyzz, which is described below.
The initial letter p is a distinctive prefix of ScaLAPACK routines and is present in each such routine.
The second symbol ? indicates the data type:

s real, single precision

d real, double precision

c complex, single precision

z complex, double precision

The second and third letters yy indicate the matrix type as:

ge general

gb general band

gg a pair of general matrices (for a generalized problem)

dt general tridiagonal (diagonally dominant-like)

db general band (diagonally dominant-like)

po symmetric or Hermitian positive-definite

pb symmetric or Hermitian positive-definite band

pt symmetric or Hermitian positive-definite tridiagonal

sy symmetric

st symmetric tridiagonal (real)

he Hermitian

or orthogonal

tr triangular (or quasi-triangular)

tz trapezoidal

un unitary

For computational routines, the last three letters zzz indicate the computation performed and have the same
meaning as for LAPACK routines.
For driver routines, the last two letters zz or three letters zzz have the following meaning:

1321
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

sv a simple driver for solving a linear system

svx an expert driver for solving a linear system

ls a driver for solving a linear least squares problem

ev a simple driver for solving a symmetric eigenvalue problem

evd a simple driver for solving an eigenvalue problem using a divide and conquer
algorithm

evx an expert driver for solving a symmetric eigenvalue problem

svd a driver for computing a singular value decomposition

gvx an expert driver for solving a generalized symmetric definite eigenvalue problem

Simple driver here means that the driver just solves the general problem, whereas an expert driver is more
versatile and can also optionally perform some related computations (such, for example, as refining the
solution and computing error bounds after the linear system is solved).

ScaLAPACK Computational Routines

In the sections that follow, the descriptions of ScaLAPACK computational routines are given. These routines
perform distinct computational tasks that can be used for:
• Solving Systems of Linear Equations
• Orthogonal Factorizations and LLS Problems
• Symmetric Eigenproblems
• Nonsymmetric Eigenproblems
• Singular Value Decomposition
• Generalized Symmetric-Definite Eigenproblems
See also the respective driver routines.

Systems of Linear Equations: ScaLAPACK Computational Routines

ScaLAPACK supports routines for the systems of equations with the following types of matrices:
• general
• general banded
• general diagonally dominant-like banded (including general tridiagonal)
• symmetric or Hermitian positive-definite
• symmetric or Hermitian positive-definite banded
• symmetric or Hermitian positive-definite tridiagonal
A diagonally dominant-like matrix is defined as a matrix for which it is known in advance that pivoting is not
required in the LU factorization of this matrix.
For the above matrix types, the library includes routines for performing the following computations: factoring
the matrix; equilibrating the matrix; solving a system of linear equations; estimating the condition number of
a matrix; refining the solution of linear equations and computing its error bounds; inverting the matrix. Note
that for some of the listed matrix types only part of the computational routines are provided (for example,
routines that refine the solution are not provided for band or tridiagonal matrices). See Table “Computational
Routines for Systems of Linear Equations” for full list of available routines.
To solve a particular problem, you can either call two or more computational routines or call a corresponding
driver routine that combines several tasks in one call. Thus, to solve a system of linear equations with a
general matrix, you can first call p?getrf(LU factorization) and then p?getrs(computing the solution).
Then, you might wish to call p?gerfs to refine the solution and get the error bounds. Alternatively, you can
just use the driver routine p?gesvx which performs all these tasks in one call.

1322
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Table “Computational Routines for Systems of Linear Equations” lists the ScaLAPACK computational routines
for factorizing, equilibrating, and inverting matrices, estimating their condition numbers, solving systems of
equations with real matrices, refining the solution, and estimating its error.
Computational Routines for Systems of Linear Equations
Matrix type, storage Factorize Equilibrate Solve Condition Estimate Invert
scheme matrix matrix system number error matrix
general (partial pivoting) p?getrf p?geequ p?getrs p?gecon p?gerfs p?getri
general band (partial p?gbtrf p?gbtrs
pivoting)
general band (no p?dbtrf p?dbtrs
pivoting)
general tridiagonal (no p?dttrf p?dttrs
pivoting)
symmetric/Hermitian p?potrf p?poequ p?potrs p?pocon p?porfs p?potri
positive-definite
symmetric/Hermitian p?pbtrf p?pbtrs
positive-definite, band
symmetric/Hermitian p?pttrf p?pttrs
positive-definite,
tridiagonal
triangular p?trtrs p?trcon p?trrfs p?trtri
In this table ? stands for s (single precision real), d (double precision real), c (single precision complex), or z
(double precision complex).

Matrix Factorization: ScaLAPACK Computational Routines

This section describes the ScaLAPACK routines for matrix factorization. The following factorizations are
supported:
• LU factorization of general matrices
• LU factorization of diagonally dominant-like matrices
• Cholesky factorization of real symmetric or complex Hermitian positive-definite matrices
You can compute the factorizations using full and band storage of matrices.

p?getrf
Computes the LU factorization of a general m-by-n
distributed matrix.

Syntax
void psgetrf (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *ipiv , MKL_INT *info );
void pdgetrf (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *ipiv , MKL_INT *info );
void pcgetrf (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *ipiv , MKL_INT *info );
void pzgetrf (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *ipiv , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description

1323
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The p?getrffunction forms the LU factorization of a general m-by-n distributed matrix sub(A) = A(ia:ia
+m-1, ja:ja+n-1) as

A = P*L*U
where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m>n)
and U is upper triangular (upper trapezoidal if m < n). L and U are stored in sub(A).

The function uses partial pivoting, with row interchanges.

NOTE
This function supports the Progress Routine feature. See mkl_progress for details.

Input Parameters

m (global) The number of rows in the distributed matrix sub(A); m≥0.

n (global) The number of columns in the distributed matrix sub(A); n≥0.

a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).

Contains the local pieces of the distributed matrix sub(A) to be factored.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

Output Parameters

a Overwritten by local pieces of the factors L and U from the factorization A =

P*L*U. The unit diagonal elements of L are not stored.

ipiv (local) Array of size LOCr(m_a)+ mb_a.

Contains the pivoting information: local row i was interchanged with global
row ipiv[i-1]. This array is tied to the distributed matrix A.

info (global)
If info=0, the execution is successful.

info < 0: if the i-th argument is an array and the j-th entry, indexed j - 1,
had an illegal value, then info = -(i*100+j); if the i-th argument is a
scalar and had an illegal value, then info = -i.

If info = i > 0, uia+i, ja+j-1 is 0. The factorization has been completed,

but the factor U is exactly singular. Division by zero will occur if you use the
factor U for solving a system of linear equations.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?gbtrf
Computes the LU factorization of a general n-by-n
banded distributed matrix.

1324
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void psgbtrf (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , float *a , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *ipiv , float *af , MKL_INT *laf , float *work , MKL_INT
*lwork , MKL_INT *info );
void pdgbtrf (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , double *a , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *ipiv , double *af , MKL_INT *laf , double *work , MKL_INT
*lwork , MKL_INT *info );
void pcgbtrf (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_Complex8 *a , MKL_INT
*ja , MKL_INT *desca , MKL_INT *ipiv , MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8
*work , MKL_INT *lwork , MKL_INT *info );
void pzgbtrf (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_Complex16 *a , MKL_INT
*ja , MKL_INT *desca , MKL_INT *ipiv , MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16
*work , MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?gbtrf function computes the LU factorization of a general n-by-n real/complex banded distributed
matrix A(1:n, ja:ja+n-1) using partial pivoting with row interchanges.

The resulting factorization is not the same factorization as returned from the LAPACK function ?gbtrf.
Additional permutations are performed on the matrix for the sake of parallelism.
The factorization has the form
A(1:n, ja:ja+n-1) = P*L*U*Q

where P and Q are permutation matrices, and L and U are banded lower and upper triangular matrices,
respectively. The matrix Q represents reordering of columns for the sake of parallelism, while P represents
reordering of rows for numerical stability using classic partial pivoting.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

n (global) The number of rows and columns in the distributed submatrix

A(1:n, ja:ja+n-1); n≥ 0.

bwl (global) The number of sub-diagonals within the band of A

( 0 ≤ bwl ≤ n-1 ).

bwu (global) The number of super-diagonals within the band of A

( 0 ≤ bwu ≤ n-1 ).

a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1)
where

1325
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lld_a≥ 2bwl + 2bwu +1.

Contains the local pieces of the n-by-n distributed banded matrix A(1:n,
ja:ja+n-1) to be factored.

ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
If dtype_a = 501, then dlen_≥ 7;

else if dtype_a = 1, then dlen_≥ 9.

laf (local) The size of the array af.

Must be laf≥ (nb_a+bwu)*(bwl+bwu)+6*(bwl+bwu)*(bwl+2*bwu).

If laf is not large enough, an error code will be returned and the minimum
acceptable size will be returned in af[0].

work (local) Same type as a. Workspace array of size lwork.

lwork (local or global) The size of the work array (lwork≥ 1). If lwork is too
small, the minimal acceptable size will be returned in work[0] and an error
code is returned.

Output Parameters

a On exit, this array contains details of the factorization. Note that additional
permutations are performed on the matrix, so that the factors returned are
different from those returned by LAPACK.

ipiv (local) array.

The size of ipiv must be ≥nb_a.

Contains pivot indices for local factorizations. Note that you should not alter
the contents of this array between factorization and solve.

af (local)
Array of size laf.

Auxiliary fill-in space. The fill-in space is created in a call to the factorization
function p?gbtrf and is stored in af.

Note that if a linear system is to be solved using p?gbtrs after the

factorization function,af must not be altered after the factorization.

work[0] On exit, work[0] contains the minimum value of lwork required.

info (global)
If info=0, the execution is successful.

info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.

1326
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
info> 0:
If info = k ≤ NPROCS, the submatrix stored on processor info and
factored locally was not nonsingular, and the factorization was not
completed.
If info = k > NPROCS, the submatrix stored on processor info-NPROCS
representing interactions with other processors was not nonsingular, and
the factorization was not completed.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?dbtrf
Computes the LU factorization of a n-by-n diagonally
dominant-like banded distributed matrix.

Syntax
void psdbtrf (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , float *a , MKL_INT *ja ,
MKL_INT *desca , float *af , MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT
*info );
void pddbtrf (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , double *a , MKL_INT *ja ,
MKL_INT *desca , double *af , MKL_INT *laf , double *work , MKL_INT *lwork , MKL_INT
*info );
void pcdbtrf (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_Complex8 *a , MKL_INT
*ja , MKL_INT *desca , MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
void pzdbtrf (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_Complex16 *a , MKL_INT
*ja , MKL_INT *desca , MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?dbtrffunction computes the LU factorization of a n-by-n real/complex diagonally dominant-like
banded distributed matrix A(1:n, ja:ja+n-1) without pivoting.

NOTE
A matrix is called diagonally dominant-like if pivoting is not required for LU to be
numerically stable.

Note that the resulting factorization is not the same factorization as returned from LAPACK. Additional
permutations are performed on the matrix for the sake of parallelism.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

1327
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

n (global) The number of rows and columns in the distributed submatrix

A(1:n, ja:ja+n-1); n≥ 0.

bwl (global) The number of sub-diagonals within the band of A

(0 ≤ bwl ≤ n-1).

bwu (global) The number of super-diagonals within the band of A

(0 ≤ bwu ≤ n-1).

a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).

Contains the local pieces of the n-by-n distributed banded matrix A(1:n,
ja:ja+n-1) to be factored.

ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
If dtype_a = 501, then dlen_≥ 7;

else if dtype_a = 1, then dlen_≥ 9.

laf (local) The size of the array af.

Must be laf≥NB*(bwl+bwu)+6*(max(bwl,bwu))2 .

If laf is not large enough, an error code will be returned and the minimum
acceptable size will be returned in af[0].

work (local) Workspace array of size lwork.

lwork (local or global) The size of the work array, must be lwork≥
(max(bwl,bwu))2. If lwork is too small, the minimal acceptable size will
be returned in work[0] and an error code is returned.

Output Parameters

a On exit, this array contains details of the factorization. Note that additional
permutations are performed on the matrix, so that the factors returned are
different from those returned by LAPACK.

af (local)
Array of size laf.

Auxiliary fill-in space. The fill-in space is created in a call to the factorization
function p?dbtrf and is stored in af.

Note that if a linear system is to be solved using p?dbtrs after the

factorization function,af must not be altered after the factorization.

work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.

1328
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
info (global)
If info=0, the execution is successful.

info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.

info> 0:
If info = k ≤ NPROCS, the submatrix stored on processor info and
factored locally was not diagonally dominant-like, and the factorization was
not completed.
If info = k > NPROCS, the submatrix stored on processor info-NPROCS
representing interactions with other processors was not nonsingular, and
the factorization was not completed.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?dttrf
Computes the LU factorization of a diagonally
dominant-like tridiagonal distributed matrix.

Syntax
void psdttrf (MKL_INT *n , float *dl , float *d , float *du , MKL_INT *ja , MKL_INT
*desca , float *af , MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT *info );
void pddttrf (MKL_INT *n , double *dl , double *d , double *du , MKL_INT *ja , MKL_INT
*desca , double *af , MKL_INT *laf , double *work , MKL_INT *lwork , MKL_INT *info );
void pcdttrf (MKL_INT *n , MKL_Complex8 *dl , MKL_Complex8 *d , MKL_Complex8 *du ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8 *work ,
MKL_INT *lwork , MKL_INT *info );
void pzdttrf (MKL_INT *n , MKL_Complex16 *dl , MKL_Complex16 *d , MKL_Complex16 *du ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16 *work ,
MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?dttrffunction computes the LU factorization of an n-by-n real/complex diagonally dominant-like
tridiagonal distributed matrix A(1:n, ja:ja+n-1) without pivoting for stability.

The resulting factorization is not the same factorization as returned from LAPACK. Additional permutations
are performed on the matrix for the sake of parallelism.
The factorization has the form:
A(1:n, ja:ja+n-1) = P*L*U*PT,

where P is a permutation matrix, and L and U are banded lower and upper triangular matrices, respectively.

1329
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

n (global) The number of rows and columns to be operated on, that is, the
order of the distributed submatrix A(1:n, ja:ja+n-1) (n≥ 0).

dl, d, du (local)
Pointers to the local arrays of size nb_a each.

On entry, the array dl contains the local part of the global vector storing
the subdiagonal elements of the matrix. Globally, dl[0] is not referenced,
and dl must be aligned with d.

On entry, the array d contains the local part of the global vector storing the
diagonal elements of the matrix.
On entry, the array du contains the local part of the global vector storing
the super-diagonal elements of the matrix. du[n-1] is not referenced, and
du must be aligned with d.

ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
If dtype_a = 501, then dlen_≥ 7;

else if dtype_a = 1, then dlen_≥ 9.

laf (local) The size of the array af.

Must be laf≥ 2*(NB+2) .

If laf is not large enough, an error code will be returned and the minimum
acceptable size will be returned in af[0].

work (local) Same type as d. Workspace array of size lwork.

lwork (local or global) The size of the work array, must be at least lwork≥
8*NPCOL.

Output Parameters

dl, d, du On exit, overwritten by the information containing the factors of the matrix.

af (local)
Array of size laf.

Auxiliary fill-in space. The fill-in space is created in a call to the factorization
function p?dttrf and is stored in af.

1330
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Note that if a linear system is to be solved using p?dttrs after the
factorization function,af must not be altered.

work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.

info (global)
If info=0, the execution is successful.

info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?potrf
Computes the Cholesky factorization of a symmetric
(Hermitian) positive-definite distributed matrix.

Syntax
void pspotrf (char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *info );
void pdpotrf (char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *info );
void pcpotrf (char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *info );
void pzpotrf (char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?potrffunction computes the Cholesky factorization of a real symmetric or complex Hermitian positive-
definite distributed n-by-n matrix A(ia:ia+n-1, ja:ja+n-1), denoted below as sub(A).

The factorization has the form

sub(A) = UH*U if uplo='U', or

sub(A) = L*LH if uplo='L'

where L is a lower triangular matrix and U is upper triangular.

1331
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

uplo (global)
Indicates whether the upper or lower triangular part of sub(A) is stored.
Must be 'U' or 'L'.

If uplo = 'U', the array a stores the upper triangular part of the matrix
sub(A) that is factored as UH*U.
If uplo = 'L', the array a stores the lower triangular part of the
matrix sub(A) that is factored as L*LH.
n (global) The order of the distributed matrix sub(A) (n≥0).

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1).

On entry, this array contains the local pieces of the n-by-n symmetric/
Hermitian distributed matrix sub(A) to be factored.
Depending on uplo, the array a contains either the upper or the lower
triangular part of the matrix sub(A) (see uplo).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

Output Parameters

a The upper or lower triangular part of a is overwritten by the Cholesky factor

U or L, as specified by uplo.

info (global) .
If info=0, the execution is successful;

info < 0: if the i-th argument is an array, and the j-th entry, indexed j -
1, had an illegal value, then info = -(i*100+j); if the i-th argument is a
scalar and had an illegal value, then info = -i.

If info = k >0, the leading minor of order k, A(ia:ia+k-1, ja:ja+k-1), is

not positive-definite, and the factorization could not be completed.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?pbtrf
Computes the Cholesky factorization of a symmetric
(Hermitian) positive-definite banded distributed
matrix.

Syntax
void pspbtrf (char *uplo , MKL_INT *n , MKL_INT *bw , float *a , MKL_INT *ja , MKL_INT
*desca , float *af , MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT *info );
void pdpbtrf (char *uplo , MKL_INT *n , MKL_INT *bw , double *a , MKL_INT *ja , MKL_INT
*desca , double *af , MKL_INT *laf , double *work , MKL_INT *lwork , MKL_INT *info );

1332
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void pcpbtrf (char *uplo , MKL_INT *n , MKL_INT *bw , MKL_Complex8 *a , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
void pzpbtrf (char *uplo , MKL_INT *n , MKL_INT *bw , MKL_Complex16 *a , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?pbtrffunction computes the Cholesky factorization of an n-by-n real symmetric or complex Hermitian
positive-definite banded distributed matrix A(1:n, ja:ja+n-1).

A(1:n, ja:ja+n-1) = PLLH*PT, if uplo='L',

where P is a permutation matrix and U and L are banded upper and lower triangular matrices, respectively.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

uplo (global) Must be 'U' or 'L'.

If uplo = 'U', upper triangle of A(1:n, ja:ja+n-1) is stored;

If uplo = 'L', lower triangle of A(1:n, ja:ja+n-1) is stored.

n (global) The order of the distributed submatrix A(1:n, ja:ja+n-1).

(n≥0).

bw (global)
The number of superdiagonals of the distributed matrix if uplo = 'U', or
the number of subdiagonals if uplo = 'L' (bw≥0).

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1).

On entry, this array contains the local pieces of the upper or lower triangle
of the symmetric/Hermitian band distributed matrix A(1:n, ja:ja+n-1) to
be factored.

ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).

1333
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
If dtype_a = 501, then dlen_≥ 7;

else if dtype_a = 1, then dlen_≥ 9.

laf (local) The size of the array af.

Must be laf≥ (NB+2bw)bw.

If laf is not large enough, an error code will be returned and the minimum
acceptable size will be returned in af[0].

work (local) Workspace array of size lwork.

lwork (local or global) The size of the work array, must be lwork≥bw2.

Output Parameters

a On exit, if info=0, contains the permuted triangular factor U or L from the

Cholesky factorization of the band matrix A(1:n, ja:ja+n-1), as specified
by uplo.

af (local)
Array of size laf. Auxiliary fill-in space. The fill-in space is created in a call
to the factorization function p?pbtrf and stored in af. Note that if a linear
system is to be solved using p?pbtrs after the factorization function,af
must not be altered.

work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.

info (global)
If info=0, the execution is successful.

info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.

info>0:
If info = k ≤ NPROCS, the submatrix stored on processor info and
factored locally was not positive definite, and the factorization was not
completed.
If info = k > NPROCS, the submatrix stored on processor info-NPROCS
representing interactions with other processors was not nonsingular, and
the factorization was not completed.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

1334
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
p?pttrf
Computes the Cholesky factorization of a symmetric
(Hermitian) positive-definite tridiagonal distributed
matrix.

Syntax
void pspttrf (MKL_INT *n , float *d , float *e , MKL_INT *ja , MKL_INT *desca , float
*af , MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT *info );
void pdpttrf (MKL_INT *n , double *d , double *e , MKL_INT *ja , MKL_INT *desca , double
*af , MKL_INT *laf , double *work , MKL_INT *lwork , MKL_INT *info );
void pcpttrf (MKL_INT *n , float *d , MKL_Complex8 *e , MKL_INT *ja , MKL_INT *desca ,
MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT
*info );
void pzpttrf (MKL_INT *n , double *d , MKL_Complex16 *e , MKL_INT *ja , MKL_INT
*desca , MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16 *work , MKL_INT *lwork ,
MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?pttrffunction computes the Cholesky factorization of an n-by-n real symmetric or complex hermitian
positive-definite tridiagonal distributed matrix A(1:n, ja:ja+n-1).

A(1:n, ja:ja+n-1) = PUHDUPT,

where P is a permutation matrix, and U and L are tridiagonal upper and lower triangular matrices,
respectively.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

n (global) The order of the distributed submatrix A(1:n, ja:ja+n-1)

(n≥ 0).

d, e (local)
Pointers into the local memory to arrays of size nb_a each.

On entry, the array d contains the local part of the global vector storing the
main diagonal of the distributed matrix A.

1335
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

On entry, the array e contains the local part of the global vector storing the
upper diagonal of the distributed matrix A.

ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).

desca (global and local ) array of size dlen_. The array descriptor for the
distributed matrix A.
If dtype_a = 501, then dlen_≥ 7;

else if dtype_a = 1, then dlen_≥ 9.

laf (local) The size of the array af.

Must be laf≥nb_a+2.

If laf is not large enough, an error code will be returned and the minimum
acceptable size will be returned in af[0].

work (local) Workspace array of size lwork .

lwork (local or global) The size of the work array, must be at least

lwork≥ 8*NPCOL.

Output Parameters

d, e On exit, overwritten by the details of the factorization.

af (local)
Array of size laf.

Auxiliary fill-in space. The fill-in space is created in a call to the factorization
function p?pttrf and stored in af.

Note that if a linear system is to be solved using p?pttrs after the

factorization function,af must not be altered.

work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.

info (global)
If info=0, the execution is successful.

info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.

info> 0:
If info = k ≤ NPROCS, the submatrix stored on processor info and
factored locally was not positive definite, and the factorization was not
completed.
If info = k > NPROCS, the submatrix stored on processor info-NPROCS
representing interactions with other processors was not nonsingular, and
the factorization was not completed.

1336
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

Solving Systems of Linear Equations: ScaLAPACK Computational Routines

This section describes the ScaLAPACK routines for solving systems of linear equations. Before calling most of
these routines, you need to factorize the matrix of your system of equations (see Routines for Matrix
Factorization in this chapter). However, the factorization is not necessary if your system of equations has a
triangular matrix.

p?getrs
Solves a system of distributed linear equations with a
general square matrix, using the LU factorization
computed by p?getrf.

Syntax
void psgetrs (char *trans , MKL_INT *n , MKL_INT *nrhs , float *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , float *b , MKL_INT *ib , MKL_INT *jb ,
MKL_INT *descb , MKL_INT *info );
void pdgetrs (char *trans , MKL_INT *n , MKL_INT *nrhs , double *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , double *b , MKL_INT *ib , MKL_INT *jb ,
MKL_INT *descb , MKL_INT *info );
void pcgetrs (char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , MKL_Complex8 *b , MKL_INT *ib ,
MKL_INT *jb , MKL_INT *descb , MKL_INT *info );
void pzgetrs (char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , MKL_Complex16 *b , MKL_INT *ib ,
MKL_INT *jb , MKL_INT *descb , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?getrsfunction solves a system of distributed linear equations with a general n-by-n distributed matrix
sub(A) = A(ia:ia+n-1, ja:ja+n-1) using the LU factorization computed by p?getrf.

The system has one of the following forms specified by trans:

sub(A)*X = sub(B) (no transpose),
sub(A)T*X = sub(B) (transpose),
sub(A)H*X = sub(B) (conjugate transpose),
where sub(B) = B(ib:ib+n-1, jb:jb+nrhs-1).

Before calling this function,you must call p?getrf to compute the LU factorization of sub(A).

Input Parameters

trans (global) Must be 'N' or 'T' or 'C'.

Indicates the form of the equations:

If trans = 'N', then sub(A)*X = sub(B) is solved for X.

If trans = 'T', then sub(A)T*X = sub(B) is solved for X.

1337
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If trans = 'C', then sub(A)H *X = sub(B) is solved for X.

n (global) The number of linear equations; the order of the matrix sub(A)
(n≥0).

nrhs (global) The number of right hand sides; the number of columns of the
distributed matrix sub(B) (nrhs≥0).

a, b (local)
Pointers into the local memory to arrays of local sizes lld_a*LOCc(ja+n-1)
and lld_b*LOCc(jb+nrhs-1), respectively.

On entry, the array a contains the local pieces of the factors L and U from
the factorization sub(A) = P*L*U; the unit diagonal elements of L are not
stored. On entry, the array b contains the right hand sides sub(B).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

ipiv (local) Array of size of LOCr(m_a) + mb_a. Contains the pivoting

information: local row i of the matrix was interchanged with the global row
ipiv[i-1].
This array is tied to the distributed matrix A.

ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of the matrix sub(B), respectively.

descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.

Output Parameters

b On exit, overwritten by the solution distributed matrix X.

info If info=0, the execution is successful. info < 0:

If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?gbtrs
Solves a system of distributed linear equations with a
general band matrix, using the LU factorization
computed by p?gbtrf.

Syntax
void psgbtrs (char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs ,
float *a , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , float *b , MKL_INT *ib ,
MKL_INT *descb , float *af , MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT
*info );

1338
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void pdgbtrs (char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs ,
double *a , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , double *b , MKL_INT *ib ,
MKL_INT *descb , double *af , MKL_INT *laf , double *work , MKL_INT *lwork , MKL_INT
*info );
void pcgbtrs (char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs ,
MKL_Complex8 *a , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , MKL_Complex8 *b ,
MKL_INT *ib , MKL_INT *descb , MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8 *work ,
MKL_INT *lwork , MKL_INT *info );
void pzgbtrs (char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs ,
MKL_Complex16 *a , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , MKL_Complex16 *b ,
MKL_INT *ib , MKL_INT *descb , MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16 *work ,
MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?gbtrs function solves a system of distributed linear equations with a general band distributed matrix
sub(A) = A(1:n, ja:ja+n-1) using the LU factorization computed by p?gbtrf.

The system has one of the following forms specified by trans:

sub(A)*X = sub(B) (no transpose),

sub(A)T*X = sub(B) (transpose),
sub(A)H*X = sub(B) (conjugate transpose),
where sub(B) = B(ib:ib+n-1, 1:nrhs).

Before calling this function,you must call p?gbtrf to compute the LU factorization of sub(A).

Input Parameters

trans (global) Must be 'N' or 'T' or 'C'.

Indicates the form of the equations:

If trans = 'N', then sub(A)*X = sub(B) is solved for X.

If trans = 'T', then sub(A)T*X = sub(B) is solved for X.

If trans = 'C', then sub(A)H *X = sub(B) is solved for X.

n (global) The number of linear equations; the order of the distributed matrix
sub(A) (n≥ 0).

bwl (global) The number of sub-diagonals within the band of A( 0 ≤ bwl ≤

n-1 ).

bwu (global) The number of super-diagonals within the band of A( 0 ≤ bwu ≤

n-1 ).

nrhs (global) The number of right hand sides; the number of columns of the
distributed matrix sub(B) (nrhs≥ 0).

a, b (local)

1339
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Pointers into the local memory to arrays of local sizes lld_a*LOCc(ja+n-1)

and lld_b*LOCc(nrhs), respectively.

The array a contains details of the LU factorization of the distributed band

matrix A.
On entry, the array b contains the local pieces of the right hand sides
B(ib:ib+n-1, 1:nrhs).

ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on ( which may be either all of A or a submatrix of A).

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
If dtype_a = 501, then dlen_≥ 7;

else if dtype_a = 1, then dlen_≥ 9.

ib (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).

descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
If dtype_b = 502, then dlen_≥ 7;

else if dtype_b = 1, then dlen_≥ 9.

laf (local) The size of the array af.

Must be laf≥nb_a*(bwl+bwu)+6*(bwl+bwu)*(bwl+2*bwu).

If laf is not large enough, an error code will be returned and the minimum
acceptable size will be returned in af[0].

work (local) Same type as a. Workspace array of size lwork.

lwork (local or global) The size of the work array, must be at least
lwork≥nrhs*(nb_a+2*bwl+4*bwu).

Output Parameters

ipiv (local) array.

The size of ipiv must be ≥nb_a.

Contains pivot indices for local factorizations. Note that you should not alter
the contents of this array between factorization and solve.

b On exit, overwritten by the local pieces of the solution distributed matrix X.

af (local)
Array of size laf.

Auxiliary Fill-in space. The fill-in space is created in a call to the

factorization function p?gbtrf and is stored in af.

Note that if a linear system is to be solved using p?gbtrs after the

factorization function,af must not be altered after the factorization.

1340
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.

info If info=0, the execution is successful.

info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?dbtrs
Solves a system of linear equations with a diagonally
dominant-like banded distributed matrix using the
factorization computed by p?dbtrf.

Syntax
void psdbtrs (char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs ,
float *a , MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *descb ,
float *af , MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT *info );
void pddbtrs (char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs ,
double *a , MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *descb ,
double *af , MKL_INT *laf , double *work , MKL_INT *lwork , MKL_INT *info );
void pcdbtrs (char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs ,
MKL_Complex8 *a , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib ,
MKL_INT *descb , MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
void pzdbtrs (char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs ,
MKL_Complex16 *a , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib ,
MKL_INT *descb , MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?dbtrsfunction solves for X one of the systems of equations:

sub(A)*X = sub(B),
(sub(A))T*X = sub(B), or
(sub(A))H*X = sub(B),
where sub(A) = A(1:n, ja:ja+n-1) is a diagonally dominant-like banded distributed matrix, and sub(B)
denotes the distributed matrix B(ib:ib+n-1, 1:nrhs).

This function uses the LU factorization computed by p?dbtrf.

Input Parameters

trans (global) Must be 'N' or 'T' or 'C'.

1341
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Indicates the form of the equations:

If trans = 'N', then sub(A)*X = sub(B) is solved for X.

If trans = 'T', then (sub(A))T*X = sub(B) is solved for X.

If trans = 'C', then (sub(A))H*X = sub(B) is solved for X.

n (global) The order of the distributed matrix sub(A) (n≥ 0).

bwl (global) The number of subdiagonals within the band of A

( 0 ≤ bwl ≤ n-1 ).

bwu (global) The number of superdiagonals within the band of A

( 0 ≤ bwu ≤ n-1 ).

nrhs (global) The number of right hand sides; the number of columns of the
distributed matrix sub(B) (nrhs≥ 0).

a, b (local)
Pointers into the local memory to arrays of local sizes lld_a*LOCc(ja+n-1)
and lld_b*LOCc(nrhs), respectively.

On entry, the array a contains details of the LU factorization of the band

matrix A, as computed by p?dbtrf.

On entry, the array b contains the local pieces of the right hand side
distributed matrix sub(B).

ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
If dtype_a = 501, then dlen_≥ 7;

else if dtype_a = 1, then dlen_≥ 9.

ib (global) The row index in the global matrix B indicating the first row of the
matrix to be operated on (which may be either all of B or a submatrix of B).

descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
If dtype_b = 502, then dlen_≥ 7;

else if dtype_b = 1, then dlen_≥ 9.

af, work (local)

Arrays of size laf and lwork, respectively The array af contains auxiliary
fill-in space. The fill-in space is created in a call to the factorization function
p?dbtrf and is stored in af.
The array work is a workspace array.

laf (local) The size of the array af.

Must be laf≥NB*(bwl+bwu)+6*(max(bwl,bwu))2 .

1342
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If laf is not large enough, an error code will be returned and the minimum
acceptable size will be returned in af[0].

lwork (local or global) The size of the array work, must be at least

lwork≥ (max(bwl,bwu))2.

Output Parameters

b On exit, this array contains the local pieces of the solution distributed
matrix X.

work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.

info If info=0, the execution is successful. info < 0:

If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?dttrs
Solves a system of linear equations with a diagonally
dominant-like tridiagonal distributed matrix using the
factorization computed by p?dttrf.

Syntax
void psdttrs (char *trans , MKL_INT *n , MKL_INT *nrhs , float *dl , float *d , float
*du , MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *descb , float
*af , MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT *info );
void pddttrs (char *trans , MKL_INT *n , MKL_INT *nrhs , double *dl , double *d , double
*du , MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *descb , double
*af , MKL_INT *laf , double *work , MKL_INT *lwork , MKL_INT *info );
void pcdttrs (char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *dl ,
MKL_Complex8 *d , MKL_Complex8 *du , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b ,
MKL_INT *ib , MKL_INT *descb , MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8 *work ,
MKL_INT *lwork , MKL_INT *info );
void pzdttrs (char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *dl ,
MKL_Complex16 *d , MKL_Complex16 *du , MKL_INT *ja , MKL_INT *desca , MKL_Complex16
*b , MKL_INT *ib , MKL_INT *descb , MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16
*work , MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?dttrsfunction solves for X one of the systems of equations:

sub(A)*X = sub(B),
(sub(A))T*X = sub(B), or

1343
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

(sub(A))H*X = sub(B),
where sub(A) =A(1:n, ja:ja+n-1) is a diagonally dominant-like tridiagonal distributed matrix, and sub(B)
denotes the distributed matrix B(ib:ib+n-1, 1:nrhs).

This function uses the LU factorization computed by p?dttrf.

Input Parameters

trans (global) Must be 'N' or 'T' or 'C'.

Indicates the form of the equations:

If trans = 'N', then sub(A)*X = sub(B) is solved for X.

If trans = 'T', then (sub(A))T*X = sub(B) is solved for X.

If trans = 'C', then (sub(A))H*X = sub(B) is solved for X.

n (global) The order of the distributed matrix sub(A) (n≥ 0).

nrhs (global) The number of right hand sides; the number of columns of the
distributed matrix sub(B) (nrhs≥ 0).

dl, d, du (local)
Pointers to the local arrays of size nb_a each.

On entry, these arrays contain details of the factorization. Globally, dl[0]

and du[n-1] are not referenced; dl and du must be aligned with d.

ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
If dtype_a = 501 or dtype_a = 502, then dlen_≥ 7;

else if dtype_a = 1, then dlen_≥ 9.

b (local) Same type as d.

Pointer into the local memory to an array of local size lld_b*LOCc(nrhs)

On entry, the array b contains the local pieces of the n-by-nrhs right hand
side distributed matrix sub(B).

ib (global) The row index in the global matrix B indicating the first row of the
matrix to be operated on (which may be either all of B or a submatrix of B).

descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
If dtype_b = 502, then dlen_≥ 7;

else if dtype_b = 1, then dlen_≥ 9.

af, work (local)

Arrays of size laf and (lwork), respectively.

1344
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The array af contains auxiliary fill-in space. The fill-in space is created in a
call to the factorization function p?dttrf and is stored in af. If a linear
system is to be solved using p?dttrs after the factorization function,af
must not be altered.
The array work is a workspace array.

laf (local) The size of the array af.

Must be laf≥NB*(bwl+bwu)+6*(bwl+bwu)*(bwl+2*bwu).

If laf is not large enough, an error code will be returned and the minimum
acceptable size will be returned in af[0].

lwork (local or global) The size of the array work, must be at least lwork≥
10*NPCOL+4*nrhs.

Output Parameters

b On exit, this array contains the local pieces of the solution distributed
matrix X.

work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.

info If info=0, the execution is successful. info < 0:

If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?potrs
Solves a system of linear equations with a Cholesky-
factored symmetric/Hermitian distributed positive-
definite matrix.

Syntax
void pspotrs (char *uplo , MKL_INT *n , MKL_INT *nrhs , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT
*info );
void pdpotrs (char *uplo , MKL_INT *n , MKL_INT *nrhs , double *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb ,
MKL_INT *info );
void pcpotrs (char *uplo , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , MKL_INT *info );
void pzpotrs (char *uplo , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *jb ,
MKL_INT *descb , MKL_INT *info );

1345
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl_scalapack.h

Description
The p?potrsfunction solves for X a system of distributed linear equations in the form:

sub(A)*X = sub(B) ,
where sub(A) = A(ia:ia+n-1, ja:ja+n-1) is an n-by-n real symmetric or complex Hermitian positive
definite distributed matrix, and sub(B) denotes the distributed matrix B(ib:ib+n-1, jb:jb+nrhs-1).

This function uses Cholesky factorization

sub(A) = UH*U, or sub(A) = L*LH
computed by p?potrf.

Input Parameters

uplo (global) Must be 'U' or 'L'.

If uplo = 'U', upper triangle of sub(A) is stored;

If uplo = 'L', lower triangle of sub(A) is stored.

n (global) The order of the distributed matrix sub(A) (n≥0).

nrhs (global) The number of right hand sides; the number of columns of the
distributed matrix sub(B) (nrhs≥0).

a, b (local)
Pointers into the local memory to arrays of local sizes
lld_a*LOCc(ja+n-1) and lld_b*LOCc(jb+nrhs-1), respectively.

The array a contains the factors L or U from the Cholesky factorization

sub(A) = L*LH or sub(A) = UH*U, as computed by p?potrf.

On entry, the array b contains the local pieces of the right hand sides
sub(B).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of the matrix sub(B), respectively.

descb (local) array of size dlen_. The array descriptor for the distributed matrix B.

Output Parameters

b Overwritten by the local pieces of the solution matrix X.

info If info=0, the execution is successful.

info < 0: if the i-th argument is an array and the j-th entry, indexed j - 1,
had an illegal value, then info = -(i*100+j); if the i-th argument is a
scalar and had an illegal value, then info = -i.

1346
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?pbtrs
Solves a system of linear equations with a Cholesky-
factored symmetric/Hermitian positive-definite band
matrix.

Syntax
void pspbtrs (char *uplo , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs , float *a , MKL_INT
*ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *descb , float *af , MKL_INT
*laf , float *work , MKL_INT *lwork , MKL_INT *info );
void pdpbtrs (char *uplo , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs , double *a ,
MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *descb , double *af ,
MKL_INT *laf , double *work , MKL_INT *lwork , MKL_INT *info );
void pcpbtrs (char *uplo , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs , MKL_Complex8 *a ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *descb ,
MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT
*info );
void pzpbtrs (char *uplo , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs , MKL_Complex16
*a , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *descb ,
MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT
*info );

Include Files
• mkl_scalapack.h

Description
The p?pbtrsfunction solves for X a system of distributed linear equations in the form:

sub(A)*X = sub(B) ,
where sub(A) = A(1:n, ja:ja+n-1) is an n-by-n real symmetric or complex Hermitian positive definite
distributed band matrix, and sub(B) denotes the distributed matrix B(ib:ib+n-1, 1:nrhs).

This function uses Cholesky factorization

sub(A) = P*UH*U*PT, or sub(A) = P*L*LH*PT
computed by p?pbtrf.

Input Parameters

uplo (global) Must be 'U' or 'L'.

If uplo = 'U', upper triangle of sub(A) is stored;

If uplo = 'L', lower triangle of sub(A) is stored.

n (global) The order of the distributed matrix sub(A) (n≥0).

bw (global) The number of superdiagonals of the distributed matrix if uplo =

'U', or the number of subdiagonals if uplo = 'L' (bw≥0).

nrhs (global) The number of right hand sides; the number of columns of the
distributed matrix sub(B) (nrhs≥0).

1347
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

a, b (local)
Pointers into the local memory to arrays of local sizes lld_a*LOCc(ja+n-1)
and lld_b*LOCc(nrhs-1), respectively.

The array a contains the permuted triangular factor U or L from the

Cholesky factorization sub(A) = P*UH*U*PT, or sub(A) = P*L*LH*PT of the
band matrix A, as returned by p?pbtrf.

On entry, the array b contains the local pieces of the n-by-nrhs right hand
side distributed matrix sub(B).

ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
If dtype_a = 501, then dlen_≥ 7;

else if dtype_a = 1, then dlen_≥ 9.

ib (global) The row index in the global matrix B indicating the first row of the
matrix sub(B).

descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
If dtype_b = 502, then dlen_≥ 7;

else if dtype_b = 1, then dlen_≥ 9.

af, work (local) Arrays, same type as a.

The array af is of size laf. It contains auxiliary fill-in space. The fill-in
space is created in a call to the factorization function p?dbtrf and is stored
in af.

The array work is a workspace array of size lwork.

laf (local) The size of the array af.

Must be laf≥nrhs*bw.

If laf is not large enough, an error code will be returned and the minimum
acceptable size will be returned in af[0].

lwork (local or global) The size of the array work, must be at least lwork≥bw2.

Output Parameters

b On exit, if info=0, this array contains the local pieces of the n-by-nrhs
solution distributed matrix X.

work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.

info If info=0, the execution is successful.

info < 0:

1348
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?pttrs
Solves a system of linear equations with a symmetric
(Hermitian) positive-definite tridiagonal distributed
matrix using the factorization computed by p?pttrf.

Syntax
void pspttrs (MKL_INT *n , MKL_INT *nrhs , float *d , float *e , MKL_INT *ja , MKL_INT
*desca , float *b , MKL_INT *ib , MKL_INT *descb , float *af , MKL_INT *laf , float
*work , MKL_INT *lwork , MKL_INT *info );
void pdpttrs (MKL_INT *n , MKL_INT *nrhs , double *d , double *e , MKL_INT *ja , MKL_INT
*desca , double *b , MKL_INT *ib , MKL_INT *descb , double *af , MKL_INT *laf , double
*work , MKL_INT *lwork , MKL_INT *info );
void pcpttrs (char *uplo , MKL_INT *n , MKL_INT *nrhs , float *d , MKL_Complex8 *e ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *descb ,
MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT
*info );
void pzpttrs (char *uplo , MKL_INT *n , MKL_INT *nrhs , double *d , MKL_Complex16 *e ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *descb ,
MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT
*info );

Include Files
• mkl_scalapack.h

Description
The p?pttrsfunction solves for X a system of distributed linear equations in the form:

sub(A)*X = sub(B) ,
where sub(A) = A(1:n, ja:ja+n-1) is an n-by-n real symmetric or complex Hermitian positive definite
tridiagonal distributed matrix, and sub(B) denotes the distributed matrix B(ib:ib+n-1, 1:nrhs).

This function uses the factorization

sub(A) = P*L*D*LH*PT, or sub(A) = P*UH*D*U*PT
computed by p?pttrf.

Input Parameters

uplo (global, used in complex flavors only)

Must be 'U' or 'L'.

If uplo = 'U', upper triangle of sub(A) is stored;

If uplo = 'L', lower triangle of sub(A) is stored.

1349
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

n (global) The order of the distributed matrix sub(A) (n≥0).

nrhs (global) The number of right hand sides; the number of columns of the
distributed matrix sub(B) (nrhs≥0).

d, e (local)
Pointers into the local memory to arrays of size nb_a each.

These arrays contain details of the factorization as returned by p?pttrf

ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
If dtype_a = 501 or dtype_a = 502, then dlen_≥ 7;

else if dtype_a = 1, then dlen_≥ 9.

b (local) Same type as d, e.

Pointer into the local memory to an array of local size

lld_b*LOCc(nrhs).

On entry, the array b contains the local pieces of the n-by-nrhsright hand
side distributed matrix sub(B).

ib (global) The row index in the global matrix B indicating the first row of the
matrix to be operated on (which may be either all of B or a submatrix of B).

descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
If dtype_b = 502, then dlen_≥ 7;

else if dtype_b = 1, then dlen_≥ 9.

af, work (local)

Arrays of size laf and (lwork), respectively. The array af contains
auxiliary fill-in space. The fill-in space is created in a call to the factorization
function p?pttrf and is stored in af.

The array work is a workspace array.

laf (local) The size of the array af.

Must be laf≥nb_a+2.

If laf is not large enough, an error code is returned and the minimum
acceptable size will be returned in af[0].

lwork (local or global) The size of the array work, must be at least

lwork≥ (10+2*min(100,nrhs))*NPCOL+4*nrhs.

Output Parameters

b On exit, this array contains the local pieces of the solution distributed
matrix X.

1350
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
work[0]) On exit, work[0] contains the minimum value of lwork required for
optimum performance.

info If info=0, the execution is successful.

info < 0:
if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a
scalar and had an illegal value, then info = -i.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?trtrs
Solves a system of linear equations with a triangular
distributed matrix.

Syntax
void pstrtrs (char *uplo , char *trans , char *diag , MKL_INT *n , MKL_INT *nrhs , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *jb ,
MKL_INT *descb , MKL_INT *info );
void pdtrtrs (char *uplo , char *trans , char *diag , MKL_INT *n , MKL_INT *nrhs ,
double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib ,
MKL_INT *jb , MKL_INT *descb , MKL_INT *info );
void pctrtrs (char *uplo , char *trans , char *diag , MKL_INT *n , MKL_INT *nrhs ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b ,
MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT *info );
void pztrtrs (char *uplo , char *trans , char *diag , MKL_INT *n , MKL_INT *nrhs ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b ,
MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?trtrsfunction solves for X one of the following systems of linear equations:

sub(A)*X = sub(B),
(sub(A))T*X = sub(B), or
(sub(A))H*X = sub(B),
where sub(A) = A(ia:ia+n-1, ja:ja+n-1) is a triangular distributed matrix of order n, and sub(B) denotes
the distributed matrix B(ib:ib+n-1, jb:jb+nrhs-1).

A check is made to verify that sub(A) is nonsingular.

Input Parameters

uplo (global) Must be 'U' or 'L'.

Indicates whether sub(A) is upper or lower triangular:

If uplo = 'U', then sub(A) is upper triangular.

1351
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If uplo = 'L', then sub(A) is lower triangular.

trans (global) Must be 'N' or 'T' or 'C'.

Indicates the form of the equations:

If trans = 'N', then sub(A)*X = sub(B) is solved for X.

If trans = 'T', then sub(A)T*X = sub(B) is solved for X.

If trans = 'C', then sub(A)H*X = sub(B) is solved for X.

diag (global) Must be 'N' or 'U'.

If diag = 'N', then sub(A) is not a unit triangular matrix.

If diag = 'U', then sub(A) is unit triangular.

n (global) The order of the distributed matrix sub(A) (n≥0).

nrhs (global) The number of right-hand sides; i.e., the number of columns of the
distributed matrix sub(B) (nrhs≥0).

a, b (local)
Pointers into the local memory to arrays of local sizes lld_a*LOCc(ja+n-1)
and lld_b*LOCc(jb+nrhs-1), respectively.

The array a contains the local pieces of the distributed triangular matrix
sub(A).
If uplo = 'U', the leading n-by-n upper triangular part of sub(A) contains
the upper triangular matrix, and the strictly lower triangular part of sub(A)
is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
the lower triangular matrix, and the strictly upper triangular part of sub(A)
is not referenced.
If diag = 'U', the diagonal elements of sub(A) are also not referenced
and are assumed to be 1.
On entry, the array b contains the local pieces of the right hand side
distributed matrix sub(B).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of the matrix sub(B), respectively.

descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.

Output Parameters

b On exit, if info=0, sub(B) is overwritten by the solution matrix X.

info If info=0, the execution is successful.

1352
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
info < 0:
if the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.

info> 0:
if info = i, the i-th diagonal element of sub(A) is zero, indicating that the
submatrix is singular and the solutions X have not been computed.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

Estimating the Condition Number: ScaLAPACK Computational Routines

This section describes the ScaLAPACK routines for estimating the condition number of a matrix. The condition
number is used for analyzing the errors in the solution of a system of linear equations. Since the condition
number may be arbitrarily large when the matrix is nearly singular, the routines actually compute the
reciprocal condition number.

p?gecon
Estimates the reciprocal of the condition number of a
general distributed matrix in either the 1-norm or the
infinity-norm.

Syntax
void psgecon (char *norm , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *anorm , float *rcond , float *work , MKL_INT *lwork , MKL_INT *iwork ,
MKL_INT *liwork , MKL_INT *info );
void pdgecon (char *norm , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *anorm , double *rcond , double *work , MKL_INT *lwork , MKL_INT
*iwork , MKL_INT *liwork , MKL_INT *info );
void pcgecon (char *norm , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , float *anorm , float *rcond , MKL_Complex8 *work , MKL_INT *lwork ,
float *rwork , MKL_INT *lrwork , MKL_INT *info );
void pzgecon (char *norm , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *anorm , double *rcond , MKL_Complex16 *work , MKL_INT *lwork ,
double *rwork , MKL_INT *lrwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?gecon function estimates the reciprocal of the condition number of a general distributed real/complex
matrix sub(A) = A(ia:ia+n-1, ja:ja+n-1) in either the 1-norm or infinity-norm, using the LU factorization
computed by p?getrf.

An estimate is obtained for ||(sub(A))-1||, and the reciprocal of the condition number is computed as

1353
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

norm (global) Must be '1' or 'O' or 'I'.

Specifies whether the 1-norm condition number or the infinity-norm

condition number is required.
If norm = '1' or 'O', then the 1-norm is used;

If norm = 'I', then the infinity-norm is used.

n (global) The order of the distributed matrix sub(A) (n≥ 0).

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1).

The array a contains the local pieces of the factors L and U from the
factorization sub(A) = P*L*U; the unit diagonal elements of L are not
stored.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

anorm (global)
If norm = '1' or 'O', the 1-norm of the original distributed matrix sub(A);

If norm = 'I', the infinity-norm of the original distributed matrix sub(A).

work (local)
The array work of size lwork is a workspace array.

lwork (local or global) The size of the array work.

For real flavors:

lwork must be at least
lwork≥ 2*LOCr(n+mod(ia-1,mb_a))+2*LOCc(n+mod(ja-1,nb_a))
+max(2, max(nb_a*max(1, iceil(NPROW-1, NPCOL)), LOCc(n
+mod(ja-1,nb_a)) + nb_a*max(1, iceil(NPCOL-1, NPROW)))).
For complex flavors:
lwork must be at least
lwork≥ 2*LOCr(n+mod(ia-1,mb_a))+max(2,
max(nb_a*iceil(NPROW-1, NPCOL), LOCc(n+mod(ja-1,nb_a))+
nb_a*iceil(NPCOL-1, NPROW))).

1354
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
LOCr and LOCc values can be computed using the ScaLAPACK tool function
numroc; NPROW and NPCOL can be determined by calling the function
blacs_gridinfo.

NOTE
iceil(x,y) is the ceiling of x/y, and mod(x,y) is the integer
remainder of x/y.

iwork (local) Workspace array of size liwork. Used in real flavors only.

liwork (local or global) The size of the array iwork; used in real flavors only. Must
be at least
liwork≥LOCr(n+mod(ia-1,mb_a)).

rwork (local)
Workspace array of size lrwork. Used in complex flavors only.

lrwork (local or global) The size of the array rwork; used in complex flavors only.
Must be at least
lrwork≥ max(1, 2*LOCc(n+mod(ja-1,nb_a))).

Output Parameters

rcond (global)
The reciprocal of the condition number of the distributed matrix sub(A). See
Description.

work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.

iwork[0] On exit, iwork[0] contains the minimum value of liwork required for
optimum performance (for real flavors).

rwork[0] On exit, rwork[0] contains the minimum value of lrwork required for
optimum performance (for complex flavors).

info (global) If info=0, the execution is successful.

info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?pocon
Estimates the reciprocal of the condition number (in
the 1 - norm) of a symmetric / Hermitian positive-
definite distributed matrix.

1355
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
void pspocon (char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *anorm , float *rcond , float *work , MKL_INT *lwork , MKL_INT *iwork ,
MKL_INT *liwork , MKL_INT *info );
void pdpocon (char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *anorm , double *rcond , double *work , MKL_INT *lwork , MKL_INT
*iwork , MKL_INT *liwork , MKL_INT *info );
void pcpocon (char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , float *anorm , float *rcond , MKL_Complex8 *work , MKL_INT *lwork ,
float *rwork , MKL_INT *lrwork , MKL_INT *info );
void pzpocon (char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *anorm , double *rcond , MKL_Complex16 *work , MKL_INT *lwork ,
double *rwork , MKL_INT *lrwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?poconfunction estimates the reciprocal of the condition number (in the 1 - norm) of a real symmetric
or complex Hermitian positive definite distributed matrix sub(A) = A(ia:ia+n-1, ja:ja+n-1), using the
Cholesky factorization sub(A) = UH*U or sub(A) = L*LH computed by p?potrf.

An estimate is obtained for ||(sub(A))-1||, and the reciprocal of the condition number is computed as

Input Parameters

uplo (global) Must be 'U' or 'L'.

Specifies whether the factor stored in sub(A) is upper or lower triangular.

If uplo = 'U', sub(A) stores the upper triangular factor U of the Cholesky
factorization sub(A) = UH*U.
If uplo = 'L', sub(A) stores the lower triangular factor L of the Cholesky
factorization sub(A) = L*LH.

n (global) The order of the distributed matrix sub(A) (n≥0).

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1).

The array a contains the local pieces of the factors L or U from the Cholesky
factorization sub(A) = UH*U, or sub(A) = L*LH, as computed by p?potrf.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.

1356
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

anorm (global)
The 1-norm of the symmetric/Hermitian distributed matrix sub(A).

work (local)
The array work of size lwork is a workspace array.

lwork (local or global) The size of the array work.

For real flavors:

lwork must be at least
lwork≥ 2*LOCr(n+mod(ia-1,mb_a))+2*LOCc(n+mod(ja-1,nb_a))
+max(2, max(nb_a*iceil(NPROW-1, NPCOL), LOCc(n
+mod(ja-1,nb_a))+nb_a*iceil(NPCOL-1, NPROW))).
For complex flavors:
lwork must be at least
lwork≥ 2*LOCr(n+mod(ia-1,mb_a))+max(2,
max(nb_a*max(1,iceil(NPROW-1, NPCOL)), LOCc(n+mod(ja-1,nb_a))
+nb_a*max(1,iceil(NPCOL-1, NPROW)))).
If lwork = -1, then lwork is a global input and a workspace query is
assumed. The routine only calculates the minimum and optimal size for all
work arrays. Each value is returned in the first entry of the corresponding
work array, and no error message is issued by pxerbla.

NOTE
iceil(x,y) is the ceiling of x/y, and mod(x,y) is the integer
remainder of x/y.

iwork (local) Workspace array of size liwork. Used in real flavors only.

liwork (local or global) The size of the array iwork; used in real flavors only. Must
be at least liwork≥LOCr(n+mod(ia-1,mb_a)).

If liwork = -1, then liwork is a global input and a workspace query is

assumed. The routine only calculates the minimum and optimal size for all
work arrays. Each value is returned in the first entry of the corresponding
work array, and no error message is issued by pxerbla.

rwork (local)
Workspace array of size lrwork. Used in complex flavors only.

lrwork (local or global) The size of the array rwork; used in complex flavors only.
Must be at least lrwork≥ 2*LOCc(n+mod(ja-1,nb_a)).

If lrwork = -1, then lrwork is a global input and a workspace query is

1357
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

rcond (global)
The reciprocal of the condition number of the distributed matrix sub(A).

work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.

iwork[0] On exit, iwork[0] contains the minimum value of liwork required for
optimum performance (for real flavors).

rwork[0] On exit, rwork[0] contains the minimum value of lrwork required for
optimum performance (for complex flavors).

info (global) If info=0, the execution is successful.

info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?trcon
Estimates the reciprocal of the condition number of a
triangular distributed matrix in either 1-norm or
infinity-norm.

Syntax
void pstrcon (char *norm , char *uplo , char *diag , MKL_INT *n , float *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , float *rcond , float *work , MKL_INT *lwork ,
MKL_INT *iwork , MKL_INT *liwork , MKL_INT *info );
void pdtrcon (char *norm , char *uplo , char *diag , MKL_INT *n , double *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , double *rcond , double *work , MKL_INT *lwork ,
MKL_INT *iwork , MKL_INT *liwork , MKL_INT *info );
void pctrcon (char *norm , char *uplo , char *diag , MKL_INT *n , MKL_Complex8 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *rcond , MKL_Complex8 *work ,
MKL_INT *lwork , float *rwork , MKL_INT *lrwork , MKL_INT *info );
void pztrcon (char *norm , char *uplo , char *diag , MKL_INT *n , MKL_Complex16 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *rcond , MKL_Complex16 *work ,
MKL_INT *lwork , double *rwork , MKL_INT *lrwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?trconfunction estimates the reciprocal of the condition number of a triangular distributed matrix
sub(A) = A(ia:ia+n-1, ja:ja+n-1), in either the 1-norm or the infinity-norm.

The norm of sub(A) is computed and an estimate is obtained for ||(sub(A))-1||, then the reciprocal of the
condition number is computed as

1358
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

Input Parameters

norm (global) Must be '1' or 'O' or 'I'.

Specifies whether the 1-norm condition number or the infinity-norm

condition number is required.
If norm = '1' or 'O', then the 1-norm is used;

If norm = 'I', then the infinity-norm is used.

uplo (global) Must be 'U' or 'L'.

If uplo = 'U', sub(A) is upper triangular. If uplo = 'L', sub(A) is lower

triangular.

diag (global) Must be 'N' or 'U'.

If diag = 'N', sub(A) is non-unit triangular. If diag = 'U', sub(A) is unit

triangular.

n (global) The order of the distributed matrix sub(A), (n≥0).

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1).

The array a contains the local pieces of the triangular distributed matrix
sub(A).
If uplo = 'U', the leading n-by-n upper triangular part of this distributed
matrix contains the upper triangular matrix, and its strictly lower triangular
part is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of this distributed
matrix contains the lower triangular matrix, and its strictly upper triangular
part is not referenced.
If diag = 'U', the diagonal elements of sub(A) are also not referenced
and are assumed to be 1.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

work (local)
The array work of size lwork is a workspace array.

lwork (local or global) The size of the array work.

For real flavors:

1359
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lwork must be at least

lwork≥ 2*LOCr(n+mod(ia-1,mb_a))+LOCc(n+mod(ja-1,nb_a))
+max(2, max(nb_a*max(1,iceil(NPROW-1, NPCOL)),
LOCc(n+mod(ja-1,nb_a))+nb_a*max(1,iceil(NPCOL-1, NPROW)))).
For complex flavors:
lwork must be at least
lwork≥ 2*LOCr(n+mod(ia-1,mb_a))+max(2,
max(nb_a*iceil(NPROW-1, NPCOL),
LOCc(n+mod(ja-1,nb_a))+nb_a*iceil(NPCOL-1, NPROW))).

NOTE
iceil(x,y) is the ceiling of x/y, and mod(x,y) is the integer
remainder of x/y.

iwork (local) Workspace array of size liwork. Used in real flavors only.

liwork (local or global) The size of the array iwork; used in real flavors only. Must
be at least
liwork≥LOCr(n+mod(ia-1,mb_a)).

rwork (local)
Workspace array of size lrwork. Used in complex flavors only.

lrwork (local or global) The size of the array rwork; used in complex flavors only.
Must be at least

lrwork≥LOCc(n+mod(ja-1,nb_a)).

Output Parameters

rcond (global)
The reciprocal of the condition number of the distributed matrix sub(A).

work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.

iwork[0] On exit, iwork[0] contains the minimum value of liwork required for
optimum performance (for real flavors).

rwork[0] On exit, rwork[0] contains the minimum value of lrwork required for
optimum performance (for complex flavors).

info (global) If info=0, the execution is successful.

info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

1360
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

Refining the Solution and Estimating Its Error: ScaLAPACK Computational Routines
This section describes the ScaLAPACK routines for refining the computed solution of a system of linear
equations and estimating the solution error. You can call these routines after factorizing the matrix of the
system of equations and computing the solution (see Routines for Matrix Factorization and Solving Systems
of Linear Equations).

p?gerfs
Improves the computed solution to a system of linear
equations and provides error bounds and backward
error estimates for the solution.

Syntax
void psgerfs (char *trans , MKL_INT *n , MKL_INT *nrhs , float *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *af , MKL_INT *iaf , MKL_INT *jaf , MKL_INT
*descaf , MKL_INT *ipiv , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , float
*x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , float *ferr , float *berr , float
*work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork , MKL_INT *info );
void pdgerfs (char *trans , MKL_INT *n , MKL_INT *nrhs , double *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *af , MKL_INT *iaf , MKL_INT *jaf , MKL_INT
*descaf , MKL_INT *ipiv , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb ,
double *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , double *ferr , double *berr ,
double *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork , MKL_INT *info );
void pcgerfs (char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *af , MKL_INT *iaf , MKL_INT *jaf ,
MKL_INT *descaf , MKL_INT *ipiv , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , MKL_Complex8 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , float *ferr ,
float *berr , MKL_Complex8 *work , MKL_INT *lwork , float *rwork , MKL_INT *lrwork ,
MKL_INT *info );
void pzgerfs (char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *af , MKL_INT *iaf , MKL_INT *jaf ,
MKL_INT *descaf , MKL_INT *ipiv , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *jb ,
MKL_INT *descb , MKL_Complex16 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , double
*ferr , double *berr , MKL_Complex16 *work , MKL_INT *lwork , double *rwork , MKL_INT
*lrwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?gerfs function improves the computed solution to one of the systems of linear equations

sub(A)*sub(X) = sub(B),
sub(A)T*sub(X) = sub(B), or
sub(A)H*sub(X) = sub(B) and provides error bounds and backward error estimates for the solution.
Here sub(A) = A(ia:ia+n-1, ja:ja+n-1), sub(B) = B(ib:ib+n-1, jb:jb+nrhs-1), and sub(X) = X(ix:ix
+n-1, jx:jx+nrhs-1).

1361
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

trans (global) Must be 'N' or 'T' or 'C'.

Specifies the form of the system of equations:

If trans = 'N', the system has the form sub(A)*sub(X) = sub(B) (No
transpose);
If trans = 'T', the system has the form sub(A)T*sub(X) = sub(B)
(Transpose);
If trans = 'C', the system has the form sub(A)H*sub(X) = sub(B)
(Conjugate transpose).

n (global) The order of the distributed matrix sub(A) (n≥ 0).

nrhs (global) The number of right-hand sides, i.e., the number of columns of the
matrices sub(B) and sub(X) (nrhs≥ 0).

The array af contains the local pieces of the distributed factors of the
matrix sub(A) = P*L*U as computed by p?getrf.

The array b contains the local pieces of the distributed matrix of right hand
sides sub(B).
On entry, the array x contains the local pieces of the distributed solution
matrix sub(X).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

iaf, jaf (global) The row and column indices in the global matrix AF indicating the
first row and the first column of the matrix sub(AF), respectively.

descaf (global and local) array of size dlen_. The array descriptor for the
distributed matrix AF.

ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of the matrix sub(B), respectively.

descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.

ix, jx (global) The row and column indices in the global matrix X indicating the
first row and the first column of the matrix sub(X), respectively.

1362
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
descx (global and local) array of size dlen_. The array descriptor for the
distributed matrix X.

ipiv (local)
Array of size LOCr(m_af) + mb_af.

This array contains pivoting information as computed by p?getrf. If

ipiv[i]=j, then the local row i+1 was swapped with the global row jwhere
i=0, ... , LOCr(m_af) + mb_af- 1.

This array is tied to the distributed matrix A.

work (local)
The array work of size lwork is a workspace array.

lwork (local or global) The size of the array work.

For real flavors:

lwork must be at least
lwork≥ 3*LOCr(n+mod(ia-1,mb_a))
For complex flavors:
lwork must be at least
lwork≥ 2*LOCr(n+mod(ia-1,mb_a))

NOTE
mod(x,y) is the integer remainder of x/y.

iwork (local) Workspace array, size liwork. Used in real flavors only.

liwork (local or global) The size of the array iwork; used in real flavors only. Must
be at least
liwork≥LOCr(n+mod(ib-1,mb_b)).

rwork (local)
Workspace array, size lrwork. Used in complex flavors only.

lrwork (local or global) The size of the array rwork; used in complex flavors only.
Must be at least lrwork≥LOCr(n+mod(ib-1,mb_b))).

Output Parameters

x On exit, contains the improved solution vectors.

ferr, berr Arrays of size LOCc(jb+nrhs-1) each.

1363
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

This array is tied to the distributed matrix X.

The array berr contains the component-wise relative backward error of
each solution vector (that is, the smallest relative change in any entry of
sub(A) or sub(B) that makes sub(X) an exact solution). This array is tied to
the distributed matrix X.

work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.

iwork[0] On exit, iwork[0] contains the minimum value of liwork required for
optimum performance (for real flavors).

rwork[0] On exit, rwork[0] contains the minimum value of lrwork required for
optimum performance (for complex flavors).

info (global) If info=0, the execution is successful.

info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?porfs
Improves the computed solution to a system of linear
equations with symmetric/Hermitian positive definite
distributed matrix and provides error bounds and
backward error estimates for the solution.

Syntax
void psporfs (char *uplo , MKL_INT *n , MKL_INT *nrhs , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *af , MKL_INT *iaf , MKL_INT *jaf , MKL_INT *descaf , float
*b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , float *x , MKL_INT *ix , MKL_INT *jx ,
MKL_INT *descx , float *ferr , float *berr , float *work , MKL_INT *lwork , MKL_INT
*iwork , MKL_INT *liwork , MKL_INT *info );
void pdporfs (char *uplo , MKL_INT *n , MKL_INT *nrhs , double *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *af , MKL_INT *iaf , MKL_INT *jaf , MKL_INT
*descaf , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , double *x , MKL_INT
*ix , MKL_INT *jx , MKL_INT *descx , double *ferr , double *berr , double *work ,
MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork , MKL_INT *info );
void pcporfs (char *uplo , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *af , MKL_INT *iaf , MKL_INT *jaf , MKL_INT
*descaf , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_Complex8
*x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , float *ferr , float *berr ,
MKL_Complex8 *work , MKL_INT *lwork , float *rwork , MKL_INT *lrwork , MKL_INT *info );
void pzporfs (char *uplo , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *af , MKL_INT *iaf , MKL_INT *jaf ,
MKL_INT *descaf , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb ,
MKL_Complex16 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , double *ferr , double
*berr , MKL_Complex16 *work , MKL_INT *lwork , double *rwork , MKL_INT *lrwork ,
MKL_INT *info );

1364
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl_scalapack.h

Description
The p?porfsfunction improves the computed solution to the system of linear equations

sub(A)*sub(X) = sub(B),
where sub(A) = A(ia:ia+n-1, ja:ja+n-1) is a real symmetric or complex Hermitian positive definite
distributed matrix and
sub(B) = B(ib:ib+n-1, jb:jb+nrhs-1),

sub(X) = X(ix:ix+n-1, jx:jx+nrhs-1)

are right-hand side and solution submatrices, respectively. This function also provides error bounds and
backward error estimates for the solution.

Input Parameters

uplo (global) Must be 'U' or 'L'.

Specifies whether the upper or lower triangular part of the symmetric/

Hermitian matrix sub(A) is stored.
If uplo = 'U', sub(A) is upper triangular. If uplo = 'L', sub(A) is lower
triangular.

n (global) The order of the distributed matrix sub(A) (n≥0).

nrhs (global) The number of right-hand sides, i.e., the number of columns of the
matrices sub(B) and sub(X) (nrhs≥0).

a, af, b, x (local)
Pointers into the local memory to arrays of local sizes
a: lld_a * LOCc(ja+n-1),
af: lld_af * LOCc(jaf+n-1),
b: lld_b * LOCc(jb+nrhs-1),
x: lld_x * LOCc(jx+nrhs-1).
The array a contains the local pieces of the n-by-n symmetric/Hermitian
distributed matrix sub(A).
If uplo = 'U', the leading n-by-n upper triangular part of sub(A) contains
the upper triangular part of the matrix, and its strictly lower triangular part
is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
the lower triangular part of the distributed matrix, and its strictly upper
triangular part is not referenced.
The array af contains the factors L or U from the Cholesky factorization
sub(A) = L*LH or sub(A) = UH*U, as computed by p?potrf.

On entry, the array b contains the local pieces of the distributed matrix of
right hand sides sub(B).

1365
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

On entry, the array x contains the local pieces of the solution vectors
sub(X).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

iaf, jaf (global) The row and column indices in the global matrix AF indicating the
first row and the first column of the matrix sub(AF), respectively.

descaf (global and local) array of size dlen_. The array descriptor for the
distributed matrix AF.

ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of the matrix sub(B), respectively.

descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.

ix, jx (global) The row and column indices in the global matrix X indicating the
first row and the first column of the matrix sub(X), respectively.

descx (global and local) array of size dlen_. The array descriptor for the
distributed matrix X.

work (local)
The array work of size lwork is a workspace array.

lwork (local) The size of the array work.

For real flavors:

lwork must be at least
lwork≥ 3*LOCr(n+mod(ia-1,mb_a))
For complex flavors:
lwork must be at least
lwork≥ 2*LOCr(n+mod(ia-1,mb_a))

NOTE
mod(x,y) is the integer remainder of x/y.

iwork (local) Workspace array of size liwork. Used in real flavors only.

liwork (local or global) The size of the array iwork; used in real flavors only. Must
be at least
liwork≥LOCr(n+mod(ib-1,mb_b)).

rwork (local)
Workspace array of size lrwork. Used in complex flavors only.

lrwork (local or global) The size of the array rwork; used in complex flavors only.
Must be at least lrwork≥LOCr(n+mod(ib-1,mb_b))).

1366
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

x On exit, contains the improved solution vectors.

ferr, berr Arrays of size LOCc(jb+nrhs-1) each.

The array ferr contains the estimated forward error bound for each
solution vector of sub(X).
If XTRUE is the true solution corresponding to sub(X), ferr is an estimated
upper bound for the magnitude of the largest element in (sub(X) - XTRUE)
divided by the magnitude of the largest element in sub(X). The estimate is
as reliable as the estimate for rcond, and is almost always a slight
overestimate of the true error.
This array is tied to the distributed matrix X.
The array berr contains the component-wise relative backward error of
each solution vector (that is, the smallest relative change in any entry of
sub(A) or sub(B) that makes sub(X) an exact solution). This array is tied to
the distributed matrix X.

work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.

iwork[0] On exit, iwork[0] contains the minimum value of liwork required for
optimum performance (for real flavors).

rwork[0] On exit, rwork[0] contains the minimum value of lrwork required for
optimum performance (for complex flavors).

info (global) If info=0, the execution is successful.

info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?trrfs
Provides error bounds and backward error estimates
for the solution to a system of linear equations with a
distributed triangular coefficient matrix.

Syntax
void pstrrfs (char *uplo , char *trans , char *diag , MKL_INT *n , MKL_INT *nrhs , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *jb ,
MKL_INT *descb , float *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , float *ferr ,
float *berr , float *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork , MKL_INT
*info );
void pdtrrfs (char *uplo , char *trans , char *diag , MKL_INT *n , MKL_INT *nrhs ,
double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib ,
MKL_INT *jb , MKL_INT *descb , double *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx ,
double *ferr , double *berr , double *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT
*liwork , MKL_INT *info );

1367
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

void pctrrfs (char *uplo , char *trans , char *diag , MKL_INT *n , MKL_INT *nrhs ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b ,
MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_Complex8 *x , MKL_INT *ix , MKL_INT
*jx , MKL_INT *descx , float *ferr , float *berr , MKL_Complex8 *work , MKL_INT *lwork ,
float *rwork , MKL_INT *lrwork , MKL_INT *info );
void pztrrfs (char *uplo , char *trans , char *diag , MKL_INT *n , MKL_INT *nrhs ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b ,
MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_Complex16 *x , MKL_INT *ix , MKL_INT
*jx , MKL_INT *descx , double *ferr , double *berr , MKL_Complex16 *work , MKL_INT
*lwork , double *rwork , MKL_INT *lrwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?trrfsfunction provides error bounds and backward error estimates for the solution to one of the
systems of linear equations
sub(A)*sub(X) = sub(B),
sub(A)T*sub(X) = sub(B), or
sub(A)H*sub(X) = sub(B) ,
where sub(A) = A(ia:ia+n-1, ja:ja+n-1) is a triangular matrix,

sub(B) = B(ib:ib+n-1, jb:jb+nrhs-1), and

sub(X) = X(ix:ix+n-1, jx:jx+nrhs-1).

The solution matrix X must be computed by p?trtrs or some other means before entering this function. The
function p?trrfs does not do iterative refinement because doing so cannot improve the backward error.

Input Parameters

uplo (global) Must be 'U' or 'L'.

If uplo = 'U', sub(A) is upper triangular. If uplo = 'L', sub(A) is lower

triangular.

trans (global) Must be 'N' or 'T' or 'C'.

Specifies the form of the system of equations:

diag Must be 'N' or 'U'.

If diag = 'N', then sub(A) is non-unit triangular.

If diag = 'U', then sub(A) is unit triangular.

n (global) The order of the distributed matrix sub(A) (n≥0).

1368
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
nrhs (global) The number of right-hand sides, that is, the number of columns of
the matrices sub(B) and sub(X) (nrhs≥0).

a, b, x (local)
Pointers into the local memory to arrays of local sizes
a: lld_a * LOCc(ja+n-1),
b: lld_b * LOCc(jb+nrhs-1),
x: lld_x * LOCc(jx+nrhs-1).
The array a contains the local pieces of the original triangular distributed
matrix sub(A).
If uplo = 'U', the leading n-by-n upper triangular part of sub(A) contains
the upper triangular part of the matrix, and its strictly lower triangular part
is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
the lower triangular part of the distributed matrix, and its strictly upper
triangular part is not referenced.
If diag = 'U', the diagonal elements of sub(A) are also not referenced
and are assumed to be 1.
On entry, the array b contains the local pieces of the distributed matrix of
right hand sides sub(B).
On entry, the array x contains the local pieces of the solution vectors
sub(X).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of the matrix sub(B), respectively.

descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.

ix, jx (global) The row and column indices in the global matrix X indicating the
first row and the first column of the matrix sub(X), respectively.

descx (global and local) array of size dlen_. The array descriptor for the
distributed matrix X.

work (local)
The array work of size lwork is a workspace array.

lwork (local) The size of the array work.

For real flavors:

lwork must be at least lwork≥ 3*LOCr(n+mod(ia-1,mb_a))
For complex flavors:
lwork must be at least

1369
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lwork≥ 2*LOCr(n+mod(ia-1,mb_a))

NOTE
mod(x,y) is the integer remainder of x/y.

iwork (local) Workspace array of size liwork. Used in real flavors only.

liwork (local or global) The size of the array iwork; used in real flavors only. Must
be at least
liwork≥LOCr(n+mod(ib-1,mb_b)).

rwork (local)
Workspace array of size lrwork. Used in complex flavors only.

lrwork (local or global) The size of the array rwork; used in complex flavors only.
Must be at least lrwork≥LOCr(n+mod(ib-1,mb_b))).

Output Parameters

ferr, berr Arrays of size LOCc(jb+nrhs-1) each.

work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.

iwork[0] On exit, iwork[0] contains the minimum value of liwork required for
optimum performance (for real flavors).

rwork[0] On exit, rwork[0] contains the minimum value of lrwork required for
optimum performance (for complex flavors).

info (global) If info=0, the execution is successful.

info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

1370
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

Matrix Inversion: ScaLAPACK Computational Routines

This sections describes ScaLAPACK routines that compute the inverse of a matrix based on the previously
obtained factorization. Note that it is not recommended to solve a system of equations Ax = b by first
computing A-1 and then forming the matrix-vector product x = A-1b. Call a solver routine instead (see
Solving Systems of Linear Equations); this is more efficient and more accurate.

p?getri
Computes the inverse of a LU-factored distributed
matrix.

Syntax
void psgetri (MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca ,
MKL_INT *ipiv , float *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork ,
MKL_INT *info );
void pdgetri (MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca ,
MKL_INT *ipiv , double *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork ,
MKL_INT *info );
void pcgetri (MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *ipiv , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT
*liwork , MKL_INT *info );
void pzgetri (MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *ipiv , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT *iwork ,
MKL_INT *liwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?getrifunction computes the inverse of a general distributed matrix sub(A) = A(ia:ia+n-1, ja:ja
+n-1) using the LU factorization computed by p?getrf. This method inverts U and then computes the
inverse of sub(A) by solving the system
inv(sub(A))*L = inv(U)
for inv(sub(A)).

Input Parameters

n (global) The number of rows and columns to be operated on, that is, the
order of the distributed matrix sub(A) (n≥0).

a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).

On entry, the array a contains the local pieces of the L and U obtained by
the factorization sub(A) = P*L*U computed by p?getrf.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

1371
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

work (local)
The array work of size lwork is a workspace array.

lwork (local) The size of the array work. lwork must be at least

lwork≥LOCr(n+mod(ia-1,mb_a))*nb_a.

NOTE
mod(x,y) is the integer remainder of x/y.

The array work is used to keep at most an entire column block of sub(A).

iwork (local) Workspace array used for physically transposing the pivots, size
liwork.

liwork (local or global) The size of the array iwork.

The minimal value liwork of is determined by the following code:

if NPROW == NPCOL then

liwork = LOCc(n_a + mod(ja-1,nb_a))+ nb_a
else
liwork = LOCc(n_a + mod(ja-1,nb_a)) +
max(ceil(ceil(LOCr(m_a)/mb_a)/(lcm/NPROW)),nb_a)
end if
where lcm is the least common multiple of process rows and columns
(NPROW and NPCOL).

Output Parameters

ipiv (local)
Array of size LOCr(m_a)+ mb_a.

This array contains the pivoting information.

If ipiv[i]=j, then the local row i+1 was swapped with the global row
jwhere i=0, ... , LOCr(m_a) + mb_a- 1.

This array is tied to the distributed matrix A.

work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.

iwork[0] On exit, iwork[0] contains the minimum value of liwork required for
optimum performance.

info (global) If info=0, the execution is successful.

info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.

info> 0:

1372
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = i, the matrix element U(i,i) is exactly zero. The factorization has
been completed, but the factor U is exactly singular, and division by zero
will occur if it is used to solve a system of equations.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?potri
Computes the inverse of a symmetric/Hermitian
positive definite distributed matrix.

Syntax
void pspotri (char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *info );
void pdpotri (char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *info );
void pcpotri (char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *info );
void pzpotri (char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?potrifunction computes the inverse of a real symmetric or complex Hermitian positive definite
distributed matrix sub(A) = A(ia:ia+n-1, ja:ja+n-1) using the Cholesky factorization sub(A) = UH*U or
sub(A) = L*LH computed by p?potrf.

Input Parameters

uplo (global) Must be 'U' or 'L'.

Specifies whether the upper or lower triangular part of the symmetric/

Hermitian matrix sub(A) is stored.
If uplo = 'U', upper triangle of sub(A) is stored. If uplo = 'L', lower
triangle of sub(A) is stored.

n (global) The number of rows and columns to be operated on, that is, the
order of the distributed matrix sub(A) (n≥0).

a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).

On entry, the array a contains the local pieces of the triangular factor U or L
from the Cholesky factorization sub(A) = UH*U, or sub(A) = L*LH, as
computed by p?potrf.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.

1373
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

Output Parameters

a On exit, overwritten by the local pieces of the upper or lower triangle of the
(symmetric/Hermitian) inverse of sub(A).

info (global) If info=0, the execution is successful.

info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.

info> 0:
If info = i, the element (i, i) of the factor U or L is zero, and the inverse
could not be computed.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?trtri
Computes the inverse of a triangular distributed
matrix.

Syntax
void pstrtri (char *uplo , char *diag , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , MKL_INT *info );
void pdtrtri (char *uplo , char *diag , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , MKL_INT *info );
void pctrtri (char *uplo , char *diag , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_INT *info );
void pztrtri (char *uplo , char *diag , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?trtrifunction computes the inverse of a real or complex upper or lower triangular distributed matrix
sub(A) = A(ia:ia+n-1, ja:ja+n-1).

Input Parameters

uplo (global) Must be 'U' or 'L'.

Specifies whether the distributed matrix sub(A) is upper or lower triangular.

If uplo = 'U', sub(A) is upper triangular.

If uplo = 'L', sub(A) is lower triangular.

1374
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
diag Must be 'N' or 'U'.

Specifies whether or not the distributed matrix sub(A) is unit triangular.

If diag = 'N', then sub(A) is non-unit triangular.

If diag = 'U', then sub(A) is unit triangular.

n (global) The number of rows and columns to be operated on, that is, the
order of the distributed matrix sub(A) (n≥0).

a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).

The array a contains the local pieces of the triangular distributed matrix
sub(A).
If uplo = 'U', the leading n-by-n upper triangular part of sub(A) contains
the upper triangular matrix to be inverted, and the strictly lower triangular
part of sub(A) is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
the lower triangular matrix, and the strictly upper triangular part of sub(A)
is not referenced.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

Output Parameters

a On exit, overwritten by the (triangular) inverse of the original matrix.

info (global) If info=0, the execution is successful.

info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.

info> 0:
If info = k, the matrix element A(ia+k-1, ja+k-1) is exactly zero. The
triangular matrix sub(A) is singular and its inverse cannot be computed.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

Matrix Equilibration: ScaLAPACK Computational Routines

ScaLAPACK routines described in this section are used to compute scaling factors needed to equilibrate a
matrix. Note that these routines do not actually scale the matrices.

p?geequ
Computes row and column scaling factors intended to
equilibrate a general rectangular distributed matrix
and reduce its condition number.

1375
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
void psgeequ (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *r , float *c , float *rowcnd , float *colcnd , float *amax , MKL_INT
*info );
void pdgeequ (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *r , double *c , double *rowcnd , double *colcnd , double *amax ,
MKL_INT *info );
void pcgeequ (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , float *r , float *c , float *rowcnd , float *colcnd , float *amax ,
MKL_INT *info );
void pzgeequ (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *r , double *c , double *rowcnd , double *colcnd , double
*amax , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?geequfunction computes row and column scalings intended to equilibrate an m-by-n distributed matrix
sub(A) = A(ia:ia+m-1, ja:ja+n-1) and reduce its condition number. The output array r returns the row
scale factors ri , and the array c returns the column scale factors cj . These factors are chosen to try to make
the largest element in each row and column of the matrix B with elements bij=ri*aij*cj have absolute value 1.
ri and cj are restricted to be between SMLNUM = smallest safe number and BIGNUM = largest safe number.
Use of these scaling factors is not guaranteed to reduce the condition number of sub(A) but works well in
practice.
SMLNUM and BIGNUM are parameters representing machine precision. You can use the ?lamch routines to
compute them. For example, compute single precision values of SMLNUM and BIGNUM as follows:

SMLNUM = slamch ('s')

BIGNUM = 1 / SMLNUM
The auxiliary function p?laqge uses scaling factors computed by p?geequ to scale a general rectangular
matrix.

Input Parameters

m (global) The number of rows to be operated on, that is, the number of rows
of the distributed matrix sub(A) (m≥ 0).

n (global) The number of columns to be operated on, that is, the number of
columns of the distributed matrix sub(A) (n≥ 0).

a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).

The array a contains the local pieces of the m-by-n distributed matrix whose
equilibration factors are to be computed.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.

1376
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

Output Parameters

r, c (local)
Arrays of sizes LOCr(m_a) and LOCc(n_a), respectively.

If info = 0, or info>ia+m-1, r[i] contain the row scale factors for sub(A)
for ia-1≤ i<ia+m-1. r is aligned with the distributed matrix A, and
replicated across every process column. r is tied to the distributed matrix
A.
If info = 0, c[i] contain the column scale factors for sub(A) for ja-1≤
i<ja+n-1. c is aligned with the distributed matrix A, and replicated down
every process row. c is tied to the distributed matrix A.

rowcnd, colcnd (global)

If info = 0 or info>ia+m-1, rowcnd contains the ratio of the smallest ri
to the largest ri (ia ≤ i ≤ ia+m-1). If rowcnd≥ 0.1 and amax is neither too
large nor too small, it is not worth scaling by ri.

If info = 0, colcnd contains the ratio of the smallest cj to the largest cj

(ja ≤ j ≤ ja+n-1).

If colcnd≥ 0.1, it is not worth scaling by cj.

amax (global)
Absolute value of the largest matrix element. If amax is very close to
overflow or very close to underflow, the matrix should be scaled.

info (global) If info=0, the execution is successful.

info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.

info> 0:
If info = i and

i ≤ m, the i-th row of the distributed matrix

sub(A) is exactly zero;

i>m, the (i - m)-th column of the distributed

matrix sub(A) is exactly zero.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?poequ
Computes row and column scaling factors intended to
equilibrate a symmetric (Hermitian) positive definite
distributed matrix and reduce its condition number.

1377
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
void pspoequ (MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float
*sr , float *sc , float *scond , float *amax , MKL_INT *info );
void pdpoequ (MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca ,
double *sr , double *sc , double *scond , double *amax , MKL_INT *info );
void pcpoequ (MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *sr , float *sc , float *scond , float *amax , MKL_INT *info );
void pzpoequ (MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *sr , double *sc , double *scond , double *amax , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?poequ function computes row and column scalings intended to equilibrate a real symmetric or
complex Hermitian positive definite distributed matrix sub(A) = A(ia:ia+n-1, ja:ja+n-1) and reduce its
condition number (with respect to the two-norm). The output arrays sr and sc return the row and column
scale factors

These factors are chosen so that the scaled distributed matrix B with elements bij=s(i)*aij*s(j) has ones on
the diagonal.
This choice of sr and sc puts the condition number of B within a factor n of the smallest possible condition
number over all possible diagonal scalings.
The auxiliary function p?laqsy uses scaling factors computed by p?geequ to scale a general rectangular
matrix.

Input Parameters

n (global) The number of rows and columns to be operated on, that is, the
order of the distributed matrix sub(A) (n≥0).

a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).

The array a contains the n-by-n symmetric/Hermitian positive definite

distributed matrix sub(A) whose scaling factors are to be computed. Only
the diagonal elements of sub(A) are referenced.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

Output Parameters

sr, sc (local)

1378
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Arrays of sizes LOCr(m_a) and LOCc(n_a), respectively.

If info = 0, the array sr(ia:ia+n-1) contains the row scale factors for
sub(A). sr is aligned with the distributed matrix A, and replicated across
every process column. sr is tied to the distributed matrix A.

If info = 0, the array sc(ja:ja+n-1) contains the column scale factors

for sub(A). sc is aligned with the distributed matrix A, and replicated down
every process row. sc is tied to the distributed matrix A.

scond (global)

If info = 0, scond contains the ratio of the smallest sr[i] ( or sc[j]) to

the largest sr[i] ( or sc[j]), with

ia-1≤i<ia+n-1 and ja-1≤j<ja+n-1.

If scond≥ 0.1 and amax is neither too large nor too small, it is not worth
scaling by sr ( or sc ).

amax (global)
Absolute value of the largest matrix element. If amax is very close to
overflow or very close to underflow, the matrix should be scaled.

info (global)
If info=0, the execution is successful.

info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.

info> 0:
If info = k, the k-th diagonal entry of sub(A) is nonpositive.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

Orthogonal Factorizations: ScaLAPACK Computational Routines

This section describes the ScaLAPACK routines for the QR(RQ) and LQ(QL) factorization of matrices. Routines
for the RZ factorization as well as for generalized QR and RQ factorizations are also included. For the
mathematical definition of the factorizations, see the respective LAPACK sections or refer to [SLUG].
Table "Computational Routines for Orthogonal Factorizations" lists ScaLAPACK routines that perform
orthogonal factorization of matrices.
Computational Routines for Orthogonal Factorizations
Matrix type, Factorize Factorize with Generate matrix Apply matrix Q
factorization without pivoting Q
pivoting

general matrices, QR p?geqrf p?geqpf p?orgqr p?ormqr

factorization
p?ungqr p?unmqr

1379
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Matrix type, Factorize Factorize with Generate matrix Apply matrix Q

factorization without pivoting Q
pivoting

general matrices, RQ p?gerqf p?orgrq p?ormrq

factorization
p?ungrq p?unmrq

general matrices, LQ p?gelqf p?orglq p?ormlq

factorization
p?unglq p?unmlq

general matrices, QL p?geqlf p?orgql p?ormql

factorization
p?ungql p?unmql

trapezoidal matrices, p?tzrzf p?ormrz

RZ factorization
p?unmrz

pair of matrices, p?ggqrf

generalized QR
factorization

pair of matrices, p?ggrqf

generalized RQ
factorization

p?geqrf
Computes the QR factorization of a general m-by-n
matrix.

Syntax
void psgeqrf (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdgeqrf (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
void pcgeqrf (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT
*info );
void pzgeqrf (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT
*info );

Include Files
• mkl_scalapack.h

Description
The p?geqrf function forms the QR factorization of a general m-by-n distributed matrix sub(A)= A(ia:ia
+m-1, ja:ja+n-1) as

A=Q*R.

Input Parameters

m (global) The number of rows in the distributed matrix sub(A); (m≥ 0).

1380
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
n (global) The number of columns in the distributed matrix sub(A); (n≥ 0).

a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).

Contains the local pieces of the distributed matrix sub(A) to be factored.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A(ia:ia+m-1, ja:ja+n-1),
respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A

work (local).
Workspace array of size lwork.

lwork (local or global) size of work, must be at least lwork≥nb_a *

(mp0+nq0+nb_a), where
iroff = mod(ia-1, mb_a), icoff = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),
iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),
mp0 = numroc(m+iroff, mb_a, MYROW, iarow, NPROW),
nq0 = numroc(n+icoff, nb_a, MYCOL, iacol, NPCOL), and numroc,
indxg2p are ScaLAPACK tool functions; MYROW, MYCOL, NPROW and NPCOL
can be determined by calling the function blacs_gridinfo.

If lwork = -1, then lwork is global input and a workspace query is

assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.

Output Parameters

a The elements on and above the diagonal of sub(A) contain the min(m,n)-by-
n upper trapezoidal matrix R (R is upper triangular if m≥n); the elements
below the diagonal, with the array tau, represent the orthogonal/unitary
matrix Q as a product of elementary reflectors (see Application Notes
below).

tau (local)
Array of size LOCc(ja+min(m,n)-1).

Contains the scalar factor of elementary reflectors. tau is tied to the

distributed matrix A.

work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.

info (global)
= 0, the execution is successful.

1381
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

< 0, if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.

Application Notes
The matrix Q is represented as a product of elementary reflectors
Q = H(ja)*H(ja+1)*...*H(ja+k-1),

where k = min(m,n).

Each H(i) has the form

H(i) = I - tau*v*v'
where tau is a real/complex scalar, and v is a real/complex vector with v(1:i-1) = 0 and v(i) = 1; v(i+1:m) is
stored on exit in A(ia+i:ia+m-1, ja+i-1), and tau in tau[ja+i-2].

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?geqpf
Computes the QR factorization of a general m-by-n
matrix with pivoting.

Syntax
void psgeqpf (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *ipiv , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdgeqpf (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *ipiv , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
void pcgeqpf (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *ipiv , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT
*lwork , float *rwork , MKL_INT *lrwork , MKL_INT *info );
void pzgeqpf (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *ipiv , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT
*lwork , double *rwork , MKL_INT *lrwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?geqpf function forms the QR factorization with column pivoting of a general m-by-n distributed matrix
sub(A)= A(ia:ia+m-1, ja:ja+n-1) as

sub(A)*P=Q*R.

Input Parameters

m (global) The number of rows in the matrix sub(A) (m≥ 0).

n (global) The number of columns in the matrix sub(A) (n≥ 0).

a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).

1382
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Contains the local pieces of the distributed matrix sub(A) to be factored.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A(ia:ia+m-1, ja:ja+n-1),
respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

work (local).
Workspace array of size lwork.

lwork (local or global) size of work, must be at least

For real flavors:

lwork≥max(3,mp0+nq0) + LOCc (ja+n-1) + nq0.
For complex flavors:
lwork≥max(3,mp0+nq0) .
Here
iroff = mod(ia-1, mb_a), icoff = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),
iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),
mp0 = numroc(m+iroff, mb_a, MYROW, iarow, NPROW ),
nq0 = numroc(n+icoff, nb_a, MYCOL, iacol, NPCOL),
LOCc (ja+n-1) = numroc(ja+n-1, nb_a, MYCOL,csrc_a, NPCOL),
and numroc, indxg2p are ScaLAPACK tool functions.

You can determine MYROW, MYCOL, NPROW and NPCOL by calling the
blacs_gridinfofunction.
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.

rwork (local).
Workspace array of size lrwork (complex flavors only).

lrwork (local or global) size of rwork (complex flavors only). The value of lrwork
must be at least
lwork≥LOCc (ja+n-1) + nq0 .
Here
iroff = mod(ia-1, mb_a), icoff = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),
iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),
mp0 = numroc(m+iroff, mb_a, MYROW, iarow, NPROW ),
nq0 = numroc(n+icoff, nb_a, MYCOL, iacol, NPCOL),

1383
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

LOCc (ja+n-1) = numroc(ja+n-1, nb_a, MYCOL,csrc_a, NPCOL),

and numroc, indxg2p are ScaLAPACK tool functions.

You can determine MYROW, MYCOL, NPROW and NPCOL by calling the
blacs_gridinfofunction.
If lrwork = -1, then lrwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.

Output Parameters

a The elements on and above the diagonal of sub(A)contain the min(m,n)-by-

n upper trapezoidal matrix R (R is upper triangular if m≥n); the elements
below the diagonal, with the array tau, represent the orthogonal/unitary
matrix Q as a product of elementary reflectors (see Application Notes
below).

ipiv (local) Array of size LOCc(ja+n-1).

ipiv[i] = k, the local (i+1)-th column of sub(A)*P was the global k-th
column of sub(A) (0 ≤ i < LOCc(ja+n-1). ipiv is tied to the distributed
matrix A.

tau (local)
Array of size LOCc(ja+min(m, n)-1).

Contains the scalar factor tau of elementary reflectors. tau is tied to the
distributed matrix A.

work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.

rwork[0] On exit, rwork[0] contains the minimum value of lrwork required for
optimum performance.

info (global)
= 0, the execution is successful.
< 0, if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.

Application Notes
The matrix Q is represented as a product of elementary reflectors
Q = H(1)*H(2)*...*H(k)
where k = min(m,n).

Each H(i) has the form

H = I - tau*v*v'
where tau is a real/complex scalar, and v is a real/complex vector with v(1:i-1) = 0 and v(i) = 1; v(i+1:m) is
stored on exit in A(ia+i:ia+m-1, ja+i-1).

1384
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The matrix P is represented in ipiv as follows: if ipiv[j]= i then the (j+1)-th column of P is the i-th
canonical unit vector (0 ≤ j < LOCc(ja+n-1).

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?orgqr
Generates the orthogonal matrix Q of the QR
factorization formed by p?geqrf.

Syntax
void psorgqr (MKL_INT *m , MKL_INT *n , MKL_INT *k , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdorgqr (MKL_INT *m , MKL_INT *n , MKL_INT *k , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?orgqrfunction generates the whole or part of m-by-n real distributed matrix Q denoting A(ia:ia+m-1,
ja:ja+n-1) with orthonormal columns, which is defined as the first n columns of a product of k elementary
reflectors of order m

Q= H(1)*H(2)*...*H(k)

as returned by p?geqrf.

Input Parameters

m (global) The number of rows in the matrix sub(Q) (m≥ 0).

n (global) The number of columns in the matrix sub(Q) (m≥n≥ 0).

k (global) The number of elementary reflectors whose product defines the

matrix Q(n≥k≥ 0).

a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).
The j-th column of the matrix stored in amust contain the vector that
defines the elementary reflector H(j), ja≤ j ≤ ja +k-1, as returned by
p?geqrf in the k columns of its distributed matrix argument A(ia:*, ja:ja
+k-1).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A(ia:ia+m-1, ja:ja+n-1),
respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

tau (local)
Array of size LOCc(ja+k-1).

1385
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Contains the scalar factor tau[j] of elementary reflectors H(j+1) as

returned by p?geqrf (0 ≤ j < LOCc(ja+k-1)). tau is tied to the
distributed matrix A.

work (local)
Workspace array of size of lwork.

lwork (local or global) size of work.

Must be at least lwork≥nb_a*(nqa0 + mpa0 + nb_a), where

iroffa = mod(ia-1, mb_a), icoffa = mod(ja-1, nb_a),

iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),
iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),
mpa0 = numroc(m+iroffa, mb_a, MYROW, iarow, NPROW),
nqa0 = numroc(n+icoffa, nb_a, MYCOL, iacol, NPCOL);
indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.

If lwork = -1, then lwork is global input and a workspace query is

Output Parameters

a Contains the local pieces of the m-by-n distributed matrix Q.

work[0] On exit, [0] contains the minimum value of lwork required for optimum
performance.

info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?ungqr
Generates the complex unitary matrix Q of the QR
factorization formed by p?geqrf.

Syntax
void pcungqr (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
void pzungqr (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );

1386
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl_scalapack.h

Description
This function generates the whole or part of m-by-n complex distributed matrix Q denoting A(ia:ia+m-1,
ja:ja+n-1) with orthonormal columns, which is defined as the first n columns of a product of k elementary
reflectors of order m

Q = H(1)*H(2)*...*H(k)

as returned by p?geqrf.

Input Parameters

m (global) The number of rows in the matrix sub(Q); (m≥0).

n (global) The number of columns in the matrix sub(Q) (m≥n≥0).

k (global) The number of elementary reflectors whose product defines the

matrix Q (n≥k≥0).

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1). The
j-th column of the matrix stored in amust contain the vector that defines
the elementary reflector H(j), ja≤ j≤ ja +k-1, as returned by p?geqrf in
the k columns of its distributed matrix argument A(ia:*, ja:ja+k-1).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

tau (local)
Array of size LOCc(ja+k-1).

Contains the scalar factor tau[j] of elementary reflectors H(j+1) as

returned by p?geqrf (0 ≤ j < LOCc(ja+k-1)). tau is tied to the
distributed matrix A.

work (local)
Workspace array of size of lwork.

lwork (local or global) size of work, must be at least lwork≥nb_a*(nqa0 + mpa0

+ nb_a), where
iroffa = mod(ia-1, mb_a),

icoffa = mod(ja-1, nb_a),

iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),

iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),

mpa0 = numroc(m+iroffa, mb_a, MYROW, iarow, NPROW),

nqa0 = numroc(n+icoffa, nb_a, MYCOL, iacol, NPCOL)

1387
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.

If lwork = -1, then lwork is global input and a workspace query is

Output Parameters

a Contains the local pieces of the m-by-n distributed matrix Q.

work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?ormqr
Multiplies a general matrix by the orthogonal matrix Q
of the QR factorization formed by p?geqrf.

Syntax
void psormqr (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *tau , float *c , MKL_INT *ic ,
MKL_INT *jc , MKL_INT *descc , float *work , MKL_INT *lwork , MKL_INT *info );
void pdormqr (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , double
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *tau , double *c , MKL_INT
*ic , MKL_INT *jc , MKL_INT *descc , double *work , MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?ormqrfunction overwrites the general real m-by-n distributed matrix sub (C) = C(iс:iс+m-1,jс:jс
+n-1) with

side ='L' side ='R'

trans = 'N': Q*sub(C) sub(C)*Q
trans = 'T': QT*sub(C) sub(C)*QT

where Q is a real orthogonal distributed matrix defined as the product of k elementary reflectors
Q = H(1) H(2)... H(k)

as returned by p?geqrf. Q is of order m if side = 'L' and of order n if side = 'R'.

1388
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

side (global)
='L':Q or QT is applied from the left.
='R':Q or QT is applied from the right.

trans (global)
='N', no transpose, Q is applied.
='T', transpose, QT is applied.

m (global) The number of rows in the distributed matrix sub(C) (m≥0).

n (global) The number of columns in the distributed matrix sub(C) (n≥0).

k (global) The number of elementary reflectors whose product defines the

matrix Q. Constraints:
If side = 'L', m≥k≥0

If side = 'R', n≥k≥0.

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1). The
j-th column of the matrix stored in amust contain the vector that defines
the elementary reflector H(j), ja≤j≤ja+k-1, as returned by p?geqrf in the
k columns of its distributed matrix argument A(ia:*, ja:ja+k-1). A(ia:*,
ja:ja+k-1) is modified by the function but restored on exit.
If side = 'L', lld_a ≥ max(1, LOCr(ia+m-1))

If side = 'R', lld_a ≥ max(1, LOCr(ia+n-1))

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

tau (local)
Array of size LOCc(ja+k-1).

Contains the scalar factor tau[j] of elementary reflectors H(j+1) as

returned by p?geqrf (0 ≤ j < LOCc(ja+k-1)). tau is tied to the
distributed matrix A.

c (local)
Pointer into the local memory to an array of local size lld_c*LOCc(jc+n-1).

Contains the local pieces of the distributed matrix sub(C) to be factored.

ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the matrix sub(C), respectively.

descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.

work (local)

1389
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Workspace array of size of lwork.

lwork (local or global) size of work, must be at least:

if side = 'L',

lwork≥max((nb_a(nb_a-1))/2, (nqc0+mpc0)nb_a) + nb_a*nb_a

else if side = 'R',

lwork≥max((nb_a*(nb_a-1))/2, (nqc0+max(npa0+numroc(numroc(n
+icoffc, nb_a, 0, 0, NPCOL), nb_a, 0, 0, lcmq), mpc0))*nb_a)
+ nb_a*nb_a
end if
where
lcmq = lcm/NPCOL with lcm = ilcm(NPROW, NPCOL),
iroffa = mod(ia-1, mb_a),
icoffa = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),
npa0= numroc(n+iroffa, mb_a, MYROW, iarow, NPROW),
iroffc = mod(ic-1, mb_c),
icoffc = mod(jc-1, nb_c),
icrow = indxg2p(ic, mb_c, MYROW, rsrc_c, NPROW),
iccol = indxg2p(jc, nb_c, MYCOL, csrc_c, NPCOL),
mpc0= numroc(m+iroffc, mb_c, MYROW, icrow, NPROW),
nqc0= numroc(n+icoffc, nb_c, MYCOL, iccol, NPCOL),
ilcm, indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL,
NPROW and NPCOL can be determined by calling the function
blacs_gridinfo.
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.

Output Parameters

c Overwritten by the product Qsub(C), or QTsub(C), or sub(C)*QT, or

sub(C)*Q.

work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.

1390
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?unmqr
Multiplies a complex matrix by the unitary matrix Q of
the QR factorization formed by p?geqrf.

Syntax
void pcunmqr (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau ,
MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work ,
MKL_INT *lwork , MKL_INT *info );
void pzunmqr (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau ,
MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex16 *work ,
MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
This function overwrites the general complex m-by-n distributed matrix sub (C) = C(iс:iс+m-1,jс:jс+n-1)
with

side ='L' side ='R'

trans = 'N': Q*sub(C) sub(C)*Q
trans = 'T': QH*sub(C) sub(C)*QH

where Q is a complex unitary distributed matrix defined as the product of k elementary reflectors
Q = H(1) H(2)... H(k) as returned by p?geqrf. Q is of order m if side = 'L' and of order n if side ='R'.

Input Parameters

side (global)
='L': Q or QH is applied from the left.
='R': Q or QH is applied from the right.

trans (global)
='N', no transpose, Q is applied.
='C', conjugate transpose, QH is applied.

m (global) The number of rows in the distributed matrix sub(C) (m≥0).

n (global) The number of columns in the distributed matrix sub(C) (n≥0).

k (global) The number of elementary reflectors whose product defines the

matrix Q. Constraints:
If side = 'L', m≥k≥0

If side = 'R', n≥k≥0.

1391
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+k-1). The
j-th column of the matrix stored in amust contain the vector that defines
the elementary reflector H(j), ja≤j≤ja+k-1, as returned by p?geqrf in the
k columns of its distributed matrix argument A(ia:*, ja:ja+k-1). A(ia:*,
ja:ja+k-1) is modified by the function but restored on exit.
If side = 'L', lld_a ≥ max(1, LOCr(ia+m-1))

If side = 'R', lld_a ≥ max(1, LOCr(ia+n-1))

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

tau (local)
Array of size LOCc(ja+k-1).

Contains the scalar factor tau[j] of elementary reflectors H(j+1) as

returned by p?geqrf (0 ≤ j < LOCc(ja+k-1)). tau is tied to the
distributed matrix A.

c (local)
Pointer into the local memory to an array of local size lld_c*LOCc(jc+n-1).

Contains the local pieces of the distributed matrix sub(C) to be factored.

ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the submatrix C, respectively.

descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.

work (local)
Workspace array of size of lwork.

lwork (local or global) size of work, must be at least:

If side = 'L',

lwork≥max((nb_a(nb_a-1))/2, (nqc0 + mpc0)nb_a) + nb_a*nb_a

else if side = 'R',

lwork≥max((nb_a*(nb_a-1))/2, (nqc0 + max(npa0 +

numroc(numroc(n+icoffc, nb_a, 0, 0, NPCOL), nb_a, 0, 0,
lcmq), mpc0))*nb_a) + nb_a*nb_a
end if
where
lcmq = lcm/NPCOL with lcm = ilcm (NPROW, NPCOL),
iroffa = mod(ia-1, mb_a),
icoffa = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),

1392
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
npa0 = numroc(n+iroffa, mb_a, MYROW, iarow, NPROW),
iroffc = mod(ic-1, mb_c),
icoffc = mod(jc-1, nb_c),
icrow = indxg2p(ic, mb_c, MYROW, rsrc_c, NPROW),
iccol = indxg2p(jc, nb_c, MYCOL, csrc_c, NPCOL),
mpc0 = numroc(m+iroffc, mb_c, MYROW, icrow, NPROW),
nqc0 = numroc(n+icoffc, nb_c, MYCOL, iccol, NPCOL),
ilcm, indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL,
NPROW and NPCOL can be determined by calling the function
blacs_gridinfo.
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.

Output Parameters

c Overwritten by the product Qsub(C), or QHsub(C), or sub(C)*QH, or

sub(C)*Q .

work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?gelqf
Computes the LQ factorization of a general
rectangular matrix.

Syntax
void psgelqf (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdgelqf (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
void pcgelqf (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT
*info );
void pzgelqf (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT
*info );

1393
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl_scalapack.h

Description
The p?gelqf function computes the LQ factorization of a real/complex distributed m-by-n matrix sub(A)=
A(ia:ia+m-1,ja:ja+n-1) = L*Q.

Input Parameters

m (global) The number of rows in the distributed submatrix sub(A) (m≥ 0).

n (global) The number of columns in the distributed submatrix sub(A) (n≥

0).

a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).

Contains the local pieces of the distributed matrix sub(A) to be factored.

ia, ja (global) The row and column indices in the global array A indicating the first
row and the first column of the submatrix A(ia:ia+m-1,ja:ja+n-1),
respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

work (local)
Workspace array of size of lwork.

lwork (local or global) size of work, must be at least lwork≥mb_a*(mp0 + nq0 +

mb_a), where
iroff = mod(ia-1, mb_a),
icoff = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),
iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),
mp0 = numroc(m+iroff, mb_a, MYROW, iarow, NPROW),
nq0 = numroc(n+icoff, nb_a, MYCOL, iacol, NPCOL)
indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.

NOTE
mod(x,y) is the integer remainder of x/y.

If lwork = -1, then lwork is global input and a workspace query is

1394
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

a The elements on and below the diagonal of sub(A) contain the m-by-
min(m,n) lower trapezoidal matrix L (L is lower trapezoidal if m ≤ n); the
elements above the diagonal, with the array tau, represent the orthogonal/
unitary matrix Q as a product of elementary reflectors (see Application
Notes below).

tau (local)
Array of size LOCr(ia+min(m, n)-1).

Contains the scalar factors of elementary reflectors. tau is tied to the

distributed matrix A.

work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.

Application Notes
The matrix Q is represented as a product of elementary reflectors
Q = H(ia+k-1)*H(ia+k-2)*...*H(ia),

where k = min(m,n)

Each H(i) has the form

H(i) = I - tau*v*v'
where tau is a real/complex scalar, and v is a real/complex vector with v(1:i-1) = 0 and v(i) = 1; v(i+1:n) is
stored on exit in A(ia+i-1,ja+i:ja+n-1), and tau in tau[ia+i-2].

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?orglq
Generates the real orthogonal matrix Q of the LQ
factorization formed by p?gelqf.

Syntax
void psorglq (MKL_INT *m , MKL_INT *n , MKL_INT *k , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdorglq (MKL_INT *m , MKL_INT *n , MKL_INT *k , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

1395
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Description
The p?orglq function generates the whole or part of m-by-n real distributed matrix Q denoting A(ia:ia
+m-1,ja:ja+n-1) with orthonormal rows, which is defined as the first m rows of a product of k elementary
reflectors of order n

Q = H(k)... H(2)* H(1)

as returned by p?gelqf.

Input Parameters

m (global) The number of rows in the matrix sub(Q); (m≥0).

n (global) The number of columns in the matrix sub(Q) (n≥m≥0).

k (global) The number of elementary reflectors whose product defines the

matrix Q(m≥k≥0).

a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).
On entry, the i-th row of the matrix stored in amust contain the vector that
defines the elementary reflector H(i), ia≤i≤ia+k-1, as returned by
p?gelqf in the k rows of its distributed matrix argument A(ia:ia+k-1,
ja:*).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A(ia:ia+m-1,ja:ja+n-1),
respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

work (local)
Workspace array of size of lwork.

lwork (local or global) size of work, must be at least

NOTE
mod(x,y) is the integer remainder of x/y.

indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.

1396
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.

Output Parameters

a Contains the local pieces of the m-by-n distributed matrix Q to be factored.

tau (local)
Array of size LOCr(ia+k-1).

Contains the scalar factors tau[j] of elementary reflectors H(j+1), 0 ≤ j <

LOCr(ia+k-1). tau is tied to the distributed matrix A.

work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?unglq
Generates the unitary matrix Q of the LQ factorization
formed by p?gelqf.

Syntax
void pcunglq (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
void pzunglq (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
This function generates the whole or part of m-by-n complex distributed matrix Q denoting A(ia:ia
+m-1,ja:ja+n-1) with orthonormal rows, which is defined as the first m rows of a product of k elementary
reflectors of order n

Q = (H(k))H...(H(2))H(H(1))H as returned by p?gelqf.

Input Parameters

m (global) The number of rows in the matrix sub(Q) (m≥0).

1397
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

n (global) The number of columns in the matrix sub(Q) (n≥m≥0).

k (global) The number of elementary reflectors whose product defines the

matrix Q(m≥k≥0).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A(ia:ia+m-1,ja:ja+n-1),
respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

tau (local)
Array of size LOCr(ia+k-1).

Contains the scalar factors tau[j] of elementary reflectors H(j+1), 0 ≤ j <

LOCr(ia+k-1). tau is tied to the distributed matrix A.

work (local)
Workspace array of size of lwork.

lwork (local or global) size of work, must be at least

lwork≥mb_a*(mpa0+nqa0+mb_a), where
iroffa = mod(ia-1, mb_a),
icoffa = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),
iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),
mpa0 = numroc(m+iroffa, mb_a, MYROW, iarow, NPROW),
nqa0 = numroc(n+icoffa, nb_a, MYCOL, iacol, NPCOL)
indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.

NOTE
mod(x,y) is the integer remainder of x/y.

If lwork = -1, then lwork is global input and a workspace query is

1398
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

a Contains the local pieces of the m-by-n distributed matrix Q to be factored.

work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?ormlq
Multiplies a general matrix by the orthogonal matrix Q
of the LQ factorization formed by p?gelqf.

Syntax
void psormlq (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *tau , float *c , MKL_INT *ic ,
MKL_INT *jc , MKL_INT *descc , float *work , MKL_INT *lwork , MKL_INT *info );
void pdormlq (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , double
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *tau , double *c , MKL_INT
*ic , MKL_INT *jc , MKL_INT *descc , double *work , MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?ormlq function overwrites the general real m-by-n distributed matrix sub(C) = C(iс:iс+m-1,jс:jс
+n-1) with

side ='L' side ='R'

trans = 'N': Q*sub(C) sub(C)*Q
trans = 'T': QT*sub(C) sub(C)*QT

where Q is a real orthogonal distributed matrix defined as the product of k elementary reflectors
Q = H(k)...H(2) H(1)

as returned by p?gelqf. Q is of order m if side = 'L' and of order n if side = 'R'.

Input Parameters

side (global)
='L': Q or QT is applied from the left.
='R': Q or QT is applied from the right.

trans (global)

1399
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

='N', no transpose, Q is applied.

='T', transpose, QT is applied.

m (global) The number of rows in the distributed matrix sub(C) (m≥0).

n (global) The number of columns in the distributed matrix sub(C) (n≥0).

k (global) The number of elementary reflectors whose product defines the

matrix Q. Constraints:
If side = 'L', m≥k≥0

If side = 'R', n≥k≥0.

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+m-1), if
side = 'L' and lld_a*LOCc(ja+n-1), if side = 'R'. The i-th row of the
matrix stored in amust contain the vector that defines the elementary
reflector H(i), ia≤i≤ia+k-1, as returned by p?gelqf in the k rows of its
distributed matrix argument A(ia:ia+k-1, ja:*).

A(ia:ia+k-1, ja:*) is modified by the function but restored on exit.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

tau (local)
Array of size LOCc(ja+k-1).

Contains the scalar factor tau[j] of elementary reflectors H(j+1) as

returned by p?gelqf (0 ≤ j < LOCc(ja+k-1)). tau is tied to the
distributed matrix A.

c (local)
Pointer into the local memory to an array of local size lld_c*LOCc(jc+n-1).

Contains the local pieces of the distributed matrix sub(C) to be factored.

ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the submatrix C, respectively.

descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.

work (local)
Workspace array of size of lwork.

lwork (local or global) size of the array work; must be at least:

If side = 'L',

lwork≥max((mb_a*(mb_a-1))/2, (mpc0+maxmqa0)+ numroc(numroc(m

+ iroffc, mb_a, 0, 0, NPROW), mb_a, 0, 0, lcmp), nqc0))*
mb_a) + mb_a*mb_a
else if side = 'R',

1400
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lwork≥max((mb_a* (mb_a-1))/2, (mpc0+nqc0)*mb_a + mb_a*mb_a
end if
where
lcmp = lcm/NPROW with lcm = ilcm (NPROW, NPCOL),
iroffa = mod(ia-1, mb_a),
icoffa = mod(ja-1, nb_a),
iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),
mqa0 = numroc(m+icoffa, nb_a, MYCOL, iacol, NPCOL),
iroffc = mod(ic-1, mb_c),
icoffc = mod(jc-1, nb_c),
icrow = indxg2p(ic, mb_c, MYROW, rsrc_c, NPROW),
iccol = indxg2p(jc, nb_c, MYCOL, csrc_c, NPCOL),
mpc0 = numroc(m+iroffc, mb_c, MYROW, icrow, NPROW),
nqc0 = numroc(n+icoffc, nb_c, MYCOL, iccol, NPCOL),

NOTE
mod(x,y) is the integer remainder of x/y.

ilcm, indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL,
NPROW and NPCOL can be determined by calling the function
blacs_gridinfo.
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.

Output Parameters

c Overwritten by the product Qsub(C), or Q' sub (C), or sub(C)*Q', or

sub(C)*Q

work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

1401
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

p?unmlq
Multiplies a general matrix by the unitary matrix Q of
the LQ factorization formed by p?gelqf.

Syntax
void pcunmlq (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau ,
MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work ,
MKL_INT *lwork , MKL_INT *info );
void pzunmlq (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau ,
MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex16 *work ,
MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
This function overwrites the general complex m-by-n distributed matrix sub(C) = C(iс:iс+m-1,jс:jс+n-1)
with

side ='L' side ='R'

trans = 'N': Q*sub(C) sub(C)*Q
trans = 'T': QH*sub(C) sub(C)*QH

where Q is a complex unitary distributed matrix defined as the product of k elementary reflectors
Q = H(k)' ... H(2)' H(1)'

as returned by p?gelqf. Q is of order m if side = 'L' and of order n if side = 'R'.

Input Parameters

side (global)
='L': Q or QH is applied from the left.
='R': Q or QH is applied from the right.

trans (global)
='N', no transpose, Q is applied.
='C', conjugate transpose, QH is applied.

m (global) The number of rows in the distributed matrix sub(C) (m≥0).

n (global) The number of columns in the distributed matrix sub(C)(n≥0).

k (global) The number of elementary reflectors whose product defines the

matrix Q. Constraints:
If side = 'L', m≥k≥0

If side = 'R', n≥k≥0.

a (local)

1402
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Pointer into the local memory to an array of size lld_a*LOCc(ja+m-1), if
side = 'L' and lld_a*LOCc(ja+n-1), if side = 'R', where lld_a≥
max(1, LOCr (ia+k-1)). The i-th column of the matrix stored in amust
contain the vector that defines the elementary reflector H(i), ia≤i≤ia+k-1,
as returned by p?gelqf in the k rows of its distributed matrix argument
A( ia:ia+k-1, ja:*). A( ia:ia+k-1, ja:*) is modified by the function but
restored on exit.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

tau (local)
Array of size LOCc(ia+k-1).

Contains the scalar factor tau[j] of elementary reflectors H(j+1) as

returned by p?gelqf (0 ≤ j < LOCc(ia+k-1)). tau is tied to the
distributed matrix A.

c (local)
Pointer into the local memory to an array of local size lld_c*LOCc(jc+n-1).

Contains the local pieces of the distributed matrix sub(C) to be factored.

ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the submatrix C, respectively.

descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.

work (local)
Workspace array of size of lwork.

lwork (local or global) size of the array work; must be at least:

If side = 'L',

lwork≥max((mb_a*(mb_a-1))/2, (mpc0 + maxmqa0)+

numroc(numroc(m + iroffc, mb_a, 0, 0, NPROW), mb_a, 0, 0,
lcmp), nqc0))*mb_a) + mb_a*mb_a
else if side = 'R',

lwork≥max((mb_a* (mb_a-1))/2, (mpc0 + nqc0)mb_a + mb_amb_a

end if
where
lcmp = lcm/NPROW with lcm = ilcm (NPROW, NPCOL),
iroffa = mod(ia-1, mb_a),
icoffa = mod(ja-1, nb_a),
iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),
mqa0 = numroc(m + icoffa, nb_a, MYCOL, iacol, NPCOL),
iroffc = mod(ic-1, mb_c),

1403
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

icoffc = mod(jc-1, nb_c),

icrow = indxg2p(ic, mb_c, MYROW, rsrc_c, NPROW),
iccol = indxg2p(jc, nb_c, MYCOL, csrc_c, NPCOL),
mpc0 = numroc(m+iroffc, mb_c, MYROW, icrow, NPROW),
nqc0 = numroc(n+icoffc, nb_c, MYCOL, iccol, NPCOL),

NOTE
mod(x,y) is the integer remainder of x/y.

Output Parameters

c Overwritten by the product Qsub(C), or Q'sub (C), or sub(C)*Q', or

sub(C)*Q

work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?geqlf
Computes the QL factorization of a general matrix.

Syntax
void psgeqlf (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdgeqlf (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
void pcgeqlf (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT
*info );
void pzgeqlf (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT
*info );

1404
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl_scalapack.h

Description
The p?geqlf function forms the QL factorization of a real/complex distributed m-by-n matrix sub(A)=
A(ia:ia+m-1, ja:ja+n-1) = Q*L.

Input Parameters

m (global) The number of rows in the matrix sub(Q); (m≥ 0).

n (global) The number of columns in the matrix sub(Q) (n≥ 0).

a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).
Contains the local pieces of the distributed matrix sub(A) to be factored.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A(ia:ia+m-1, ja:ja+n-1),
respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

work (local)
Workspace array of size of lwork.

lwork (local or global) size of work, must be at least lwork≥nb_a*(mp0 + nq0 +

nb_a), where
iroff = mod(ia-1, mb_a),
icoff = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),
iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),
mp0 = numroc(m+iroff, mb_a, MYROW, iarow, NPROW),
nq0 = numroc(n+icoff, nb_a, MYCOL, iacol, NPCOL)

NOTE
mod(x,y) is the integer remainder of x/y.

numroc and indxg2p are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.

If lwork = -1, then lwork is global input and a workspace query is

1405
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

a On exit, if m≥n, the lower triangle of the distributed submatrix A(ia+m-n:ia

+m-1, ja:ja+n-1) contains the n-by-n lower triangular matrix L; if m≤n, the
elements on and below the (n - m)-th superdiagonal contain the m-by-n
lower trapezoidal matrix L; the remaining elements, with the array tau,
represent the orthogonal/unitary matrix Q as a product of elementary
reflectors (see Application Notes below).

tau (local)
Array of size LOCc(ja+n-1).

Contains the scalar factors of elementary reflectors. tau is tied to the

distributed matrix A.

work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.

Application Notes
The matrix Q is represented as a product of elementary reflectors
Q = H(ja+k-1)*...*H(ja+1)*H(ja)

where k = min(m,n)

Each H(i) has the form

H(i) = I - tau*v*v'
where tau is a real/complex scalar, and v is a real/complex vector with v(m-k+i+1:m) = 0 and v(m-k+i) = 1;
v(1:m-k+i-1) is stored on exit in A(ia:ia+m-k+i-2, ja+n-k+i-1), and tau in tau[ja+n-k+i-2].

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?orgql
Generates the orthogonal matrix Q of the QL
factorization formed by p?geqlf.

Syntax
void psorgql (MKL_INT *m , MKL_INT *n , MKL_INT *k , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdorgql (MKL_INT *m , MKL_INT *n , MKL_INT *k , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

1406
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
The p?orgql function generates the whole or part of m-by-n real distributed matrix Q denoting A(ia:ia
+m-1,ja:ja+n-1) with orthonormal rows, which is defined as the first m rows of a product of k elementary
reflectors of order n

Q = H(k)*...*H(2)*H(1)

as returned by p?geqlf.

Input Parameters

m (global) The number of rows in the matrix sub(Q), (m≥0).

n (global) The number of columns in the matrix sub(Q),(m≥n≥0).

k (global) The number of elementary reflectors whose product defines the

matrix Q(n≥k≥0).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A(ia:ia+m-1,ja:ja+n-1),
respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

tau (local)
Array of size LOCc(ja+n-1).

Contains the scalar factors tau[j] of elementary reflectors H(j+1), 0 ≤ j <

LOCr(ia+n-1). tau is tied to the distributed matrix A.

work (local)
Workspace array of size of lwork.

lwork (local or global) size of work, must be at least

lwork≥nb_a*(nqa0+mpa0+nb_a), where
iroffa = mod(ia-1, mb_a),
icoffa = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),
iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),
mpa0 = numroc(m+iroffa, mb_a, MYROW, iarow, NPROW),
nqa0 = numroc(n+icoffa, nb_a, MYCOL, iacol, NPCOL)

1407
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

NOTE
mod(x,y) is the integer remainder of x/y.

indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.

If lwork = -1, then lwork is global input and a workspace query is

Output Parameters

a Contains the local pieces of the m-by-n distributed matrix Q to be factored.

work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?ungql
Generates the unitary matrix Q of the QL factorization
formed by p?geqlf.

Syntax
void pcungql (const MKL_INT *m , const MKL_INT *n , const MKL_INT *k , MKL_Complex8
*a , const MKL_INT *ia , const MKL_INT *ja , const MKL_INT *desca , const MKL_Complex8
*tau , MKL_Complex8 *work , const MKL_INT *lwork , MKL_INT *info );
void pzungql (const MKL_INT *m , const MKL_INT *n , const MKL_INT *k , MKL_Complex16
*a , const MKL_INT *ia , const MKL_INT *ja , const MKL_INT *desca , const MKL_Complex16
*tau , MKL_Complex16 *work , const MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
This function generates the whole or part of m-by-n complex distributed matrix Q denoting A(ia:ia
+m-1,ja:ja+n-1) with orthonormal rows, which is defined as the first n columns of a product of k
elementary reflectors of order m

Q = (H(k))H...(H(2))H(H(1))H as returned by p?geqlf.

1408
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

m (global) The number of rows in the matrix sub(Q) (m≥0).

n (global) The number of columns in the matrix sub(Q) (m≥n≥0).

k (global) The number of elementary reflectors whose product defines the

matrix Q(n≥k≥0).

a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja
+n-1). On entry, the j-th columnof the matrix stored in a must
contain the vector that defines the elementary reflector H(j), ja+n-
k≤ j≤ ja+n-1, as returned by p?geqlf in the k columns of its
distributed matrix argument A(ia:*, ja+n-k: ja+n-1).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A(ia:ia+m-1,ja:ja+n-1),
respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

tau (local)
Array of size LOCr(ia+n-1).

Contains the scalar factors tau[j] of elementary reflectors H(j+1), 0 ≤ j <

LOCr(ia+n-1). tau is tied to the distributed matrix A.

work (local)
Workspace array of size of lwork.

lwork (local or global) size of work, must be at least lwork≥nb_a*(nqa0 + mpa0

+ nb_a), where
iroffa = mod(ia-1, mb_a),
icoffa = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),
iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),
mpa0 = numroc(m+iroffa, mb_a, MYROW, iarow, NPROW),
nqa0 = numroc(n+icoffa, nb_a, MYCOL, iacol, NPCOL)

indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.

If lwork = -1, then lwork is global input and a workspace query is

Output Parameters

a Contains the local pieces of the m-by-n distributed matrix Q to be factored.

1409
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?ormql
Multiplies a general matrix by the orthogonal matrix Q
of the QL factorization formed by p?geqlf.

Syntax
void psormql (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *tau , float *c , MKL_INT *ic ,
MKL_INT *jc , MKL_INT *descc , float *work , MKL_INT *lwork , MKL_INT *info );
void pdormql (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , double
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *tau , double *c , MKL_INT
*ic , MKL_INT *jc , MKL_INT *descc , double *work , MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?ormqlfunction overwrites the general real m-by-n distributed matrix sub(C) = C(iс:iс+m-1,jс:jс
+n-1) with

side ='L' side ='R'

trans = 'N': Q*sub(C) sub(C)*Q
trans = 'T': QT*sub(C) sub(C)*QT

where Q is a real orthogonal distributed matrix defined as the product of k elementary reflectors

Q = H(k)' ... H(2)' H(1)'

as returned by p?geqlf. Q is of order m if side = 'L' and of order n if side = 'R'.

Input Parameters

side (global)
='L': Q or QT is applied from the left.
='R': Q or QT is applied from the right.

trans (global)
='N', no transpose, Q is applied.
='T', transpose, QT is applied.

1410
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
m (global) The number of rows in the distributed matrix sub(C), (m≥0).

n (global) The number of columns in the distributed matrix sub(C), (n≥0).

k (global) The number of elementary reflectors whose product defines the

matrix Q. Constraints:
If side = 'L', m≥k≥0

If side = 'R', n≥k≥0.

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+k-1). The
j-th column of the matrix stored in amust contain the vector that defines
the elementary reflector H(j), ja≤j≤ja+k-1, as returned by p?gelqf in the
k columns of its distributed matrix argument A(ia:*, ja:ja+k-1). A(ia:*,
ja:ja+k-1) is modified by the function but restored on exit.
If side = 'L',lld_a ≥ max(1, LOCr(ia+m-1)),

If side = 'R', lld_a ≥ max(1, LOCr(ia+n-1)).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

tau (local)
Array of size LOCc(ja+n-1).

Contains the scalar factor tau[j] of elementary reflectors H(j+1) as

returned by p?geqlf (0 ≤ j < LOCc(ja+k-1)). tau is tied to the
distributed matrix A.

c (local)
Pointer into the local memory to an array of local size lld_c*LOCc(jc+n-1).

Contains the local pieces of the distributed matrix sub(C) to be factored.

ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the submatrix C, respectively.

descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.

work (local)
Workspace array of size of lwork.

lwork (local or global) dimension of work, must be at least:

If side = 'L',

lwork≥max((nb_a(nb_a-1))/2, (nqc0+mpc0)nb_a + nb_a*nb_a

else if side ='R',

lwork≥max((nb_a*(nb_a-1))/2, (nqc0+max(npa0 +
numroc(numroc(n+icoffc, nb_a, 0, 0, NPCOL), nb_a, 0, 0,
lcmq), mpc0))*nb_a) + nb_a*nb_a

1411
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

end if
where
lcmq = lcm/NPCOL with lcm = ilcm (NPROW, NPCOL),
iroffa = mod(ia-1, mb_a),
icoffa = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),
npa0= numroc(n + iroffa, mb_a, MYROW, iarow, NPROW),
iroffc = mod(ic-1, mb_c),
icoffc = mod(jc-1, nb_c),
icrow = indxg2p(ic, mb_c, MYROW, rsrc_c, NPROW),
iccol = indxg2p(jc, nb_c, MYCOL, csrc_c, NPCOL),
mpc0 = numroc(m+iroffc, mb_c, MYROW, icrow, NPROW),
nqc0 = numroc(n+icoffc, nb_c, MYCOL, iccol, NPCOL),

NOTE
mod(x,y) is the integer remainder of x/y.

Output Parameters

c Overwritten by the product Q* sub(C), or Q'sub (C), or sub(C) Q', or

sub(C)* Q

work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?unmql
Multiplies a general matrix by the unitary matrix Q of
the QL factorization formed by p?geqlf.

1412
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void pcunmql (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau ,
MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work ,
MKL_INT *lwork , MKL_INT *info );
void pzunmql (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau ,
MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex16 *work ,
MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
This function overwrites the general complex m-by-n distributed matrix sub(C) = C(iс:iс+m-1,jс:jс+n-1)
with

side ='L' side ='R'

trans = 'N': Q*sub(C) sub(C)*Q
trans = 'C': QH*sub(C) sub(C)*QH

where Q is a complex unitary distributed matrix defined as the product of k elementary reflectors
Q = H(k)' ... H(2)' H(1)'

as returned by p?geqlf. Q is of order m if side = 'L' and of order n if side = 'R'.

Input Parameters

side (global)
='L': Q or QH is applied from the left.
='R': Q or QH is applied from the right.

trans (global)
='N', no transpose, Q is applied.
='C', conjugate transpose, QH is applied.

m (global) The number of rows in the distributed matrix sub(C) (m≥0).

n (global) The number of columns in the distributed matrix sub(C)(n≥0).

k (global) The number of elementary reflectors whose product defines the

matrix Q. Constraints:
If side = 'L', m≥k≥0

If side = 'R', n≥k≥0.

a (local)

1413
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Pointer into the local memory to an array of size lld_a*LOCc(ja+k-1). The

j-th column of the matrix stored in amust contain the vector that defines
the elementary reflector H(j), ja≤j≤ja+k-1, as returned by p?geqlf in the
k columns of its distributed matrix argument A(ia:*, ja:ja+k-1). A(ia:*,
ja:ja+k-1) is modified by the function but restored on exit.
If side = 'L',lld_a ≥ max(1, LOCr(ia+m-1)),

If side = 'R', lld_a ≥ max(1, LOCr(ia+n-1)).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

tau (local)
Array of size LOCc(ia+n-1).

Contains the scalar factor tau[j] of elementary reflectors H(j+1) as

returned by p?geqlf (0 ≤ j < LOCc(ia+n-1)). tau is tied to the
distributed matrix A.

c (local)
Pointer into the local memory to an array of local size lld_c*LOCc(jc+n-1).

Contains the local pieces of the distributed matrix sub(C) to be factored.

ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the submatrix C, respectively.

descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.

work (local)
Workspace array of size of lwork.

lwork (local or global) size of work, must be at least:

If side = 'L',

lwork≥max((nb_a* (nb_a-1))/2, (nqc0+mpc0)nb_a + nb_anb_a

else if side ='R',

lwork≥max((nb_a*(nb_a-1))/2, (nqc0+maxnpa0)+ numroc(numroc(n

+icoffc, nb_a, 0, 0, NPCOL), nb_a, 0, 0, lcmq), mpc0))*nb_a)
+ nb_a*nb_a
end if
where
lcmp = lcm/NPCOL with lcm = ilcm (NPROW, NPCOL),
iroffa = mod(ia-1, mb_a),
icoffa = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),
npa0 = numroc (n + iroffa, mb_a, MYROW, iarow, NPROW),

1414
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
iroffc = mod(ic-1, mb_c),
icoffc = mod(jc-1, nb_c),
icrow = indxg2p(ic, mb_c, MYROW, rsrc_c, NPROW),
iccol = indxg2p(jc, nb_c, MYCOL, csrc_c, NPCOL),
mpc0 = numroc(m+iroffc, mb_c, MYROW, icrow, NPROW),
nqc0 = numroc(n+icoffc, nb_c, MYCOL, iccol, NPCOL),

NOTE
mod(x,y) is the integer remainder of x/y.

ilcm, indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL,
NPROW and NPCOL can be determined by calling the function
blacs_gridinfo.

NOTE
mod(x,y) is the integer remainder of x/y.

If lwork = -1, then lwork is global input and a workspace query is

Output Parameters

c Overwritten by the product Q* sub(C), or Q' sub (C), or sub(C)* Q', or

sub(C)* Q

work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?gerqf
Computes the RQ factorization of a general
rectangular matrix.

Syntax
void psgerqf (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdgerqf (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );

1415
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

void pcgerqf (MKL_INT m , MKL_INT n , MKL_Complex8 a , MKL_INT ia , MKL_INT *ja ,

MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT
*info );
void pzgerqf (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT
*info );

Include Files
• mkl_scalapack.h

Description
The p?gerqf function forms the QR factorization of a general m-by-n distributed matrix sub(A)= A(ia:ia
+m-1, ja:ja+n-1) as

A= R*Q

Input Parameters

m (global) The number of rows in the distributed matrix sub(A); (m≥0).

n (global) The number of columns in the distributed matrix sub(A); (n≥0).

a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).

Contains the local pieces of the distributed matrix sub(A) to be factored.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A(ia:ia+m-1, ja:ja+n-1),
respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A

work (local).
Workspace array of size lwork.

lwork (local or global) size of work, must be at least

lwork≥mb_a*(mp0+nq0+mb_a), where
iroff = mod(ia-1, mb_a),
icoff = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),
iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),
mp0 = numroc(m+iroff, mb_a, MYROW, iarow, NPROW),

NOTE
mod(x,y) is the integer remainder of x/y.

nq0 = numroc(n+icoff, nb_a, MYCOL, iacol, NPCOL) and numroc,

indxg2p are ScaLAPACK tool functions; MYROW, MYCOL, NPROW and NPCOL
can be determined by calling the function blacs_gridinfo.

1416
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.

Output Parameters

a On exit, if m≤n, the upper triangle of A(ia:ia+m-1, ja:ja+n-1) contains the

m-by-m upper triangular matrix R; if m≥n, the elements on and above the (m
- n)-th subdiagonal contain the m-by-n upper trapezoidal matrix R; the
remaining elements, with the array tau, represent the orthogonal/unitary
matrix Q as a product of elementary reflectors (see Application Notes
below).

tau (local)
Array of size LOCr(ia+m-1).

Contains the scalar factor of elementary reflectors. tau is tied to the

distributed matrix A.

work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.

Application Notes
The matrix Q is represented as a product of elementary reflectors
Q = H(ia)*H(ia+1)*...*H(ia+k-1),

where k = min(m,n).

Each H(i) has the form

H(i) = I - tau*v*v'
where tau is a real/complex scalar, and v is a real/complex vector with v(n-k+i+1:n) = 0 and v(n-k+i) = 1;
v(1:n-k+i-1) is stored on exit in A(ia+m-k+i-1,ja:ja+n-k+i-2), and tau in tau[ia+m-k+i-2].

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?orgrq
Generates the orthogonal matrix Q of the RQ
factorization formed by p?gerqf.

Syntax
void psorgrq (MKL_INT *m , MKL_INT *n , MKL_INT *k , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdorgrq (MKL_INT *m , MKL_INT *n , MKL_INT *k , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );

1417
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl_scalapack.h

Description
The p?orgrqfunction generates the whole or part of m-by-n real distributed matrix Q denoting A(ia:ia
+m-1,ja:ja+n-1) with orthonormal rows that is defined as the last m rows of a product of k elementary
reflectors of order n

Q= H(1)*H(2)*...*H(k)

as returned by p?gerqf.

Input Parameters

m (global) The number of rows in the matrix sub(Q), (m≥0).

n (global) The number of columns in the matrix sub(Q), (n≥m≥0).

k (global) The number of elementary reflectors whose product defines the

matrix Q(m≥k≥0).

a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).
The i-th row of the matrix stored in amust contain the vector that defines
the elementary reflector H(i), ia≤i≤ia+m-1, as returned by p?gerqf in the
k rows of its distributed matrix argument A(ia+m-k:ia+m-1, ja:*).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

tau (local)
Array of size LOCc(ja+k-1).

Contains the scalar factor tau[i] of elementary reflectors H(i+1) as

returned by p?gerqf, 0 ≤ i < LOCr(ja+k-1). tau is tied to the distributed
matrix A.

work (local)
Workspace array of size of lwork.

lwork (local or global) size of work, must be at least lwork≥mb_a*(mpa0 + nqa0

+ mb_a), where
iroffa = mod(ia-1, mb_a),
icoffa = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),
iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),
mpa0 = numroc(m+iroffa, mb_a, MYROW, iarow, NPROW),
nqa0 = numroc(n+icoffa, nb_a, MYCOL, iacol, NPCOL)

1418
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.

NOTE
mod(x,y) is the integer remainder of x/y.

If lwork = -1, then lwork is global input and a workspace query is

Output Parameters

a Contains the local pieces of the m-by-n distributed matrix Q.

work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?ungrq
Generates the unitary matrix Q of the RQ factorization
formed by p?gerqf.

Syntax
void pcungrq (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
void pzungrq (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
This function generates the m-by-n complex distributed matrix Q denoting A(ia:ia+m-1,ja:ja+n-1) with
orthonormal rows, which is defined as the last m rows of a product of k elementary reflectors of order n

Q = (H(1))H(H(2))H...*(H(k))H as returned by p?gerqf.

Input Parameters

m (global) The number of rows in the matrix sub(Q); (m≥0).

1419
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

n (global) The number of columns in the matrix sub(Q) (n≥m≥0).

k (global) The number of elementary reflectors whose product defines the

matrix Q(m≥k≥0).

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1). The
i-th row of the matrix stored in amust contain the vector that defines the
elementary reflector H(i), ia+m-k≤i≤ia+m-1, as returned by p?gerqf in
the k rows of its distributed matrix argument A(ia+m-k:ia+m-1, ja:*).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

tau (local)
Array of size LOCr(ia+m-1).

Contains the scalar factor tau[i] of elementary reflectors H(i+1) as

returned by p?gerqf, 0 ≤ i < LOCr(ia+m-1). tau is tied to the distributed
matrix A.

work (local)
Workspace array of size of lwork.

lwork (local or global) size of work, must be at least lwork≥mb_a*(mpa0

+nqa0+mb_a), where
iroffa = mod(ia-1, mb_a),
icoffa = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),
iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),
mpa0 = numroc(m+iroffa, mb_a, MYROW, iarow, NPROW),
nqa0 = numroc(n+icoffa, nb_a, MYCOL, iacol, NPCOL)

NOTE
mod(x,y) is the integer remainder of x/y.

indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.

If lwork = -1, then lwork is global input and a workspace query is

Output Parameters

a Contains the local pieces of the m-by-n distributed matrix Q.

1420
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?ormr3
Applies an orthogonal distributed matrix to a general
m-by-n distributed matrix.

Syntax
void psormr3 (const char* side, const char* trans, const MKL_INT* m, const MKL_INT* n,
const MKL_INT* k, const MKL_INT* l, const float* a, const MKL_INT* ia, const MKL_INT*
ja, const MKL_INT* desca, const float* tau, float* c, const MKL_INT* ic, const MKL_INT*
jc, const MKL_INT* descc, float* work, const MKL_INT* lwork, MKL_INT* info);
void pdormr3 (const char* side, const char* trans, const MKL_INT* m, const MKL_INT* n,
const MKL_INT* k, const MKL_INT* l, const double* a, const MKL_INT* ia, const MKL_INT*
ja, const MKL_INT* desca, const double* tau, double* c, const MKL_INT* ic, const
MKL_INT* jc, const MKL_INT* descc, double* work, const MKL_INT* lwork, MKL_INT* info);

Include Files
• mkl_scalapack.h

Description
p?ormr3 overwrites the general real m-by-n distributed matrix sub( C ) = C(ic:ic+m-1,jc:jc+n-1) with

side = 'L' side = 'R'

trans = 'N' Q * sub( C ) sub( C ) * Q

trans = 'T' QT * sub( C ) sub( C ) * QT

Q * sub( C )

where Q is a real orthogonal distributed matrix defined as the product of k elementary reflectors

Q = H(1) H(2) . . . H(k)

as returned by p?tzrzf. Q is of order m if side = 'L' and of order n if side = 'R'.

Input Parameters

side (global)
= 'L': apply Q or QT from the Left;
= 'R': apply Q or QT from the Right.

1421
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

trans (global)
= 'N': No transpose, apply Q;
= 'T': Transpose, apply QT.

m (global)
The number of rows to be operated on i.e the number of rows of the
distributed submatrix sub( C ). m >= 0.

n (global)
The number of columns to be operated on i.e the number of columns of the
distributed submatrix sub( C ). n >= 0.

k (global)
The number of elementary reflectors whose product defines the matrix Q.
If side = 'L', m >= k >= 0,

if side = 'R', n >= k >= 0.

l (global)
The columns of the distributed submatrix sub( A ) containing the
meaningful part of the Householder reflectors.
If side = 'L', m >= l >= 0,

if side = 'R', n >= l >= 0.

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+m-1) if
side='L', and lld_a*LOCc(ja+n-1) if side='R', where lld_a >=
MAX(1,LOCr(ia+k-1));

On entry, the i-th row must contain the vector which defines the elementary
reflector H(i), ia <= i <= ia+k-1, as returned by p?tzrzf in the k rows of
its distributed matrix argument A(ia:ia+k-1,ja:*).

A(ia:ia+k-1,ja:*) is modified by the routine but restored on exit.

ia (global)
The row index in the global array a indicating the first row of sub( A ).

ja (global)
The column index in the global array a indicating the first column of
sub( A ).

desca (global and local)

Array of size dlen_.
The array descriptor for the distributed matrix A.

tau (local)
Array, size LOCc(ia+k-1).

This array contains the scalar factors tau(i) of the elementary reflectors
H(i) as returned by p?tzrzf. tau is tied to the distributed matrix A.

1422
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
c (local)
Pointer into the local memory to an array of size lld_c*LOCc(jc+n-1) .

On entry, the local pieces of the distributed matrix sub( C ).

ic (global)
The row index in the global array c indicating the first row of sub( C ).

jc (global)
The column index in the global array c indicating the first column of
sub( C ).

descc (global and local)

Array of size dlen_.
The array descriptor for the distributed matrix C.

work (local)
Array, size (lwork)

lwork (local)
The size of the array work.

lwork is local input and must be at least

If side = 'L', lwork >= MpC0 + MAX( MAX( 1, NqC0 ), numroc( numroc( m
+IROFFC,mb_a,0,0,NPROW ),mb_a,0,0,NqC0 ) );
if side = 'R', lwork >= NqC0 + MAX( 1, MpC0 );

where LCMP = LCM / NPROW

LCM = iclm( NPROW, NPCOL ),

IROFFC = MOD( ic-1, mb_c ),

ICOFFC = MOD( jc-1, nb_c),

ICROW = indxg2p( ic, mb_c, MYROW, rsrc_c, NPROW ),

ICCOL = indxg2p( jc, nb_c, MYCOL, csrc_c, NPCOL ),

MpC0 = numroc( m+IROFFC, mb_c, MYROW, ICROW, NPROW ),

NqC0 = numroc( n+ICOFFC, nb_c, MYCOL, ICCOL, NPCOL ),

ilcm, indxg2p, and numroc are ScaLAPACK tool functions;

MYROW, MYCOL, NPROW and NPCOL can be determined by calling the
subroutine blacs_gridinfo.

If lwork = -1, then lwork is global input and a workspace query is

assumed; the routine only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.

Output Parameters

c On exit, sub( C ) is overwritten by Qsub( C ) or Q'sub( C ) or

sub( C )*Q' or sub( C )*Q.

1423
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

work On exit, work[0] returns the minimal and optimal lwork.

info (local)
= 0: successful exit
< 0: If the i-th argument is an array and the j-th entry had an illegal
value, then info = -(i*100+j), if the i-th argument is a scalar and
had an illegal value, then info = -i.

Application Notes
Alignment requirements
The distributed submatrices A(ia:*, ja:*) and C(ic:ic+m-1,jc:jc+n-1) must verify some alignment
properties, namely the following expressions should be true:
If side = 'L',

( nb_a = mb_c .AND. ICOFFA = IROFFC )

If side = 'R',

( nb_a = nb_c .AND. ICOFFA = ICOFFC .AND. IACOL = ICCOL )

p?unmr3
Applies an orthogonal distributed matrix to a general
m-by-n distributed matrix.

Syntax
void pcunmr3 (const char* side, const char* trans, const MKL_INT* m, const MKL_INT* n,
const MKL_INT* k, const MKL_INT* l, const MKL_Complex8* a, const MKL_INT* ia, const
MKL_INT* ja, const MKL_INT* desca, const MKL_Complex8* tau, MKL_Complex8* c, const
MKL_INT* ic, const MKL_INT* jc, const MKL_INT* descc, MKL_Complex8* work, const
MKL_INT* lwork, MKL_INT* info);
void pzunmr3 (const char* side, const char* trans, const MKL_INT* m, const MKL_INT* n,
const MKL_INT* k, const MKL_INT* l, const MKL_Complex16* a, const MKL_INT* ia, const
MKL_INT* ja, const MKL_INT* desca, const MKL_Complex16* tau, MKL_Complex16* c, const
MKL_INT* ic, const MKL_INT* jc, const MKL_INT* descc, MKL_Complex16* work, const
MKL_INT* lwork, MKL_INT* info);

Include Files
• mkl_scalapack.h

Description
p?unmr3 overwrites the general complex m-by-n distributed matrix sub( C ) = C(ic:ic+m-1,jc:jc+n-1) with
side = 'L' side = 'R'
trans = 'N': Q * sub( C ) sub( C ) * Q

trans = 'C': QH * sub( C ) sub( C ) * QH

where Q is a complex unitary distributed matrix defined as the product of k elementary reflectors

Q = H(1)' H(2)' . . . H(k)'

as returned by p?tzrzf. Q is of order m if side = 'L' and of order n if side = 'R'.

1424
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

side (global)
= 'L': apply Q or QH from the Left;
= 'R': apply Q or QH from the Right.

trans (global)
= 'N': No transpose, apply Q;
= 'C': Conjugate transpose, apply QH.

m (global)
The number of rows to be operated on i.e the number of rows of the
distributed submatrix sub( C ). m >= 0.

n (global)
The number of columns to be operated on i.e the number of columns of the
distributed submatrix sub( C ). n >= 0.

k (global)
The number of elementary reflectors whose product defines the matrix Q.
If side = 'L', m >= k >= 0, if side = 'R', n >= k >= 0.

l (global)
The columns of the distributed submatrix sub( A ) containing the
meaningful part of the Householder reflectors.
If side = 'L', m >= l >= 0, if side = 'R', n >= l >= 0.

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+m-1) if
side='L', and lld_a*LOCc(ja+n-1) if side='R', where lld_a >=
MAX(1,LOCr(ia+k-1));

A(ia:ia+k-1,ja:*) is modified by the routine but restored on exit.

ia (global)
The row index in the global array a indicating the first row of sub( A ).

ja (global)
The column index in the global array a indicating the first column of
sub( A ).

desca (global and local)

Array of size dlen_.
The array descriptor for the distributed matrix A.

tau (local)
Array, size LOCc(ia+k-1).

1425
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

This array contains the scalar factors tau(i) of the elementary reflectors
H(i) as returned by p?tzrzf. tau is tied to the distributed matrix A.

c (local)
Pointer into the local memory to an array of size lld_c*LOCc(jc+n-1) .

On entry, the local pieces of the distributed matrix sub( C ).

ic (global)
The row index in the global array c indicating the first row of sub( C ).

jc (global)
The column index in the global array c indicating the first column of
sub( C ).

descc (global and local)

Array of size dlen_.
The array descriptor for the distributed matrix C.

work (local)
Array, size (lwork)

On exit, work(1) returns the minimal and optimal lwork.

lwork (local or global)

The size of the array work.

lwork is local input and must be at least

If side = 'L', lwork >= MpC0 + MAX( MAX( 1, NqC0 ), numroc( numroc( m
+IROFFC,mb_a,0,0,NPROW ),mb_a,0,0,LCMP ) );
if side = 'R', lwork >= NqC0 + MAX( 1, MpC0 );

where LCMP = LCM / NPROW with LCM = ICLM( NPROW, NPCOL ),

IROFFC = MOD( ic-1, MB_C ), ICOFFC = MOD( jc-1, nb_c ),

ICROW = indxg2p( ic, MB_C, MYROW, rsrc_c, NPROW ),

ICCOL = indxg2p( jc, nb_c, MYCOL, csrc_c, NPCOL ),

MpC0 = numroc( m+IROFFC, MB_C, MYROW, ICROW, NPROW ),

NqC0 = numroc( n+ICOFFC, nb_c, MYCOL, ICCOL, NPCOL ),

ilcm, indxg2p, and numroc are ScaLAPACK tool functions;

MYROW, MYCOL, NPROW and NPCOL can be determined by calling the
subroutine blacs_gridinfo.

If lwork = -1, then lwork is global input and a workspace query is

1426
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

c On exit, sub( C ) is overwritten by Qsub( C ) or Q'sub( C ) or

sub( C )*Q' or sub( C )*Q.

work (local)
Array, size (lwork)

On exit, work[0] returns the minimal and optimal lwork.

If side = 'R', ( nb_a = nb_c and ICOFFA = ICOFFC and IACOL = ICCOL )

p?ormrq
Multiplies a general matrix by the orthogonal matrix Q
of the RQ factorization formed by p?gerqf.

Syntax
void psormrq (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *tau , float *c , MKL_INT *ic ,
MKL_INT *jc , MKL_INT *descc , float *work , MKL_INT *lwork , MKL_INT *info );
void pdormrq (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , double
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *tau , double *c , MKL_INT
*ic , MKL_INT *jc , MKL_INT *descc , double *work , MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?ormrqfunction overwrites the general real m-by-n distributed matrix sub (C) = C(iс:iс+m-1,jс:jс
+n-1) with

side ='L' side ='R'

trans = 'N': Q*sub(C) sub(C)*Q
trans = 'T': QT*sub(C) sub(C)*QT

where Q is a real orthogonal distributed matrix defined as the product of k elementary reflectors

Q = H(1) H(2)... H(k)

as returned by p?gerqf. Q is of order m if side = 'L' and of order n if side = 'R'.

1427
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

side (global)
='L': Q or QT is applied from the left.
='R': Q or QT is applied from the right.

trans (global)
='N', no transpose, Q is applied.
='T', transpose, QT is applied.

m (global) The number of rows in the distributed matrix sub(C) (m≥0).

n (global) The number of columns in the distributed matrix sub(C) (n≥0).

k (global) The number of elementary reflectors whose product defines the

matrix Q. Constraints:
If side = 'L', m≥k≥0

If side = 'R', n≥k≥0.

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+m-1) if
side = 'L', and lld_a*LOCc(ja+n-1) if side = 'R'.
The i-th row of the matrix stored in a must contain the vector that defines
the elementary reflector H(i), ia≤i≤ia+k-1, as returned by p?gerqf in the
k rows of its distributed matrix argument A(ia:ia+k-1, ja:*). A(ia:ia
+k-1, ja:*) is modified by the function but restored on exit.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

tau (local)
Array of size LOCc(ja+k-1).

Contains the scalar factor tau[i] of elementary reflectors H(i+1) as

returned by p?gerqf (0 ≤ i < LOCc(ja+k-1)). tau is tied to the distributed
matrix A.

c (local)
Pointer into the local memory to an array of local size lld_c*LOCc(jc+n-1).

Contains the local pieces of the distributed matrix sub(C) to be factored.

ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the matrix sub(C), respectively.

descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.

work (local)
Workspace array of size of lwork.

1428
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lwork (local or global) size of work, must be at least:

If side = 'L',

lwork≥max((mb_a*(mb_a-1))/2, (mpc0 + max(mqa0 +

numroc(numroc(n+iroffc, mb_a, 0, 0, NPROW), mb_a, 0, 0,
lcmp), nqc0))*mb_a) + mb_a*mb_a
else if side ='R',

lwork≥max((mb_a(mb_a-1))/2, (mpc0 + nqc0)mb_a) + mb_a*mb_a

end if
where
lcmp = lcm/NPROW with lcm = ilcm (NPROW, NPCOL),
iroffa = mod(ia-1, mb_a),
icoffa = mod(ja-1, nb_a),
iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),
mqa0 = numroc(n+icoffa, nb_a, MYCOL, iacol, NPCOL),
iroffc = mod(ic-1, mb_c),
icoffc = mod(jc-1, nb_c),
icrow = indxg2p(ic, mb_c, MYROW, rsrc_c, NPROW),
iccol = indxg2p(jc, nb_c, MYCOL, csrc_c, NPCOL),
mpc0 = numroc(m+iroffc, mb_c, MYROW, icrow, NPROW),
nqc0 = numroc(n+icoffc, nb_c, MYCOL, iccol, NPCOL),

NOTE
mod(x,y) is the integer remainder of x/y.

Output Parameters

c Overwritten by the product Q* sub(C), or Q'sub (C), or sub(C) Q', or

sub(C)* Q

work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.

info (global)
= 0: the execution is successful.

1429
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?unmrq
Multiplies a general matrix by the unitary matrix Q of
the RQ factorization formed by p?gerqf.

Syntax
void pcunmrq (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau ,
MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work ,
MKL_INT *lwork , MKL_INT *info );
void pzunmrq (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau ,
MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex16 *work ,
MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
This function overwrites the general complex m-by-n distributed matrix sub (C) = C(iс:iс+m-1,jс:jс+n-1)
with

side ='L' side ='R'

trans = 'N': Q*sub(C) sub(C)*Q
trans = 'C': QH*sub(C) sub(C)*QH

where Q is a complex unitary distributed matrix defined as the product of k elementary reflectors

Q = H(1)' H(2)'... H(k)'

as returned by p?gerqf. Q is of order m if side = 'L' and of order n if side = 'R'.

Input Parameters

side (global)
='L': Q or QH is applied from the left.
='R': Q or QH is applied from the right.

trans (global)
='N', no transpose, Q is applied.
='C', conjugate transpose, QH is applied.

m (global) The number of rows in the distributed matrix sub(C) , (m≥0).

n (global) The number of columns in the distributed matrix sub(C), (n≥0).

1430
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
k (global) The number of elementary reflectors whose product defines the
matrix Q. Constraints:
If side = 'L', m≥k≥0

If side = 'R', n≥k≥0.

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+m-1) if
side = 'L', and lld_a*LOCc(ja+n-1) if side = 'R'. The i-th row of the
matrix stored in amust contain the vector that defines the elementary
reflector H(i), ia≤i≤ia+k-1, as returned by p?gerqf in the k rows of its
distributed matrix argument A(ia:ia+k-1, ja:*). A(ia:ia+k-1, ja:*) is
modified by the function but restored on exit.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

tau (local)
Array of size LOCc(ja+k-1).

Contains the scalar factor tau[i] of elementary reflectors H(i+1) as

returned by p?gerqf (0 ≤ i < LOCc(ja+k-1)). tau is tied to the distributed
matrix A.

c (local)
Pointer into the local memory to an array of local size lld_c*LOCc(jc+n-1).

Contains the local pieces of the distributed matrix sub(C) to be factored.

ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the submatrix C, respectively.

descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.

work (local)
Workspace array of size of lwork.

lwork (local or global) size of work, must be at least:

If side = 'L',

lwork≥max((mb_a*(mb_a-1))/2, (mpc0 +
max(mqa0+numroc(numroc(n+iroffc, mb_a, 0, 0, NPROW), mb_a,
0, 0, lcmp), nqc0))*mb_a) + mb_a*mb_a
else if side = 'R',

lwork≥max((mb_a(mb_a-1))/2, (mpc0 + nqc0)mb_a) + mb_a*mb_a

end if
where
lcmp = lcm/NPROW with lcm = ilcm(NPROW, NPCOL),
iroffa = mod(ia-1, mb_a),

1431
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

icoffa = mod(ja-1, nb_a),

iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),
mqa0 = numroc(m+icoffa, nb_a, MYCOL, iacol, NPCOL),
iroffc = mod(ic-1, mb_c),
icoffc = mod(jc-1, nb_c),
icrow = indxg2p(ic, mb_c, MYROW, rsrc_c, NPROW),
iccol = indxg2p(jc, nb_c, MYCOL, csrc_c, NPCOL),
mpc0 = numroc(m+iroffc, mb_c, MYROW, icrow, NPROW),
nqc0 = numroc(n+icoffc, nb_c, MYCOL, iccol, NPCOL),

NOTE
mod(x,y) is the integer remainder of x/y.

Output Parameters

c Overwritten by the product Q* sub(C) or Q'sub (C), or sub(C) Q', or

sub(C)* Q

work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?tzrzf
Reduces the upper trapezoidal matrix A to upper
triangular form.

Syntax
void pstzrzf (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdtzrzf (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );

1432
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void pctzrzf (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT
*info );
void pztzrzf (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT
*info );

Include Files
• mkl_scalapack.h

Description
The p?tzrzffunction reduces the m-by-n (m ≤ n) real/complex upper trapezoidal matrix sub(A)= A(ia:ia
+m-1, ja:ja+n-1) to upper triangular form by means of orthogonal/unitary transformations. The upper
trapezoidal matrix A is factored as
A = (R 0)*Z,
where Z is an n-by-n orthogonal/unitary matrix and R is an m-by-m upper triangular matrix.

Input Parameters

m (global) The number of rows in the matrix sub(A); (m≥0).

n (global) The number of columns in the matrix sub(A) (n≥0).

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1).
Contains the local pieces of the m-by-n distributed matrix sub (A) to be
factored.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

work (local)
Workspace array of size of lwork.

lwork (local or global) size of work, must be at least

NOTE
mod(x,y) is the integer remainder of x/y.

1433
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.

If lwork = -1, then lwork is global input and a workspace query is

Output Parameters

a On exit, the leading m-by-m upper triangular part of sub(A) contains the
upper triangular matrix R, and elements m+1 to n of the first m rows of sub
(A), with the array tau, represent the orthogonal/unitary matrix Z as a
product of m elementary reflectors.

work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.

tau (local)
Array of size LOCr(ia+m-1).

Contains the scalar factor of elementary reflectors. tau is tied to the

distributed matrix A.

info (global)
= 0: the execution is successful.
< 0:if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.

Application Notes
The factorization is obtained by the Householder's method. The k-th transformation matrix, Z(k), which is or
whose conjugate transpose is used to introduce zeros into the (m - k +1)-th row of sub(A), is given in the
form

where
T(k) = i - tau*u(k)*u(k)',

tau is a scalar and Z(k) is an (n - m) element vector. tau and Z(k) are chosen to annihilate the elements of
the k-th row of sub(A). The scalar tau is returned in the k-th element of tau, indexed k-1, and the vector
u(k) in the k-th row of sub(A), such that the elements of Z(k) are in a(k, m + 1),..., a(k, n). The
elements of R are returned in the upper triangular part of sub(A). Z is given by

1434
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Z = Z(1) * Z(2) *... * Z(m).

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?ormrz
Multiplies a general matrix by the orthogonal matrix
from a reduction to upper triangular form formed by
p?tzrzf.

Syntax
void psormrz (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_INT
*l , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *tau , float *c ,
MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , float *work , MKL_INT *lwork , MKL_INT
*info );
void pdormrz (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_INT
*l , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *tau , double *c ,
MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , double *work , MKL_INT *lwork , MKL_INT
*info );

Include Files
• mkl_scalapack.h

Description
This function overwrites the general real m-by-n distributed matrix sub(C) = C(iс:iс+m-1,jс:jс+n-1) with

side ='L' side ='R'

trans = 'N': Q*sub(C) sub(C)*Q
trans = 'T': QT*sub(C) sub(C)*QT

where Q is a real orthogonal distributed matrix defined as the product of k elementary reflectors

Q = H(1) H(2)... H(k)

as returned by p?tzrzf. Q is of order m if side = 'L' and of order n if side = 'R'.

Input Parameters

side (global)
='L': Q or QT is applied from the left.
='R': Q or QT is applied from the right.

trans (global)
='N', no transpose, Q is applied.
='T', transpose, QT is applied.

m (global) The number of rows in the distributed matrix sub(C)(m≥0).

n (global) The number of columns in the distributed matrix sub(C)(n≥0).

k (global) The number of elementary reflectors whose product defines the

matrix Q. Constraints:

1435
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If side = 'L', m ≥ k ≥0

If side = 'R', n ≥ k ≥0.

l (global)
The columns of the distributed matrix sub(A) containing the meaningful
part of the Householder reflectors.
If side = 'L', m ≥ l ≥0

If side = 'R', n ≥ l ≥0.

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+m-1) if
side = 'L', and lld_a*LOCc(ja+n-1) if side = 'R', where lld_a ≥
max(1,LOCr(ia+k-1)).
The i-th row of the matrix stored in amust contain the vector that defines
the elementary reflector H(i), ia≤i≤ia+k-1, as returned by p?tzrzf in the
k rows of its distributed matrix argument A(ia:ia+k-1, ja:*). A(ia:ia
+k-1, ja:*) is modified by the function but restored on exit.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

tau (local)
Array of size LOCc(ia+k-1).

Contains the scalar factor tau[i] of elementary reflectors H(i+1) as

returned by p?tzrzf (0 ≤ i < LOCc(ia+k-1)). tau is tied to the distributed
matrix A.

c (local)
Pointer into the local memory to an array of local size lld_c*LOCc(jc+n-1).

Contains the local pieces of the distributed matrix sub(C) to be factored.

ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the submatrix C, respectively.

descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.

work (local)
Workspace array of size of lwork.

lwork (local or global) size of work, must be at least:

If side = 'L',

lwork≥max((mb_a*(mb_a-1))/2, (mpc0 + max(mqa0 +

numroc(numroc(n+iroffc, mb_a, 0, 0, NPROW), mb_a, 0, 0,
lcmp), nqc0))*mb_a) + mb_a*mb_a
else if side ='R',

1436
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lwork≥max((mb_a*(mb_a-1))/2, (mpc0 + nqc0)*mb_a) + mb_a*mb_a
end if
where
lcmp = lcm/NPROW with lcm = ilcm (NPROW, NPCOL),
iroffa = mod(ia-1, mb_a), icoffa = mod(ja-1, nb_a),
iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),
mqa0 = numroc(n+icoffa, nb_a, MYCOL, iacol, NPCOL),
iroffc = mod(ic-1, mb_c),
icoffc = mod(jc-1, nb_c),
icrow = indxg2p(ic, mb_c, MYROW, rsrc_c, NPROW),
iccol = indxg2p(jc, nb_c, MYCOL, csrc_c, NPCOL),
mpc0 = numroc(m+iroffc, mb_c, MYROW, icrow, NPROW),
nqc0 = numroc(n+icoffc, nb_c, MYCOL, iccol, NPCOL),

NOTE
mod(x,y) is the integer remainder of x/y.

Output Parameters

c Overwritten by the product Qsub(C), or Q'sub (C), or sub(C)*Q', or

sub(C)*Q

work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

1437
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

p?unmrz
Multiplies a general matrix by the unitary
transformation matrix from a reduction to upper
triangular form determined by p?tzrzf.

Syntax
void pcunmrz (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_INT
*l , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau ,
MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work ,
MKL_INT *lwork , MKL_INT *info );
void pzunmrz (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_INT
*l , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16
*tau , MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex16
*work , MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
This function overwrites the general complex m-by-n distributed matrix sub (C) = C(iс:iс+m-1,jс:jс+n-1)
with

side ='L' side ='R'

trans = 'N': Q*sub(C) sub(C)*Q
trans = 'C': QH*sub(C) sub(C)*QH

where Q is a complex unitary distributed matrix defined as the product of k elementary reflectors

Q = H(1)' H(2)'... H(k)'

as returned by pctzrzf/pztzrzf. Q is of order m if side = 'L' and of order n if side = 'R'.

Input Parameters

side (global)
='L': Q or QH is applied from the left.
='R': Q or QH is applied from the right.

trans (global)
='N', no transpose, Q is applied.
='C', conjugate transpose, QH is applied.

m (global) The number of rows in the distributed matrix sub(C), (m≥0).

n (global) The number of columns in the distributed matrix sub(C), (n≥0).

k (global) The number of elementary reflectors whose product defines the

matrix Q. Constraints:
If side = 'L', m≥k≥0

If side = 'R', n≥k≥0.

1438
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
l (global) The columns of the distributed matrix sub(A) containing the
meaningful part of the Householder reflectors.
If side = 'L', m≥l≥0

If side = 'R', n≥l≥0.

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+m-1) if
side = 'L', and lld_a*LOCc(ja+n-1) if side = 'R', where lld_a ≥
max(1, LOCr(ja+k-1)). The i-th row of the matrix stored in amust
contain the vector that defines the elementary reflector H(i), ia≤i≤ia+k-1,
as returned by p?gerqf in the k rows of its distributed matrix argument
A(ia:ia+k-1, ja:*). A(ia:ia+k-1, ja:*) is modified by the function but
restored on exit.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

tau (local)
Array of size LOCc(ia+k-1).

Contains the scalar factor tau[i] of elementary reflectors H(i+1) as

returned by p?gerqf (0 ≤ i < LOCc(ia+k-1)). tau is tied to the distributed
matrix A.

c (local)
Pointer into the local memory to an array of local size lld_c*LOCc(jc+n-1).

Contains the local pieces of the distributed matrix sub(C) to be factored.

ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the submatrix C, respectively.

descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.

work (local)
Workspace array of size lwork.

lwork (local or global) size of work, must be at least:

If side = 'L',

lwork≥max((mb_a*(mb_a-1))/2, (mpc0+max(mqa0+numroc(numroc(n
+iroffc, mb_a, 0, 0, NPROW), mb_a, 0, 0, lcmp), nqc0))*mb_a)
+ mb_a*mb_a
else if side ='R',

lwork≥max((mb_a(mb_a-1))/2, (mpc0+nqc0)mb_a) + mb_a*mb_a

end if
where
lcmp = lcm/NPROW with lcm = ilcm(NPROW, NPCOL),

1439
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

iroffa = mod(ia-1, mb_a),

icoffa = mod(ja-1, nb_a),
iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),
mqa0 = numroc(m+icoffa, nb_a, MYCOL, iacol, NPCOL),
iroffc = mod(ic-1, mb_c),
icoffc = mod(jc-1, nb_c),
icrow = indxg2p(ic, mb_c, MYROW, rsrc_c, NPROW),
iccol = indxg2p(jc, nb_c, MYCOL, csrc_c, NPCOL),
mpc0 = numroc(m+iroffc, mb_c, MYROW, icrow, NPROW),
nqc0 = numroc(n+icoffc, nb_c, MYCOL, iccol, NPCOL),

NOTE
mod(x,y) is the integer remainder of x/y.

Output Parameters

c Overwritten by the product Q* sub(C), or Q'sub (C), or sub(C)Q', or

sub(C)*Q

work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?ggqrf
Computes the generalized QR factorization.

Syntax
void psggqrf (MKL_INT *n , MKL_INT *m , MKL_INT *p , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *taua , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , float *taub , float *work , MKL_INT *lwork , MKL_INT *info );

1440
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void pdggqrf (MKL_INT *n , MKL_INT *m , MKL_INT *p , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *taua , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , double *taub , double *work , MKL_INT *lwork , MKL_INT *info );
void pcggqrf (MKL_INT *n , MKL_INT *m , MKL_INT *p , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *taua , MKL_Complex8 *b , MKL_INT *ib ,
MKL_INT *jb , MKL_INT *descb , MKL_Complex8 *taub , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
void pzggqrf (MKL_INT *n , MKL_INT *m , MKL_INT *p , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *taua , MKL_Complex16 *b , MKL_INT *ib ,
MKL_INT *jb , MKL_INT *descb , MKL_Complex16 *taub , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?ggqrffunction forms the generalized QR factorization of an n-by-m matrix

sub(A) = A(ia:ia+n-1, ja:ja+m-1)

and an n-by-p matrix

sub(B) = B(ib:ib+n-1, jb:jb+p-1):

as
sub(A) = Q*R, sub(B) = Q*T*Z,
where Q is an n-by-n orthogonal/unitary matrix, Z is a p-by-p orthogonal/unitary matrix, and R and T
assume one of the forms:
If n ≥ m

or if n < m

where R11 is upper triangular, and

1441
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

where T12 or T21 is an upper triangular matrix.

In particular, if sub(B) is square and nonsingular, the GQR factorization of sub(A) and sub(B) implicitly gives
the QR factorization of inv (sub(B))* sub (A):
inv(sub(B))*sub(A) = ZH*(inv(T)*R)

Input Parameters

n (global) The number of rows in the distributed matrices sub (A) and sub(B)
(n≥0).

m (global) The number of columns in the distributed matrix sub(A) (m≥0).

p The number of columns in the distributed matrix sub(B) (p≥0).

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+m-1).
Contains the local pieces of the n-by-m matrix sub(A) to be factored.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

b (local)
Pointer into the local memory to an array of size lld_b*LOCc(jb+p-1).
Contains the local pieces of the n-by-p matrix sub(B) to be factored.

ib, jb (global) The row and column indices in the global matrix B
indicating the first row and the first column of the submatrix B,
respectively.
descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.

work (local)
Workspace array of size of lwork.

lwork (local or global) Sze of work, must be at least

lwork≥max(nb_a*(npa0+mqa0+nb_a), max((nb_a*(nb_a-1))/2,
(pqb0+npb0)*nb_a)+nb_a*nb_a, mb_b*(npb0+pqb0+mb_b)),
where
iroffa = mod(ia-1, mb_A),
icoffa = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),
iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),

1442
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
npa0 = numroc (n+iroffa, mb_a, MYROW, iarow, NPROW),
mqa0 = numroc (m+icoffa, nb_a, MYCOL, iacol, NPCOL)
iroffb = mod(ib-1, mb_b),
icoffb = mod(jb-1, nb_b),
ibrow = indxg2p(ib, mb_b, MYROW, rsrc_b, NPROW),
ibcol = indxg2p(jb, nb_b, MYCOL, csrc_b, NPCOL),
npb0 = numroc (n+iroffa, mb_b, MYROW, Ibrow, NPROW),
pqb0 = numroc(m+icoffb, nb_b, MYCOL, ibcol, NPCOL)

NOTE
mod(x,y) is the integer remainder of x/y.

and numroc, indxg2p are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.

If lwork = -1, then lwork is global input and a workspace query is

Output Parameters

a On exit, the elements on and above the diagonal of sub (A) contain the
min(n, m)-by-m upper trapezoidal matrix R (R is upper triangular if n≥m); the
elements below the diagonal, with the array taua, represent the
orthogonal/unitary matrix Q as a product of min(n, m) elementary
reflectors. (See Application Notes below).

taua, taub (local)

Arrays of size LOCc(ja+min(n,m)-1) for taua and LOCr(ib+n-1) for
taub.
The array taua contains the scalar factors of the elementary reflectors
which represent the orthogonal/unitary matrix Q. taua is tied to the
distributed matrix A. (See Application Notes below).
The array taub contains the scalar factors of the elementary reflectors
which represent the orthogonal/unitary matrix Z. taub is tied to the
distributed matrix B. (See Application Notes below).

work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.

1443
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Application Notes
The matrix Q is represented as a product of elementary reflectors
Q = H(ja)*H(ja+1)*...*H(ja+k-1),

where k= min(n,m).

Each H(i) has the form

H(i) = i - taua*v*v'
where taua is a real/complex scalar, and v is a real/complex vector with v(1:i-1) = 0 and v(i) = 1; v(i+1:n)
is stored on exit in A(ia+i:ia+n-1, ja+i-1) , and taua in taua[ja+i-2].To form Q explicitly, use ScaLAPACK
function p?orgqr/p?ungqr. To use Q to update another matrix, use ScaLAPACK function p?ormqr/p?unmqr.

The matrix Z is represented as a product of elementary reflectors

Z = H(ib)*H(ib+1)*...*H(ib+k-1), where k= min(n,p).

Each H(i) has the form

H(i) = i - taub*v*v'
where taub is a real/complex scalar, and v is a real/complex vector with v(p-k+i+1:p) = 0 and v(p-k+i) = 1;
v(1:p-k+i-1) is stored on exit in B(ib+n-k+i-1,jb:jb+p-k+i-2), and taub in taub[ib+n-k+i-2]. To form Z
explicitly, use ScaLAPACK function p?orgrq/p?ungrq. To use Z to update another matrix, use ScaLAPACK
function p?ormrq/p?unmrq.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?ggrqf
Computes the generalized RQ factorization.

Syntax
void psggrqf (MKL_INT *m , MKL_INT *p , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *taua , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , float *taub , float *work , MKL_INT *lwork , MKL_INT *info );
void pdggrqf (MKL_INT *m , MKL_INT *p , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *taua , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , double *taub , double *work , MKL_INT *lwork , MKL_INT *info );
void pcggrqf (MKL_INT *m , MKL_INT *p , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *taua , MKL_Complex8 *b , MKL_INT *ib ,
MKL_INT *jb , MKL_INT *descb , MKL_Complex8 *taub , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
void pzggrqf (MKL_INT *m , MKL_INT *p , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *taua , MKL_Complex16 *b , MKL_INT *ib ,
MKL_INT *jb , MKL_INT *descb , MKL_Complex16 *taub , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?ggrqffunction forms the generalized RQ factorization of an m-by-n matrix sub(A) = A(ia:ia+m-1,
ja:ja+n-1) and a p-by-n matrix sub(B) = B(ib:ib+p-1, jb:jb+n-1):

1444
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
sub(A) = R*Q, sub(B) = Z*T*Q,
where Q is an n-by-n orthogonal/unitary matrix, Z is a p-by-p orthogonal/unitary matrix, and R and T
assume one of the forms:

where R11 or R21 is upper triangular, and

where T11 is upper triangular.

In particular, if sub(B) is square and nonsingular, the GRQ factorization of sub(A) and sub(B) implicitly gives
the RQ factorization of sub (A)*inv(sub(B)):
sub(A)*inv(sub(B))= (R*inv(T))*Z'
where inv(sub(B)) denotes the inverse of the matrix sub(B), and Z' denotes the transpose (conjugate
transpose) of matrix Z.

Input Parameters

m (global) The number of rows in the distributed matrices sub (A) (m≥0).

p The number of rows in the distributed matrix sub(B) (p≥0).

n (global) The number of columns in the distributed matrices sub(A) and

sub(B) (n≥0).

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1).
Contains the local pieces of the m-by-n distributed matrix sub(A) to be
factored.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

1445
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

b (local)
Pointer into the local memory to an array of size lld_b*LOCc(jb+n-1).

Contains the local pieces of the p-by-n matrix sub(B) to be factored.

ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of the submatrix B, respectively.

descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.

work (local)
Workspace array of size of lwork.

lwork (local or global)

Size of work, must be at least lwork≥max(mb_a*(mpa0+nqa0+mb_a),
max((mb_a*(mb_a-1))/2, (ppb0+nqb0)*mb_a) + mb_a*mb_a,
nb_b*(ppb0+nqb0+nb_b)), where
iroffa = mod(ia-1, mb_A),
icoffa = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),
iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),
mpa0 = numroc (m+iroffa, mb_a, MYROW, iarow, NPROW),
nqa0 = numroc (m+icoffa, nb_a, MYCOL, iacol, NPCOL)
iroffb = mod(ib-1, mb_b),
icoffb = mod(jb-1, nb_b),
ibrow = indxg2p(ib, mb_b, MYROW, rsrc_b, NPROW ),
ibcol = indxg2p(jb, nb_b, MYCOL, csrc_b, NPCOL ),
ppb0 = numroc (p+iroffb, mb_b, MYROW, ibrow,NPROW),
nqb0 = numroc (n+icoffb, nb_b, MYCOL, ibcol,NPCOL)

NOTE
mod(x,y) is the integer remainder of x/y.

and numroc, indxg2p are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.

If lwork = -1, then lwork is global input and a workspace query is

1446
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

a On exit, if m≤n, the upper triangle of A(ia:ia+m-1, ja+n-m:ja+n-1)

contains the m-by-m upper triangular matrix R; if m≥n, the elements on and
above the (m-n)-th subdiagonal contain the m-by-n upper trapezoidal matrix
R; the remaining elements, with the array taua, represent the orthogonal/
unitary matrix Q as a product of min(n,m) elementary reflectors (see
Application Notes below).

taua, taub (local)

Arrays of size LOCr(ia+m-1)for taua and LOCc(jb+min(p,n)-1) for
taub.
The array taua contains the scalar factors of the elementary reflectors
which represent the orthogonal/unitary matrix Q. taua is tied to the
distributed matrix A.(See Application Notes below).
The array taub contains the scalar factors of the elementary reflectors
which represent the orthogonal/unitary matrix Z. taub is tied to the
distributed matrix B. (See Application Notes below).

work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.

Application Notes
The matrix Q is represented as a product of elementary reflectors
Q = H(ia)*H(ia+1)*...*H(ia+k-1),

where k= min(m,n).

Each H(i) has the form

H(i) = i - taua*v*v'
where taua is a real/complex scalar, and v is a real/complex vector with v(n-k+i+1:n) = 0 and v(n-k+i) = 1;
v(1:n-k+i-1) is stored on exit in A(ia+m-k+i-1, ja:ja+n-k+i-2), and taua in taua[ia+m-k+i-2]. To form Q
explicitly, use ScaLAPACK function p?orgrq/p?ungrq. To use Q to update another matrix, use ScaLAPACK
function p?ormrq/p?unmrq.

The matrix Z is represented as a product of elementary reflectors

Z = H(jb)*H(jb+1)*...*H(jb+k-1), where k= min(p,n).

Each H(i) has the form

H(i) = i - taub*v*v'
where taub is a real/complex scalar, and v is a real/complex vector with v(1:i-1) = 0 and v(i)= 1; v(i+1:p) is
stored on exit in B(ib+i:ib+p-1,jb+i-1), and taub in taub[jb+i-2]. To form Z explicitly, use ScaLAPACK
function p?orgqr/p?ungqr. To use Z to update another matrix, use ScaLAPACK function p?ormqr/p?unmqr.

1447
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

Symmetric Eigenvalue Problems: ScaLAPACK Computational Routines

To solve a symmetric eigenproblem with ScaLAPACK, you usually need to reduce the matrix to real
tridiagonal form T and then find the eigenvalues and eigenvectors of the tridiagonal matrix T. ScaLAPACK
includes routines for reducing the matrix to a tridiagonal form by an orthogonal (or unitary) similarity
transformation A = QTQH as well as for solving tridiagonal symmetric eigenvalue problems. These routines
are listed in Table "Computational Routines for Solving Symmetric Eigenproblems".
There are different routines for symmetric eigenproblems, depending on whether you need eigenvalues only
or eigenvectors as well, and on the algorithm used (either the QTQ algorithm, or bisection followed by
inverse iteration).

Computational Routines for Solving Symmetric Eigenproblems

Operation Dense symmetric/ Orthogonal/unitary Symmetric
Hermitian matrix matrix tridiagonal
matrix
Reduce to tridiagonal form A = QTQH p?sytrd/p?hetrd
Multiply matrix after reduction p?ormtr/p?unmtr
Find all eigenvalues and eigenvectors steqr2*
of a tridiagonal matrix T by a QTQ
method
Find selected eigenvalues of a p?stebz
tridiagonal matrix T via bisection
Find selected eigenvectors of a p?stein
tridiagonal matrix T by inverse
iteration
* This routine is described as part of auxiliary ScaLAPACK routines.

p?syngst
Reduces a complex Hermitian-definite generalized
eigenproblem to standard form.

Syntax
void pssyngst (const MKL_INT* ibtype, const char* uplo, const MKL_INT* n, float* a,
const MKL_INT* ia, const MKL_INT* ja, const MKL_INT* desca, const float* b, const
MKL_INT* ib, const MKL_INT* jb, const MKL_INT* descb, float* scale, float* work, const
MKL_INT* lwork, MKL_INT* info);
void pdsyngst (const MKL_INT* ibtype, const char* uplo, const MKL_INT* n, double* a,
const MKL_INT* ia, const MKL_INT* ja, const MKL_INT* desca, const double* b, const
MKL_INT* ib, const MKL_INT* jb, const MKL_INT* descb, double* scale, double* work,
const MKL_INT* lwork, MKL_INT* info);

Include Files
• mkl_scalapack.h

Description
p?syngst reduces a complex Hermitian-definite generalized eigenproblem to standard form.
p?syngst performs the same function as p?hegst, but is based on rank 2K updates, which are faster and
more scalable than triangular solves (the basis of p?syngst).

p?syngst calls p?hegst when uplo='U', hence p?hengst provides improved performance only when
uplo='L', ibtype=1.

1448
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
p?syngst also calls p?hegst when insufficient workspace is provided, hence p?syngst provides improved
performance only when lwork >= 2 * NP0 * NB + NQ0 * NB + NB * NB

In the following sub( A ) denotes A( ia:ia+n-1, ja:ja+n-1 ) and sub( B ) denotes B( ib:ib+n-1, jb:jb
+n-1 ).

If ibtype = 1, the problem is sub( A )*x = lambda*sub( B )*x, and sub( A ) is overwritten by
inv(UH)*sub( A )*inv(U) or inv(L)*sub( A )*inv(LH)
If ibtype = 2 or 3, the problem is sub( A )*sub( B )*x = lambda*x or sub( B )*sub( A )*x = lambda*x, and
sub( A ) is overwritten by U*sub( A )*UH or LH*sub( A )*L.
sub( B ) must have been previously factorized as UH*U or L*LH by p?potrf.

Input Parameters

ibtype (global)
= 1: compute inv(UH)*sub( A )*inv(U) or inv(L)*sub( A )*inv(LH);
= 2 or 3: compute U*sub( A )*UH or LH*sub( A )*L.

uplo (global)
= 'U': Upper triangle of sub( A ) is stored and sub( B ) is factored as UH*U;
= 'L': Lower triangle of sub( A ) is stored and sub( B ) is factored as L*LH.

n (global)
The order of the matrices sub( A ) and sub( B ). n >= 0.

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1).

On entry, this array contains the local pieces of the n-by-n Hermitian
distributed matrix sub( A ). If uplo = 'U', the leading n-by-n upper
triangular part of sub( A ) contains the upper triangular part of the matrix,
and its strictly lower triangular part is not referenced. If uplo = 'L', the
leading n-by-n lower triangular part of sub( A ) contains the lower
triangular part of the matrix, and its strictly upper triangular part is not
referenced.

ia (global)
A's global row index, which points to the beginning of the submatrix which
is to be operated on.

ja (global)
A's global column index, which points to the beginning of the submatrix
which is to be operated on.

desca (global and local)

Array of size dlen_.
The array descriptor for the distributed matrix A.

b (local)
Pointer into the local memory to an array of size lld_b*LOCc(jb+n-1).

1449
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

On entry, this array contains the local pieces of the triangular factor from
the Cholesky factorization of sub( B ), as returned by p?potrf.

ib (global)
B's global row index, which points to the beginning of the submatrix which
is to be operated on.

jb (global)
B's global column index, which points to the beginning of the submatrix
which is to be operated on.

descb (global and local)

Array of size dlen_.
The array descriptor for the distributed matrix B.

work (local)
Array, size (lwork)

lwork (local or global)

The size of the array work.

lwork is local input and must be at least lwork >= MAX( NB * ( NP0 +1 ),
3 * NB )
When ibtype = 1 and uplo = 'L', p?syngst provides improved
performance when lwork >= 2 * NP0 * NB + NQ0 * NB + NB * NB,

where NB = mb_a = nb_a,

NP0 = numroc( n, NB, 0, 0, NPROW ),

NQ0 = numroc( n, NB, 0, 0, NPROW ),

numroc is a ScaLAPACK tool functions

MYROW, MYCOL, NPROW and NPCOL can be determined by calling the
subroutine blacs_gridinfo.

If lwork = -1, then lwork is global input and a workspace query is

assumed; the routine only calculates the optimal size for all work arrays.
Each of these values is returned in the first entry of the corresponding work
array, and no error message is issued by pxerbla.

Output Parameters

a On exit, if info = 0, the transformed matrix, stored in the same

format as sub( A ).

scale (global)
Amount by which the eigenvalues should be scaled to compensate for
the scaling performed in this routine. At present, scale is always
returned as 1.0, it is returned here to allow for future enhancement.

work (local)
Array, size (lwork)

1450
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
On exit, work[0] returns the minimal and optimal lwork.

info (global)
= 0: successful exit
< 0: If the i-th argument is an array and the j-th entry had an illegal
value, then info = -(i*100+j), if the i-th argument is a scalar and
had an illegal value, then info = -i.

p?syntrd
Reduces a real symmetric matrix to symmetric
tridiagonal form.

Syntax
void pssyntrd (const char* uplo, const MKL_INT* n, float* a, const MKL_INT* ia, const
MKL_INT* ja, const MKL_INT* desca, float* d, float* e, float* tau, float* work, const
MKL_INT* lwork, MKL_INT* info);
void pdsyntrd (const char* uplo, const MKL_INT* n, double* a, const MKL_INT* ia, const
MKL_INT* ja, const MKL_INT* desca, double* d, double* e, double* tau, double* work,
const MKL_INT* lwork, MKL_INT* info);

Include Files
• mkl_scalapack.h

Description
p?syntrd is a prototype version of p?sytrd which uses tailored codes (either the serial, ?sytrd, or the
parallel code, p?syttrd) when the workspace provided by the user is adequate.

p?syntrd reduces a real symmetric matrix sub( A ) to symmetric tridiagonal form T by an orthogonal
similarity transformation:
Q' * sub( A ) * Q = T, where sub( A ) = A(ia:ia+n-1,ja:ja+n-1).

Features
p?syntrd is faster than p?sytrd on almost all matrices, particularly small ones (i.e. n < 500 * sqrt(P) ),
provided that enough workspace is available to use the tailored codes.
The tailored codes provide performance that is essentially independent of the input data layout.
The tailored codes place no restrictions on ia, ja, MB or NB. At present, ia, ja, MB and NB are restricted to
those values allowed by p?hetrd to keep the interface simple (see the Application Notes section for more
information about the restrictions).

Input Parameters

uplo (global)
Specifies whether the upper or lower triangular part of the symmetric
matrix sub( A ) is stored:
= 'U': Upper triangular
= 'L': Lower triangular

n (global)

1451
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The number of rows and columns to be operated on, i.e. the order of the
distributed submatrix sub( A ). n >= 0.

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1).

On entry, this array contains the local pieces of the symmetric distributed
matrix sub( A ). If uplo = 'U', the leading n-by-n upper triangular part of
sub( A ) contains the upper triangular part of the matrix, and its strictly
lower triangular part is not referenced. If uplo = 'L', the leading n-by-n
lower triangular part of sub( A ) contains the lower triangular part of the
matrix, and its strictly upper triangular part is not referenced.

ia (global)
The row index in the global array a indicating the first row of sub( A ).

ja (global)
The column index in the global array a indicating the first column of
sub( A ).

desca (global and local)

Array of size dlen_.
The array descriptor for the distributed matrix A.

work (local)
Array, size (lwork)

lwork (local or global)

The size of the array work.

lwork is local input and must be at least lwork >= MAX( NB * ( NP +1 ), 3

* NB )
For optimal performance, greater workspace is needed, i.e.
lwork >= 2*( ANB+1 )*( 4*NPS+2 ) + ( NPS + 4 ) * NPS
ANB = pjlaenv( ICTXT, 3, 'p?syttrd', 'L', 0, 0, 0, 0 )

ICTXT = desca( ctxt_ )

SQNPC = INT( sqrt( REAL( NPROW * NPCOL ) ) )

numroc is a ScaLAPACK tool function.

pjlaenv is a ScaLAPACK environmental inquiry function.
NPROW and NPCOL can be determined by calling the subroutine
blacs_gridinfo.

Output Parameters

a On exit, if uplo = 'U', the diagonal and first superdiagonal of sub( A )

are overwritten by the corresponding elements of the tridiagonal
matrix T, and the elements above the first superdiagonal, with the
array tau, represent the orthogonal matrix Q as a product of
elementary reflectors; if uplo = 'L', the diagonal and first subdiagonal

1452
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
of sub( A ) are overwritten by the corresponding elements of the
tridiagonal matrix T, and the elements below the first subdiagonal,
with the array tau, represent the orthogonal matrix Q as a product of
elementary reflectors. See Further Details.

d (local)
Array, size LOCc(ja+n-1)

The diagonal elements of the tridiagonal matrix T: d(i) = A(i,i). d is

tied to the distributed matrix A.

e (local)
Array, size LOCc(ja+n-1) if uplo = 'U', LOCc(ja+n-2) otherwise.

The off-diagonal elements of the tridiagonal matrix T: e(i) = A(i,i+1) if

uplo = 'U', e(i) = A(i+1,i) if uplo = 'L'. e is tied to the distributed
matrix A.

tau (local)
Array, size LOCc(ja+n-1).

This array contains the scalar factors tau of the elementary reflectors.
tau is tied to the distributed matrix A.

work (local)
Array, size (lwork)

On exit, work[0] returns the optimal lwork.

Application Notes
If uplo = 'U', the matrix Q is represented as a product of elementary reflectors

Q = H(n-1) . . . H(2) H(1).

Each H(i) has the form

H(i) = I - tau * v * v', where tau is a complex scalar, and v is a complex vector with v(i+1:n) = 0 and v(i) =
1; v(1:i-1) is stored on exit in A(ia:ia+i-2,ja+i), and tau in tau(ja+i-1).

If uplo = 'L', the matrix Q is represented as a product of elementary reflectors

Q = H(1) H(2) . . . H(n-1).

Each H(i) has the form

H(i) = I - tau * v * v', where tau is a complex scalar, and v is a complex vector with v(1:i) = 0 and v(i+1) =
1; v(i+2:n) is stored on exit in A(ia+i+1:ia+n-1,ja+i-1), and tau in tau(ja+i-1).

The contents of sub( A ) on exit are illustrated by the following examples with n = 5:

if uplo = 'U':

1453
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

d e v2 v3 v4
d e v3 v4
d e v3
d e
d
if uplo = 'L':

d
e d
v1 e d
v1 v2 e d
v1 v2 v3 e d
where d and e denote diagonal and off-diagonal elements of T, and vi denotes an element of the vector
defining H(i).
Alignment requirements
The distributed submatrix sub( A ) must verify some alignment properties, namely the following expression
should be true:
( mb_a = nb_a and IROFFA = ICOFFA and IROFFA = 0 ) with IROFFA = mod( ia-1, mb_a), and ICOFFA =
mod( ja-1, nb_a ).

p?sytrd
Reduces a symmetric matrix to real symmetric
tridiagonal form by an orthogonal similarity
transformation.

Syntax
void pssytrd (char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *d , float *e , float *tau , float *work , MKL_INT *lwork , MKL_INT
*info );
void pdsytrd (char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *d , double *e , double *tau , double *work , MKL_INT *lwork , MKL_INT
*info );

Include Files
• mkl_scalapack.h

Description
The p?sytrd function reduces a real symmetric matrix sub(A) to symmetric tridiagonal form T by an
orthogonal similarity transformation:
Q'*sub(A)*Q = T,
where sub(A) = A(ia:ia+n-1,ja:ja+n-1).

Input Parameters

uplo (global)
Specifies whether the upper or lower triangular part of the symmetric
matrix sub(A) is stored:
If uplo = 'U', upper triangular

1454
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If uplo = 'L', lower triangular

n (global) The order of the distributed matrix sub(A) (n≥0).

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1). On
entry, this array contains the local pieces of the symmetric distributed
matrix sub(A).
If uplo = 'U', the leading n-by-n upper triangular part of sub(A) contains
the upper triangular part of the matrix, and its strictly lower triangular part
is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
the lower triangular part of the matrix, and its strictly upper triangular part
is not referenced. See Application Notes below.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

work (local)
Workspace array of size lwork.

lwork (local or global) size of work, must be at least:

lwork ≥ max(NB(np +1), 3NB),

where NB = mb_a = nb_a,

np = numroc(n, NB, MYROW, iarow, NPROW),

iarow = indxg2p(ia, NB, MYROW, rsrc_a, NPROW).
indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.

If lwork = -1, then lwork is global input and a workspace query is

Output Parameters

a On exit, if uplo = 'U', the diagonal and first superdiagonal of sub(A) are
overwritten by the corresponding elements of the tridiagonal matrix T, and
the elements above the first superdiagonal, with the array tau, represent
the orthogonal matrix Q as a product of elementary reflectors; if uplo =
'L', the diagonal and first subdiagonal of sub(A) are overwritten by the
corresponding elements of the tridiagonal matrix T, and the elements below
the first subdiagonal, with the array tau, represent the orthogonal matrix Q
as a product of elementary reflectors. See Application Notes below.

d (local)
Arrays of size LOCc(ja+n-1) .The diagonal elements of the tridiagonal
matrix T:

1455
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

d[i]= A(i+1,i+1), 0 ≤i < LOCc(ja+n-1).

d is tied to the distributed matrix A.

e (local)
Arrays of size LOCc(ja+n-1) if uplo = 'U', LOCc(ja+n-2) otherwise.

The off-diagonal elements of the tridiagonal matrix T:

e[i]= A(i+1,i+2), 0 ≤i < LOCc(ja+n-1) if uplo = 'U',
e[i] = A(i+2,i+1) if uplo = 'L'.
e is tied to the distributed matrix A.

tau (local)
Arrays of size LOCc(ja+n-1). This array contains the scalar factors of the
elementary reflectors. tau is tied to the distributed matrix A.

work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.

Application Notes
If uplo = 'U', the matrix Q is represented as a product of elementary reflectors

Q = H(n-1)... H(2) H(1).

Each H(i) has the form

H(i) = i - tau * v * v',
where tau is a real scalar, and v is a real vector with v(i+1:n) = 0 and v(i) = 1; v(1:i-1) is stored on exit in
A(ia:ia+i-2, ja+i), and tau in tau[ja+i-2].

If uplo = 'L', the matrix Q is represented as a product of elementary reflectors

Q = H(1) H(2)... H(n-1).

Each H(i) has the form

H(i) = i - tau * v * v',
where tau is a real scalar, and v is a real vector with v(1:i) = 0 and v(i+1) = 1; v(i+2:n) is stored on exit in
A(ia+i+1:ia+n-1,ja+i-1), and tau in tau[ja+i-2].

The contents of sub(A) on exit are illustrated by the following examples with n = 5:

If uplo = 'U':

1456
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

If uplo = 'L':

where d and e denote diagonal and off-diagonal elements of T, and vi denotes an element of the vector
defining H(i).

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?ormtr
Multiplies a general matrix by the orthogonal
transformation matrix from a reduction to tridiagonal
form determined by p?sytrd.

Syntax
void psormtr (char *side , char *uplo , char *trans , MKL_INT *m , MKL_INT *n , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *tau , float *c , MKL_INT *ic ,
MKL_INT *jc , MKL_INT *descc , float *work , MKL_INT *lwork , MKL_INT *info );
void pdormtr (char *side , char *uplo , char *trans , MKL_INT *m , MKL_INT *n , double
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *tau , double *c , MKL_INT
*ic , MKL_INT *jc , MKL_INT *descc , double *work , MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
This function overwrites the general real distributed m-by-n matrix sub(C) = C(iс:iс+m-1,jс:jс+n-1) with

side ='L' side ='R'

trans = 'N': Q*sub(C) sub(C)*Q
trans = 'T': QT*sub(C) sub(C)*QT

where Q is a real orthogonal distributed matrix of order nq, with nq = m if side = 'L' and nq = n if side =
'R'.

1457
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Q is defined as the product of nq elementary reflectors, as returned by p?sytrd.

If uplo = 'U', Q = H(nq-1)... H(2) H(1);

If uplo = 'L', Q = H(1) H(2)... H(nq-1).

Input Parameters

side (global)
='L': Q or QT is applied from the left.
='R': Q or QT is applied from the right.

trans (global)
='N', no transpose, Q is applied.
='T', transpose, QT is applied.

uplo (global)
= 'U': Upper triangle of A(ia:*, ja:*) contains elementary reflectors
from p?sytrd;

= 'L': Lower triangle of A(ia:,ja:) contains elementary reflectors

from p?sytrd

m (global) The number of rows in the distributed matrix sub(C) (m≥0).

n (global) The number of columns in the distributed matrix sub(C) (n≥0).

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+m-1) if
side = 'L', and lld_a*LOCc(ja+n-1) if side = 'R'.
Contains the vectors that define the elementary reflectors, as returned by
p?sytrd.
If side='L', lld_a ≥ max(1,LOCr(ia+m-1));

If side ='R', lld_a ≥ max(1, LOCr(ia+n-1)).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

tau (local)
Array of size of ltau where
if side = 'L' and uplo = 'U', ltau = LOCc(m_a),

if side = 'L' and uplo = 'L', ltau = LOCc(ja+m-2),

if side = 'R' and uplo = 'U', ltau = LOCc(n_a),

if side = 'R' and uplo = 'L', ltau = LOCc(ja+n-2).

tau[i] must contain the scalar factor of the elementary reflector H(i+1), as
returned by p?sytrd (0 ≤ i < ltau). tau is tied to the distributed matrix A.

c (local)

1458
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Pointer into the local memory to an array of size lld_c*LOCc(jc+n-1).
Contains the local pieces of the distributed matrix sub (C).

ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the submatrix C, respectively.

descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.

work (local)
Workspace array of size lwork.

lwork (local or global) size of work, must be at least:

if uplo = 'U',
iaa= ia; jaa= ja+1, icc= ic; jcc= jc;
else uplo = 'L',

iaa= ia+1, jaa= ja;

If side = 'L',

icc= ic+1; jcc= jc;

else icc= ic; jcc= jc+1;

end if
end if
If side = 'L',

mi= m-1; ni= n

lwork ≥ max((nb_a*(nb_a-1))/2, (nqc0 + mpc0)*nb_a) +
nb_a*nb_a
else
If side = 'R',

mi= m; mi = n-1;
lwork≥max((nb_a*(nb_a-1))/2, (nqc0 +
max(npa0+numroc(numroc(ni+icoffc, nb_a, 0, 0, NPCOL), nb_a,
0, 0, lcmq), mpc0))*nb_a)+ nb_a*nb_a
end if
where lcmq = lcm/NPCOL with lcm = ilcm(NPROW, NPCOL),

iroffa = mod(iaa-1, mb_a),

1459
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

iccol = indxg2p(jcc, nb_c, MYCOL, csrc_c, NPCOL),

mpc0 = numroc(mi+iroffc, mb_c, MYROW, icrow, NPROW),
nqc0 = numroc(ni+icoffc, nb_c, MYCOL, iccol, NPCOL),

NOTE
mod(x,y) is the integer remainder of x/y.

ilcm, indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL,
NPROW and NPCOL can be determined by calling the function
blacs_gridinfo. If lwork = -1, then lwork is global input and a
workspace query is assumed; the function only calculates the minimum and
optimal size for all work arrays. Each of these values is returned in the first
entry of the corresponding work array, and no error message is issued by
pxerbla.

Output Parameters

c Overwritten by the product Qsub(C), or Q'sub(C), or sub(C)*Q', or

sub(C)*Q.

work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?hengst
Reduces a complex Hermitian-definite generalized
eigenproblem to standard form.

Syntax
void pchengst (const MKL_INT* ibtype, const char* uplo, const MKL_INT* n, MKL_Complex8*
a, const MKL_INT* ia, const MKL_INT* ja, const MKL_INT* desca, const MKL_Complex8* b,
const MKL_INT* ib, const MKL_INT* jb, const MKL_INT* descb, float* scale, MKL_Complex8*
work, const MKL_INT* lwork, MKL_INT* info);
void pzhengst (const MKL_INT* ibtype, const char* uplo, const MKL_INT* n,
MKL_Complex16* a, const MKL_INT* ia, const MKL_INT* ja, const MKL_INT* desca, const
MKL_Complex16* b, const MKL_INT* ib, const MKL_INT* jb, const MKL_INT* descb, double*
scale, MKL_Complex16* work, const MKL_INT* lwork, MKL_INT* info);

Include Files
• mkl_scalapack.h

1460
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
p?hengst reduces a complex Hermitian-definite generalized eigenproblem to standard form.
p?hengst performs the same function as p?hegst, but is based on rank 2K updates, which are faster and
more scalable than triangular solves (the basis of p?hengst).

p?hengst calls p?hegst when uplo='U', hence p?hengst provides improved performance only when
uplo='L' and ibtype=1.
p?hengst also calls p?hegst when insufficient workspace is provided, hence p?hengst provides improved
performance only when lwork is sufficient (as described in the parameter descriptions).

In the following sub( A ) denotes the submatrix A( ia:ia+n-1, ja:ja+n-1 ) and sub( B ) denotes the
submatrix B( ib:ib+n-1, jb:jb+n-1 ).

Input Parameters

ibtype (global)
= 1: compute inv(UH)*sub( A )*inv(U) or inv(L)*sub( A )*inv(LH);
= 2 or 3: compute U*sub( A )*UH or LH*sub( A )*L.

uplo (global)
= 'U': Upper triangle of sub( A ) is stored and sub( B ) is factored as UH*U;
= 'L': Lower triangle of sub( A ) is stored and sub( B ) is factored as L*LH.

n (global)
The order of the matrices sub( A ) and sub( B ). n >= 0.

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1).

ia (global)
Global row index of matrix A, which points to the beginning of the
submatrix on which to operate.

ja (global)
Global column index of matrix A, which points to the beginning of the
submatrix on which to operate.

1461
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

desca (global and local)

Array of size dlen_.
The array descriptor for the distributed matrix A.

b (local)
Pointer into the local memory to an array of size lld_b*LOCc(jb+n-1).

ib (global)
Global row index of matrix B, which points to the beginning of the
submatrix on which to operate.

jb (global)
Global column index of matrix B, which points to the beginning of the
submatrix on which to operate.

descb (global and local)

Array of size dlen_.
The array descriptor for the distributed matrix B.

work (local)
Array, size (lwork)

On exit, work( 1 ) returns the minimal and optimal lwork.

lwork (local)
The size of the array work.

lwork is local input and must be at least lwork >= MAX( NB * ( NP0
+1 ), 3 * NB ).
When ibtype = 1 and uplo = 'L', p?hengst provides improved
performance when lwork >= 2 * NP0 * NB + NQ0 * NB + NB * NB, where
NB = mb_a = nb_a, NP0 = numroc( n, NB, 0, 0, NPROW ), NQ0 =
numroc( n, NB, 0, 0, NPROW ), and numroc is a ScaLAPACK tool function.
MYROW, MYCOL, NPROW and NPCOL can be determined by calling the
subroutine blacs_gridinfo.

If lwork = -1, then lwork is global input and a workspace query is

Output Parameters

a On exit, if info = 0, the transformed matrix, stored in the same

format as sub( A ).

scale (global)
Amount by which the eigenvalues should be scaled to compensate for
the scaling performed in this routine.
scale is always returned as 1.0.

1462
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
work On exit, work[0] returns the minimal and optimal lwork.

info (global)

= 0: successful exit
< 0: If the i-th argument is an array and the j-entry had an illegal
value, then info = -(i*100+j), if the i-th argument is a scalar and
had an illegal value, then info = -i.

p?hentrd
Reduces a complex Hermitian matrix to Hermitian
tridiagonal form.

Syntax
void pchentrd (const char* uplo, const MKL_INT* n, MKL_Complex8* a, const MKL_INT* ia,
const MKL_INT* ja, const MKL_INT* desca, float* d, float* e, MKL_Complex8* tau,
MKL_Complex8* work, const MKL_INT* lwork, float* rwork, const MKL_INT* lrwork, MKL_INT*
info);
void pzhentrd (const char* uplo, const MKL_INT* n, MKL_Complex16* a, const MKL_INT* ia,
const MKL_INT* ja, const MKL_INT* desca, double* d, double* e, MKL_Complex16* tau,
MKL_Complex16* work, const MKL_INT* lwork, double* rwork, const MKL_INT* lrwork,
MKL_INT* info);

Include Files
• mkl_scalapack.h

Description
p?hentrd is a prototype version of p?hetrd which uses tailored codes (either the serial, ?hetrd, or the
parallel code, p?hettrd) when adequate workspace is provided.

p?hentrd reduces a complex Hermitian matrix sub( A ) to Hermitian tridiagonal form T by an unitary
similarity transformation:
Q' * sub( A ) * Q = T, where sub( A ) = A(ia:ia+n-1,ja:ja+n-1).

p?hentrd is faster than p?hetrd on almost all matrices, particularly small ones (i.e. n < 500 * sqrt(P) ),
provided that enough workspace is available to use the tailored codes.
The tailored codes provide performance that is essentially independent of the input data layout.
The tailored codes place no restrictions on ia, ja, MB or NB. At present, ia, ja, MB and NB are restricted to
those values allowed by p?hetrd to keep the interface simple (see the Application Notes section for more
information about the restrictions).

Input Parameters

uplo (global)
Specifies whether the upper or lower triangular part of the Hermitian matrix
sub( A ) is stored:
= 'U': Upper triangular
= 'L': Lower triangular

n (global)

1463
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The number of rows and columns to be operated on, i.e. the order of the
distributed submatrix sub( A ). n >= 0.

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1).

On entry, this array contains the local pieces of the Hermitian distributed
matrix sub( A ). If uplo = 'U', the leading n-by-n upper triangular part of
sub( A ) contains the upper triangular part of the matrix, and its strictly
lower triangular part is not referenced. If uplo = 'L', the leading n-by-n
lower triangular part of sub( A ) contains the lower triangular part of the
matrix, and its strictly upper triangular part is not referenced.

ia (global)
The row index in the global array a indicating the first row of sub( A ).

ja (global)

The column index in the global array a indicating the first column of
sub( A ).

desca (global and local)

Array of size dlen_.
The array descriptor for the distributed matrix A.

work (local)
Array, size (lwork)

lwork (local or global)

The size of the array work.

lwork is local input and must be at least lwork >= MAX( NB * ( NP +1 ), 3

* NB ).
For optimal performance, greater workspace is needed:
lwork >= 2*( ANB+1 )*( 4*NPS+2 ) + ( NPS + 4 ) * NPS
ANB = pjlaenv( ICTXT, 3, 'p?hettrd', 'L', 0, 0, 0, 0 )

ICTXT = desca( ctxt_ )

SQNPC = INT( sqrt( REAL( NPROW * NPCOL ) ) )

NPS = MAX( numroc( n, 1, 0, 0, SQNPC ), 2*ANB )

numroc is a ScaLAPACK tool function.

pjlaenv is a ScaLAPACK environmental inquiry function.
NPROW and NPCOL can be determined by calling the subroutine
blacs_gridinfo.

rwork (local)
Array, size (lrwork)

lrwork (local or global)

The size of the array rwork.

1464
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lrwork is local input and must be at least lrwork >= 1.
For optimal performance, greater workspace is needed, i.e. lrwork >=
MAX( 2 * n )

Output Parameters

a On exit, if uplo = 'U', the diagonal and first superdiagonal of sub( A )

are overwritten by the corresponding elements of the tridiagonal
matrix T, and the elements above the first superdiagonal, with the
array tau, represent the unitary matrix Q as a product of elementary
reflectors; if uplo = 'L', the diagonal and first subdiagonal of sub( A )
are overwritten by the corresponding elements of the tridiagonal
matrix T, and the elements below the first subdiagonal, with the array
tau, represent the unitary matrix Q as a product of elementary
reflectors. See Application Notes.

d (local)
Array, size LOCc(ja+n-1)

The diagonal elements of the tridiagonal matrix T: d[i - 1] = A(i,i).

d is tied to the distributed matrix A.

e (local)
Array, size LOCc(ja+n-1) if uplo = 'U', LOCc(ja+n-2) otherwise.

The off-diagonal elements of the tridiagonal matrix T: e[i - 1] =

A(i,i+1) if uplo = 'U', e[i - 1] = A(i+1,i) if uplo = 'L'. e is tied to
the distributed matrix A.

tau (local)
Array, size LOCc(ja+n-1).

This array contains the scalar factors tau of the elementary reflectors.
tau is tied to the distributed matrix A.

work On exit, work[0] returns the optimal lwork.

rwork On exit, rwork[0] returns the optimal lrwork.

Application Notes
If uplo = 'U', the matrix Q is represented as a product of elementary reflectors

Q = H(n-1) . . . H(2) H(1).

Each H(i) has the form

H(i) = I - tau * v * v', where tau is a complex scalar, and v is a complex vector with v(i+1:n) = 0 and v(i) =
1; v(1:i-1) is stored on exit in A(ia:ia+i-2,ja+i), and tau in tau(ja+i-1).

1465
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If uplo = 'L', the matrix Q is represented as a product of elementary reflectors

Q = H(1) H(2) . . . H(n-1).

Each H(i) has the form

H(i) = I - tau * v * v', where tau is a complex scalar, and v is a complex vector with v(1:i) = 0 and v(i+1) =
1; v(i+2:n) is stored on exit in A(ia+i+1:ia+n-1,ja+i-1), and tau in tau(ja+i-1).

The contents of sub( A ) on exit are illustrated by the following examples with n = 5:

if uplo = 'U':

d e v2 v3 v4
d e v3 v4
d e v3
d e
d
if uplo = 'L':

p?hetrd
Reduces a Hermitian matrix to Hermitian tridiagonal
form by a unitary similarity transformation.

Syntax
void pchetrd (char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , float *d , float *e , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
void pzhetrd (char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *d , double *e , MKL_Complex16 *tau , MKL_Complex16 *work ,
MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?hetrd function reduces a complex Hermitian matrix sub(A) to Hermitian tridiagonal form T by a
unitary similarity transformation:
Q'*sub(A)*Q = T

1466
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
where sub(A) = A(ia:ia+n-1,ja:ja+n-1).

Input Parameters

uplo (global)
Specifies whether the upper or lower triangular part of the Hermitian matrix
sub(A) is stored:
If uplo = 'U', upper triangular

If uplo = 'L', lower triangular

n (global) The order of the distributed matrix sub(A) (n≥0).

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1). On
entry, this array contains the local pieces of the Hermitian distributed
matrix sub(A).
If uplo = 'U', the leading n-by-n upper triangular part of sub(A) contains
the upper triangular part of the matrix, and its strictly lower triangular part
is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
the lower triangular part of the matrix, and its strictly upper triangular part
is not referenced. (see Application Notes below).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

work (local)
Workspace array of size lwork.

lwork (local or global) size of work, must be at least:

lwork≥max(NB(np +1), 3NB)

where NB = mb_a = nb_a,

np = numroc(n, NB, MYROW, iarow, NPROW),

iarow = indxg2p(ia, NB, MYROW, rsrc_a, NPROW).
indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.

If lwork = -1, then lwork is global input and a workspace query is

Output Parameters

a On exit,

1467
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If uplo = 'U', the diagonal and first superdiagonal of sub(A) are

overwritten by the corresponding elements of the tridiagonal matrix T, and
the elements above the first superdiagonal, with the array tau, represent
the unitary matrix Q as a product of elementary reflectors;if uplo = 'L',
the diagonal and first subdiagonal of sub(A) are overwritten by the
corresponding elements of the tridiagonal matrix T, and the elements below
the first subdiagonal, with the array tau, represent the unitary matrix Q as
a product of elementary reflectors (see Application Notes below).

d (local)
Arrays of size LOCc(ja+n-1). The diagonal elements of the tridiagonal
matrix T:
d[i]= A(i+1,i+1), 0 ≤i < LOCc(ja+n-1).
d is tied to the distributed matrix A.

e (local)
Arrays of size LOCc(ja+n-1) if uplo = 'U'; LOCc(ja+n-2) - otherwise.

The off-diagonal elements of the tridiagonal matrix T:

e[i]= A(i+1,i+2), 0 ≤i < LOCc(ja+n-1) if uplo = 'U',
e[i] = A(i+2,i+1) if uplo = 'L'.
e is tied to the distributed matrix A.

tau (local)
Array of size LOCc(ja+n-1). This array contains the scalar factors of the
elementary reflectors. tau is tied to the distributed matrix A.

work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.

Application Notes
If uplo = 'U', the matrix Q is represented as a product of elementary reflectors

Q = H(n-1)*...*H(2)*H(1).

Each H(i) has the form

H(i) = i - tau*v*v',
where tau is a complex scalar, and v is a complex vector with v(i+1:n) = 0 and v(i) = 1; v(1:i-1) is stored on
exit in A(ia:ia+i-2, ja+i), and tau in tau[ja+i-2].

If uplo = 'L', the matrix Q is represented as a product of elementary reflectors

Q = H(1)*H(2)*...*H(n-1).

Each H(i) has the form

1468
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
H(i) = i - tau*v*v',
where tau is a complex scalar, and v is a complex vector with v(1:i) = 0 and v(i+1) = 1; v(i+2:n) is stored
on exit in A(ia+i+1:ia+n-1,ja+i-1), and tau in tau[ja+i-2].

The contents of sub(A) on exit are illustrated by the following examples with n = 5:

If uplo = 'U':

If uplo = 'L':

where d and e denote diagonal and off-diagonal elements of T, and vi denotes an element of the vector
defining H(i).

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?unmtr
Multiplies a general matrix by the unitary
transformation matrix from a reduction to tridiagonal
form determined by p?hetrd.

Syntax
void pcunmtr (char *side , char *uplo , char *trans , MKL_INT *m , MKL_INT *n ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau ,
MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work ,
MKL_INT *lwork , MKL_INT *info );
void pzunmtr (char *side , char *uplo , char *trans , MKL_INT *m , MKL_INT *n ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau ,
MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex16 *work ,
MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

1469
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Description
This function overwrites the general complex distributed m-by-n matrix sub(C) = C(iс:iс+m-1,jс:jс+n-1)
with

side ='L' side ='R'

trans = 'N': Q*sub(C) sub(C)*Q
trans = 'C': QH*sub(C) sub(C)*QH

where Q is a complex unitary distributed matrix of order nq, with nq =m if side = 'L' and nq =n if side =
'R'.
Q is defined as the product of nq-1 elementary reflectors, as returned by p?hetrd.

If uplo = 'U', Q = H(nq-1)... H(2) H(1);

If uplo = 'L', Q = H(1) H(2)... H(nq-1).

Input Parameters

side (global)
='L': Q or QH is applied from the left.
='R': Q or QH is applied from the right.

trans (global)
='N', no transpose, Q is applied.
='C', conjugate transpose, QH is applied.

uplo (global)
= 'U': Upper triangle of A(ia:*, ja:*) contains elementary reflectors
from p?hetrd;

= 'L': Lower triangle of A(ia:,ja:) contains elementary reflectors

from p?hetrd

m (global) The number of rows in the distributed matrix sub(C) (m≥0).

n (global) The number of columns in the distributed matrix sub(C) (n≥0).

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+m-1) if
side = 'L', and lld_a*LOCc(ja+n-1) if side = 'R'.
Contains the vectors which define the elementary reflectors, as returned by
p?hetrd.
If side='L', lld_a ≥ max(1,LOCr(ia+m-1));

If side ='R', lld_a ≥ max(1,LOCr(ia+n-1)).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

tau (local)

1470
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Array of size of ltau where
If side = 'L' and uplo = 'U', ltau = LOCc(m_a),

if side = 'L' and uplo = 'L', ltau = LOCc(ja+m-2),

if side = 'R' and uplo = 'U', ltau = LOCc(n_a),

if side = 'R' and uplo = 'L', ltau = LOCc(ja+n-2).

tau[i] must contain the scalar factor of the elementary reflector H(i+1), as
returned by p?hetrd (0 ≤ i < ltau). tau is tied to the distributed matrix A.

c (local)
Pointer into the local memory to an array of size lld_c*LOCc(jc+n-1).
Contains the local pieces of the distributed matrix sub (C).

ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the submatrix C, respectively.

descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.

work (local)
Workspace array of size lwork.

lwork (local or global) size of work, must be at least:

If uplo = 'U',

iaa= ia; jaa= ja+1, icc= ic; jcc= jc;

else uplo = 'L',

iaa= ia+1, jaa= ja;

If side = 'L',

icc= ic+1; jcc= jc;

else icc= ic; jcc= jc+1;

end if
end if
If side = 'L',

mi= m-1; ni= n

lwork ≥ max((nb_a*(nb_a-1))/2, (nqc0 + mpc0)*nb_a) +
nb_a*nb_a
else
If side = 'R',

mi= m; mi = n-1;
lwork ≥ max((nb_a*(nb_a-1))/2, (nqc0 +
max(npa0+numroc(numroc(ni+icoffc, nb_a, 0, 0, NPCOL), nb_a,
0, 0, lcmq), mpc0))*nb_a) + nb_a*nb_a
end if
where lcmq = lcm/NPCOL with lcm = ilcm(NPROW, NPCOL),

1471
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

iroffa = mod(iaa-1, mb_a),

icoffa = mod(jaa-1, nb_a),
iarow = indxg2p(iaa, mb_a, MYROW, rsrc_a, NPROW),
npa0 = numroc(ni+iroffa, mb_a, MYROW, iarow, NPROW),

iroffc = mod(icc-1, mb_c),

icoffc = mod(jcc-1, nb_c),
icrow = indxg2p(icc, mb_c, MYROW, rsrc_c, NPROW),
iccol = indxg2p(jcc, nb_c, MYCOL, csrc_c, NPCOL),
mpc0 = numroc(mi+iroffc, mb_c, MYROW, icrow, NPROW),
nqc0 = numroc(ni+icoffc, nb_c, MYCOL, iccol, NPCOL),

NOTE
mod(x,y) is the integer remainder of x/y.

ilcm, indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL,
NPROW and NPCOL can be determined by calling the function
blacs_gridinfo. If lwork = -1, then lwork is global input and a
workspace query is assumed; the function only calculates the minimum and
optimal size for all work arrays. Each of these values is returned in the first
entry of the corresponding work array, and no error message is issued by
pxerbla.

Output Parameters

c Overwritten by the product Qsub(C), or Q'sub(C), or sub(C)*Q', or

sub(C)*Q.

work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?stebz
Computes the eigenvalues of a symmetric tridiagonal
matrix by bisection.

Syntax
void psstebz (MKL_INT *ictxt , char *range , char *order , MKL_INT *n , float *vl ,
float *vu , MKL_INT *il , MKL_INT *iu , float *abstol , float *d , float *e , MKL_INT
*m , MKL_INT *nsplit , float *w , MKL_INT *iblock , MKL_INT *isplit , float *work ,
MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork , MKL_INT *info );

1472
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void pdstebz (MKL_INT *ictxt , char *range , char *order , MKL_INT *n , double *vl ,
double *vu , MKL_INT *il , MKL_INT *iu , double *abstol , double *d , double *e ,
MKL_INT *m , MKL_INT *nsplit , double *w , MKL_INT *iblock , MKL_INT *isplit , double
*work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?stebz function computes the eigenvalues of a symmetric tridiagonal matrix in parallel. These may be
all eigenvalues, all eigenvalues in the interval [vlvu], or the eigenvalues il through iu. A static partitioning
of work is done at the beginning of p?stebz which results in all processes finding an (almost) equal number
of eigenvalues.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

ictxt (global) The BLACS context handle.

range (global) Must be 'A' or 'V' or 'I'.

If range = 'A', the function computes all eigenvalues.

If range = 'V', the function computes eigenvalues in the interval [vl,

vu].
If range ='I', the function computes eigenvalues il through iu.

order (global) Must be 'B' or 'E'.

If order = 'B', the eigenvalues are to be ordered from smallest to largest

within each split-off block.
If order = 'E', the eigenvalues for the entire matrix are to be ordered
from smallest to largest.

n (global) The order of the tridiagonal matrix T(n≥0).

vl, vu (global)
If range = 'V', the function computes the lower and the upper bounds for
the eigenvalues on the interval [1, vu].

If range = 'A' or 'I', vl and vu are not referenced.

il, iu (global)
Constraint: 1≤il≤iu≤n.

If range = 'I', the index of the smallest eigenvalue is returned for il and
of the largest eigenvalue for iu (assuming that the eigenvalues are in
ascending order) must be returned.
If range = 'A' or 'V', il and iu are not referenced.

1473
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

abstol (global)
The absolute tolerance to which each eigenvalue is required. An eigenvalue
(or cluster) is considered to have converged if it lies in an interval of width
abstol. If abstol≤0, then the tolerance is taken as ulp||T||, where ulp is
the machine precision, and ||T|| means the 1-norm of T
Eigenvalues will be computed most accurately when abstol is set to the
underflow threshold slamch('U'), not 0. Note that if eigenvectors are
desired later by inverse iteration (p?stein), abstol should be set to
2*p?lamch('S').

d (global)

Array of size n.

Contains n diagonal elements of the tridiagonal matrix T. To avoid overflow,

the matrix must be scaled so that its largest entry is no greater than the
overflow(1/2) * underflow(1/4) in absolute value, and for greatest
accuracy, it should not be much smaller than that.

e (global)
Array of size n - 1.

Contains (n-1) off-diagonal elements of the tridiagonal matrix T. To avoid

overflow, the matrix must be scaled so that its largest entry is no greater
than overflow(1/2) * underflow(1/4) in absolute value, and for greatest
accuracy, it should not be much smaller than that.

work (local)
Array of size max(5n, 7). This is a workspace array.

lwork (local) The size of the work array must be ≥ max(5n, 7).

If lwork = -1, then lwork is global input and a workspace query is

iwork (local) Array of size max(4n, 14). This is a workspace array.

liwork (local) The size of the iwork array must ≥max(4n, 14, NPROCS).

If liwork = -1, then liwork is global input and a workspace query is

Output Parameters

m (global) The actual number of eigenvalues found. 0≤m≤n

nsplit (global) The number of diagonal blocks detected in T. 1≤nsplit≤n

w (global)
Array of size n. On exit, the first m elements of w contain the eigenvalues on
all processes.

1474
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
iblock (global)
Array of size n. At each row/column j where e[j-1] is zero or small, the
matrix T is considered to split into a block diagonal matrix. On exit
iblock[i] specifies which block (from 1 to the number of blocks) the
eigenvalue w[i] belongs to.

NOTE
In the (theoretically impossible) event that bisection does not
converge for some or all eigenvalues, info is set to 1 and the
ones for which it did not are identified by a negative block
number.

isplit (global)
Array of size n.

Contains the splitting points, at which T breaks up into submatrices. The

first submatrix consists of rows/columns 1 to isplit[0], the second of
rows/columns isplit[0]+1 through isplit[1], and so on, and the
nsplit-th submatrix consists of rows/columns isplit[nsplit-2]+1
through isplit[nsplit-1]=n. (Only the first nsplit elements are used,
but since the nsplit values are not known, n words must be reserved for
isplit.)

info (global)
If info = 0, the execution is successful.

If info < 0, if info = -i, the i-th argument has an illegal value.

If info> 0, some or all of the eigenvalues fail to converge or are not

computed.
If info = 1, bisection fails to converge for some eigenvalues; these
eigenvalues are flagged by a negative block number. The effect is that the
eigenvalues may not be as accurate as the absolute and relative tolerances.
If info = 2, mismatch between the number of eigenvalues output and the
number desired.
If info = 3: range='I', and the Gershgorin interval initially used is
incorrect. No eigenvalues are computed. Probable cause: the machine has a
sloppy floating-point arithmetic. Increase the fudge parameter, recompile,
and try again.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?stedc
Computes all eigenvalues and eigenvectors of a
symmetric tridiagonal matrix in parallel.

Syntax
void psstedc (const char* compz, const MKL_INT* n, float* d, float* e, float* q, const
MKL_INT* iq, const MKL_INT* jq, const MKL_INT* descq, float* work, MKL_INT* lwork,
MKL_INT* iwork, const MKL_INT* liwork, MKL_INT* info);

1475
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

void pdstedc (const char* compz, const MKL_INT* n, double* d, double* e, double* q,
const MKL_INT* iq, const MKL_INT* jq, const MKL_INT* descq, double* work, MKL_INT*
lwork, MKL_INT* iwork, const MKL_INT* liwork, MKL_INT* info);

Include Files
• mkl_scalapack.h

Description
p?stedc computes all eigenvalues and eigenvectors of a symmetric tridiagonal matrix in parallel, using the
divide and conquer algorithm.

Input Parameters

compz = 'N': Compute eigenvalues only. (NOT IMPLEMENTED YET)

= 'I': Compute eigenvectors of tridiagonal matrix also.
= 'V': Compute eigenvectors of original dense symmetric matrix also. On
entry, Z contains the orthogonal matrix used to reduce the original matrix
to tridiagonal form. (NOT IMPLEMENTED YET)

n (global)
The order of the tridiagonal matrix T. n >= 0.

d (global)
Array, size (n)

On entry, the diagonal elements of the tridiagonal matrix.

e (global)
Array, size (n-1).

On entry, the subdiagonal elements of the tridiagonal matrix.

iq (global)
Q's global row index, which points to the beginning of the submatrix which
is to be operated on.

jq (global)
Q's global column index, which points to the beginning of the submatrix
which is to be operated on.

descq (global and local)

Array of size dlen_.
The array descriptor for the distributed matrix Q.

work (local)
Array, size (lwork)

lwork (local)
The size of the array work.

lwork = 6n + 2NP*NQ

NP = numroc( n, NB, MYROW, DESCQ( rsrc_ ), NPROW )

1476
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
NQ = numroc( n, NB, MYCOL, DESCQ( csrc_ ), NPCOL )

numroc is a ScaLAPACK tool function.

If lwork = -1, the lwork is global input and a workspace query is
assumed; the routine only calculates the minimum size for the work array.
The required workspace is returned as the first element of work and no
error message is issued by pxerbla.

iwork (local)
Array, size (liwork)

liwork The size of the array iwork.

liwork = 2 + 7n + 8NPCOL

Output Parameters

d On exit, if info = 0, the eigenvalues in descending order.

q (local)
Array, local size ( lld_q, LOCc(jq+n-1))

q contains the orthonormal eigenvectors of the symmetric tridiagonal

matrix.
On output, q is distributed across the P processes in block cyclic
format.

work On output, work[0] returns the workspace needed.

iwork On exit, if liwork > 0, iwork[0] returns the optimal liwork.

info (global)
= 0: successful exit.
< 0: If the i-th argument is an array and the j-th entry had an illegal
value, then info = -(i*100+j), if the i-th argument is a scalar and
had an illegal value, then info = -i.

> 0: The algorithm failed to compute the info/(n+1)-th eigenvalue

while working on the submatrix lying in global rows and columns
mod(info,n+1).

p?stein
Computes the eigenvectors of a tridiagonal matrix
using inverse iteration.

Syntax
void psstein (MKL_INT *n , float *d , float *e , MKL_INT *m , float *w , MKL_INT
*iblock , MKL_INT *isplit , float *orfac , float *z , MKL_INT *iz , MKL_INT *jz ,
MKL_INT *descz , float *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork ,
MKL_INT *ifail , MKL_INT *iclustr , float *gap , MKL_INT *info );

1477
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

void pdstein (MKL_INT n , double d , double e , MKL_INT m , double *w , MKL_INT

*iblock , MKL_INT *isplit , double *orfac , double *z , MKL_INT *iz , MKL_INT *jz ,
MKL_INT *descz , double *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork ,
MKL_INT *ifail , MKL_INT *iclustr , double *gap , MKL_INT *info );
void pcstein (MKL_INT *n , float *d , float *e , MKL_INT *m , float *w , MKL_INT
*iblock , MKL_INT *isplit , float *orfac , MKL_Complex8 *z , MKL_INT *iz , MKL_INT *jz ,
MKL_INT *descz , float *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork ,
MKL_INT *ifail , MKL_INT *iclustr , float *gap , MKL_INT *info );
void pzstein (MKL_INT *n , double *d , double *e , MKL_INT *m , double *w , MKL_INT
*iblock , MKL_INT *isplit , double *orfac , MKL_Complex16 *z , MKL_INT *iz , MKL_INT
*jz , MKL_INT *descz , double *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT
*liwork , MKL_INT *ifail , MKL_INT *iclustr , double *gap , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?stein function computes the eigenvectors of a symmetric tridiagonal matrix T corresponding to
specified eigenvalues, by inverse iteration. p?stein does not orthogonalize vectors that are on different
processes. The extent of orthogonalization is controlled by the input parameter lwork. Eigenvectors that are
to be orthogonalized are computed by the same process. p?stein decides on the allocation of work among
the processes and then calls ?stein2 (modified LAPACK function) on each individual process. If insufficient
workspace is allocated, the expected orthogonalization may not be done.

NOTE
If the eigenvectors obtained are not orthogonal, increase lwork and run the code again.

p = NPROW*NPCOL is the total number of processes.

Input Parameters

n (global) The order of the matrix T(n≥ 0).

m (global) The number of eigenvectors to be returned.

d, e, w (global)
Arrays:

d of size n contains the diagonal elements of T.

e of size n-1 contains the off-diagonal elements of T.
w of size m contains all the eigenvalues grouped by split-off block. The
eigenvalues are supplied from smallest to largest within the block. (Here
the output array w from p?stebz with order = 'B' is expected. The array
should be replicated in all processes.)

iblock (global)

1478
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Array of size n. The submatrix indices associated with the corresponding
eigenvalues in w: 1 for eigenvalues belonging to the first submatrix from
the top, 2 for those belonging to the second submatrix, etc. (The output
array iblock from p?stebz is expected here).

isplit (global)
Array of size n. The splitting points at which T breaks up into submatrices.
The first submatrix consists of rows/columns 1 to isplit[0], the second of
rows/columns isplit[0]+1 through isplit[1], and so on, and the
nsplit-th submatrix consists of rows/columns isplit[nsplit-2]+1
through isplit[nsplit-1]=n. (The output array isplit from p?stebz is
expected here.)

orfac (global)
orfac specifies which eigenvectors should be orthogonalized. Eigenvectors
that correspond to eigenvalues within orfac*||T|| of each other are to be
orthogonalized. However, if the workspace is insufficient (see lwork), this
tolerance may be decreased until all eigenvectors can be stored in one
process. No orthogonalization is done if orfac is equal to zero. A default
value of 1000 is used if orfac is negative. orfac should be identical on all
processes

iz, jz (global) The row and column indices in the global matrix Z indicating the
first row and the first column of the submatrix Z, respectively.

descz (global and local) array of size dlen_. The array descriptor for the
distributed matrix Z.

work (local).
Workspace array of size lwork.

lwork (local)
lwork controls the extent of orthogonalization which can be done. The
number of eigenvectors for which storage is allocated on each process is
nvec = floor((lwork-max(5*n,np00*mq00))/n). Eigenvectors
corresponding to eigenvalue clusters of size (nvec - ceil(m/p) + 1) are
guaranteed to be orthogonal (the orthogonality is similar to that obtained
from ?stein2).

NOTE
lwork must be no smaller than max(5*n,np00*mq00) + ceil(m/
p)*n and should have the same input value on all processes.

It is the minimum value of lwork input on different processes that is

significant.
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.

iwork (local)

1479
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Workspace array of size 3n+p+1.

liwork (local) The size of the array iwork. It must be greater than 3*n+p+1.

If liwork = -1, then liwork is global input and a workspace query is

Output Parameters

z (local)
Array of size descz[dlen_-1], n/NPCOL + NB). z contains the computed
eigenvectors associated with the specified eigenvalues. Any vector which
fails to converge is set to its current iterate after MAXIT iterations
(See ?stein2). On output, z is distributed across the p processes in block
cyclic format.

work On exit, work[0] gives a lower bound on the workspace (lwork) that
guarantees the user desired orthogonalization (see orfac). Note that this
may overestimate the minimum workspace needed.

iwork On exit, iwork[0] contains the amount of integer workspace required.

On exit, the iwork[1] through iwork[p+1] indicate the eigenvectors

computed by each process. Process i computes eigenvectors indexed
iwork[i+1]+1 through iwork[i+2].

ifail (global) Array of size m. On normal exit, all elements of ifail are zero. If
one or more eigenvectors fail to converge after MAXIT iterations (as
in ?stein), then info > 0 is returned. If mod(info, m+1)>0, then for i=1
to mod(info,m+1), the eigenvector corresponding to the eigenvalue
w[ifail[i-1]-1] failed to converge (w refers to the array of eigenvalues
on output).

NOTE
mod(x,y) is the integer remainder of x/y.

iclustr (global) Array of size 2*p.

This output array contains indices of eigenvectors corresponding to a cluster

of eigenvalues that could not be orthogonalized due to insufficient
workspace (see lwork, orfac and info). Eigenvectors corresponding to
clusters of eigenvalues indexed iclustr(2*I-1) to iclustr(2*I), i = 1
to info/(m+1), could not be orthogonalized due to lack of workspace.
Hence the eigenvectors corresponding to these clusters may not be
orthogonal. iclustr is a zero terminated array: iclustr[2*k-1]≠ 0 and
iclustr[2*k] = 0 if and only if k is the number of clusters.

gap (global)
This output array contains the gap between eigenvalues whose
eigenvectors could not be orthogonalized. The info/m output values
in this array correspond to the info/(m+1) clusters indicated by the

1480
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
array iclustr. As a result, the dot product between eigenvectors
corresponding to the i-th cluster may be as high as
(O(n)*macheps)/gap[i-1].
info (global)
If info = 0, the execution is successful.

If info < 0: If the i-th argument is an array and the j-th entry, indexed
j-1, had an illegal value, then info = -(i*100+j),

If the i-th argument is a scalar and had an illegal value, then info = -i.

If info < 0: if info = -i, the i-th argument had an illegal value.

If info > 0: if mod(info, m+1) = i, then i eigenvectors failed to converge

in MAXIT iterations. Their indices are stored in the array ifail. If info/(m
+1) = i, then eigenvectors corresponding to i clusters of eigenvalues could
not be orthogonalized due to insufficient workspace. The indices of the
clusters are stored in the array iclustr.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

Nonsymmetric Eigenvalue Problems: ScaLAPACK Computational Routines

This section describes ScaLAPACK routines for solving nonsymmetric eigenvalue problems, computing the
Schur factorization of general matrices, as well as performing a number of related computational tasks.
To solve a nonsymmetric eigenvalue problem with ScaLAPACK, you usually need to reduce the matrix to the
upper Hessenberg form and then solve the eigenvalue problem with the Hessenberg matrix obtained.
Table "Computational Routines for Solving Nonsymmetric Eigenproblems"lists ScaLAPACK routines for
reducing the matrix to the upper Hessenberg form by an orthogonal (or unitary) similarity transformation A=
QHQH, as well as routines for solving eigenproblems with Hessenberg matrices, and multiplying the matrix
after reduction.
Computational Routines for Solving Nonsymmetric Eigenproblems
Operation performed General matrix Orthogonal/Unitary Hessenberg matrix
matrix
Reduce to Hessenberg form A= QHQH p?gehrd
Multiply the matrix after reduction p?ormhr/ p?unmhr
Find eigenvalues and Schur p?lahqr
factorization

p?gehrd
Reduces a general matrix to upper Hessenberg form.

Syntax
void psgehrd (MKL_INT *n , MKL_INT *ilo , MKL_INT *ihi , float *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *tau , float *work , MKL_INT *lwork , MKL_INT
*info );
void pdgehrd (MKL_INT *n , MKL_INT *ilo , MKL_INT *ihi , double *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *tau , double *work , MKL_INT *lwork , MKL_INT
*info );
void pcgehrd (MKL_INT *n , MKL_INT *ilo , MKL_INT *ihi , MKL_Complex8 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );

1481
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

void pzgehrd (MKL_INT n , MKL_INT ilo , MKL_INT ihi , MKL_Complex16 a , MKL_INT

*ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?gehrd function reduces a real/complex general distributed matrix sub(A) to upper Hessenberg form H
by an orthogonal or unitary similarity transformation
Q'*sub(A)*Q = H,
where sub(A) = A(ia:ia+n-1, ja:ja+n-1).

Input Parameters

n (global). The order of the distributed matrix sub(A) (n≥0).

ilo, ihi (global).

It is assumed that sub(A) is already upper triangular in rows ia:ia+ilo-2
and ia+ihi:ia+n-1 and columns ja:ja+ilo-2 and ja+ihi:ja+n-1. (See
Application Notes below).
If n > 0, 1≤ilo≤ihi≤n; otherwise set ilo = 1, ihi = n.

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1). On
entry, this array contains the local pieces of the n-by-n general distributed
matrix sub(A) to be reduced.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

work (local)
Workspace array of size lwork.

lwork (local or global) size of the array work. lwork is local input and must be at
least
lwork≥NB*NB + NB*max(ihip+1, ihlp+inlq)
where NB = mb_a = nb_a,

iroffa = mod(ia-1, NB),

icoffa = mod(ja-1, NB),
ioff = mod(ia+ilo-2, NB), iarow = indxg2p(ia, NB, MYROW,
rsrc_a, NPROW), ihip = numroc(ihi+iroffa, NB, MYROW, iarow,
NPROW),
ilrow = indxg2p(ia+ilo-1, NB, MYROW, rsrc_a, NPROW),
ihlp = numroc(ihi-ilo+ioff+1, NB, MYROW, ilrow, NPROW),

1482
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ilcol = indxg2p(ja+ilo-1, NB, MYCOL, csrc_a, NPCOL),
inlq = numroc(n-ilo+ioff+1, NB, MYCOL, ilcol, NPCOL),

NOTE
mod(x,y) is the integer remainder of x/y.

indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.

If lwork = -1, then lwork is global input and a workspace query is

Output Parameters

a On exit, the upper triangle and the first subdiagonal of sub(A) are
overwritten with the upper Hessenberg matrix H, and the elements below
the first subdiagonal, with the array tau, represent the orthogonal/unitary
matrix Q as a product of elementary reflectors (see Application Notes
below).

tau (local).
Array of size at least max(ja+n-2).

The scalar factors of the elementary reflectors (see Application Notes

below). Elements ja:ja+ilo-2 and ja+ihi:ja+n-2 of the global vector
tau are set to zero. tau is tied to the distributed matrix A.

work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.

Application Notes
The matrix Q is represented as a product of (ihi-ilo) elementary reflectors

Q = H(ilo)*H(ilo+1)*...*H(ihi-1).

Each H(i) has the form

H(i)= i - tau*v*v'
where tau is a real/complex scalar, and v is a real/complex vector with v(1:i)= 0, v(i+1)= 1 and v(ihi
+1:n)= 0; v(i+2:ihi) is stored on exit in A(ia+ilo+i:ia+ihi-1,ja+ilo+i-2), and tau in tau[ja+ilo
+i-3]. The contents of A

(ia:ia+n-1,ja:ja+n-1) are illustrated by the following example, with n = 7, ilo = 2 and ihi = 6:
on entry

1483
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

on exit

where a denotes an element of the original matrix sub(A), H denotes a modified element of the upper
Hessenberg matrix H, and vi denotes an element of the vector defining H(ja+ilo+i-2).

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?ormhr
Multiplies a general matrix by the orthogonal
transformation matrix from a reduction to Hessenberg
form determined by p?gehrd.

Syntax
void psormhr (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *ilo ,
MKL_INT *ihi , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *tau ,
float *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , float *work , MKL_INT *lwork ,
MKL_INT *info );
void pdormhr (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *ilo ,
MKL_INT *ihi , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *tau ,
double *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , double *work , MKL_INT *lwork ,
MKL_INT *info );

Include Files
• mkl_scalapack.h

1484
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
The p?ormhr function overwrites the general real distributed m-by-n matrix sub(C)= C(iс:iс+m-1,jс:jс
+n-1) with

side ='L' side ='R'

trans = 'N': Q*sub(C) sub(C)*Q
trans = 'T': QT*sub(C) sub(C)*QT

where Q is a real orthogonal distributed matrix of order nq, with nq = m if side = 'L' and nq = n if side =
'R'.
Q is defined as the product of ihi-ilo elementary reflectors, as returned by p?gehrd.

Q = H(ilo) H(ilo+1)... H(ihi-1).

Input Parameters

side (global)
='L': Q or QT is applied from the left.
='R': Q or QT is applied from the right.

trans (global)
='N', no transpose, Q is applied.
='T', transpose, QT is applied.

m (global) The number of rows in the distributed matrix sub (C) (m≥0).

n (global) The number of columns in he distributed matrix sub (C) (n≥0).

ilo, ihi (global)

ilo and ihi must have the same values as in the previous call of p?gehrd.
Q is equal to the unit matrix except for the distributed submatrix Q(ia
+ilo:ia+ihi-1,ja+ilo:ja+ihi-1).

If side = 'L', 1≤ilo≤ihi≤max(1,m);

If side = 'R', 1≤ilo≤ihi≤max(1,n);

ilo and ihi are relative indexes.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

tau (local)

1485
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Array of size LOCc(ja+m-2) if side = 'L', and LOCc(ja+n-2) if side =

'R'.
tau[j] contains the scalar factor of the elementary reflector H(j+1) as
returned by p?gehrd (0 ≤ j < size(tau)). tau is tied to the distributed
matrix A.

c (local)
Pointer into the local memory to an array of size lld_c*LOCc(jc+n-1).

Contains the local pieces of the distributed matrix sub(C).

ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the submatrix C, respectively.

descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.

work (local)
Workspace array with size lwork.

lwork (local or global)

The size of the array work.

lwork must be at least iaa = ia + ilo; jaa = ja+ilo-1;

If side = 'L',

mi = ihi-ilo; ni = n; icc = ic + ilo; jcc = jc; lwork ≥

max((nb_a*(nb_a-1))/2, (nqc0+mpc0)*nb_a) + nb_a*nb_a
else if side = 'R',

mi = m; ni = ihi-ilo; icc = ic; jcc = jc + ilo; lwork ≥

max((nb_a*(nb_a-1))/2, (nqc0+max(npa0+numroc(numroc(ni
+icoffc, nb_a, 0, 0, NPCOL), nb_a, 0, 0, lcmq), mpc0))*nb_a)
+ nb_a*nb_a
end if
where lcmq = lcm/NPCOL with lcm = ilcm(NPROW, NPCOL),

iroffa = mod(iaa-1, mb_a),

icoffa = mod(jaa-1, nb_a),
iarow = indxg2p(iaa, mb_a, MYROW, rsrc_a, NPROW),
npa0 = numroc(ni+iroffa, mb_a, MYROW, iarow, NPROW),
iroffc = mod(icc-1, mb_c), icoffc = mod(jcc-1, nb_c),
icrow = indxg2p(icc, mb_c, MYROW, rsrc_c, NPROW),
iccol = indxg2p(jcc, nb_c, MYCOL, csrc_c, NPCOL),
mpc0 = numroc(mi+iroffc, mb_c, MYROW, icrow, NPROW),
nqc0 = numroc(ni+icoffc, nb_c, MYCOL, iccol, NPCOL),

1486
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

NOTE
mod(x,y) is the integer remainder of x/y.

Output Parameters

c sub(C) is overwritten by Qsub(C), or Q'sub(C), or sub(C)*Q', or

sub(C)*Q.

work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?unmhr
Multiplies a general matrix by the unitary
transformation matrix from a reduction to Hessenberg
form determined by p?gehrd.

Syntax
void pcunmhr (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *ilo ,
MKL_INT *ihi , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca ,
MKL_Complex8 *tau , MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc ,
MKL_Complex8 *work , MKL_INT *lwork , MKL_INT *info );
void pzunmhr (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *ilo ,
MKL_INT *ihi , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca ,
MKL_Complex16 *tau , MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc ,
MKL_Complex16 *work , MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
This function overwrites the general complex distributed m-by-n matrix sub(C) = C(iс:iс+m-1,jс:jс+n-1)
with

1487
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

side ='L' side ='R'

trans = 'N': Q*sub(C) sub(C)*Q
trans = 'H': QH*sub(C) sub(C)*QH

where Q is a complex unitary distributed matrix of order nq, with nq = m if side = 'L' and nq = n if side
= 'R'.
Q is defined as the product of ihi-ilo elementary reflectors, as returned by p?gehrd.

Q = H(ilo) H(ilo+1)... H(ihi-1).

Input Parameters

side (global)
='L': Q or QH is applied from the left.
='R': Q or QH is applied from the right.

trans (global)
='N', no transpose, Q is applied.
='C', conjugate transpose, QH is applied.

m (global) The number of rows in the distributed matrix sub (C) (m≥0).

n (global) The number of columns in the distributed matrix sub (C) (n≥0).

ilo, ihi (global)

These must be the same parameters ilo and ihi, respectively, as supplied
to p?gehrd. Q is equal to the unit matrix except in the distributed
submatrixQ(ia+ilo:ia+ihi-1,ja+ilo:ja+ihi-1).

If side ='L', then 1≤ilo≤ihi≤max(1,m).

If side = 'R', then 1≤ilo≤ihi≤max(1,n)

ilo and ihi are relative indexes.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

tau (local)
Array of size LOCc(ja+m-2), if side = 'L', and LOCc(ja+n-2) if side =
'R'.
tau[j] contains the scalar factor of the elementary reflector H(j+1) as
returned by p?gehrd (0 ≤ j < size(tau)). tau is tied to the distributed
matrix A.

1488
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
c (local)
Pointer into the local memory to an array of size lld_c*LOCc(jc+n-1).

Contains the local pieces of the distributed matrix sub(C).

ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the submatrix C, respectively.

descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.

work (local)
Workspace array with size lwork.

lwork (local or global)

The size of the array work.

lwork must be at least iaa = ia + ilo;jaa = ja+ilo-1;

If side = 'L', mi = ihi-ilo; ni = n; icc = ic + ilo; jcc = jc;
lwork ≥ max((nb_a*(nb_a-1))/2, (nqc0+mpc0)*nb_a) + nb_a*nb_a
else if side = 'R',

mi = m; ni = ihi-ilo; icc = ic; jcc = jc + ilo; lwork ≥

max((nb_a*(nb_a-1))/2, (nqc0 + max(npa0+numroc(numroc(ni
+icoffc, nb_a, 0, 0, NPCOL), nb_a, 0, 0, lcmq ),
mpc0))*nb_a) + nb_a*nb_a
end if
where lcmq = lcm/NPCOL with lcm = ilcm(NPROW, NPCOL),

iroffa = mod(iaa-1, mb_a),

icoffa = mod(jaa-1, nb_a),
iarow = indxg2p(iaa, mb_a, MYROW, rsrc_a, NPROW),
npa0 = numroc(ni+iroffa, mb_a, MYROW, iarow, NPROW),
iroffc = mod(icc-1, mb_c),
icoffc = mod(jcc-1, nb_c),
icrow = indxg2p(icc, mb_c, MYROW, rsrc_c, NPROW),
iccol = indxg2p(jcc, nb_c, MYCOL, csrc_c, NPCOL),
mpc0 = numroc(mi+iroffc, mb_c, MYROW, icrow, NPROW),
nqc0 = numroc(ni+icoffc, nb_c, MYCOL, iccol, NPCOL),

NOTE
mod(x,y) is the integer remainder of x/y.

ilcm, indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL,
NPROW and NPCOL can be determined by calling the function
blacs_gridinfo.

1489
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If lwork = -1, then lwork is global input and a workspace query is

Output Parameters

c C is overwritten by Q* sub(C) or Q'sub(C) or sub(C)Q' or sub(C)*Q.

work[0]) On exit work[0] contains the minimum value of lwork required for
optimum performance.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?lahqr
Computes the Schur decomposition and/or
eigenvalues of a matrix already in Hessenberg form.

Syntax
void pslahqr (MKL_INT *wantt, MKL_INT *wantz, MKL_INT *n, MKL_INT *ilo, MKL_INT *ihi,
float *a, MKL_INT *desca, float *wr, float *wi, MKL_INT *iloz, MKL_INT *ihiz, float *z,
MKL_INT *descz, float *work, MKL_INT *lwork, MKL_INT *iwork, MKL_INT *ilwork, MKL_INT
*info );
void pdlahqr (MKL_INT *wantt, MKL_INT *wantz, MKL_INT *n, MKL_INT *ilo, MKL_INT *ihi,
double *a, MKL_INT *desca, double *wr, double *wi, MKL_INT *iloz, MKL_INT *ihiz, double
*z, MKL_INT *descz, double *work, MKL_INT *lwork, MKL_INT *iwork, MKL_INT *ilwork,
MKL_INT *info );
void pclahqr (const MKL_INT *wantt, const MKL_INT *wantz, const MKL_INT *n, const
MKL_INT *ilo, const MKL_INT *ihi, MKL_Complex8 *a, const MKL_INT *desca, MKL_Complex8
*w, const MKL_INT *iloz, const MKL_INT *ihiz, MKL_Complex8 *z, const MKL_INT *descz,
MKL_Complex8 *work, const MKL_INT *lwork, const MKL_INT *iwork, const MKL_INT *ilwork,
MKL_INT *info );
void pzlahqr (const MKL_INT *wantt, const MKL_INT *wantz, const MKL_INT *n, const
MKL_INT *ilo, const MKL_INT *ihi, MKL_Complex16 *a, const MKL_INT *desca, MKL_Complex16
*w, const MKL_INT *iloz, const MKL_INT *ihiz, MKL_Complex16 *z, const MKL_INT *descz,
MKL_Complex16 *work, const MKL_INT *lwork, const MKL_INT *iwork, const MKL_INT *ilwork,
MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
This is an auxiliary function used to find the Schur decomposition and/or eigenvalues of a matrix already in
Hessenberg form from columns ilo and ihi.

1490
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
NOTE
These restrictions apply to the use of p?lahqr:
• The code requires the distributed block size to be square and at least 6.
• The code requires A and Z to be distributed identically and have identical contexts.
• The matrix A must be in upper Hessenberg form. If elements below the subdiagonal are non-zero,
the resulting transformations can be nonsimilar.
• All eigenvalues are distributed to all the nodes.

Input Parameters

wantt (global)
If wantt≠ 0, the full Schur form T is required;

If wantt = 0, only eigenvalues are required.

wantz (global)
If wantz≠ 0, the matrix of Schur vectors Z is required;

If wantz = 0, Schur vectors are not required.

n (global) The order of the Hessenberg matrix A (and z if wantz is non-zero).

n≥0.

ilo, ihi (global)

It is assumed that A is already upper quasi-triangular in rows and columns
ihi+1:n, and that A(ilo, ilo-1) = 0 (unless ilo = 1). p?lahqr works
primarily with the Hessenberg submatrix in rows and columns ilo to ihi,
but applies transformations to all of H if wantt is non-zero.
1≤ilo≤max(1,ihi); ihi ≤ n.

a (global)
Array, of size lld_a * LOCc(n) . On entry, the upper Hessenberg matrix A.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

iloz, ihiz (global) Specify the rows of the matrix Z to which transformations must be
applied if wantz is non-zero. 1≤iloz≤ilo; ihi≤ihiz≤n.

z (global )
Array. If wantz is non-zero, on entry z must contain the current matrix Z of
transformations accumulated by pdhseqr. If wantz is zero, z is not
referenced.

descz (global and local) array of size dlen_. The array descriptor for the
distributed matrix Z.

work (local)
Workspace array with size lwork.

lwork (local) The size of work. lwork is assumed big enough so that lwork≥3*n
+ max(2*max(lld_z,lld_a) + 2*LOCq(n), 7*ceil(n/hbl)/
lcm(NPROW,NPCOL))).

1491
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If lwork = -1, then work[0] gets set to the above number and the code
returns immediately.

iwork (global and local) array of size ilwork. Not referenced and can be NULL
pointer.

ilwork (local) This holds some of the iblk integer arrays. Not referenced and can be
NULL pointer.

Output Parameters

a On exit, if wantt is non-zero, A is upper quasi-triangular in rows and

columns ilo:ihi, with any 2-by-2 or larger diagonal blocks not yet in
standard form. If wantt is zero, the contents of A are unspecified on exit.

work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.

wr, wi (global replicated output)

Arrays of size n each. The real and imaginary parts, respectively, of the
computed eigenvalues ilo to ihiare stored in the corresponding elements
of wr and wi. If two eigenvalues are computed as a complex conjugate pair,
they are stored in consecutive elements of wr and wi, say the i-th and (i
+1)-th, with wi[i-1]> 0 and wi[i] < 0. If wantt is zero, the eigenvalues are
stored in the same order as on the diagonal of the Schur form returned in
A. A may be returned with larger diagonal blocks until the next release.

w (global replicated output)

Array of size n. The computed eigenvalues ilo to ihi are stored in the
corresponding elements of w. If two eigenvalues are computed as a complex
conjugate pair, they are stored in consecutive elements of w, say the i-th
and (i+1)-th, with w[i-1]> 0 and w[i] < 0. If wantt is zero, the eigenvalues
are stored in the same order as on the diagonal of the Schur form returned
in A. A may be returned with larger diagonal blocks until the next release.

z On exit z has been updated; transformations are applied only to the

submatrix Z(iloz:ihiz, ilo:ihi).

info (global)
= 0: the execution is successful.
< 0: the parameter number - info is incorrect or inconsistent
> 0: p?lahqr failed to compute all the eigenvalues ilo to ihi in a total of
30*(ihi-ilo+1) iterations; if info = i, elements i+1: ihi of wr and wi
contain the eigenvalues that have been successfully computed.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?trevc
Computes right and/or left eigenvectors of a complex
upper triangular matrix in parallel.

1492
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void pctrevc (const char* side, const char* howmny, const MKL_INT* select, const
MKL_INT* n, MKL_Complex8* t, const MKL_INT* desct, MKL_Complex8* vl, const MKL_INT*
descvl, MKL_Complex8* vr, const MKL_INT* descvr, const MKL_INT* mm, MKL_INT* m,
MKL_Complex8* work, float* rwork, MKL_INT* info);
void pztrevc (const char* side, const char* howmny, const MKL_INT* select, const
MKL_INT* n, MKL_Complex16* t, const MKL_INT* desct, MKL_Complex16* vl, const MKL_INT*
descvl, MKL_Complex16* vr, const MKL_INT* descvr, const MKL_INT* mm, MKL_INT* m,
MKL_Complex16* work, double* rwork, MKL_INT* info);
void pdtrevc (const char* side, const char* howmny, const MKL_INT* select, const
MKL_INT* n, double* t, const MKL_INT* desct, double* vl, const MKL_INT* descvl, double*
vr, const MKL_INT* descvr, const MKL_INT* mm, MKL_INT* m, double* work, MKL_INT* info);
void pstrevc (const char* side, const char* howmny, const MKL_INT* select, const
MKL_INT* n, float* t, const MKL_INT* desct, float* vl, const MKL_INT* descvl, float* vr,
const MKL_INT* descvr, const MKL_INT* mm, MKL_INT* m, float* work, MKL_INT* info);

Include Files
• mkl_scalapack.h

Description
p?trevc computes some or all of the right and/or left eigenvectors of a complex upper triangular matrix T in
parallel.
The right eigenvector x and the left eigenvector y of T corresponding to an eigenvalue w are defined by:
T*x = w*x,
y'*T = w*y'
where y' denotes the conjugate transpose of the vector y.
If all eigenvectors are requested, the routine may either return the matrices X and/or Y of right or left
eigenvectors of T, or the products Q*X and/or Q*Y, where Q is an input unitary matrix. If T was obtained
from the Schur factorization of an original matrix A = Q*T*Q', then Q*X and Q*Y are the matrices of right or
left eigenvectors of A.

Input Parameters

side (global)
= 'R': compute right eigenvectors only;
= 'L': compute left eigenvectors only;
= 'B': compute both right and left eigenvectors.

howmny (global)
= 'A': compute all right and/or left eigenvectors;
= 'B': compute all right and/or left eigenvectors, and backtransform them
using the input matrices supplied in vr and/or vl;

= 'S': compute selected right and/or left eigenvectors, specified by the

logical array select.

select (global)

1493
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Array, size (n)

If howmny = 'S', select specifies the eigenvectors to be computed.

If howmny = 'A' or 'B', select is not referenced. To select the eigenvector

corresponding to the j-th eigenvalue, select[j - 1] must be set to non-
zero.

n (global)
The order of the matrix T. n >= 0.

t (local)
Array, size lld_t*LOCc(n).

The upper triangular matrix T. T is modified, but restored on exit.

desct (global and local)

Array of size dlen_.
The array descriptor for the distributed matrix T.

vl (local)
Array, size (descvl(lld_),mm)

On entry, if side = 'L' or 'B' and howmny = 'B', vl must contain an n-by-n
matrix Q (usually the unitary matrix Q of Schur vectors returned
by ?hseqr).

descvl (global and local)

Array of size dlen_.
The array descriptor for the distributed matrix VL.

vr (local)
Array, size descvr(lld_)*mm.

On entry, if side = 'R' or 'B' and howmny = 'B', vr must contain an n-by-n
matrix Q (usually the unitary matrix Q of Schur vectors returned
by ?hseqr).

descvr (global and local)

Array of size dlen_.
The array descriptor for the distributed matrix VR.

mm (global)
The number of columns in the arrays vl and/or vr. mm >= m.

work (local)
Array, size ( 2*desct(lld_) )

Additional workspace may be required if p?lattrs is updated to use work.

rwork Array, size ( desct(lld_) )

1494
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

t The upper triangular matrix T. T is modified, but restored on exit.

vl On exit, if side = 'L' or 'B', vl contains:

if howmny = 'A', the matrix Y of left eigenvectors of T;

if howmny = 'B', the matrix Q*Y;

if howmny = 'S', the left eigenvectors of T specified by select, stored

consecutively in the columns of vl, in the same order as their
eigenvalues. If side = 'R', vl is not referenced.

vr On exit, if side = 'R' or 'B', vr contains:

if howmny = 'A', the matrix X of right eigenvectors of T;

if howmny = 'B', the matrix Q*X;

if howmny = 'S', the right eigenvectors of T specified by select,

stored consecutively in the columns of vr, in the same order as their
eigenvalues. If side = 'L', vr is not referenced.

m (global)
The number of columns in the arrays vl and/or vr actually used to
store the eigenvectors. If howmny = 'A' or 'B', m is set to n. Each
selected eigenvector occupies one column.

info (global)
= 0: successful exit
< 0: if info = -i, the i-th argument had an illegal value

Application Notes
The algorithm used in this program is basically backward (forward) substitution. Scaling should be used to
make the code robust against possible overflow. But scaling has not yet been implemented in p?lattrs
which is called by this routine to solve the triangular systems. p?lattrs just calls p?trsv.

Each eigenvector is normalized so that the element of largest magnitude has magnitude 1; here the
magnitude of a complex number (x,y) is taken to be |x| + |y|.

Singular Value Decomposition: ScaLAPACK Driver Routines

This section describes ScaLAPACK routines for computing the singular value decomposition (SVD) of a
general m-by-n matrix A (see LAPACK"Singular Value Decomposition" ).
To find the SVD of a general matrix A, this matrix is first reduced to a bidiagonal matrix B by a unitary
(orthogonal) transformation, and then SVD of the bidiagonal matrix is computed. Note that the SVD of B is
computed using the LAPACK routine ?bdsqr .
Table "Computational Routines for Singular Value Decomposition (SVD)" lists ScaLAPACK computational
routines for performing this decomposition.
Computational Routines for Singular Value Decomposition (SVD)
Operation General matrix Orthogonal/unitary matrix
Reduce A to a bidiagonal matrix p?gebrd
Multiply matrix after reduction p?ormbr/p?unmbr

1495
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

p?gebrd
Reduces a general matrix to bidiagonal form.

Syntax
void psgebrd (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *d , float *e , float *tauq , float *taup , float *work , MKL_INT
*lwork , MKL_INT *info );
void pdgebrd (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *d , double *e , double *tauq , double *taup , double *work , MKL_INT
*lwork , MKL_INT *info );
void pcgebrd (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , float *d , float *e , MKL_Complex8 *tauq , MKL_Complex8 *taup ,
MKL_Complex8 *work , MKL_INT *lwork , MKL_INT *info );
void pzgebrd (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *d , double *e , MKL_Complex16 *tauq , MKL_Complex16 *taup ,
MKL_Complex16 *work , MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?gebrd function reduces a real/complex general m-by-n distributed matrix sub(A)= A(ia:ia+m-1,
ja:ja+n-1) to upper or lower bidiagonal form B by an orthogonal/unitary transformation:
Q'*sub(A)*P = B.
If m≥ n, B is upper bidiagonal; if m < n, B is lower bidiagonal.

Input Parameters

m (global) The number of rows in the distributed matrix sub(A) (m≥0).

n (global) The number of columns in the distributed matrix sub(A) (n≥0).

a (local)
Real pointer into the local memory to an array of size lld_a*LOCc(ja+n-1).
On entry, this array contains the distributed matrix sub (A).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

work (local)
Workspace array of size lwork.

lwork (local or global) size of work, must be at least:

lwork≥nb*(mpa0 + nqa0+1)+ nqa0
where nb = mb_a = nb_a,

iroffa = mod(ia-1, nb),

icoffa = mod(ja-1, nb),

1496
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
iarow = indxg2p(ia, nb, MYROW, rsrc_a, NPROW),
iacol = indxg2p (ja, nb, MYCOL, csrc_a, NPCOL),
mpa0 = numroc(m +iroffa, nb, MYROW, iarow, NPROW),
nqa0 = numroc(n +icoffa, nb, MYCOL, iacol, NPCOL),

NOTE
mod(x,y) is the integer remainder of x/y.

indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.

If lwork = -1, then lwork is global input and a workspace query is

Output Parameters

a On exit, if m≥n, the diagonal and the first superdiagonal of sub(A) are
overwritten with the upper bidiagonal matrix B; the elements below the
diagonal, with the array tauq, represent the orthogonal/unitary matrix Q as
a product of elementary reflectors, and the elements above the first
superdiagonal, with the array taup, represent the orthogonal matrix P as a
product of elementary reflectors. If m < n, the diagonal and the first
subdiagonal are overwritten with the lower bidiagonal matrix B; the
elements below the first subdiagonal, with the array tauq, represent the
orthogonal/unitary matrix Q as a product of elementary reflectors, and the
elements above the diagonal, with the array taup, represent the orthogonal
matrix P as a product of elementary reflectors. See Application Notes below.

d (local)
Array of size LOCc(ja+min(m,n)-1) if m≥n and LOCr(ia+min(m,n)-1)
otherwise. The distributed diagonal elements of the bidiagonal matrix B:
d[i] = A(i+1,i+1), 0 ≤ i < size (d).
d is tied to the distributed matrix A.

e (local)
Array of size LOCr(ia+min(m,n)-1) if m≥n; LOCc(ja+min(m,n)-2)
otherwise. The distributed off-diagonal elements of the bidiagonal
distributed matrix B:
If m≥n, e[i] = A(i+1,i+2) for i = 0,1,..., n-2; if m < n, e[i] = A(i+2,i+1)
for i = 0,1,..., m-2. e is tied to the distributed matrix A.

tauq, taup (local)

Arrays of size LOCc(ja+min(m,n)-1) for tauq and LOCr(ia
+min(m,n)-1) for taup. Contain the scalar factors of the elementary
reflectors that represent the orthogonal/unitary matrices Q and P,
respectively. tauq and taup are tied to the distributed matrix A. See
Application Notes below.

1497
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.

Application Notes
The matrices Q and P are represented as products of elementary reflectors:
If m≥n,

Q = H(1)H(2)...H(n), and P = G(1)G(2)...G(n-1).

Each H(i) and G(i) has the form:
H(i)= i - tauq * v * v' and G(i) = i - taup*u*u'
where tauq and taup are real/complex scalars, and v and u are real/complex vectors;
v(1:i-1) = 0, v(i) = 1, and v(i+1:m) is stored on exit in A(ia+i:ia+m-1,ja+i-1);

u(1:i) = 0, u(i+1) = 1, and u(i+2:n) is stored on exit in A (ia+i-1,ja+i+1:ja+n-1);

tauq is stored in tauq[ja+i-2] and taup in taup[ia+i-2].

If m < n,
Q = H(1)*H(2)*...*H(m-1), and P = G(1)* G(2)*...* G(m)
Each H (i) and G(i) has the form:
H(i)= i-tauq*v*v' and G(i)= i-taup*u*u'
here tauq and taup are real/complex scalars, and v and u are real/complex vectors;
v(1:i) = 0, v(i+1) = 1, and v(i+2:m) is stored on exit in A (ia+i:ia+m-1,ja+i-1); u(1:i-1) = 0, u(i) = 1, and
u(i+1:n) is stored on exit in A(ia+i-1,ja+i+1:ja+n-1);

tauq is stored in tauq[ja+i-2] and taup in taup[ia+i-2].

The contents of sub(A) on exit are illustrated by the following examples:

m = 6 and n = 5(m > n):

m = 5 and n = 6(m < n):

1498
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

where d and e denote diagonal and off-diagonal elements of B, vi denotes an element of the vector defining
H(i), and ui an element of the vector defining G(i).

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?ormbr
Multiplies a general matrix by one of the orthogonal
matrices from a reduction to bidiagonal form
determined by p?gebrd.

Syntax
void psormbr (char *vect , char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT
*k , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *tau , float *c ,
MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , float *work , MKL_INT *lwork , MKL_INT
*info );
void pdormbr (char *vect , char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT
*k , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *tau , double *c ,
MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , double *work , MKL_INT *lwork , MKL_INT
*info );

Include Files
• mkl_scalapack.h

Description
If vect = 'Q', the p?ormbr function overwrites the general real distributed m-by-n matrix sub(C) = C(iс:iс
+m-1,jс:jс+n-1) with

side ='L' side ='R'

trans = 'N': Q sub(C) sub(C) Q
trans = 'T': QT sub(C) sub(C) QT

If vect = 'P', the function overwrites sub(C) with

side ='L' side ='R'

trans = 'N': P sub(C) sub(C) P
trans = 'T': PT sub(C) sub(C) PT

Here Q and PT are the orthogonal distributed matrices determined by p?gebrd when reducing a real
distributed matrix A(ia:*, ja:*) to bidiagonal form: A(ia:*, ja:*) = Q*B*PT. Q and PT are defined as
products of elementary reflectors H(i) and G(i) respectively.

1499
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Let nq = m if side = 'L' and nq = n if side = 'R'. Therefore nq is the order of the orthogonal matrix Q or
PT that is applied.
If vect = 'Q', A(ia:*, ja:*) is assumed to have been an nq-by-k matrix:

If nq ≥ k, Q = H(1) H(2)...H(k);

If nq < k, Q = H(1) H(2)...H(nq-1).

If vect = 'P', A(ia:, ja:) is assumed to have been a k-by-nq matrix:

If k < nq, P = G(1) G(2)...G(k);

If k ≥ nq, P = G(1) G(2)...G(nq-1).

Input Parameters

vect (global)
If vect ='Q', then Q or QT is applied.

If vect ='P', then P or PT is applied.

side (global)
If side ='L', then Q or QT, P or PT is applied from the left.

If side ='R', then Q or QT, P or PT is applied from the right.

trans (global)
If trans = 'N', no transpose, Q or P is applied.

If trans = 'T', then QT or PT is applied.

m (global) The number of rows in the distributed matrix sub (C).

n (global) The number of columns in the distributed matrix sub (C).

k (global)
If vect = 'Q', the number of columns in the original distributed matrix
reduced by p?gebrd;

If vect = 'P', the number of rows in the original distributed matrix

reduced by p?gebrd.

Constraints: k≥ 0.

a (local)
Pointer into the local memory to an array of size lld_a * LOCc(ja
+min(nq,k)-1) if vect='Q', and lld_a * LOCc(ja+nq-1) if vect = 'P'.

nq = m if side = 'L', and nq = n otherwise.

The vectors that define the elementary reflectors H(i) and G(i), whose
products determine the matrices Q and P, as returned by p?gebrd.

If vect = 'Q', lld_a≥max(1, LOCr(ia+nq-1));

If vect = 'P', lld_a≥max(1, LOCr(ia+min(nq, k)-1)).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

1500
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

tau (local)
Array of size LOCc(ja+min(nq, k)-1), if vect = 'Q', and LOCr(ia
+min(nq, k)-1), if vect = 'P'.
tau[i] must contain the scalar factor of the elementary reflector H(i+1) or
G (i+1)
which determines Q or P, as returned by pdgebrd in its array argument
tauq or taup. tau is tied to the distributed matrix A.

c (local)
Pointer into the local memory to an array of size lld_c*LOCc(jc+n-1).

Contains the local pieces of the distributed matrix sub (C).

ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the submatrix C, respectively.

descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.

work (local)
Workspace array of size lwork.

lwork (local or global) size of work, must be at least:

If side = 'L'

nq = m;
if ((vect = 'Q' and nq≥k) or (vect is not equal to 'Q' and nq>k)),
iaa=ia; jaa=ja; mi=m; ni=n; icc=ic; jcc=jc;
else
iaa= ia+1; jaa=ja; mi=m-1; ni=n; icc=ic+1; jcc= jc;
end if
else
If side = 'R', nq = n;

if((vect = 'Q' and nq≥k) or (vect is not equal to 'Q' and

nq>k)),
iaa=ia; jaa=ja; mi=m; ni=n; icc=ic; jcc=jc;

else
iaa= ia; jaa= ja+1; mi= m; ni= n-1; icc= ic; jcc= jc+1;

end if
end if
If vect = 'Q',

If side = 'L', lwork≥max((nb_a(nb_a-1))/2, (nqc0 + mpc0)nb_a) +

nb_a * nb_a

1501
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

else if side = 'R',

lwork≥max((nb_a*(nb_a-1))/2, (nqc0 + max(npa0 +

numroc(numroc(ni+icoffc, nb_a, 0, 0, NPCOL), nb_a, 0, 0,
lcmq), mpc0))*nb_a) + nb_a*nb_a
end if
else if vect is not equal to 'Q', if side = 'L',

lwork≥max((mb_a*(mb_a-1))/2, (mpc0 + max(mqa0 +

numroc(numroc(mi+iroffc, mb_a, 0, 0, NPROW), mb_a, 0, 0,
lcmp), nqc0))*mb_a) + mb_a*mb_a
else if side = 'R',

lwork≥max((mb_a(mb_a-1))/2, (mpc0 + nqc0)mb_a) + mb_a*mb_a

end if
end if
where lcmp = lcm/NPROW, lcmq = lcm/NPCOL, with lcm =
ilcm(NPROW, NPCOL),
iroffa = mod(iaa-1, mb_a),
icoffa = mod(jaa-1, nb_a),
iarow = indxg2p(iaa, mb_a, MYROW, rsrc_a, NPROW),
iacol = indxg2p(jaa, nb_a, MYCOL, csrc_a, NPCOL),
mqa0 = numroc(mi+icoffa, nb_a, MYCOL, iacol, NPCOL),
npa0 = numroc(ni+iroffa, mb_a, MYROW, iarow, NPROW),
iroffc = mod(icc-1, mb_c),
icoffc = mod(jcc-1, nb_c),
icrow = indxg2p(icc, mb_c, MYROW, rsrc_c, NPROW),
iccol = indxg2p(jcc, nb_c, MYCOL, csrc_c, NPCOL),
mpc0 = numroc(mi+iroffc, mb_c, MYROW, icrow, NPROW),
nqc0 = numroc(ni+icoffc, nb_c, MYCOL, iccol, NPCOL),

NOTE
mod(x,y) is the integer remainder of x/y.

indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.

If lwork = -1, then lwork is global input and a workspace query is

1502
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

c On exit, if vect='Q', sub(C) is overwritten by Qsub(C), or Q'sub(C), or

sub(C)*Q', or sub(C)*Q; if vect='P', sub(C) is overwritten by P*sub(C),
or P'*sub(C), or sub(C)*P, or sub(C)*P'.

work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?unmbr
Multiplies a general matrix by one of the unitary
transformation matrices from a reduction to bidiagonal
form determined by p?gebrd.

Syntax
void pcunmbr (char *vect , char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT
*k , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau ,
MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work ,
MKL_INT *lwork , MKL_INT *info );
void pzunmbr (char *vect , char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT
*k , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16
*tau , MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex16
*work , MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
If vect = 'Q', the p?unmbr function overwrites the general complex distributed m-by-n matrix sub(C) =
C(iс:iс+m-1,jс:jс+n-1) with

side ='L' side ='R'

trans = 'N': Q*sub(C) sub(C)*Q
trans = 'C': QH*sub(C) sub(C)*QH

If vect = 'P', the function overwrites sub(C) with

side ='L' side ='R'

trans = 'N': P*sub(C) sub(C)*P
trans = 'C': PH*sub(C) sub(C)*PH

Here Q and PH are the unitary distributed matrices determined by p?gebrd when reducing a complex
distributed matrix A(ia:*, ja:*) to bidiagonal form: A(ia:*, ja:*) = Q*B*PH.

1503
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Q and PH are defined as products of elementary reflectors H(i) and G(i) respectively.
Let nq = m if side = 'L' and nq = n if side = 'R'. Therefore nq is the order of the unitary matrix Q or PH
that is applied.
If vect = 'Q', A(ia:*, ja:*) is assumed to have been an nq-by-k matrix:

If nq ≥ k, Q = H(1) H(2)... H(k);

If nq < k, Q = H(1) H(2)... H(nq-1).

If vect = 'P', A(ia:, ja:) is assumed to have been a k-by-nq matrix:

If k < nq, P = G(1) G(2)... G(k);

If k ≥ nq, P = G(1) G(2)... G(nq-1).

Input Parameters

vect (global)
If vect ='Q', then Q or QH is applied.

If vect ='P', then P or PH is applied.

side (global)
If side ='L', then Q or QH, P or PH is applied from the left.

If side ='R', then Q or QH, P or PH is applied from the right.

trans (global)
If trans = 'N', no transpose, Q or P is applied.

If trans = 'C', conjugate transpose, QH or PH is applied.

m (global) The number of rows in the distributed matrix sub (C) m≥0.

n (global) The number of columns in the distributed matrix sub (C) n≥0.

k (global)
If vect = 'Q', the number of columns in the original distributed matrix
reduced by p?gebrd;

If vect = 'P', the number of rows in the original distributed matrix

reduced by p?gebrd.

Constraints: k≥ 0.

a (local)
Pointer into the local memory to an array of size lld_a * LOCc(ja
+min(nq,k)-1) if vect='Q', and lld_a * LOCc(ja+nq-1) if vect = 'P'.

nq = m if side = 'L', and nq = n otherwise.

The vectors that define the elementary reflectors H(i) and G(i), whose
products determine the matrices Q and P, as returned by p?gebrd.

If vect = 'Q', lld_a ≥ max(1, LOCr(ia+nq-1));

If vect = 'P', lld_a ≥ max(1, LOCr(ia+min(nq, k)-1)).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

1504
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

tau (local)
Array of size LOCc(ja+min(nq, k)-1), if vect = 'Q', and LOCr(ia
+min(nq, k)-1), if vect = 'P'.
tau[i] must contain the scalar factor of the elementary reflector H(i+1) or
G (i+1), which determines Q or P, as returned by p?gebrd in its array
argument tauq or taup. tau is tied to the distributed matrix A.

c (local)
Pointer into the local memory to an array of size lld_c*LOCc(jc+n-1).

Contains the local pieces of the distributed matrix sub (C).

ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the submatrix C, respectively.

descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.

work (local)
Workspace array of size lwork.

lwork (local or global) size of work, must be at least:

If side = 'L'

nq = m;
if ((vect = 'Q' and nq ≥ k) or (vect is not equal to 'Q' and
nq>k)), iaa= ia; jaa= ja; mi= m; ni= n; icc= ic; jcc= jc;
else
iaa= ia+1; jaa= ja; mi= m-1; ni= n; icc= ic+1; jcc= jc;

end if
else
If side = 'R', nq = n;

if ((vect = 'Q' and nq ≥ k) or (vect is not equal to 'Q' and

nq≥k)),
iaa= ia; jaa= ja; mi= m; ni= n; icc= ic; jcc= jc;

else
iaa= ia; jaa= ja+1; mi= m; ni= n-1; icc= ic; jcc= jc+1;

end if
end if
If vect = 'Q',

If side = 'L', lwork ≥ max((nb_a*(nb_a-1))/2,

(nqc0+mpc0)*nb_a) + nb_a*nb_a
else if side = 'R',

1505
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lwork ≥ max((nb_a*(nb_a-1))/2, (nqc0 +

max(npa0+numroc(numroc(ni+icoffc, nb_a, 0, 0, NPCOL), nb_a,
0, 0, lcmq), mpc0))*nb_a) + nb_a*nb_a
end if
else if vect is not equal to 'Q',

if side = 'L',

lwork ≥ max((mb_a*(mb_a-1))/2, (mpc0 +

max(mqa0+numroc(numroc(mi+iroffc, mb_a, 0, 0, NPROW), mb_a,
0, 0, lcmp), nqc0))*mb_a) + mb_a*mb_a
else if side = 'R',

lwork ≥ max((mb_a(mb_a-1))/2, (mpc0 + nqc0)mb_a) +

mb_a*mb_a
end if
end if
where lcmp = lcm/NPROW, lcmq = lcm/NPCOL, with lcm =
ilcm(NPROW, NPCOL),
iroffa = mod(iaa-1, mb_a),
icoffa = mod(jaa-1, nb_a),
iarow = indxg2p(iaa, mb_a, MYROW, rsrc_a, NPROW),
iacol = indxg2p(jaa, nb_a, MYCOL, csrc_a, NPCOL),
mqa0 = numroc(mi+icoffa, nb_a, MYCOL, iacol, NPCOL),

npa0 = numroc(ni+iroffa, mb_a, MYROW, iarow, NPROW),

iroffc = mod(icc-1, mb_c),

NOTE
mod(x,y) is the integer remainder of x/y.

indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.

If lwork = -1, then lwork is global input and a workspace query is

1506
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

c On exit, if vect='Q', sub(C) is overwritten by Qsub(C), or Q'sub(C), or

sub(C)*Q', or sub(C)*Q; if vect='P', sub(C) is overwritten by P*sub(C), or
P'*sub(C), or sub(C)*P, or sub(C)*P'.

work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

Generalized Symmetric-Definite Eigenvalue Problems: ScaLAPACK Computational Routines

This section describes ScaLAPACK routines that allow you to reduce the generalized symmetric-definite
eigenvalue problems (see LAPACKGeneralized Symmetric-Definite Eigenvalue Problems ) to standard
symmetric eigenvalue problem Cy = λy, which you can solve by calling ScaLAPACK routines (see Symmetric
Eigenproblems).
Table "Computational Routines for Reducing Generalized Eigenproblems to Standard Problems" lists these
routines.
Computational Routines for Reducing Generalized Eigenproblems to Standard Problems
Operation Real symmetric matrices Complex Hermitian matrices
Reduce to standard problems p?sygst p?hegst

p?sygst
Reduces a real symmetric-definite generalized
eigenvalue problem to the standard form.

Syntax
void pssygst (MKL_INT *ibtype , char *uplo , MKL_INT *n , float *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb ,
float *scale , MKL_INT *info );
void pdsygst (MKL_INT *ibtype , char *uplo , MKL_INT *n , double *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb ,
double *scale , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?sygstfunction reduces real symmetric-definite generalized eigenproblems to the standard form.

In the following sub(A) denotes A(ia:ia+n-1, ja:ja+n-1) and sub(B) denotes B(ib:ib+n-1, jb:jb+n-1).

If ibtype = 1, the problem is

sub(A)*x = λ*sub(B)*x,

1507
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

and sub(A) is overwritten by inv(UT)sub(A)inv(U), or inv(L)sub(A)inv(LT).

If ibtype = 2 or 3, the problem is

sub(A)sub(B)x = λx, or sub(B)sub(A)x = λx,

and sub(A) is overwritten by U*sub(A)*UT, or LT*sub(A)*L.
sub(B) must have been previously factorized as UT*U or L*LT by p?potrf.

Input Parameters

ibtype (global) Must be 1 or 2 or 3.

If itype = 1, compute inv(UT)*sub(A)*inv(U), or inv(L)*sub(A)*inv(LT);

If itype = 2 or 3, compute Usub(A)UT, or LTsub(A)L.

uplo (global) Must be 'U' or 'L'.

If uplo = 'U', the upper triangle of sub(A) is stored and sub (B) is
factored as UT*U.
If uplo = 'L', the lower triangle of sub(A) is stored and sub (B) is
factored as L*LT.

n (global) The order of the matrices sub (A) and sub (B) (n≥ 0).

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1). On
entry, the array contains the local pieces of the n-by-n symmetric
distributed matrix sub(A).
If uplo = 'U', the leading n-by-n upper triangular part of sub(A) contains
the upper triangular part of the matrix, and its strictly lower triangular part
is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
the lower triangular part of the matrix, and its strictly upper triangular part
is not referenced.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

b (local)
Pointer into the local memory to an array of size lld_b*LOCc(jb+n-1). On
entry, the array contains the local pieces of the triangular factor from the
Cholesky factorization of sub (B) as returned by p?potrf.

ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of the submatrix B, respectively.

descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.

1508
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

a On exit, if info = 0, the transformed matrix, stored in the same format as

sub(A).

scale (global)
Amount by which the eigenvalues should be scaled to compensate for the
scaling performed in this function. At present, scale is always returned as
1.0, it is returned here to allow for future enhancement.

info (global)
If info = 0, the execution is successful. If info < 0, if the i-th argument
is an array and the j-th entry, indexed j - 1, had an illegal value, then info
= -(i*100+j); if the i-th argument is a scalar and had an illegal value, then
info = -i.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?hegst
Reduces a Hermitian positive-definite generalized
eigenvalue problem to the standard form.

Syntax
void pchegst (MKL_INT *ibtype , char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *jb ,
MKL_INT *descb , float *scale , MKL_INT *info );
void pzhegst (MKL_INT *ibtype , char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *jb ,
MKL_INT *descb , double *scale , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?hegst function reduces complex Hermitian positive-definite generalized eigenproblems to the
standard form.
In the following sub(A) denotes A(ia:ia+n-1, ja:ja+n-1) and sub(B) denotes B(ib:ib+n-1, jb:jb+n-1).

If ibtype = 1, the problem is

sub(A)*x = λ*sub(B)*x,
and sub(A) is overwritten by inv(UH)*sub(A)*inv(U), or inv(L)*sub(A)*inv(LH).
If ibtype = 2 or 3, the problem is

sub(A)sub(B)x = λx, or sub(B)sub(A)x = λx,

and sub(A) is overwritten by U*sub(A)*UH, or LH*sub(A)*L.
sub(B) must have been previously factorized as UH*U or L*LH by p?potrf.

1509
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

ibtype (global) Must be 1 or 2 or 3.

If itype = 1, compute inv(UH)*sub(A)*inv(U), or inv(L)*sub(A)*inv(LH);

If itype = 2 or 3, compute Usub(A)UH, or LHsub(A)L.

uplo (global) Must be 'U' or 'L'.

If uplo = 'U', the upper triangle of sub(A) is stored and sub (B) is
factored as UH*U.
If uplo = 'L', the lower triangle of sub(A) is stored and sub (B) is
factored as L*LH.

n (global) The order of the matrices sub (A) and sub (B) (n≥0).

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1). On
entry, the array contains the local pieces of the n-by-n Hermitian distributed
matrix sub(A). If uplo = 'U', the leading n-by-n upper triangular part of
sub(A) contains the upper triangular part of the matrix, and its strictly
lower triangular part is not referenced. If uplo = 'L', the leading n-by-n
lower triangular part of sub(A) contains the lower triangular part of the
matrix, and its strictly upper triangular part is not referenced.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of the submatrix B, respectively.

descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.

Output Parameters

a On exit, if info = 0, the transformed matrix, stored in the same format as

sub(A).

info (global)

1510
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = 0, the execution is successful. If info <0, if the i-th argument is
an array and the j-th entry, indexed j - 1, had an illegal value, then info =
-(i*100+j); if the i-th argument is a scalar and had an illegal value, then
info = -i.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

ScaLAPACK Driver Routines

Table "ScaLAPACK Driver Routines" lists ScaLAPACK driver routines available for solving systems of linear
equations, linear least-squares problems, standard eigenvalue and singular value problems, and generalized
symmetric definite eigenproblems.
ScaLAPACK Driver Routines
Type of Problem Matrix type, storage scheme Driver
Linear equations general (partial pivoting) p?gesv (simple driver) / p?gesvx
(expert driver)
general band (partial pivoting) p?gbsv (simple driver)
general band (no pivoting) p?dbsv (simple driver)
general tridiagonal (no pivoting) p?dtsv (simple driver)
symmetric/Hermitian positive-definite p?posv (simple driver) / p?posvx
(expert driver)
symmetric/Hermitian positive-definite, p?pbsv (simple driver)
band
symmetric/Hermitian positive-definite, p?ptsv (simple driver)
tridiagonal
Linear least squares problem general m-by-n p?gels
Non-symmetric eigenvalue general p?geevx (expert driver)
problem
Symmetric eigenvalue problem symmetric/Hermitian p?syev / p?heev (simple driver);
p?syevd / p?heevd (simple driver with
a divide and conquer algorithm);
p?syevx / p?heevx (expert driver);
p?syevr / p?heevr (simple driver with
MRRR algorithm)
Singular value decomposition general m-by-n p?gesvd
Generalized symmetric definite symmetric/Hermitian, one matrix also p?sygvx / p?hegvx (expert driver)
eigenvalue problem positive-definite

p?geevx
Computes for an n-by-n real/complex non-symmetric
matrix A, the eigenvalues and, optionally, the left
and/or right eigenvectors.

Syntax
void psgeevx (const char *balanc, const char *jobvl, const char *jobvr, const char
*sense, const MKL_INT *n, float *a, const MKL_INT *desca, float *wr, float *wi, float
*vl, const MKL_INT *descvl, float *vr, const MKL_INT *descvr, MKL_INT *ilo, MKL_INT
*ihi, float *scale, float *abnrm, float *rconde, float *rcondv, float *work, const
MKL_INT *lwork, MKL_INT *info);

1511
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

void pdgeevx (const char *balanc, const char *jobvl, const char *jobvr, const char
*sense, const MKL_INT *n, double *a, const MKL_INT *desca, double *wr, double *wi,
double *vl, const MKL_INT *descvl, double *vr, const MKL_INT *descvr, MKL_INT *ilo,
MKL_INT *ihi, double *scale, double *abnrm, double *rconde, double *rcondv, double
*work, const MKL_INT *lwork, MKL_INT *info);
void pcgeevx (const char *balanc, const char *jobvl, const char *jobvr, const char
*sense, const MKL_INT *n, MKL_Complex8 *a, const MKL_INT *desca, MKL_Complex8 *w,
MKL_Complex8 *vl, const MKL_INT *descvl, MKL_Complex8 *vr, const MKL_INT *descvr,
MKL_INT *ilo, MKL_INT *ihi, float *scale, float *abnrm, float *rconde, float *rcondv,
MKL_Complex8 *work, const MKL_INT *lwork, MKL_INT *info);
void pzgeevx (const char *balanc, const char *jobvl, const char *jobvr, const char
*sense, const MKL_INT *n, MKL_Complex16 *a, const MKL_INT *desca, MKL_Complex16 *w,
MKL_Complex16 *vl, const MKL_INT *descvl, MKL_Complex16 *vr, const MKL_INT *descvr,
MKL_INT *ilo, MKL_INT *ihi, double *scale, double *abnrm, double *rconde, double
*rcondv, MKL_Complex16 *work, const MKL_INT *lwork, MKL_INT *info);

Include Files
• mkl_scalapack.h

Description
The p?geevx function computes for an n-by-n real/complex non-symmetric matrix A, the eigenvalues and,
optionally, the left and/or right eigenvectors.
Optionally also, it computes a balancing transformation to improve the conditioning of the eigenvalues and
eigenvectors (ilo, ihi, scale, and abnrm), reciprocal condition numbers for the eigenvalues (rconde).
The right eigenvector v of A satisfies

A⋅v = λ⋅v
where ƛ is its eigenvalue.
The left eigenvector u of A satisfies.
uHA = ƛuH
where uH denotes the conjugate transpose of u. The computed eigenvectors are normalized to have
Euclidean norm equal to 1 and largest component real.
Balancing a matrix means permuting the rows and columns to make it more nearly upper triangular, and
applying a diagonal similarity transformation D*A*inv(D), where D is a diagonal matrix, to make its rows and
columns closer in norm and the condition number of its eigenvalues smaller. The computed reciprocal
condition numbers correspond to the balanced matrix. Permuting rows and columns will not change the
condition numbers in exact arithmetic, but diagonal scaling will.

NOTE
The current version doesn’t support computation of the reciprocal condition numbers for the
right eigenvectors.

Current Notes and Restrictions

All the p?geevx interfaces call p?lahqr for computing eigenvalues and eigenvectors of the Hessenberg
matrices. There are several restrictions for the usage of p?lahqr, which include:

• The current implementation of p?lahqr requires the distributed block size to be square and at least six
(6); unlike simpler codes like LU, this algorithm is extremely sensitive to block size.

1512
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• The current implementation of p?lahqr requires that input matrix A, the left and right eigenvector
matrices VR and/or VL to be distributed identically and have identical context.

Parameters

balanc (global). Must be 'N', 'P', 'S', or 'B'. Indicates how the input matrix should
be diagonally scaled and/or permuted to improve the conditioning of its
eigenvalues.
If balanc = 'N', do not diagonally scale or permute;

If balanc = 'P', perform permutations to make the matrix more nearly upper
triangular. Do not diagonally scale;
If balanc = 'S', diagonally scale the matrix, that is, replace A by
D*A*inv(D), where D is a diagonal matrix chosen to make the rows and
columns of A more equal in norm. Do not permute;
If balanc = 'B', both diagonally scale and permute A.

Computed reciprocal condition numbers will be for the matrix after

balancing and/or permuting. Permuting does not change condition numbers
(in exact arithmetic), but balancing does.

jobvl (global). Must be 'N' or 'V.

If jobvl = 'N', left eigenvectors of A are not computed;

If jobvl = 'V', left eigenvectors of A are computed.

If sense = 'E', then jobvl must be 'V'.

jobvr (global). Must be 'N' or 'V.

If jobvr = 'N', right eigenvectors of A are not computed;

If jobvr = 'V', right eigenvectors of A are computed.

If sense = 'E', then jobvr must be 'V'.

sense (global). Must be 'N' or 'E. Determines which reciprocal condition numbers
are computed.
If sense = 'N', none are computed.

If sense = 'E', computed for eigenvalues only.

n (global) The order of the distributed matrix A (n≥0).

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(n). On entry,
this array contains the local pieces of the n-by-n general distributed matrix
A to be reduced.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

wr, wi (global output) Arrays, size at least max (1, n) each. Contain the real and
imaginary parts, respectively, of the computed eigenvalues. Complex
conjugate pairs of eigenvalues appear consecutively with the eigenvalue
having positive imaginary part first.

1513
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

w (global output) Array, size at least max(1, n). Contains the computed
eigenvalues.

vl (local output)
Pointer into the local memory to an array of size (DESCVL(LLD_),LOCc(n)).
If jobvl = 'N', vl is not referenced. If jobvl = 'V', the vl parameter contains
the local pieces of the left eigenvectors of the matrix A.

descvl (global and local input) array of size dlen_. The array descriptor for the
distributed matrix vl.

vr (local output)
Pointer into the local memory to an array of size (DESCVR(LLD_),LOCc(n)).
If jobvr = 'N', vr is not referenced. If jobvr = 'V', the vr parameter contains
the local pieces of the right eigenvectors of the matrix A.

descvr (global and local input) array of size dlen_. The array descriptor for the
distributed matrix vr.

ilo, ihi (global output)

ilo and ihi are integer values determined when A was balanced.
The balanced A(i,j) = 0 if i > j and j = 1,..., ilo-1 or i= ihi+1,..., n.

If balanc = 'N' or 'S', ilo = 1 and ihi = n.

scale (global output)

Array, size at least max(1, n). Details of the permutations and scaling
factors applied when balancing A.
If P[j - 1] is the index of the row and column interchanged with row and
column j, and D[j - 1] is the scaling factor applied to row and column j, then

scale[j - 1] = P[j - 1], for j = 1,...,ilo-1

= D[j - 1], for j = ilo,...,ihi

= P[j - 1] for j = ihi+1,..., n.

The order in which the interchanges are made is n to ihi+1, then 1 to ilo-1.

abnrm The one-norm of the balanced matrix (the maximum of the sum of absolute
values of elements of any column).

rconde Array, size at least max(1, n).

rconde[j - 1] is the reciprocal condition number of the j-th eigenvalue.

rcondv Not supported in the current version. It could be null pointer.

work (local)
Workspace array of size lwork.

lwork (local or global) size of the array work.

If lwork = -1, then lwork is global input and a workspace query is assumed;
the function only calculates the minimum size for the work array. These
values are returned in the first entry of the work array, and no error
message is issued by pxerbla.

1514
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j- 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and had
an illegal value, then info = -i.

p?gesv
Computes the solution to the system of linear
equations with a square distributed matrix and
multiple right-hand sides.

Syntax
void psgesv (MKL_INT *n , MKL_INT *nrhs , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *ipiv , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT
*info );
void pdgesv (MKL_INT *n , MKL_INT *nrhs , double *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *ipiv , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , MKL_INT *info );
void pcgesv (MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *ipiv , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , MKL_INT *info );
void pzgesv (MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , MKL_INT *ipiv , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *jb ,
MKL_INT *descb , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?gesvfunction computes the solution to a real or complex system of linear equations sub(A)*X =
sub(B), where sub(A) = A(ia:ia+n-1, ja:ja+n-1) is an n-by-n distributed matrix and X and sub(B) =
B(ib:ib+n-1, jb:jb+nrhs-1) are n-by-nrhs distributed matrices.
The LU decomposition with partial pivoting and row interchanges is used to factor sub(A) as sub(A) =
P*L*U, where P is a permutation matrix, L is unit lower triangular, and U is upper triangular. L and U are
stored in sub(A). The factored form of sub(A) is then used to solve the system of equations sub(A)*X =
sub(B).

Input Parameters

n (global) The number of rows and columns to be operated on, that is, the
order of the distributed submatrix sub(A) (n≥ 0).

nrhs (global) The number of right hand sides, that is, the number of columns of
the distributed submatrices B and X(nrhs≥ 0).

a, b (local)
Pointers into the local memory to arrays of local size a: lld_a*LOCc(ja
+n-1) and b: lld_b*LOCc(jb+nrhs-1), respectively.

1515
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

On entry, the array a contains the local pieces of the n-by-n distributed
matrix sub(A) to be factored.
On entry, the array b contains the right hand side distributed matrix sub(B).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of sub(B), respectively.

descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.

Output Parameters

a Overwritten by the factors L and U from the factorization sub(A) = PLU;

the unit diagonal elements of L are not stored .

b Overwritten by the solution distributed matrix X.

ipiv (local) Array of size LOCr(m_a)+mb_a. This array contains the pivoting
information. The (local) row i of the matrix was interchanged with the
(global) row ipiv[i - 1].

This array is tied to the distributed matrix A.

info (global) If info=0, the execution is successful.

info < 0:
If the i-th argument is an array and the j-th entry had an illegal value, then
info = -(i*100+j); if the i-th argument is a scalar and had an illegal
value, then info = -i.

info> 0:
If info = k, U(ia+k-1,ja+k-1) is exactly zero. The factorization has been
completed, but the factor U is exactly singular, so the solution could not be
computed.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?gesvx
Uses the LU factorization to compute the solution to
the system of linear equations with a square matrix A
and multiple right-hand sides, and provides error
bounds on the solution.

Syntax
void psgesvx (char *fact , char *trans , MKL_INT *n , MKL_INT *nrhs , float *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , float *af , MKL_INT *iaf , MKL_INT *jaf , MKL_INT
*descaf , MKL_INT *ipiv , char *equed , float *r , float *c , float *b , MKL_INT *ib ,

1516
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
MKL_INT *jb , MKL_INT *descb , float *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx ,
float *rcond , float *ferr , float *berr , float *work , MKL_INT *lwork , MKL_INT
*iwork , MKL_INT *liwork , MKL_INT *info );
void pdgesvx (char *fact , char *trans , MKL_INT *n , MKL_INT *nrhs , double *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *af , MKL_INT *iaf , MKL_INT *jaf ,
MKL_INT *descaf , MKL_INT *ipiv , char *equed , double *r , double *c , double *b ,
MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , double *x , MKL_INT *ix , MKL_INT *jx ,
MKL_INT *descx , double *rcond , double *ferr , double *berr , double *work , MKL_INT
*lwork , MKL_INT *iwork , MKL_INT *liwork , MKL_INT *info );
void pcgesvx (char *fact , char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *af , MKL_INT *iaf , MKL_INT
*jaf , MKL_INT *descaf , MKL_INT *ipiv , char *equed , float *r , float *c ,
MKL_Complex8 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_Complex8 *x ,
MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , float *rcond , float *ferr , float *berr ,
MKL_Complex8 *work , MKL_INT *lwork , float *rwork , MKL_INT *lrwork , MKL_INT *info );
void pzgesvx (char *fact , char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *af , MKL_INT *iaf ,
MKL_INT *jaf , MKL_INT *descaf , MKL_INT *ipiv , char *equed , double *r , double *c ,
MKL_Complex16 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_Complex16 *x ,
MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , double *rcond , double *ferr , double
*berr , MKL_Complex16 *work , MKL_INT *lwork , double *rwork , MKL_INT *lrwork ,
MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?gesvx function uses the LU factorization to compute the solution to a real or complex system of linear
equations AX = B, where A denotes the n-by-n submatrix A(ia:ia+n-1, ja:ja+n-1), B denotes the n-by-
nrhs submatrix B(ib:ib+n-1, jb:jb+nrhs-1) and X denotes the n-by-nrhs submatrix X(ix:ix+n-1,
jx:jx+nrhs-1).
Error bounds on the solution and a condition estimate are also provided.
In the following description, af stands for the subarray of af from row iaf and column jaf to row iaf+n-1 and
column jaf+n-1.
The function p?gesvx performs the following steps:

1. If fact = 'E', real scaling factors R and C are computed to equilibrate the system:

trans = 'N': diag(R)Adiag(C) diag(C)-1X = diag(R)*B

trans = 'T': (diag(R)*A*diag(C))T *diag(R)-1*X = diag(C)*B
trans = 'C': (diag(R)*A*diag(C))H *diag(R)-1*X = diag(C)*B
Whether or not the system will be equilibrated depends on the scaling of the matrix A, but if
equilibration is used, A is overwritten by diag(R)*A*diag(C) and B by diag(R)*B (if trans='N') or
diag(c)*B (if trans = 'T' or 'C').
2. If fact = 'N' or 'E', the LU decomposition is used to factor the matrix A (after equilibration if fact
= 'E') as A = PLU, where P is a permutation matrix, L is a unit lower triangular matrix, and U is
upper triangular.
3. The factored form of A is used to estimate the condition number of the matrix A. If the reciprocal of the
condition number is less than relative machine precision, steps 4 - 6 are skipped.

1517
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

4. The system of equations is solved for X using the factored form of A.

5. Iterative refinement is applied to improve the computed solution matrix and calculate error bounds and
backward error estimates for it.
6. If equilibration was used, the matrix X is premultiplied by diag(C) (if trans = 'N') or diag(R) (if
trans = 'T' or 'C') so that it solves the original system before equilibration.

Input Parameters

fact (global) Must be 'F', 'N', or 'E'.

Specifies whether or not the factored form of the matrix A is supplied on

entry, and if not, whether the matrix A should be equilibrated before it is
factored.
If fact = 'F' then, on entry, af and ipiv contain the factored form of A. If
equed is not 'N', the matrix A has been equilibrated with scaling factors
given by r and c. Arrays a, af, and ipiv are not modified.
If fact = 'N', the matrix A is copied to af and factored.

If fact = 'E', the matrix A is equilibrated if necessary, then copied to af

and factored.

trans (global) Must be 'N', 'T', or 'C'.

Specifies the form of the system of equations:

If trans = 'N', the system has the form A*X = B (No transpose);

If trans = 'T', the system has the form AT*X = B (Transpose);

If trans = 'C', the system has the form AH*X = B (Conjugate transpose);

n (global) The number of linear equations; the order of the submatrix A(n≥
0).

nrhs (global) The number of right hand sides; the number of columns of the
distributed submatrices B and X(nrhs≥ 0).

a, af, b, work (local)

Pointers into the local memory to arrays of local size a: lld_a*LOCc(ja
+n-1), af: lld_af*LOCc(ja+n-1), b: lld_b*LOCc(jb+nrhs-1), work:
lwork.
The array a contains the matrix A. If fact = 'F' and equed is not 'N',
then A must have been equilibrated by the scaling factors in r and/or c.
The array af is an input argument if fact = 'F'. In this case it contains on
entry the factored form of the matrix A, that is, the factors L and U from
the factorization A = P*L*U as computed by p?getrf. If equed is not 'N',
then af is the factored form of the equilibrated matrix A.
The array b contains on entry the matrix B whose columns are the right-
hand sides for the systems of equations.
work is a workspace array. The size of work is (lwork).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A(ia:ia+n-1, ja:ja
+n-1), respectively.

1518
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

iaf, jaf (global) The row and column indices in the global matrix AF indicating the
first row and the first column of the subarray af, respectively.

descaf (global and local) array of size dlen_. The array descriptor for the
distributed matrix AF.

ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of the submatrix B(ib:ib+n-1, jb:jb
+nrhs-1), respectively.

descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.

ipiv (local) Array of size LOCr(m_a)+mb_a.

The array ipiv is an input argument if fact = 'F' .

On entry, it contains the pivot indices from the factorization A = PLU as

computed by p?getrf; (local) row i of the matrix was interchanged with
the (global) row ipiv[i - 1].

This array must be aligned with A(ia:ia+n-1, *).

equed (global) Must be 'N', 'R', 'C', or 'B'. equed is an input argument if fact
= 'F' . It specifies the form of equilibration that was done:

If equed = 'N', no equilibration was done (always true if fact = 'N');

If equed = 'R', row equilibration was done, that is, A has been
premultiplied by diag(r);
If equed = 'C', column equilibration was done, that is, A has been
postmultiplied by diag(c);
If equed = 'B', both row and column equilibration was done; A has been
replaced by diag(r)*A*diag(c).

r, c (local)
Arrays of size LOCr(m_a) and LOCc(n_a), respectively.

The array r contains the row scale factors for A, and the array c contains
the column scale factors for A. These arrays are input arguments if fact =
'F' only; otherwise they are output arguments. If equed = 'R' or 'B', A
is multiplied on the left by diag(r); if equed = 'N' or 'C', r is not
accessed.
If fact = 'F' and equed = 'R' or 'B', each element of r must be
positive.
If equed = 'C' or 'B', A is multiplied on the right by diag(c); if equed =
'N' or 'R', c is not accessed.
If fact = 'F' and equed = 'C' or 'B', each element of c must be
positive. Array r is replicated in every process column, and is aligned with
the distributed matrix A. Array c is replicated in every process row, and is
aligned with the distributed matrix A.

1519
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ix, jx (global) The row and column indices in the global matrix X indicating the
first row and the first column of the submatrix X(ix:ix+n-1, jx:jx
+nrhs-1), respectively.

descx (global and local) array of size dlen_. The array descriptor for the
distributed matrix X.

lwork (local or global) The size of the array work ; must be at least
max(p?gecon(lwork), p?gerfs(lwork))+LOCr(n_a).

iwork (local, psgesvx/pdgesvx only). Workspace array. The size of iwork is

(liwork).

liwork (local, psgesvx/pdgesvx only). The size of the array iwork , must be at
least LOCr(n_a).

rwork (local)
Workspace array, used in complex flavors only.
The size of rwork is (lrwork).

lrwork (local or global, pcgesvx/pzgesvx only). The size of the array rwork;must
be at least 2*LOCc(n_a) .

Output Parameters

x (local)
Pointer into the local memory to an array of local size lld_x*LOCc(jx
+nrhs-1).
If info = 0, the array x contains the solution matrix X to the original
system of equations. Note that A and B are modified on exit if equed≠'N',
and the solution to the equilibrated system is:
diag(C)-1*X, if trans = 'N' and equed = 'C' or 'B'; and
diag(R)-1*X, if trans = 'T' or 'C' and equed = 'R' or 'B'.

a Array a is not modified on exit if fact = 'F' or 'N', or if fact = 'E' and
equed = 'N'.
If equed≠'N', A is scaled on exit as follows:

equed = 'R': A = diag(R)*A

equed = 'C': A = A*diag(c)
equed = 'B': A = diag(R)*A*diag(c)

af If fact = 'N' or 'E', then af is an output argument and on exit returns

the factors L and U from the factorization A = P*L*U of the original matrix
A (if fact = 'N') or of the equilibrated matrix A (if fact = 'E'). See the
description of a for the form of the equilibrated matrix.

b Overwritten by diag(R)*B if trans = 'N' and equed = 'R' or 'B';

overwritten by diag(c)*B if trans = 'T' and equed = 'C' or 'B'; not

changed if equed = 'N'.

r, c These arrays are output arguments if fact≠'F'.

1520
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
See the description of r, c in Input Arguments section.

rcond (global).
An estimate of the reciprocal condition number of the matrix A after
equilibration (if done). The function sets rcond =0 if the estimate
underflows; in this case the matrix is singular (to working precision).
However, anytime rcond is small compared to 1.0, for the working
precision, the matrix may be poorly conditioned or even singular.

ferr, berr (local)

Arrays of size LOCc(n_b) each. Contain the component-wise forward and
relative backward errors, respectively, for each solution vector.
Arrays ferr and berr are both replicated in every process row, and are
aligned with the matrices B and X.

ipiv If fact = 'N' or 'E', then ipiv is an output argument and on exit contains
the pivot indices from the factorization A = P*L*U of the original matrix A
(if fact = 'N') or of the equilibrated matrix A (if fact = 'E').

equed If fact≠'F' , then equed is an output argument. It specifies the form of

equilibration that was done (see the description of equed in Input
Arguments section).

work[0] If info=0, on exit work[0] returns the minimum value of lwork required
for optimum performance.

iwork[0] If info=0, on exit iwork[0] returns the minimum value of liwork required
for optimum performance.

rwork[0] If info=0, on exit rwork[0] returns the minimum value of lrwork required
for optimum performance.

info If info=0, the execution is successful.

info < 0: if the ith argument is an array and the jth entry had an illegal
value, then info = -(i*100+j); if the ith argument is a scalar and had an
illegal value, then info = -i. If info = i, and i ≤ n, then U(i,i) is
exactly zero. The factorization has been completed, but the factor U is
exactly singular, so the solution and error bounds could not be computed. If
info = i, and i = n +1, then U is nonsingular, but rcond is less than
machine precision. The factorization has been completed, but the matrix is
singular to working precision and the solution and error bounds have not
been computed.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?gbsv
Computes the solution to the system of linear
equations with a general banded distributed matrix
and multiple right-hand sides.

1521
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
void psgbsv (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs , float *a ,
MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , float *b , MKL_INT *ib , MKL_INT *descb ,
float *work , MKL_INT *lwork , MKL_INT *info );
void pdgbsv (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs , double *a ,
MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , double *b , MKL_INT *ib , MKL_INT
*descb , double *work , MKL_INT *lwork , MKL_INT *info );
void pcgbsv (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs , MKL_Complex8
*a , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , MKL_Complex8 *b , MKL_INT *ib ,
MKL_INT *descb , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT *info );
void pzgbsv (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs , MKL_Complex16
*a , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , MKL_Complex16 *b , MKL_INT *ib ,
MKL_INT *descb , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?gbsvfunction computes the solution to a real or complex system of linear equations

sub(A)*X = sub(B),
where sub(A) = A(1:n, ja:ja+n-1) is an n-by-n real/complex general banded distributed matrix with bwl
subdiagonals and bwu superdiagonals, and X and sub(B)= B(ib:ib+n-1, 1:rhs) are n-by-nrhs distributed
matrices.
The LU decomposition with partial pivoting and row interchanges is used to factor sub(A) as sub(A) =
P*L*U*Q, where P and Q are permutation matrices, and L and U are banded lower and upper triangular
matrices, respectively. The matrix Q represents reordering of columns for the sake of parallelism, while P
represents reordering of rows for numerical stability using classic partial pivoting.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

n (global) The number of rows and columns to be operated on, that is, the
order of the distributed matrix sub(A) (n≥ 0).

bwl (global) The number of subdiagonals within the band of A (0≤ bwl ≤ n-1 ).

bwu (global) The number of superdiagonals within the band of A (0≤ bwu ≤
n-1 ).

nrhs (global) The number of right hand sides; the number of columns of the
distributed matrix sub(B) (nrhs≥ 0).

a, b (local)
Pointers into the local memory to arrays of local size a: lld_a*LOCc(ja
+n-1) and b: lld_b*LOCc(nrhs).

1522
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
On entry, the array a contains the local pieces of the global array A.
On entry, the array b contains the right hand side distributed matrix sub(B).

ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
If desca[dtype_ - 1] = 501, then dlen_≥ 7;

else if desca[dtype_ - 1] = 1, then dlen_≥ 9.

ib (global) The row index in the global matrix B indicating the first row of the
matrix to be operated on (which may be either all of B or a submatrix of B).

descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
If descb[dtype_-1] = 502, then dlen_≥ 7;

else if descb[dtype_-1] = 1, then dlen_≥ 9.

work (local)
Workspace array of size lwork.

lwork (local or global) The size of the array work, must be at least lwork≥ (NB
+bwu)*(bwl+bwu)+6*(bwl+bwu)*(bwl+2*bwu) +
+ max(nrhs *(NB+2*bwl+4*bwu), 1).

Output Parameters

a On exit, contains details of the factorization. Note that the resulting

factorization is not the same factorization as returned from LAPACK.
Additional permutations are performed on the matrix for the sake of
parallelism.

b On exit, this array contains the local pieces of the solution distributed
matrix X.

ipiv (local) array.

The size of ipiv must be at least desca[NB - 1]. This array contains pivot
indices for local factorizations. You should not alter the contents between
factorization and solve.

work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.

info If info=0, the execution is successful. info < 0:

If the ith argument is an array and the j-th entry had an illegal value, then
info = -(i*100+j); if the ith argument is a scalar and had an illegal
value, then info = -i.

info> 0:

1523
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If info = k ≤ NPROCS, the submatrix stored on processor info and

factored locally was not nonsingular, and the factorization was not
completed. If info = k > NPROCS, the submatrix stored on processor
info-NPROCS representing interactions with other processors was not
nonsingular, and the factorization was not completed.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?dbsv
Solves a general band system of linear equations.

Syntax
void psdbsv (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs , float *a ,
MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *descb , float *work ,
MKL_INT *lwork , MKL_INT *info );
void pddbsv (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs , double *a ,
MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *descb , double *work ,
MKL_INT *lwork , MKL_INT *info );
void pcdbsv (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs , MKL_Complex8
*a , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *descb ,
MKL_Complex8 *work , MKL_INT *lwork , MKL_INT *info );
void pzdbsv (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs , MKL_Complex16
*a , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *descb ,
MKL_Complex16 *work , MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?dbsvfunction solves the following system of linear equations:

A(1:n, ja:ja+n-1)* X = B(ib:ib+n-1, 1:nrhs),

where A(1:n, ja:ja+n-1) is an n-by-n real/complex banded diagonally dominant-like distributed matrix
with bandwidth bwl, bwu.
Gaussian elimination without pivoting is used to factor a reordering of the matrix into LU.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

n (global) The order of the distributed submatrix A, (n≥ 0).

bwl (global) Number of subdiagonals. 0 ≤ bwl ≤ n-1.

bwu (global) Number of subdiagonals. 0 ≤ bwu ≤ n-1.

1524
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
nrhs (global) The number of right-hand sides; the number of columns of the
distributed submatrix B, (nrhs ≥ 0).

a (local).
Pointer into the local memory to an array with leading size lld_a ≥ (bwl
+bwu+1) (stored in desca). On entry, this array contains the local pieces of
the distributed matrix.

ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).

desca (global and local) array of size dlen.

If 1d type (dtype_a=501 or 502), dlen ≥ 7;

If 2d type (dtype_a=1), dlen ≥ 9.

The array descriptor for the distributed matrix A.

Contains information of mapping of A to memory.

b (local)
Pointer into the local memory to an array of local lead size lld_b ≥ nb. On
entry, this array contains the local pieces of the right hand sides B(ib:ib
+n-1, 1:nrhs).

ib (global) The row index in the global matrix B indicating the first row of the
matrix to be operated on (which may be either all of b or a submatrix of B).

descb (global and local) array of size dlen.

If 1d type (dtype_b =502), dlen ≥ 7;

If 2d type (dtype_b =1), dlen ≥ 9.

The array descriptor for the distributed matrix B.

Contains information of mapping of B to memory.

work (local).
Temporary workspace. This space may be overwritten in between calls to
functions. work must be the size given in lwork.

lwork (local or global) Size of user-input workspace work. If lwork is too small,
the minimal acceptable size will be returned in work[0] and an error code
is returned.
lwork ≥ nb(bwl+bwu)+6max(bwl,bwu)*max(bwl,bwu)
+max((max(bwl,bwu)nrhs), max(bwl,bwu)*max(bwl,bwu))

Output Parameters

a On exit, this array contains information containing details of the

factorization.
Note that permutations are performed on the matrix, so that the factors
returned are different from those returned by LAPACK.

b On exit, this contains the local piece of the solutions distributed matrix X.

1525
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

work On exit, work[0] contains the minimal lwork.

info (local) If info=0, the execution is successful.

< 0: If the i-th argument is an array and the j-entry had an illegal value,
then info = -(i*100+j), if the i-th argument is a scalar and had an
illegal value, then info = -i.

> 0: If info = k < NPROCS, the submatrix stored on processor info and
factored locally was not positive definite, and the factorization was not
completed.
If info = k > NPROCS, the submatrix stored on processor info-NPROCS
representing interactions with other processors was not positive definite,
and the factorization was not completed.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?dtsv
Solves a general tridiagonal system of linear
equations.

Syntax
void psdtsv (MKL_INT *n , MKL_INT *nrhs , float *dl , float *d , float *du , MKL_INT
*ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *descb , float *work , MKL_INT
*lwork , MKL_INT *info );
void pddtsv (MKL_INT *n , MKL_INT *nrhs , double *dl , double *d , double *du , MKL_INT
*ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *descb , double *work , MKL_INT
*lwork , MKL_INT *info );
void pcdtsv (MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *dl , MKL_Complex8 *d ,
MKL_Complex8 *du , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib ,
MKL_INT *descb , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT *info );
void pzdtsv (MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *dl , MKL_Complex16 *d ,
MKL_Complex16 *du , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib ,
MKL_INT *descb , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The function solves a system of linear equations
A(1:n, ja:ja+n-1) * X = B(ib:ib+n-1, 1:nrhs),
where A(1:n, ja:ja+n-1) is an n-by-n complex tridiagonal diagonally dominant-like distributed matrix.

Gaussian elimination without pivoting is used to factor a reordering of the matrix into L U.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.

1526
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Product and Performance Information

Notice revision #20201201

Input Parameters

n (global) The order of the distributed submatrix A(n≥ 0).

nrhs The number of right hand sides; the number of columns of the distributed
matrix B(nrhs≥ 0).

dl (local).
Pointer to local part of global vector storing the lower diagonal of the
matrix. Globally, dl[0] is not referenced, and dl must be aligned with d.
Must be of size > desca[nb_ - 1].

d (local).
Pointer to local part of global vector storing the main diagonal of the matrix.

du (local).
Pointer to local part of global vector storing the upper diagonal of the
matrix. Globally, du[n - 1] is not referenced, and du must be aligned with
d.

ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).

desca (global and local) array of size dlen.

If 1d type (dtype_a=501 or 502), dlen ≥ 7;

If 2d type (dtype_a=1), dlen ≥ 9.

The array descriptor for the distributed matrix A.

Contains information of mapping of A to memory.

b (local)
Pointer into the local memory to an array of local lead size lld_b > nb. On
entry, this array contains the local pieces of the right hand sides B(ib:ib
+n-1, 1:nrhs).

ib (global) The row index in the global matrix B indicating the first row of the
matrix to be operated on (which may be either all of b or a submatrix of B).

descb (global and local) array of size dlen.

If 1d type (dtype_b =502), dlen ≥ 7;

If 2d type (dtype_b =1), dlen ≥ 9.

The array descriptor for the distributed matrix B.

Contains information of mapping of B to memory.

work (local).

1527
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lwork (local or global) Size of user-input workspace work. If lwork is too small,
the minimal acceptable size will be returned in work[0] and an error code
is returned. lwork > (12*NPCOL+3*nb)+max((10+2*min(100,
nrhs))*NPCOL+4*nrhs, 8*NPCOL)

Output Parameters

dl On exit, this array contains information containing the * factors of the

matrix.

d On exit, this array contains information containing the * factors of the

matrix. Must be of size > desca[nb_ - 1].

du On exit, this array contains information containing the * factors of the

matrix. Must be of size > desca[nb_ - 1].

b On exit, this contains the local piece of the solutions distributed matrix X.

work On exit, work[0] contains the minimal lwork.

info (local) If info=0, the execution is successful.

< 0: If the i-th argument is an array and the j-entry had an illegal value,
then info = -(i*100+j), if the i-th argument is a scalar and had an
illegal value, then info = -i.

> 0: If info = k<NPROCS, the submatrix stored on processor info and

factored locally was not positive definite, and the factorization was not
completed.
If info = k > NPROCS, the submatrix stored on processor info-NPROCS
representing interactions with other processors was not positive definite,
and the factorization was not completed.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?posv
Solves a symmetric positive definite system of linear
equations.

Syntax
void psposv (char *uplo , MKL_INT *n , MKL_INT *nrhs , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT
*info );
void pdposv (char *uplo , MKL_INT *n , MKL_INT *nrhs , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT
*info );
void pcposv (char *uplo , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , MKL_INT *info );
void pzposv (char *uplo , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , MKL_INT *info );

1528
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl_scalapack.h

Description
The p?posvfunction computes the solution to a real/complex system of linear equations

sub(A)*X = sub(B),
where sub(A) denotes A(ia:ia+n-1,ja:ja+n-1) and is an n-by-n symmetric/Hermitian distributed positive
definite matrix and X and sub(B) denoting B(ib:ib+n-1,jb:jb+nrhs-1) are n-by-nrhs distributed
matrices. The Cholesky decomposition is used to factor sub(A) as
sub(A) = UT*U, if uplo = 'U', or
sub(A) = L*LT, if uplo = 'L',
where U is an upper triangular matrix and L is a lower triangular matrix. The factored form of sub(A) is then
used to solve the system of equations.

Input Parameters

uplo (global) Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of sub(A) is stored.

n (global) The order of the distributed matrix sub(A) (n≥ 0).

nrhs The number of right-hand sides; the number of columns of the distributed
matrix sub(B) (nrhs≥ 0).

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1). On
entry, this array contains the local pieces of the n-by-n symmetric
distributed matrix sub(A) to be factored.
If uplo = 'U', the leading n-by-n upper triangular part of sub(A) contains
the upper triangular part of the matrix, and its strictly lower triangular part
is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
the lower triangular part of the distributed matrix, and its strictly upper
triangular part is not referenced.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

b (local)
Pointer into the local memory to an array of size lld_b*LOCc(jb+nrhs-1).
On entry, the local pieces of the right hand sides distributed matrix sub(B).

ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of the submatrix B, respectively.

descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.

1529
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

a On exit, if info = 0, this array contains the local pieces of the factor U or L
from the Cholesky factorization sub(A) = UH*U, or L*LH.

b On exit, if info = 0, sub(B) is overwritten by the solution distributed

matrix X.

info (global)
If info =0, the execution is successful.

If info < 0: If the i-th argument is an array and the j-th entry, indexed
j-1, had an illegal value, then info = -(i*100+j), if the i-th argument is
a scalar and had an illegal value, then info = -i.

If info > 0: If info = k, the leading minor of order k, A(ia:ia+k-1,

ja:ja+k-1) is not positive definite, and the factorization could not be
completed, and the solution has not been computed.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?posvx
Solves a symmetric or Hermitian positive definite
system of linear equations.

Syntax
void psposvx (char *fact , char *uplo , MKL_INT *n , MKL_INT *nrhs , float *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , float *af , MKL_INT *iaf , MKL_INT *jaf , MKL_INT
*descaf , char *equed , float *sr , float *sc , float *b , MKL_INT *ib , MKL_INT *jb ,
MKL_INT *descb , float *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , float *rcond ,
float *ferr , float *berr , float *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT
*liwork , MKL_INT *info );
void pdposvx (char *fact , char *uplo , MKL_INT *n , MKL_INT *nrhs , double *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , double *af , MKL_INT *iaf , MKL_INT *jaf , MKL_INT
*descaf , char *equed , double *sr , double *sc , double *b , MKL_INT *ib , MKL_INT
*jb , MKL_INT *descb , double *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , double
*rcond , double *ferr , double *berr , double *work , MKL_INT *lwork , MKL_INT *iwork ,
MKL_INT *liwork , MKL_INT *info );
void pcposvx (char *fact , char *uplo , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *af , MKL_INT *iaf , MKL_INT
*jaf , MKL_INT *descaf , char *equed , float *sr , float *sc , MKL_Complex8 *b , MKL_INT
*ib , MKL_INT *jb , MKL_INT *descb , MKL_Complex8 *x , MKL_INT *ix , MKL_INT *jx ,
MKL_INT *descx , float *rcond , float *ferr , float *berr , MKL_Complex8 *work ,
MKL_INT *lwork , float *rwork , MKL_INT *lrwork , MKL_INT *info );
void pzposvx (char *fact , char *uplo , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *af , MKL_INT *iaf , MKL_INT
*jaf , MKL_INT *descaf , char *equed , double *sr , double *sc , MKL_Complex16 *b ,
MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_Complex16 *x , MKL_INT *ix , MKL_INT
*jx , MKL_INT *descx , double *rcond , double *ferr , double *berr , MKL_Complex16
*work , MKL_INT *lwork , double *rwork , MKL_INT *lrwork , MKL_INT *info );

1530
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl_scalapack.h

Description
The p?posvxfunction uses the Cholesky factorization A=UT*U or A=L*LT to compute the solution to a real or
complex system of linear equations
A(ia:ia+n-1, ja:ja+n-1)*X = B(ib:ib+n-1, jb:jb+nrhs-1),
where A(ia:ia+n-1, ja:ja+n-1) is a n-by-n matrix and X and B(ib:ib+n-1,jb:jb+nrhs-1) are n-by-
nrhs matrices.
Error bounds on the solution and a condition estimate are also provided.
In the following comments y denotes Y(iy:iy+m-1, jy:jy+k-1), an m-by-k matrix where y can be a, af, b
and x.
The function p?posvx performs the following steps:

1. If fact = 'E', real scaling factors s are computed to equilibrate the system:

diag(sr)*A*diag(sc)*inv(diag(sc))*X = diag(sr)*B
Whether or not the system will be equilibrated depends on the scaling of the matrix A, but if
equilibration is used, A is overwritten by diag(sr)*A*diag(sc) and B by diag(sr)*B .
2. If fact = 'N' or 'E', the Cholesky decomposition is used to factor the matrix A (after equilibration if
fact = 'E') as
A = UT*U, if uplo = 'U', or
A = L*LT, if uplo = 'L',
where U is an upper triangular matrix and L is a lower triangular matrix.
3. The factored form of A is used to estimate the condition number of the matrix A. If the reciprocal of the
condition number is less than machine precision, steps 4-6 are skipped
4. The system of equations is solved for X using the factored form of A.
5. Iterative refinement is applied to improve the computed solution matrix and calculate error bounds and
backward error estimates for it.
6. If equilibration was used, the matrix X is premultiplied by diag(sr) so that it solves the original system
before equilibration.

Input Parameters

fact (global) Must be 'F', 'N', or 'E'.

Specifies whether or not the factored form of the matrix A is supplied on

entry, and if not, whether the matrix A should be equilibrated before it is
factored.
If fact = 'F': on entry, af contains the factored form of A. If equed =
'Y', the matrix A has been equilibrated with scaling factors given by s. a
and af will not be modified.
If fact = 'N', the matrix A will be copied to af and factored.

If fact = 'E', the matrix A will be equilibrated if necessary, then copied to

af and factored.

uplo (global) Must be 'U' or 'L'.

Indicates whether the upper or lower triangular part of A is stored.

1531
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

n (global) The order of the distributed matrix sub(A) (n≥ 0).

nrhs (global) The number of right-hand sides; the number of columns of the
distributed submatrices B and X. (nrhs≥ 0).

a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja
+n-1). On entry, the symmetric/Hermitian matrix A, except if fact = 'F'
and equed = 'Y', then A must contain the equilibrated matrix
diag(sr)*A*diag(sc).
If uplo = 'U', the leading n-by-n upper triangular part of A contains the
upper triangular part of the matrix A, and the strictly lower triangular part
of A is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of A contains the
lower triangular part of the matrix A, and the strictly upper triangular part
of A is not referenced. A is not modified if fact = 'F' or 'N', or if fact =
'E' and equed = 'N' on exit.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

af (local)
Pointer into the local memory to an array of local size lld_af*LOCc(ja
+n-1).
If fact = 'F', then af is an input argument and on entry contains the
triangular factor U or L from the Cholesky factorization A = UT*U or A =
L*LT, in the same storage format as A. If equed ≠ 'N', then af is the
factored form of the equilibrated matrix diag(sr)*A*diag(sc).

iaf, jaf (global) The row and column indices in the global matrix AF indicating the
first row and the first column of the submatrix AF, respectively.

descaf (global and local) array of size dlen_. The array descriptor for the
distributed matrix AF.

equed (global) Must be 'N' or 'Y'.

equed is an input argument if fact = 'F'. It specifies the form of

equilibration that was done:
If equed = 'N', no equilibration was done (always true if fact = 'N');

If equed = 'Y', equilibration was done and A has been replaced by

diag(sr)*A*diag(sc).

sr (local)
Array of size lld_a.
The array s contains the scale factors for A. This array is an input argument
if fact = 'F' only; otherwise it is an output argument.

If equed = 'N', s is not accessed.

1532
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If fact = 'F' and equed = 'Y', each element of s must be positive.

b (local)
Pointer into the local memory to an array of local size lld_b*LOCc(jb
+nrhs-1). On entry, the n-by-nrhs right-hand side matrix B.

ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of the submatrix B, respectively.

descb (global and local) Array of size dlen_. The array descriptor for the
distributed matrix B.

x (local)
Pointer into the local memory to an array of local size lld_x*LOCc(jx
+nrhs-1).

ix, jx (global) The row and column indices in the global matrix X indicating the
first row and the first column of the submatrix X, respectively.

descx (global and local) array of size dlen_. The array descriptor for the
distributed matrix X.

work (local)
Workspace array of size lwork.

lwork (local or global)

The size of the array work. lwork is local input and must be at least lwork
= max(p?pocon(lwork), p?porfs(lwork)) + LOCr(n_a).
lwork = 3*desca[lld_ - 1].
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.

iwork (local) Workspace array of size liwork.

liwork (local or global)

The size of the array iwork. liwork is local input and must be at least
liwork = desca[lld_ - 1]liwork = LOCr(n_a).
If liwork = -1, then liwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.

Output Parameters

a On exit, if fact = 'E' and equed = 'Y', a is overwritten by

diag(sr)*a*diag(sc).

af If fact = 'N', then af is an output argument and on exit returns the

triangular factor U or L from the Cholesky factorization A = UT*U or A =
L*LT of the original matrix A.

1533
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If fact = 'E', then af is an output argument and on exit returns the

triangular factor U or L from the Cholesky factorization A = UT*U or A =
L*LT of the equilibrated matrix A (see the description of A for the form of
the equilibrated matrix).

equed If fact≠'F' , then equed is an output argument. It specifies the form of

equilibration that was done (see the description of equed in Input
Arguments section).

sr This array is an output argument if fact≠'F'.

See the description of sr in Input Arguments section.

sc This array is an output argument if fact≠'F'.

See the description of sc in Input Arguments section.

b On exit, if equed = 'N', b is not modified; if trans = 'N' and equed =

'R' or 'B', b is overwritten by diag(r)*b; if trans = 'T' or 'C' and
equed = 'C' or 'B', b is overwritten by diag(c)*b.

x (local)
If info = 0 the n-by-nrhs solution matrix X to the original system of
equations.
Note that A and B are modified on exit if equed≠'N', and the solution to
the equilibrated system is
inv(diag(sc))*X if trans = 'N' and equed = 'C' or 'B', or
inv(diag(sr))*X if trans = 'T' or 'C' and equed = 'R' or 'B'.

rcond (global)
An estimate of the reciprocal condition number of the matrix A after
equilibration (if done). If rcond is less than the machine precision (in
particular, if rcond=0), the matrix is singular to working precision. This
condition is indicated by a return code of info > 0.

ferr Arrays of size at least max(LOC,n_b). The estimated forward error bounds
for each solution vector X(j) (the j-th column of the solution matrix X). If
xtrue is the true solution, ferr[j - 1] bounds the magnitude of the largest
entry in (X(j) - xtrue) divided by the magnitude of the largest entry in
X(j). The quality of the error bound depends on the quality of the estimate
of norm(inv(A)) computed in the code; if the estimate of norm(inv(A))
is accurate, the error bound is guaranteed.

berr (local)
Arrays of size at least max(LOC,n_b). The componentwise relative
backward error of each solution vector X(j) (the smallest relative change in
any entry of A or B that makes X(j) an exact solution).

work[0] (local) On exit, work[0] returns the minimal and optimal liwork.

info (global)
If info=0, the execution is successful.

< 0: if info = -i, the i-th argument had an illegal value

1534
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
> 0: if info = i, and i is ≤ n: if info = i, the leading minor of order i of
a is not positive definite, so the factorization could not be completed, and
the solution and error bounds could not be computed.
= n+1: rcond is less than machine precision. The factorization has been
completed, but the matrix is singular to working precision, and the solution
and error bounds have not been computed.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?pbsv
Solves a symmetric/Hermitian positive definite banded
system of linear equations.

Syntax
void pspbsv (char *uplo , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs , float *a , MKL_INT
*ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *descb , float *work , MKL_INT
*lwork , MKL_INT *info );
void pdpbsv (char *uplo , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs , double *a , MKL_INT
*ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *descb , double *work , MKL_INT
*lwork , MKL_INT *info );
void pcpbsv (char *uplo , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs , MKL_Complex8 *a ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *descb ,
MKL_Complex8 *work , MKL_INT *lwork , MKL_INT *info );
void pzpbsv (char *uplo , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs , MKL_Complex16 *a ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *descb ,
MKL_Complex16 *work , MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?pbsvfunction solves a system of linear equations

A(1:n, ja:ja+n-1)*X = B(ib:ib+n-1, 1:nrhs),

where A(1:n, ja:ja+n-1) is an n-by-n real/complex banded symmetric positive definite distributed matrix
with bandwidth bw.
Cholesky factorization is used to factor a reordering of the matrix into L*L'.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

uplo (global) Must be 'U' or 'L'.

Indicates whether the upper or lower triangular of A is stored.

1535
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If uplo = 'U', the upper triangular A is stored

If uplo = 'L', the lower triangular of A is stored.

n (global) The order of the distributed matrix A(n≥ 0).

bw (global) The number of subdiagonals in L or U. 0 ≤ bw ≤ n-1.

nrhs (global) The number of right-hand sides; the number of columns in

B(nrhs≥ 0).

a (local).
Pointer into the local memory to an array with leading size lld_a ≥ (bw
+1) (stored in desca). On entry, this array contains the local pieces of the
distributed matrix sub(A) to be factored.

ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

b (local)
Pointer into the local memory to an array of local lead size lld_b ≥ nb. On
entry, this array contains the local pieces of the right hand sides B(ib:ib
+n-1, 1:nrhs).

ib (global) The row index in the global matrix B indicating the first row of the
matrix to be operated on (which may be either all of b or a submatrix of B).

descb (global and local) array of size dlen.

If 1D type (dtype_b =502), dlen ≥ 7;

If 2D type (dtype_b =1), dlen ≥ 9.

The array descriptor for the distributed matrix B.

Contains information of mapping of B to memory.

work (local).
Temporary workspace. This space may be overwritten in between calls to
functions. work must be the size given in lwork.

lwork (local or global) Size of user-input workspace work. If lwork is too small,
the minimal acceptable size will be returned in work[0] and an error code
is returned. lwork ≥ (nb+2*bw)*bw +max((bw*nrhs), bw*bw)

Output Parameters

a On exit, this array contains information containing details of the

factorization. Note that permutations are performed on the matrix, so that
the factors returned are different from those returned by LAPACK.

b On exit, contains the local piece of the solutions distributed matrix X.

work On exit, work[0] contains the minimal lwork.

info (global) If info=0, the execution is successful.

1536
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
< 0: If the i-th argument is an array and the j-entry had an illegal value,
then info = -(i*100+j), if the i-th argument is a scalar and had an
illegal value, then info = -i.

> 0: If info = k ≤ NPROCS, the submatrix stored on processor info and

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?ptsv

Syntax
Solves a symmetric or Hermitian positive definite tridiagonal system of linear equations.
void psptsv (MKL_INT *n , MKL_INT *nrhs , float *d , float *e , MKL_INT *ja , MKL_INT
*desca , float *b , MKL_INT *ib , MKL_INT *descb , float *work , MKL_INT *lwork ,
MKL_INT *info );
void pdptsv (MKL_INT *n , MKL_INT *nrhs , double *d , double *e , MKL_INT *ja , MKL_INT
*desca , double *b , MKL_INT *ib , MKL_INT *descb , double *work , MKL_INT *lwork ,
MKL_INT *info );
void pcptsv (char *uplo , MKL_INT *n , MKL_INT *nrhs , float *d , MKL_Complex8 *e ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *descb ,
MKL_Complex8 *work , MKL_INT *lwork , MKL_INT *info );
void pzptsv (char *uplo , MKL_INT *n , MKL_INT *nrhs , double *d , MKL_Complex16 *e ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *descb ,
MKL_Complex16 *work , MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?ptsvfunction solves a system of linear equations

A(1:n, ja:ja+n-1)*X = B(ib:ib+n-1, 1:nrhs),

where A(1:n, ja:ja+n-1) is an n-by-n real tridiagonal symmetric positive definite distributed matrix.

Cholesky factorization is used to factor a reordering of the matrix into L*L'.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

1537
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

n (global) The order of matrix A(n≥ 0).

nrhs (global) The number of right-hand sides; the number of columns of the
distributed submatrix B(nrhs≥ 0).

d (local)
Pointer to local part of global vector storing the main diagonal of the matrix.

e (local)
Pointer to local part of global vector storing the upper diagonal of the
matrix. Globally, du(n) is not referenced, and du must be aligned with d.

ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).

desca (global and local) array of size dlen.

If 1d type (dtype_a=501 or 502), dlen ≥ 7;

If 2d type (dtype_a=1), dlen ≥ 9.

The array descriptor for the distributed matrix A.

Contains information of mapping of A to memory.

b (local)
Pointer into the local memory to an array of local lead size lld_b ≥ nb.

On entry, this array contains the local pieces of the right hand sides
B(ib:ib+n-1, 1:nrhs).

ib (global) The row index in the global matrix B indicating the first row of the
matrix to be operated on (which may be either all of b or a submatrix of B).

descb (global and local) array of size dlen.

If 1d type (dtype_b = 502), dlen ≥ 7;

If 2d type (dtype_b = 1), dlen ≥ 9.

The array descriptor for the distributed matrix B.

Contains information of mapping of B to memory.

work (local).
Temporary workspace. This space may be overwritten in between calls to
functions. work must be the size given in lwork.

Output Parameters

d On exit, this array contains information containing the factors of the matrix.
Must be of size greater than or equal to desca[nb_ - 1].

1538
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
e On exit, this array contains information containing the factors of the matrix.
Must be of size greater than or equal to desca[nb_ - 1].

b On exit, this contains the local piece of the solutions distributed matrix X.

work On exit, work[0] contains the minimal lwork.

info (local) If info=0, the execution is successful.

< 0: If the i-th argument is an array and the j-entry had an illegal value,
then info = -(i*100+j), if the i-th argument is a scalar and had an
illegal value, then info = -i.

> 0: If info = k ≤ NPROCS, the submatrix stored on processor info and

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?gels
Solves overdetermined or underdetermined linear
systems involving a matrix of full rank.

Syntax
void psgels (char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *nrhs , float *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , float *work , MKL_INT *lwork , MKL_INT *info );
void pdgels (char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *nrhs , double *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , double *work , MKL_INT *lwork , MKL_INT *info );
void pcgels (char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT
*jb , MKL_INT *descb , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT *info );
void pzgels (char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT
*jb , MKL_INT *descb , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?gels function solves overdetermined or underdetermined real/ complex linear systems involving an
m-by-n matrix sub(A) = A(ia:ia+m-1,ja:ja+n-1), or its transpose/ conjugate-transpose, using a QTQ or
LQ factorization of sub(A). It is assumed that sub(A) has full rank.
The following options are provided:

1. If trans = 'N' and m≥n: find the least squares solution of an overdetermined system, that is, solve
the least squares problem

1539
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

minimize ||sub(B) - sub(A)*X||

2. If trans = 'N' and m < n: find the minimum norm solution of an underdetermined system sub(A)*X
= sub(B).
3. If trans = 'T' and m≥n: find the minimum norm solution of an undetermined system sub(A)T*X =
sub(B).
4. If trans = 'T' and m < n: find the least squares solution of an overdetermined system, that is, solve
the least squares problem
minimize ||sub(B) - sub(A)T*X||,
where sub(B) denotes B(ib:ib+m-1, jb:jb+nrhs-1) when trans = 'N' and B(ib:ib+n-1,
jb:jb+nrhs-1) otherwise. Several right hand side vectors b and solution vectors x can be handled in a
single call; when trans = 'N', the solution vectors are stored as the columns of the n-by-nrhs right
hand side matrix sub(B) and the m-by-nrhs right hand side matrix sub(B) otherwise.

Input Parameters

trans (global) Must be 'N', or 'T'.

If trans = 'N', the linear system involves matrix sub(A);

If trans = 'T', the linear system involves the transposed matrix AT (for
real flavors only).

m (global) The number of rows in the distributed matrix sub (A) (m≥ 0).

n (global) The number of columns in the distributed matrix sub (A) (n≥ 0).

nrhs (global) The number of right-hand sides; the number of columns in the
distributed submatrices sub(B) and X. (nrhs≥ 0).

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1). On
entry, contains the m-by-n matrix A.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

b (local)
Pointer into the local memory to an array of local size lld_b*LOCc(jb
+nrhs-1). On entry, this array contains the local pieces of the distributed
matrix B of right-hand side vectors, stored columnwise; sub(B) is m-by-
nrhs if trans='N', and n-by-nrhs otherwise.

ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of the submatrix B, respectively.

descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.

work (local)
Workspace array with size lwork.

lwork (local or global) .

1540
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The size of the array worklwork is local input and must be at least lwork ≥
ltau + max(lwf, lws), where if m > n, then
ltau = numroc(ja+min(m,n)-1, nb_a, MYCOL, csrc_a, NPCOL),
lwf = nb_a*(mpa0 + nqa0 + nb_a)
lws = max((nb_a*(nb_a-1))/2, (nrhsqb0 + mpb0)*nb_a) +
nb_a*nb_a
else
ltau = numroc(ia+min(m,n)-1, mb_a, MYROW, rsrc_a, NPROW),
lwf = mb_a * (mpa0 + nqa0 + mb_a)
lws = max((mb_a*(mb_a-1))/2, (npb0 + max(nqa0 +
numroc(numroc(n+iroffb, mb_a, 0, 0, NPROW), mb_a, 0, 0,
lcmp), nrhsqb0))*mb_a) + mb_a*mb_a
end if,
where lcmp = lcm/NPROW with lcm = ilcm(NPROW, NPCOL),

iroffa = mod(ia-1, mb_a),

icoffa = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),
iacol= indxg2p(ja, nb_a, MYROW, rsrc_a, NPROW)
mpa0 = numroc(m+iroffa, mb_a, MYROW, iarow, NPROW),
nqa0 = numroc(n+icoffa, nb_a, MYCOL, iacol, NPCOL),
iroffb = mod(ib-1, mb_b),
icoffb = mod(jb-1, nb_b),
ibrow = indxg2p(ib, mb_b, MYROW, rsrc_b, NPROW),
ibcol = indxg2p(jb, nb_b, MYCOL, csrc_b, NPCOL),
mpb0 = numroc(m+iroffb, mb_b, MYROW, icrow, NPROW),
nqb0 = numroc(n+icoffb, nb_b, MYCOL, ibcol, NPCOL),

NOTE
mod(x,y) is the integer remainder of x/y.

ilcm, indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL,
NPROW, and NPCOL can be determined by calling the function
blacs_gridinfo.
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.

1541
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

a On exit, If m≥n, sub(A) is overwritten by the details of its QR factorization as

returned by p?geqrf; if m < n, sub(A) is overwritten by details of its LQ
factorization as returned by p?gelqf.

b On exit, sub(B) is overwritten by the solution vectors, stored columnwise: if

trans = 'N' and m≥n, rows 1 to n of sub(B) contain the least squares
solution vectors; the residual sum of squares for the solution in each
column is given by the sum of squares of elements n+1 to m in that
column;
If trans = 'N' and m < n, rows 1 to n of sub(B) contain the minimum
norm solution vectors;
If trans = 'T' and m≥n, rows 1 to m of sub(B) contain the minimum norm
solution vectors; if trans = 'T' and m < n, rows 1 to m of sub(B) contain
the least squares solution vectors; the residual sum of squares for the
solution in each column is given by the sum of squares of elements m+1 to n
in that column.

work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.

info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-entry had an illegal value,
then info = - (i* 100+j), if the i-th argument is a scalar and had an
illegal value, then info = -i.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?syev
Computes all eigenvalues and, optionally,
eigenvectors of a symmetric matrix.

Syntax
void pssyev (char *jobz , char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *w , float *z , MKL_INT *iz , MKL_INT *jz , MKL_INT
*descz , float *work , MKL_INT *lwork , MKL_INT *info );
void pdsyev (char *jobz , char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *w , double *z , MKL_INT *iz , MKL_INT *jz , MKL_INT
*descz , double *work , MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?syevfunction computes all eigenvalues and, optionally, eigenvectors of a real symmetric matrix A by
calling the recommended sequence of ScaLAPACK functions.

1542
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
In its present form, the function assumes a homogeneous system and makes no checks for consistency of
the eigenvalues or eigenvectors across the different processes. Because of this, it is possible that a
heterogeneous system may return incorrect results without any error messages.

Input Parameters
np = the number of rows local to a given process.
nq = the number of columns local to a given process.

jobz (global) Must be 'N' or 'V'. Specifies if it is necessary to compute the

eigenvectors:
If jobz ='N', then only eigenvalues are computed.

If jobz ='V', then eigenvalues and eigenvectors are computed.

uplo (global) Must be 'U' or 'L'. Specifies whether the upper or lower
triangular part of the symmetric matrix A is stored:
If uplo = 'U', a stores the upper triangular part of A.

If uplo = 'L', a stores the lower triangular part of A.

n (global) The number of rows and columns of the matrix A(n≥ 0).

a (local)
Block cyclic array of global size n*n and local size lld_a*LOCc(ja+n-1).
On entry, the symmetric matrix A.
If uplo = 'U', only the upper triangular part of A is used to define the
elements of the symmetric matrix.
If uplo = 'L', only the lower triangular part of A is used to define the
elements of the symmetric matrix.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

iz, jz (global) The row and column indices in the global matrix Z indicating the
first row and the first column of the submatrix Z, respectively.

descz (global and local) array of size dlen_. The array descriptor for the
distributed matrix Z.

work (local)
Array of size lwork.

lwork (local) See below for definitions of variables used to define lwork.
If no eigenvectors are requested (jobz = 'N'), then lwork ≥ 5*n +
sizesytrd + 1,
where sizesytrdis the workspace for p?sytrd and is max(NB*(np +1),
3*NB).
If eigenvectors are requested (jobz = 'V') then the amount of workspace
required to guarantee that all eigenvectors are computed is:
qrmem = 2*n-2

1543
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lwmin = 5n + nldc + max(sizemqrleft, qrmem) + 1

Variable definitions:
nb = desca[mb_ - 1] = desca[nb_ - 1] = descz[mb_ - 1] =
descz[nb_ - 1];
nn = max(n, nb, 2);
desca[rsrc_ - 1] = desca[rsrc_ - 1] = descz[rsrc_ - 1] =
descz[csrc_ - 1] = 0
np = numroc(nn, nb, 0, 0, NPROW)
nq = numroc(max(n, nb, 2), nb, 0, 0, NPCOL)
nrc = numroc(n, nb, myprowc, 0, NPROCS)
ldc = max(1, nrc)
sizemqrleft is the workspace for p?ormtr when its side argument is 'L'.
myprowc is defined when a new context is created as follows:
call blacs_get(desca[ctxt_ - 1], 0, contextc)
call blacs_gridinit(contextc, 'R', NPROCS, 1)
call blacs_gridinfo(contextc, nprowc, npcolc, myprowc,
mypcolc)
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.

Output Parameters

a On exit, the lower triangle (if uplo='L') or the upper triangle (if
uplo='U') of A, including the diagonal, is destroyed.

w (global).
Array of size n.
On normal exit, the first entries contain the selected eigenvalues in
ascending order.

z (local).
Array, global size n*n, local size lld_z*LOCc(jz+n-1). If jobz = 'V',
then on normal exit the first columns of z contain the orthonormal
eigenvectors of the matrix corresponding to the selected eigenvalues.
If jobz = 'N', then z is not referenced.

work[0] On output, work[0] returns the workspace needed to guarantee

completion. If the input parameters are incorrect, work[0] may also be
incorrect.
If jobz = 'N'work[0] = minimal (optimal) amount of workspace

If jobz = 'V'work[0] = minimal workspace required to generate all the

eigenvectors.

1544
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
info (global)
If info = 0, the execution is successful.

If info < 0: If the i-th argument is an array and the j-entry had an illegal
value, then info = -(i*100+j), if the i-th argument is a scalar and had
an illegal value, then info = -i.

If info > 0:

If info= 1 through n, the i-th eigenvalue did not converge in ?steqr2

after a total of 30n iterations.
If info= n+1, then p?syev has detected heterogeneity by finding that
eigenvalues were not identical across the process grid. In this case, the
accuracy of the results from p?syev cannot be guaranteed.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?syevd
Computes all eigenvalues and eigenvectors of a real
symmetric matrix by using a divide and conquer
algorithm.

Syntax
void pssyevd (char *jobz , char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *w , float *z , MKL_INT *iz , MKL_INT *jz , MKL_INT
*descz , float *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork , MKL_INT
*info );
void pdsyevd (char *jobz , char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *w , double *z , MKL_INT *iz , MKL_INT *jz , MKL_INT
*descz , double *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork , MKL_INT
*info );

Include Files
• mkl_scalapack.h

Description
The p?syevd function computes all eigenvalues and eigenvectors of a real symmetric matrix A by using a
divide and conquer algorithm.

Input Parameters
np = the number of rows local to a given process.
nq = the number of columns local to a given process.

jobz (global) Must be 'N' or 'V'.

Specifies whether it is necessary to compute the eigenvectors:

If jobz = 'N', then only eigenvalues are computed (not yet
implemented).
If jobz = 'V', then eigenvalues and eigenvectors are computed.

1545
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

uplo (global) Must be 'U' or 'L'.

Specifies whether the upper or lower triangular part of the Hermitian matrix
A is stored:
If uplo = 'U', a stores the upper triangular part of A.

If uplo = 'L', a stores the lower triangular part of A.

n (global) The number of rows and columns of the matrix A(n≥ 0).

a (local).
Block cyclic array of global size n*n and local size lld_a*LOCc(ja+n-1).
On entry, the symmetric matrix A.
If uplo = 'U', only the upper triangular part of A is used to define the
elements of the symmetric matrix.
If uplo = 'L', only the lower triangular part of A is used to define the
elements of the symmetric matrix.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A. If desca[ctxt_ - 1] is incorrect, p?syevd cannot
guarantee correct error reporting.

iz, jz (global) The row and column indices in the global matrix Z indicating the
first row and the first column of the submatrix Z, respectively.

descz (global and local) array of size dlen_. The array descriptor for the
distributed matrix Z. descz[ctxt_ - 1] must equal desca[ctxt_ - 1].

work (local).
Array of size lwork.

lwork (local) The size of the array work.

If eigenvalues are requested:
lwork≥ max( 1+6*n + 2*np*nq, trilwmin) + 2*n
with trilwmin = 3*n + max( nb*( np + 1), 3*nb )

np = numroc( n, nb, myrow, iarow, NPROW)

nq = numroc( n, nb, mycol, iacol, NPCOL)
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the size required for optimal
performance for all work arrays. The required workspace is returned as the
first element of the corresponding work arrays, and no error message is
issued by pxerbla.

iwork (local) Workspace array of size liwork.

liwork (local) , size of iwork.

liwork = 7n + 8npcol + 2.

1546
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

a On exit, the lower triangle (if uplo = 'L'), or the upper triangle (if uplo =
'U') of A, including the diagonal, is overwritten.

w (global).
Array of size n. If info = 0, w contains the eigenvalues in the ascending
order.

z (local).
Array, global size (n, n), local size lld_z*LOCc(jz+n-1).

The z parameter contains the orthonormal eigenvectors of the matrix A.

work[0] On exit, returns adequate workspace to allow optimal performance.

iwork[0] (local).
On exit, if liwork > 0, iwork[0] returns the optimal liwork.

info (global)
If info = 0, the execution is successful.

If info < 0:

If the i-th argument is an array and the j-entry had an illegal value, then
info = -(i*100+j). If the i-th argument is a scalar and had an illegal
value, then info = -i.

If info> 0:

The algorithm failed to compute the info/(n+1)-th eigenvalue while

working on the submatrix lying in global rows and columns mod(info,n
+1).

NOTE
mod(x,y) is the integer remainder of x/y.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?syevr
Computes selected eigenvalues and, optionally,
eigenvectors of a real symmetric matrix using
Relatively Robust Representation.

Syntax
void pssyevr(char* jobz, char* range, char* uplo, MKL_INT* n, float* a, MKL_INT* ia,
MKL_INT* ja, MKL_INT* desca, float* vl, float* vu, MKL_INT* il, MKL_INT* iu, MKL_INT* m,
MKL_INT* nz, float* w, float* z, MKL_INT* iz, MKL_INT* jz, MKL_INT* descz, float* work,
MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork, MKL_INT* info);
void pdsyevr(char* jobz, char* range, char* uplo, MKL_INT* n, double* a, MKL_INT* ia,
MKL_INT* ja, MKL_INT* desca, double* vl, double* vu, MKL_INT* il, MKL_INT* iu, MKL_INT*
m, MKL_INT* nz, double* w, double* z, MKL_INT* iz, MKL_INT* jz, MKL_INT* descz, double*
work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork, MKL_INT* info);

1547
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl_scalapack.h

Description
p?syevr computes selected eigenvalues and, optionally, eigenvectors of a real symmetric matrix A
distributed in 2D blockcyclic format by calling the recommended sequence of ScaLAPACK functions.
First, the matrix A is reduced to real symmetric tridiagonal form. Then, the eigenproblem is solved using the
parallel MRRR algorithm. Last, if eigenvectors have been computed, a backtransformation is done.
Upon successful completion, each processor stores a copy of all computed eigenvalues in w. The eigenvector
matrix z is stored in 2D block-cyclic format distributed over all processors.

Note that subsets of eigenvalues/vectors can be selected by specifying a range of values or a range of indices
for the desired eigenvalues.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

jobz (global)
Specifies whether or not to compute the eigenvectors:
= 'N': Compute eigenvalues only.
= 'V': Compute eigenvalues and eigenvectors.

range (global)
= 'A': all eigenvalues will be found.
= 'V': all eigenvalues in the interval [vl,vu] will be found.

= 'I': the il-th through iu-th eigenvalues will be found.

uplo (global)
Specifies whether the upper or lower triangular part of the symmetric
matrix A is stored:
= 'U': Upper triangular
= 'L': Lower triangular

n (global )
The number of rows and columns of the matrix a. n≥ 0

a Block cyclic array of global size n * n), local size lld_a * LOCc(ja+n-1).

This array contains the local pieces of the symmetric distributed matrix A. If
uplo = 'U', only the upper triangular part of a is used to define the
elements of the symmetric matrix. If uplo = 'L', only the lower triangular
part of a is used to define the elements of the symmetric matrix.

On exit, the lower triangle (if uplo='L') or the upper triangle (if uplo='U')
of a, including the diagonal, is destroyed.

1548
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ia (global )
Global row index in the global matrix A that points to the beginning of the
submatrix which is to be operated on. It should be set to 1 when operating
on a full matrix.

ja (global )
Global column index in the global matrix A that points to the beginning of
the submatrix which is to be operated on. It should be set to 1 when
operating on a full matrix.

desca (global and local) array of size dlen_=9.

The array descriptor for the distributed matrix a.

vl (global )
If range='V', the lower bound of the interval to be searched for
eigenvalues. Not referenced if range = 'A' or 'I'.

vu (global )
If range='V', the upper bound of the interval to be searched for
eigenvalues. Not referenced if range = 'A' or 'I'.

il (global )
If range='I', the index (from smallest to largest) of the smallest eigenvalue
to be returned. il≥ 1.

Not referenced if range = 'A'.

iu (global )
If range='I', the index (from smallest to largest) of the largest eigenvalue
to be returned. min(il,n) ≤iu≤n.

Not referenced if range = 'A'.

iz (global )
Global row index in the global matrix Z that points to the beginning of the
submatrix which is to be operated on. It should be set to 1 when operating
on a full matrix.

jz (global )
Global column index in the global matrix Z that points to the beginning of
the submatrix which is to be operated on. It should be set to 1 when
operating on a full matrix.

descz array of size dlen_.

The array descriptor for the distributed matrix z.

The context descz[ctxt_ - 1] must equal desca[ctxt_ - 1]. Also note the
array alignment requirements specified below.

work (local workspace) array of size lwork

lwork (local )
Size of work, must be at least 3.

1549
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

See below for definitions of variables used to define lwork.

If no eigenvectors are requested (jobz = 'N') then

lwork≥ 2 + 5n + max( 12 nn, neig * ( np0 + 1 ) )

If eigenvectors are requested (jobz = 'V' ) then the amount of workspace
required is:
lwork≥ 2 + 5*n + max( 18*nn, np0 * mq0 + 2 * neig * neig ) + (2 +
iceil( neig, nprow*npcol))*nn
Variable definitions:
neig = number of eigenvectors requested
nb = desca[ mb_ - 1] = desca( nb_ ) = descz[ mb_ - 1] = descz( nb_ )

nn = max( n, neig, 2 )

desca[ rsrc_ - 1] = desca[ csrc_nb_ - 1] = descz[rsrc_ - 1] =

descz[csrc_ - 1] = 0
np0 = numroc( nn, neig, 0, 0, nprow )

mq0 = numroc( max( neig, neig, 2 ), neig, 0, 0, npcol )

iceil( x, y ) is a ScaLAPACK function returning ceiling(x/y), and nprow and

npcol can be determined by calling the function blacs_gridinfo.

If lwork = -1, then lwork is global input and a workspace query is

assumed; the function only calculates the size required for optimal
performance for all work arrays. Each of these values is returned in the first
entry of the corresponding work arrays, and no error message is issued by
pxerbla.

liwork (local )
size of iwork

Let nnp = max( n, nprow*npcol + 1, 4 ). Then:

liwork≥ 12nnp + 2n when the eigenvectors are desired

liwork≥ 10*nnp + 2*n when only the eigenvalues have to be computed

If liwork = -1, then liwork is global input and a workspace query is

OUTPUT Parameters

m (global )
Total number of eigenvalues found. 0 ≤m≤n.

nz (global )
Total number of eigenvectors computed. 0 ≤nz≤m.

The number of columns of z that are filled.

If jobz≠ 'V', nz is not referenced.

1550
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If jobz = 'V', nz = m

w (global ) array of size n

Upon successful exit, the first m entries contain the selected eigenvalues in
ascending order.

z Block-cyclic array, global sizenn, local size lld_zLOCc(jz+n-1).

On exit, contains local pieces of distributed matrix Z.

work On return, work[0] contains the optimal amount of workspace required for
efficient execution. If jobz='N' work[0] = optimal amount of workspace
required to compute the eigenvalues. If jobz='V' work[0] = optimal
amount of workspace required to compute eigenvalues and eigenvectors.

iwork (local workspace) array

On return, iwork[0] contains the amount of integer workspace required.

info (global )
= 0: successful exit
< 0: If the i-th argument is an array and the jth-entry had an illegal value,
then info = -(i*100+j), if the i-th argument is a scalar and had an illegal
value, then info = -i.

Application Notes
The distributed submatrices a(ia:*, ja:*) and z(iz:iz+m-1,jz:jz+n-1) must satisfy the following
alignment properties:

1. Identical (quadratic) dimension: desca[m_ - 1] = descz[m_ - 1] = desca[n_ - 1] = descz[n_ - 1]

2. Quadratic conformal blocking: desca[mb_ - 1] = desca[nb_ - 1] = descz[mb_ - 1] = descz[nb_ - 1],
desca[rsrc_ - 1] = descz[rsrc_ - 1]
3. mod( ia-1, mb_a ) = mod( iz-1, mb_z ) = 0

NOTE
mod(x,y) is the integer remainder of x/y.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?syevx
Computes selected eigenvalues and, optionally,
eigenvectors of a symmetric matrix.

Syntax
void pssyevx (char *jobz , char *range , char *uplo , MKL_INT *n , float *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , float *vl , float *vu , MKL_INT *il , MKL_INT *iu ,
float *abstol , MKL_INT *m , MKL_INT *nz , float *w , float *orfac , float *z , MKL_INT
*iz , MKL_INT *jz , MKL_INT *descz , float *work , MKL_INT *lwork , MKL_INT *iwork ,
MKL_INT *liwork , MKL_INT *ifail , MKL_INT *iclustr , float *gap , MKL_INT *info );

1551
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

void pdsyevx (char *jobz , char *range , char *uplo , MKL_INT *n , double *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , double *vl , double *vu , MKL_INT *il , MKL_INT
*iu , double *abstol , MKL_INT *m , MKL_INT *nz , double *w , double *orfac , double
*z , MKL_INT *iz , MKL_INT *jz , MKL_INT *descz , double *work , MKL_INT *lwork ,
MKL_INT *iwork , MKL_INT *liwork , MKL_INT *ifail , MKL_INT *iclustr , double *gap ,
MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?syevxfunction computes selected eigenvalues and, optionally, eigenvectors of a real symmetric matrix
A by calling the recommended sequence of ScaLAPACK functions. Eigenvalues and eigenvectors can be
selected by specifying either a range of values or a range of indices for the desired eigenvalues.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters
np = the number of rows local to a given process.
nq = the number of columns local to a given process.

jobz (global) Must be 'N' or 'V'. Specifies if it is necessary to compute the

eigenvectors:
If jobz ='N', then only eigenvalues are computed.

If jobz ='V', then eigenvalues and eigenvectors are computed.

range (global) Must be 'A', 'V', or 'I'.

If range = 'A', all eigenvalues will be found.

If range = 'V', all eigenvalues in the half-open interval [vl, vu] will be
found.
If range = 'I', the eigenvalues with indices il through iu will be found.

uplo (global) Must be 'U' or 'L'.

Specifies whether the upper or lower triangular part of the symmetric

matrix A is stored:
If uplo = 'U', a stores the upper triangular part of A.

If uplo = 'L', a stores the lower triangular part of A.

n (global) The number of rows and columns of the matrix A(n≥ 0).

1552
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If uplo = 'L', only the lower triangular part of A is used to define the
elements of the symmetric matrix.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

vl, vu (global)
If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues; vl ≤ vu. Not referenced if range = 'A' or 'I'.

il, iu (global)
If range ='I', the indices of the smallest and largest eigenvalues to be
returned.
Constraints: il ≥ 1

min(il,n) ≤ iu ≤ n
Not referenced if range = 'A' or 'V'.

abstol + eps * max(|a|,|b|),

where eps is the machine precision. If abstol is less than or equal to zero,
then eps*norm(T) will be used in its place, where norm(T) is the 1-norm of
the tridiagonal matrix obtained by reducing A to tridiagonal form.
Eigenvalues will be computed most accurately when abstol is set to twice
the underflow threshold 2*p?lamch('S') not zero. If this function returns
with (mod(info,2) ≠ 0) or (mod(info/8,2) ≠ 0)), indicating that some
eigenvalues or eigenvectors did not converge, try setting abstol to
2*p?lamch('S').

orfac (global).
Specifies which eigenvectors should be reorthogonalized. Eigenvectors that
correspond to eigenvalues which are within tol=orfac*norm(A)of each
other are to be reorthogonalized. However, if the workspace is insufficient
(see lwork), tol may be decreased until all eigenvectors to be
reorthogonalized can be stored in one process. No reorthogonalization will
be done if orfac equals zero. A default value of 1.0e-3 is used if orfac is
negative. orfac should be identical on all processes.

iz, jz (global) The row and column indices in the global matrix Z indicating the
first row and the first column of the submatrix Z, respectively.

descz (global and local) array of size dlen_. The array descriptor for the
distributed matrix Z.descz[ctxt_ - 1] must equal desca[ctxt_ - 1].

1553
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

work (local)
Array of size lwork.

lwork (local) The size of the array work.

See below for definitions of variables used to define lwork.
If no eigenvectors are requested (jobz = 'N'), then lwork ≥ 5*n +
max(5*nn, NB*(np0 + 1)).
If eigenvectors are requested (jobz = 'V'), then the amount of workspace
required to guarantee that all eigenvectors are computed is:
lwork ≥ 5*n + max(5*nn, np0*mq0 + 2*NB*NB) + iceil(neig,
NPROW*NPCOL)*nn
The computed eigenvectors may not be orthogonal if the minimal
workspace is supplied and orfac is too small. If you want to guarantee
orthogonality (at the cost of potentially poor performance) you should add
the following to lwork:
(clustersize-1)*n,
where clustersize is the number of eigenvalues in the largest cluster, where
a cluster is defined as a set of close eigenvalues:
{w[k - 1],..., w[k+clustersize-2]|w[j] ≤ w[j-1]) +
orfac*2*norm(A)},
where
neig = number of eigenvectors requested
nb = desca[mb_ - 1] = desca[nb_ - 1] = descz[mb_ - 1] =
descz[nb_ - 1];
nn = max(n, nb, 2);
desca[rsrc_ - 1] = desca[nb_ - 1] = descz[rsrc_ - 1] =
descz[csrc_ - 1] = 0;
np0 = numroc(nn, nb, 0, 0, NPROW);
mq0 = numroc(max(neig, nb, 2), nb, 0, 0, NPCOL)
iceil(x, y) is a ScaLAPACK function returning ceiling(x/y)
If lwork is too small to guarantee orthogonality, p?syevx attempts to
maintain orthogonality in the clusters with the smallest spacing between the
eigenvalues.
If lwork is too small to compute all the eigenvectors requested, no
computation is performed and info= -23 is returned.
Note that when range='V', number of requested eigenvectors are not
known until the eigenvalues are computed. In this case and if lwork is large
enough to compute the eigenvalues, p?sygvx computes the eigenvalues
and as many eigenvectors as possible.
Relationship between workspace, orthogonality & performance:
Greater performance can be achieved if adequate workspace is provided. In
some situations, performance can decrease as the provided workspace
increases above the workspace amount shown below:

1554
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lwork ≥ max(lwork, 5*n + nsytrd_lwopt),
where lwork, as defined previously, depends upon the number of
eigenvectors requested, and
nsytrd_lwopt = n + 2*(anb+1)*(4*nps+2) + (nps + 3)*nps;
anb = pjlaenv(desca[ctxt_ - 1], 3, 'p?syttrd', 'L', 0, 0, 0,
0);
sqnpc = int(sqrt(dble(NPROW * NPCOL)));
nps = max(numroc(n, 1, 0, 0, sqnpc), 2*anb);
numroc is a ScaLAPACK tool functions;
pjlaenv is a ScaLAPACK environmental inquiry function
MYROW, MYCOL, NPROW and NPCOL can be determined by calling the function
blacs_gridinfo.
For large n, no extra workspace is needed, however the biggest boost in
performance comes for small n, so it is wise to provide the extra workspace
(typically less than a megabyte per process).
If clustersize > n/sqrt(NPROW*NPCOL), then providing enough space
to compute all the eigenvectors orthogonally will cause serious degradation
in performance. At the limit (that is, clustersize = n-1) p?stein will
perform no better than ?stein on single processor.

For clustersize = n/sqrt(NPROW*NPCOL) reorthogonalizing all

eigenvectors will increase the total execution time by a factor of 2 or more.
For clustersize>n/sqrt(NPROW*NPCOL) execution time will grow as the
square of the cluster size, all other factors remaining equal and assuming
enough workspace. Less workspace means less reorthogonalization but
faster execution.
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the size required for optimal
performance for all work arrays. Each of these values is returned in the first
entry of the corresponding work arrays, and no error message is issued by
pxerbla.

iwork (local) Workspace array.

liwork (local) , size of iwork. liwork ≥ 6*nnp

Where: nnp = max(n, NPROW*NPCOL + 1, 4)

If liwork = -1, then liwork is global input and a workspace query is

Output Parameters

a On exit, the lower triangle (if uplo = 'L') or the upper triangle (if uplo =
'U')of A, including the diagonal, is overwritten.

m (global) The total number of eigenvalues found; 0 ≤ m ≤ n.

1555
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

nz (global) Total number of eigenvectors computed. 0 ≤ nz ≤ m.

The number of columns of z that are filled.

If jobz ≠ 'V', nz is not referenced.

If jobz = 'V', nz = m unless the user supplies insufficient space and

p?syevx is not able to detect this before beginning computation. To get all
the eigenvectors requested, the user must supply both sufficient space to
hold the eigenvectors in z (m≤descz[n_ - 1]) and sufficient workspace to
compute them. (See lwork). p?syevx is always able to detect insufficient
space without computation unless range = 'V'.

w (global).
Array of size n. The first m elements contain the selected eigenvalues in
ascending order.

z (local).
Array, global size n*n, local size lld_z*LOCc(jz+n-1).

If jobz = 'V', then on normal exit the first m columns of z contain the
orthonormal eigenvectors of the matrix corresponding to the selected
eigenvalues. If an eigenvector fails to converge, then that column of z
contains the latest approximation to the eigenvector, and the index of the
eigenvector is returned in ifail.
If jobz = 'N', then z is not referenced.

work[0] On exit, returns workspace adequate workspace to allow optimal

performance.

iwork[0] On return, iwork[0] contains the amount of integer workspace required.

ifail (global).
Array of size n.
If jobz = 'V', then on normal exit, the first m elements of ifail are zero. If
(mod(info,2) ≠ 0) on exit, then ifail contains the indices of the
eigenvectors that failed to converge.
If jobz = 'N', then ifail is not referenced.

iclustr (global) Array of size (2NPROWNPCOL)

This array contains indices of eigenvectors corresponding to a cluster of

eigenvalues that could not be reorthogonalized due to insufficient
workspace (see lwork, orfac and info). Eigenvectors corresponding to
clusters of eigenvalues indexed iclustr(2*i-1) to iclustr(2*i), could
not be reorthogonalized due to lack of workspace. Hence the eigenvectors
corresponding to these clusters may not be orthogonal. iclustr is a zero
terminated array. iclustr[2*k - 1] ≠ 0 and iclustr[2*k] = 0 if and
only if k is the number of clusters.
iclustr is not referenced if jobz = 'N'.

gap (global)
Array of size NPROW*NPCOL

1556
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
This array contains the gap between eigenvalues whose eigenvectors could
not be reorthogonalized. The output values in this array correspond to the
clusters indicated by the array iclustr. As a result, the dot product between
eigenvectors corresponding to the ith cluster may be as high as (C*n)/
gap[i - 1] where C is a small constant.

info (global)
If info = 0, the execution is successful.

If info < 0:

If the i-th argument is an array and the j-entry had an illegal value, then
info = -(i*100+j), if the i-th argument is a scalar and had an illegal
value, then info = -i.

If info> 0: if (mod(info,2)≠0), then one or more eigenvectors failed to

converge. Their indices are stored in ifail. Ensure
abstol=2.0*p?lamch('U').
If (mod(info/2,2)≠0), then eigenvectors corresponding to one or more
clusters of eigenvalues could not be reorthogonalized because of insufficient
workspace.The indices of the clusters are stored in the array iclustr.
If (mod(info/4,2)≠0), then space limit prevented p?syevxf rom
computing all of the eigenvectors between vl and vu. The number of
eigenvectors computed is returned in nz.
If (mod(info/8,2)≠0), then p?stebz failed to compute eigenvalues.
Ensure abstol=2.0*p?lamch('U').

NOTE
mod(x,y) is the integer remainder of x/y.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?heev
Computes all eigenvalues and, optionally,
eigenvectors of a complex Hermitian matrix.

Syntax
void pcheev (char *jobz , char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *w , MKL_Complex8 *z , MKL_INT *iz , MKL_INT *jz ,
MKL_INT *descz , MKL_Complex8 *work , MKL_INT *lwork , float *rwork , MKL_INT *lrwork ,
MKL_INT *info );
void pzheev (char *jobz , char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *w , MKL_Complex16 *z , MKL_INT *iz , MKL_INT
*jz , MKL_INT *descz , MKL_Complex16 *work , MKL_INT *lwork , double *rwork , MKL_INT
*lrwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

1557
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Description
The p?heev function computes all eigenvalues and, optionally, eigenvectors of a complex Hermitian matrix A
by calling the recommended sequence of ScaLAPACK functions. The function assumes a homogeneous
system and makes spot checks of the consistency of the eigenvalues across the different processes. A
heterogeneous system may return incorrect results without any error messages.

Input Parameters
np = the number of rows local to a given process.
nq = the number of columns local to a given process.

jobz (global) Must be 'N' or 'V'.

Specifies if it is necessary to compute the eigenvectors:

If jobz = 'N', then only eigenvalues are computed.

If jobz = 'V', then eigenvalues and eigenvectors are computed.

uplo (global) Must be 'U' or 'L'.

Specifies whether the upper or lower triangular part of the Hermitian matrix
A is stored:
If uplo = 'U', a stores the upper triangular part of A.

If uplo = 'L', a stores the lower triangular part of A.

n (global) The number of rows and columns of the matrix A(n≥ 0).

a (local).
Block cyclic array of global size n*n and local size lld_a*LOCc(ja+n-1).
On entry, the Hermitian matrix A.
If uplo = 'U', only the upper triangular part of A is used to define the
elements of the Hermitian matrix.
If uplo = 'L', only the lower triangular part of A is used to define the
elements of the Hermitian matrix.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A. If desca[ctxt_ - 1] is incorrect, p?heev cannot
guarantee correct error reporting.

iz, jz (global) The row and column indices in the global matrix Z indicating the
first row and the first column of the submatrix Z, respectively.

descz (global and local) array of size dlen_. The array descriptor for the
distributed matrix Z. descz[ctxt_ - 1] must equal desca[ctxt_ - 1].

work (local).
Array of size lwork.

lwork (local) The size of the array work.

If only eigenvalues are requested (jobz = 'N'):

lwork≥max(nb(np0 + 1), 3) + 3n

1558
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If eigenvectors are requested (jobz = 'V'), then the amount of workspace
required:
lwork≥ (np0+nq0+nb)*nb + 3*n + n2
with nb = desca[mb_ - 1] = desca[ nb_ - 1] = nb = descz[mb_ -
1] = descz[ nb_ - 1]
np0 = numroc(nn, nb, 0, 0, NPROW).
nq0 = numroc( max( n, nb, 2 ), nb, 0, 0, NPCOL).
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the size required for optimal
performance for all work arrays. The required workspace is returned as the
first element of the corresponding work arrays, and no error message is
issued by pxerbla.

rwork (local).
Workspace array of size lrwork.

lrwork (local) The size of the array rwork.

See below for definitions of variables used to define lrwork.

If no eigenvectors are requested (jobz = 'N'), then lrwork≥ 2*n.

If eigenvectors are requested (jobz = 'V'), then lrwork≥ 2n + 2n-2.

If lrwork = -1, then lrwork is global input and a workspace query is

assumed; the function only calculates the minimum size required for the
rwork array. The required workspace is returned as the first element of
rwork, and no error message is issued by pxerbla.

Output Parameters

a On exit, the lower triangle (if uplo = 'L'), or the upper triangle (if uplo =
'U') of A, including the diagonal, is overwritten.

w (global).
Array of size n. The first m elements contain the selected eigenvalues in
ascending order.

z (local).
Array, global size n*n, local size lld_z*LOCc(jz+n-1).

If jobz ='V', then on normal exit the first columns of z contain the
orthonormal eigenvectors of the matrix corresponding to the selected
eigenvalues. If an eigenvector fails to converge, then that column of z
contains the latest approximation to the eigenvector, and the index of the
eigenvector is returned in ifail.
If jobz = 'N', then z is not referenced.

work[0] On exit, returns adequate workspace to allow optimal performance.

If jobz ='N', then work[0] = minimal workspace only for eigenvalues.

If jobz ='V', then work[0] = minimal workspace required to generate all

the eigenvectors.

1559
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

rwork[0] (local)
On output, rwork[0] returns workspace required to guarantee completion.

info (global)
If info = 0, the execution is successful.

If info < 0:

If the i-th argument is an array and the j-entry had an illegal value, then
info = -(i*100+j). If the i-th argument is a scalar and had an illegal
value, then info = -i.

If info> 0:

If info = 1 through n, the i-th eigenvalue did not converge in ?steqr2

after a total of 30*n iterations.
If info = n+1, then p?heev detected heterogeneity, and the accuracy of
the results cannot be guaranteed.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?heevd
Computes all eigenvalues and eigenvectors of a
complex Hermitian matrix by using a divide and
conquer algorithm.

Syntax
void pcheevd (char *jobz , char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *w , MKL_Complex8 *z , MKL_INT *iz , MKL_INT *jz ,
MKL_INT *descz , MKL_Complex8 *work , MKL_INT *lwork , float *rwork , MKL_INT *lrwork ,
MKL_INT *iwork , MKL_INT *liwork , MKL_INT *info );
void pzheevd (char *jobz , char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *w , MKL_Complex16 *z , MKL_INT *iz , MKL_INT
*jz , MKL_INT *descz , MKL_Complex16 *work , MKL_INT *lwork , double *rwork , MKL_INT
*lrwork , MKL_INT *iwork , MKL_INT *liwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?heevd function computes all eigenvalues and eigenvectors of a complex Hermitian matrix A by using a
divide and conquer algorithm.

Input Parameters
np = the number of rows local to a given process.
nq = the number of columns local to a given process.

jobz (global) Must be 'N' or 'V'.

Specifies whether it is necessary to compute the eigenvectors:

1560
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If jobz = 'N', then only eigenvalues are computed (not yet
implemented).
If jobz = 'V', then eigenvalues and eigenvectors are computed.

uplo (global) Must be 'U' or 'L'.

Specifies whether the upper or lower triangular part of the Hermitian matrix
A is stored:
If uplo = 'U', a stores the upper triangular part of A.

If uplo = 'L', a stores the lower triangular part of A.

n (global) The number of rows and columns of the matrix A(n≥ 0).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A. If desca[ctxt_ - 1] is incorrect, p?heevd cannot
guarantee correct error reporting.

iz, jz (global) The row and column indices in the global matrix Z indicating the
first row and the first column of the submatrix Z, respectively.

descz (global and local) array of size dlen_. The array descriptor for the
distributed matrix Z. descz[ctxt_ - 1] must equal desca[ctxt_ - 1].

work (local).
Array of size lwork.

lwork (local) The size of the array work.

If eigenvalues are requested:
lwork = n + (nb0 + mq0 + nb)*nb
with np0 = numroc( max( n, nb, 2 ), nb, 0, 0, NPROW)

mq0 = numroc( max( n, nb, 2 ), nb, 0, 0, NPCOL)

If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the size required for optimal
performance for all work arrays. The required workspace is returned as the
first element of the corresponding work arrays, and no error message is
issued by pxerbla.

rwork (local).
Workspace array of size lrwork.

1561
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lrwork (local) The size of the array rwork.

lrwork≥ 1 + 9n + 3np*nq,

with np = numroc( n, nb, myrow, iarow, NPROW)

nq = numroc( n, nb, mycol, iacol, NPCOL)

iwork (local) Workspace array of size liwork.

liwork (local) , size of iwork.

liwork = 7n + 8npcol + 2.

Output Parameters

a On exit, the lower triangle (if uplo = 'L'), or the upper triangle (if uplo =
'U') of A, including the diagonal, is overwritten.

w (global).
Array of size n. If info = 0, w contains the eigenvalues in the ascending
order.

z (local).
Array, global size n*n, local size lld_z*LOCc(jz+n-1).

The z parameter contains the orthonormal eigenvectors of the matrix A.

work[0] On exit, returns adequate workspace to allow optimal performance.

rwork[0] (local)
On output, rwork[0] returns workspace required to guarantee completion.

iwork[0] (local).
On return, iwork[0] contains the amount of integer workspace required.

info (global)
If info = 0, the execution is successful.

If info < 0:

If the i-th argument is an array and the j-entry had an illegal value, then
info = -(i*100+j). If the i-th argument is a scalar and had an illegal
value, then info = -i.

If info> 0:

If info = 1 through n, the i-th eigenvalue did not converge.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?heevr
Computes selected eigenvalues and, optionally,
eigenvectors of a Hermitian matrix using Relatively
Robust Representation.

1562
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void pcheevr(char* jobz, char* range, char* uplo, MKL_INT* n, MKL_Complex8* a, MKL_INT*
ia, MKL_INT* ja, MKL_INT* desca, float* vl, float* vu, MKL_INT* il, MKL_INT* iu,
MKL_INT* m, MKL_INT* nz, float* w, MKL_Complex8* z, MKL_INT* iz, MKL_INT* jz, MKL_INT*
descz, MKL_Complex8* work, MKL_INT* lwork, float* rwork, MKL_INT* lrwork, MKL_INT*
iwork, MKL_INT* liwork, MKL_INT* info);
void pzheevr(char* jobz, char* range, char* uplo, MKL_INT* n, MKL_Complex16* a, MKL_INT*
ia, MKL_INT* ja, MKL_INT* desca, double* vl, double* vu, MKL_INT* il, MKL_INT* iu,
MKL_INT* m, MKL_INT* nz, double* w, MKL_Complex16* z, MKL_INT* iz, MKL_INT* jz, MKL_INT*
descz, MKL_Complex16* work, MKL_INT* lwork, double* rwork, MKL_INT* lrwork, MKL_INT*
iwork, MKL_INT* liwork, MKL_INT* info);

Include Files
• mkl_scalapack.h

Description
p?heevr computes selected eigenvalues and, optionally, eigenvectors of a complex Hermitian matrix A
distributed in 2D blockcyclic format by calling the recommended sequence of ScaLAPACK functions.
First, the matrix A is reduced to complex Hermitian tridiagonal form. Then, the eigenproblem is solved using
the parallel MRRR algorithm. Last, if eigenvectors have been computed, a backtransformation is done.
Upon successful completion, each processor stores a copy of all computed eigenvalues in w. The eigenvector
matrix Z is stored in 2D block-cyclic format distributed over all processors.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

jobz (global)
Specifies whether or not to compute the eigenvectors:
= 'N': Compute eigenvalues only.
= 'V': Compute eigenvalues and eigenvectors.

range (global)
= 'A': all eigenvalues will be found.
= 'V': all eigenvalues in the interval [vl,vu] will be found.

= 'I': the il-th through iu-th eigenvalues will be found.

uplo (global)
Specifies whether the upper or lower triangular part of the Hermitian matrix
A is stored:
= 'U': Upper triangular
= 'L': Lower triangular

n (global )

1563
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The number of rows and columns of the matrix A. n≥ 0

a Block-cyclic array, global size n * n), local size lld_a * LOCc(ja+n-1)

Contains the local pieces of the Hermitian distributed matrix A. If uplo =

'U', only the upper triangular part of a is used to define the elements of the
Hermitian matrix. If uplo = 'L', only the lower triangular part of a is used to
define the elements of the Hermitian matrix.

ia (global )
Global row index in the global matrix A that points to the beginning of the
submatrix which is to be operated on. It should be set to 1 when operating
on a full matrix.

ja (global )
Global column index in the global matrix A that points to the beginning of
the submatrix which is to be operated on. It should be set to 1 when
operating on a full matrix.

desca (global and local) array of size dlen_. (The ScaLAPACK descriptor length is
dlen_ = 9.)
The array descriptor for the distributed matrix a. The descriptor stores
details about the 2D block-cyclic storage, see the notes below. If desca is
incorrect, p?heevr cannot work correctly.

Also note the array alignment requirements specified below

vl (global)
If range='V', the lower bound of the interval to be searched for
eigenvalues. Not referenced if range = 'A' or 'I'.

vu (global)
If range='V', the upper bound of the interval to be searched for
eigenvalues. Not referenced if range = 'A' or 'I'.

il (global )
If range='I', the index (from smallest to largest) of the smallest eigenvalue
to be returned. il≥ 1.

Not referenced if range = 'A'.

iu (global )
If range='I', the index (from smallest to largest) of the largest eigenvalue
to be returned. min(il,n) ≤iu≤n.

Not referenced if range = 'A'.

iz (global )
Global row index in the global matrix Z that points to the beginning of the
submatrix which is to be operated on. It should be set to 1 when operating
on a full matrix.

jz (global )

1564
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Global column index in the global matrix Z that points to the beginning of
the submatrix which is to be operated on. It should be set to 1 when
operating on a full matrix.

descz (global and local) array of size dlen_.

The array descriptor for the distributed matrix z. descz[ctxt_ - 1] must
equal desca[ctxt_ - 1]

work (local workspace) array of size lwork

lwork (local )
Size of work array, must be at least 3.

If only eigenvalues are requested:

lwork≥n + max( nb * ( np00 + 1 ), nb * 3 )
If eigenvectors are requested:
lwork≥n + ( np00 + mq00 + nb ) * nb
For definitions of np00 and mq00, see lrwork.

For optimal performance, greater workspace is needed, i.e.

lwork≥ max( lwork, nhetrd_lwork )
Where lwork is as defined above, and

nhetrd_lwork = n + 2( anb+1 )( 4nps+2 ) + ( nps + 1 ) nps

ictxt = desca[ctxt_ - 1]

anb = pjlaenv( ictxt, 3, 'PCHETTRD', 'L', 0, 0, 0, 0 )

sqnpc = sqrt( real( nprow * npcol ) )

nps = max( numroc( n, 1, 0, 0, sqnpc ), 2*anb )

If lwork = -1, then lwork is global input and a workspace query is

assumed; the function only calculates the optimal size for all work arrays.
Each of these values is returned in the first entry of the corresponding work
array, and no error message is issued by pxerbla.

rwork (local workspace) array of size lrwork

lrwork (local )
Size of rwork, must be at least 3.

See below for definitions of variables used to define lrwork.

If no eigenvectors are requested (jobz = 'N') then

lrwork≥ 2 + 5 * n + max( 12 * n, nb * ( np00 + 1 ) )

If eigenvectors are requested (jobz = 'V' ) then the amount of workspace
required is:
lrwork≥ 2 + 5 * n + max( 18*n, np00 * mq00 + 2 * nb * nb ) +
(2 + iceil( neig, nprow*npcol))*n

1565
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

NOTE
iceil(x,y) is the ceiling of x/y.

Variable definitions:
neig = number of eigenvectors requested
nb = desca[ mb_ - 1] = desca[ nb_ - 1] = descz[ mb_ - 1] = descz[nb_
- 1]
nn = max( n, nb, 2 )

desca[ rsrc_ - 1] = desca[csrc_ - 1] = descz[ rsrc_ - 1] = descz[csrc_ -

1] = 0
np00 = numroc( nn, nb, 0, 0, nprow )

mq00 = numroc( max( neig, nb, 2 ), nb, 0, 0, npcol )

iceil( x, y ) is a ScaLAPACK function returning ceiling(x/y), and nprow and

npcol can be determined by calling the function blacs_gridinfo.

If lrwork = -1, then lrwork is global input and a workspace query is

iwork (local workspace) array of size liwork

liwork (local )
size of iwork

Let nnp = max( n, nprow*npcol + 1, 4 ). Then:

liwork≥ 12nnp + 2n when the eigenvectors are desired

liwork≥ 10*nnp + 2*n when only the eigenvalues have to be computed

If liwork = -1, then liwork is global input and a workspace query is

OUTPUT Parameters

a The lower triangle (if uplo='L') or the upper triangle (if uplo='U') of a,
including the diagonal, is destroyed.

m (global )
Total number of eigenvalues found. 0 ≤m≤n.

nz (global )
Total number of eigenvectors computed. 0 ≤nz≤m.

The number of columns of z that are filled.

If jobz≠ 'V', nz is not referenced.

1566
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If jobz = 'V', nz = m

w (global ) array of size n

On normal exit, the first m entries contain the selected eigenvalues in

ascending order.

z (local ) array, global size n * n), local size lld_z*LOCc(jz+n-1)

If jobz = 'V', then on normal exit the first m columns of z contain the
orthonormal eigenvectors of the matrix corresponding to the selected
eigenvalues.
If jobz = 'N', then z is not referenced.

work work[0] returns workspace adequate workspace to allow optimal

performance.

rwork On return, rwork[0] contains the optimal amount of workspace required

for efficient execution. if jobz='N' rwork[0] = optimal amount of
workspace required to compute the eigenvalues. if jobz='V' rwork[0] =
optimal amount of workspace required to compute eigenvalues and
eigenvectors.

iwork On return, iwork[0] contains the amount of integer workspace required.

info (global )
= 0: successful exit
< 0: If the i-th argument is an array and the j-th entry had an illegal value,
then info = -(i*100+j), if the i-th argument is a scalar and had an illegal
value, then info = -i.

Application Notes
The distributed submatrices a(ia:*, ja:*) and z(iz:iz+m-1,jz:jz+n-1) must satisfy the following
alignment properties:

1. Identical (quadratic) dimension: desca[m_ - 1] = descz[m_ - 1] = desca[n_ - 1] = descz[n_ - 1]

2. Quadratic conformal blocking: desca[mb_ - 1] = desca[nb_ - 1] = descz[mb_ - 1] = descz[nb_ - 1],
desca[rsrc_ - 1] = descz[rsrc_ - 1]
3. mod( ia-1, mb_a ) = mod( iz-1, mb_z ) = 0

NOTE
mod(x,y) is the integer remainder of x/y.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?heevx
Computes selected eigenvalues and, optionally,
eigenvectors of a Hermitian matrix.

1567
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
void pcheevx (char *jobz , char *range , char *uplo , MKL_INT *n , MKL_Complex8 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *vl , float *vu , MKL_INT *il ,
MKL_INT *iu , float *abstol , MKL_INT *m , MKL_INT *nz , float *w , float *orfac ,
MKL_Complex8 *z , MKL_INT *iz , MKL_INT *jz , MKL_INT *descz , MKL_Complex8 *work ,
MKL_INT *lwork , float *rwork , MKL_INT *lrwork , MKL_INT *iwork , MKL_INT *liwork ,
MKL_INT *ifail , MKL_INT *iclustr , float *gap , MKL_INT *info );
void pzheevx (char *jobz , char *range , char *uplo , MKL_INT *n , MKL_Complex16 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *vl , double *vu , MKL_INT *il ,
MKL_INT *iu , double *abstol , MKL_INT *m , MKL_INT *nz , double *w , double *orfac ,
MKL_Complex16 *z , MKL_INT *iz , MKL_INT *jz , MKL_INT *descz , MKL_Complex16 *work ,
MKL_INT *lwork , double *rwork , MKL_INT *lrwork , MKL_INT *iwork , MKL_INT *liwork ,
MKL_INT *ifail , MKL_INT *iclustr , double *gap , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?heevx function computes selected eigenvalues and, optionally, eigenvectors of a complex Hermitian
matrix A by calling the recommended sequence of ScaLAPACK functions. Eigenvalues and eigenvectors can
be selected by specifying either a range of values or a range of indices for the desired eigenvalues.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters
np = the number of rows local to a given process.
nq = the number of columns local to a given process.

jobz (global) Must be 'N' or 'V'.

Specifies if it is necessary to compute the eigenvectors:

If jobz = 'N', then only eigenvalues are computed.

If jobz = 'V', then eigenvalues and eigenvectors are computed.

range (global) Must be 'A', 'V', or 'I'.

If range = 'A', all eigenvalues will be found.

If range = 'V', all eigenvalues in the half-open interval [vl, vu] will be
found.
If range = 'I', the eigenvalues with indices il through iu will be found.

uplo (global) Must be 'U' or 'L'.

Specifies whether the upper or lower triangular part of the Hermitian matrix
A is stored:
If uplo = 'U', a stores the upper triangular part of A.

1568
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If uplo = 'L', a stores the lower triangular part of A.

n (global) The number of rows and columns of the matrix A(n≥ 0).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A. If desca[ctxt_ - 1] is incorrect, p?heevx cannot
guarantee correct error reporting.

vl, vu (global)
If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues; not referenced if range = 'A' or 'I'.

il, iu (global)
If range ='I', the indices of the smallest and largest eigenvalues to be
returned.
Constraints:
il ≥ 1; min(il,n) ≤ iu ≤ n.
Not referenced if range = 'A' or 'V'.

abstol (global).
If jobz='V', setting abstol to p?lamch(context, 'U') yields the most
orthogonal eigenvectors.
The absolute error tolerance for the eigenvalues. An approximate
eigenvalue is accepted as converged when it is determined to lie in an
interval [a, b] of width less than or equal to abstol+eps*max(|a|,|b|),
where eps is the machine precision. If abstol is less than or equal to zero,
then eps*norm(T) will be used in its place, where norm(T) is the 1-norm
of the tridiagonal matrix obtained by reducing A to tridiagonal form.
Eigenvalues are computed most accurately when abstol is set to twice the
underflow threshold 2*p?lamch('S'), not zero. If this function returns
with ((mod(info,2)≠0).or.(mod(info/8,2)≠0)), indicating that some
eigenvalues or eigenvectors did not converge, try setting abstol to
2*p?lamch('S').

NOTE
mod(x,y) is the integer remainder of x/y.

orfac (global).

1569
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Specifies which eigenvectors should be reorthogonalized. Eigenvectors that

correspond to eigenvalues which are within tol=orfac*norm(A) of each
other are to be reorthogonalized. However, if the workspace is insufficient
(see lwork), tol may be decreased until all eigenvectors to be
reorthogonalized can be stored in one process. No reorthogonalization will
be done if orfac equals zero. A default value of 1.0e-3 is used if orfac is
negative.
orfac should be identical on all processes.

iz, jz (global) The row and column indices in the global matrix Z indicating the
first row and the first column of the submatrix Z, respectively.

descz (global and local) array of size dlen_. The array descriptor for the
distributed matrix Z. descz[ctxt_ - 1] must equal desca[ctxt_ - 1].

work (local).
Array of size lwork.

lwork (local) The size of the array work.

If only eigenvalues are requested:
lwork≥n + max(nb*(np0 + 1), 3)
If eigenvectors are requested:
lwork≥n + (np0+mq0+nb)*nb
with nq0 = numroc(nn, nb, 0, 0, NPCOL).

lwork≥ 5n + max(5nn, np0mq0+2nb*nb) + iceil(neig,

NPROW*NPCOL)*nn
For optimal performance, greater workspace is needed, that is
lwork≥max(lwork, nhetrd_lwork)
where lwork is as defined above, and nhetrd_lwork = n + 2*(anb
+1)*(4*nps+2) + (nps+1)*nps
ictxt = desca[ctxt_ - 1]
anb = pjlaenv(ictxt, 3, 'pchettrd', 'L', 0, 0, 0, 0)
sqnpc = sqrt(dble(NPROW * NPCOL))
nps = max(numroc(n, 1, 0, 0, sqnpc), 2*anb)
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the size required for optimal
performance for all work arrays. Each of these values is returned in the first
entry of the corresponding work arrays, and no error message is issued by
pxerbla.

rwork (local)
Workspace array of size lrwork.

lrwork (local) The size of the array work.

See below for definitions of variables used to define lwork.
If no eigenvectors are requested (jobz = 'N'), then lrwork≥ 5*nn+4*n.

1570
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If eigenvectors are requested (jobz = 'V'), then the amount of workspace
required to guarantee that all eigenvectors are computed is:
lrwork≥ 4*n + max(5*nn, np0*mq0+2*nb*nb) + iceil(neig,
NPROW*NPCOL)*nn
The computed eigenvectors may not be orthogonal if the minimal
workspace is supplied and orfac is too small. If you want to guarantee
orthogonality (at the cost of potentially poor performance) you should add
the following values to lrwork:
(clustersize-1)*n,
where clustersize is the number of eigenvalues in the largest cluster, where
a cluster is defined as a set of close eigenvalues:
{w[k - 1],..., w[k+clustersize-2]|w[j] ≤
w[j-1]+orfac*2*norm(A)}.
Variable definitions:
neig = number of eigenvectors requested;
nb = desca[mb_ - 1] = desca[nb_ - 1] = descz[mb_ - 1] =
descz[nb_ - 1];
nn = max(n, NB, 2);
desca[rsrc_ - 1] = desca[nb_ - 1] = descz[rsrc_ - 1] =
descz[csrc_ - 1] = 0;
np0 = numroc(nn, nb, 0, 0, NPROW);
mq0 = numroc(max(neig, nb, 2), nb, 0, 0, NPCOL);
iceil(x, y) is a ScaLAPACK function returning ceiling(x/y)
When lrwork is too small:
If lwork is too small to guarantee orthogonality, p?heevx attempts to
maintain orthogonality in the clusters with the smallest spacing between the
eigenvalues. If lwork is too small to compute all the eigenvectors requested,
no computation is performed and info= -23 is returned. Note that when
range='V', p?heevx does not know how many eigenvectors are requested
until the eigenvalues are computed. Therefore, when range='V' and as
long as lwork is large enough to allow p?heevx to compute the eigenvalues,
p?heevx will compute the eigenvalues and as many eigenvectors as it can.
Relationship between workspace, orthogonality and performance:
If clustersize ≥ n/sqrt(NPROW*NPCOL), then providing enough space
to compute all the eigenvectors orthogonally will cause serious degradation
in performance. In the limit (that is, clustersize = n-1)p?stein will
perform no better than ?stein on 1 processor.

For clustersize = n/sqrt(NPROW*NPCOL) reorthogonalizing all

1571
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If lwork = -1, then lwork is global input and a workspace query is

iwork (local) Workspace array.

liwork (local), size of iwork.

liwork ≥ 6*nnp
Where: nnp = max(n, NPROW*NPCOL+1, 4)

If liwork = -1, then liwork is global input and a workspace query is

Output Parameters

a On exit, the lower triangle (if uplo = 'L'), or the upper triangle (if uplo =
'U') of A, including the diagonal, is overwritten.

m (global) The total number of eigenvalues found; 0 ≤ m ≤ n.

nz (global) Total number of eigenvectors computed. 0 ≤ nz ≤ m.

The number of columns of z that are filled.

If jobz ≠ 'V', nz is not referenced.

If jobz = 'V', nz = m unless the user supplies insufficient space and

p?heevx is not able to detect this before beginning computation. To get all
the eigenvectors requested, the user must supply both sufficient space to
hold the eigenvectors in z (m≤descz[n_ - 1]) and sufficient workspace to
compute them. (See lwork). p?heevx is always able to detect insufficient
space without computation unless range='V'.

w (global).
Array of size n. The first m elements contain the selected eigenvalues in
ascending order.

z (local).
Array, global size n*n, local size lld_z*LOCc(jz+n-1).

If jobz ='V', then on normal exit the first m columns of z contain the
orthonormal eigenvectors of the matrix corresponding to the selected
eigenvalues. If an eigenvector fails to converge, then that column of z
contains the latest approximation to the eigenvector, and the index of the
eigenvector is returned in ifail.
If jobz = 'N', then z is not referenced.

work[0] On exit, returns adequate workspace to allow optimal performance.

rwork (local).

1572
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Array of size lrwork. On return, rwork[0] contains the optimal amount of
workspace required for efficient execution.
If jobz='N'rwork[0] = optimal amount of workspace required to compute
eigenvalues efficiently.
If jobz='V'rwork[0] = optimal amount of workspace required to compute
eigenvalues and eigenvectors efficiently with no guarantee on orthogonality.
If range='V', it is assumed that all eigenvectors may be required.

iwork[0] (local)
On return, iwork[0] contains the amount of integer workspace required.

ifail (global)
Array of size n.
If jobz ='V', then on normal exit, the first m elements of ifail are zero. If
(mod(info,2)≠0) on exit, then ifail contains the indices of the eigenvectors
that failed to converge.
If jobz = 'N', then ifail is not referenced.

iclustr (global)
Array of size 2*NPROW*NPCOL.

This array contains indices of eigenvectors corresponding to a cluster of

eigenvalues that could not be reorthogonalized due to insufficient
workspace (see lwork, orfac and info). Eigenvectors corresponding to
clusters of eigenvalues indexed iclustr[2*i - 2]) to iclustr[2*i -
1], could not be reorthogonalized due to lack of workspace. Hence the
eigenvectors corresponding to these clusters may not be orthogonal.
iclustr is a zero terminated array. (iclustr[2*k - 1]≠0 and
iclustr[2*k]=0) if and only if k is the number of clusters. iclustr is not
referenced if jobz = 'N'.

gap (global)
Array of size (NPROW*NPCOL)

This array contains the gap between eigenvalues whose eigenvectors could
not be reorthogonalized. The output values in this array correspond to the
clusters indicated by the array iclustr. As a result, the dot product between
eigenvectors corresponding to the i-th cluster may be as high as (C*n)/
gap(i) where C is a small constant.

info (global)
If info = 0, the execution is successful.

If info < 0:

If the i-th argument is an array and the j-entry had an illegal value, then
info = -(i*100+j). If the i-th argument is a scalar and had an illegal
value, then info = -i.

If info> 0:

If (mod(info,2)≠0), then one or more eigenvectors failed to converge.

Their indices are stored in ifail. Ensure abstol=2.0*p?lamch('U')

1573
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If (mod(info/2,2)≠0), then eigenvectors corresponding to one or more

clusters of eigenvalues could not be reorthogonalized because of insufficient
workspace.The indices of the clusters are stored in the array iclustr.
If (mod(info/4,2)≠0), then space limit prevented p?syevx from
computing all of the eigenvectors between vl and vu. The number of
eigenvectors computed is returned in nz.
If (mod(info/8,2)≠0), then p?stebz failed to compute eigenvalues.
Ensure abstol=2.0*p?lamch('U').

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?gesvd
Computes the singular value decomposition of a
general matrix, optionally computing the left and/or
right singular vectors.

Syntax
void psgesvd (char *jobu , char *jobvt , MKL_INT *m , MKL_INT *n , float *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , float *s , float *u , MKL_INT *iu , MKL_INT *ju ,
MKL_INT *descu , float *vt , MKL_INT *ivt , MKL_INT *jvt , MKL_INT *descvt , float
*work , MKL_INT *lwork , float *rwork , MKL_INT *info );
void pdgesvd (char *jobu , char *jobvt , MKL_INT *m , MKL_INT *n , double *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , double *s , double *u , MKL_INT *iu , MKL_INT *ju ,
MKL_INT *descu , double *vt , MKL_INT *ivt , MKL_INT *jvt , MKL_INT *descvt , double
*work , MKL_INT *lwork , double *rwork , MKL_INT *info );
void pcgesvd (char *jobu , char *jobvt , MKL_INT *m , MKL_INT *n , MKL_Complex8 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *s , MKL_Complex8 *u , MKL_INT *iu ,
MKL_INT *ju , MKL_INT *descu , MKL_Complex8 *vt , MKL_INT *ivt , MKL_INT *jvt , MKL_INT
*descvt , MKL_Complex8 *work , MKL_INT *lwork , float *rwork , MKL_INT *info );
void pzgesvd (char *jobu , char *jobvt , MKL_INT *m , MKL_INT *n , MKL_Complex16 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *s , MKL_Complex16 *u , MKL_INT
*iu , MKL_INT *ju , MKL_INT *descu , MKL_Complex16 *vt , MKL_INT *ivt , MKL_INT *jvt ,
MKL_INT *descvt , MKL_Complex16 *work , MKL_INT *lwork , double *rwork , MKL_INT
*info );

Include Files
• mkl_scalapack.h

Description
The p?gesvd function computes the singular value decomposition (SVD) of an m-by-n matrix A, optionally
computing the left and/or right singular vectors. The SVD is written
A = U*Σ*VT,
where Σ is an m-by-n matrix that is zero except for its min(m, n) diagonal elements, U is an m-by-m
orthogonal matrix, and V is an n-by-n orthogonal matrix. The diagonal elements of Σ are the singular values
of A and the columns of U and V are the corresponding right and left singular vectors, respectively. The
singular values are returned in array s in decreasing order and only the first min(m,n) columns of U and rows
of vt = VT are computed.

1574
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

NOTE
The distributed submatrix sub(A) must verify certain alignment properties. These
expressions must be true:
• mb_a = nb_a = nb
• iroffa = icoffa
where:
• iroffa = mod(ia-1, nb )
• icoffa = mod(ja-1, nb )

Input Parameters
mp = number of local rows in A and U
nq = number of local columns in A and VT
size = min(m, n)
sizeq = number of local columns in U
sizep = number of local rows in VT

jobu (global) Specifies options for computing all or part of the matrix U.
If jobu = 'V', the first size columns of U (the left singular vectors) are
returned in the array u;
If jobu ='N', no columns of U (no left singular vectors)are computed.

jobvt (global)
Specifies options for computing all or part of the matrix VT.
If jobvt = 'V', the first size rows of VT (the right singular vectors) are
returned in the array vt;
If jobvt = 'N', no rows of VT(no right singular vectors) are computed.

m (global) The number of rows of the matrix A(m≥ 0).

n (global) The number of columns in A(n≥ 0).

a (local).
Block cyclic array, global size (m, n), local size (mp, nq).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

1575
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

iu, ju (global) The row and column indices in the global matrix U indicating the
first row and the first column of the submatrix U, respectively.

descu (global and local) array of size dlen_. The array descriptor for the
distributed matrix U.

ivt, jvt (global) The row and column indices in the global matrix VT indicating the
first row and the first column of the submatrix VT, respectively.

descvt (global and local) array of size dlen_. The array descriptor for the
distributed matrix VT.

work (local).
Workspace array of size lwork

lwork (local) The size of the array work;

lwork > 2 + 6*sizeb + max(watobd, wbdtosvd),
where sizeb = max(m, n), and watobd and wbdtosvd refer, respectively,
to the workspace required to bidiagonalize the matrix A and to go from the
bidiagonal matrix to the singular value decomposition USVT.
For watobd, the following holds:
watobd = max(max(wp?lange,wp?gebrd), max(wp?lared2d, wp?
lared1d)),
where wp?lange, wp?lared1d, wp?lared2d, wp?gebrd are the workspaces
required respectively for the subprograms p?lange, p?lared1d,
p?lared2d, p?gebrd. Using the standard notation
mp = numroc(m, mb, MYROW, desca[ctxt_ - 1], NPROW),
nq = numroc(n, nb, MYCOL, desca[lld_ - 1], NPCOL),
the workspaces required for the above subprograms are
wp?lange = mp,
wp?lared1d = nq0,
wp?lared2d = mp0,
wp?gebrd = nb*(mp + nq + 1) + nq,
where nq0 and mp0 refer, respectively, to the values obtained at MYCOL =
0 and MYROW = 0. In general, the upper limit for the workspace is given by
a workspace required on processor (0,0):
watobd ≤ nb*(mp0 + nq0 + 1) + nq0.
In case of a homogeneous process grid this upper limit can be used as an
estimate of the minimum workspace for every processor.
For wbdtosvd, the following holds:
wbdtosvd = size*(wantu*nru + wantvt*ncvt) + max(w?bdsqr,
max(wantu*wp?ormbrqln, wantvt*wp?ormbrprt)),
where

1576
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
wantu(wantvt) = 1, if left/right singular vectors are wanted, and
wantu(wantvt) = 0, otherwise. w?bdsqr, wp?ormbrqln, and wp?ormbrprt
refer respectively to the workspace required for the subprograms ?bdsqr,
p?ormbr(qln), and p?ormbr(prt), where qln and prt are the values of the
arguments vect, side, and trans in the call to p?ormbr. nru is equal to the
local number of rows of the matrix U when distributed 1-dimensional
"column" of processes. Analogously, ncvt is equal to the local number of
columns of the matrix VT when distributed across 1-dimensional "row" of
processes. Calling the LAPACK procedure ?bdsqr requires

w?bdsqr = max(1, 2size + (2size - 4)* max(wantu, wantvt))

on every processor. Finally,
wp?ormbrqln = max((nb*(nb-1))/2, (sizeq+mp)*nb)+nb*nb,
wp?ormbrprt = max((mb*(mb-1))/2, (sizep+nq)*mb)+mb*mb,
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum size for the work array.
The required workspace is returned as the first element of work and no
error message is issued by pxerbla.

rwork Workspace array of size 1 + 4*sizeb. Not used for psgesvd and pdgesvd.

Output Parameters

a On exit, the contents of a are destroyed.

s (global).
Array of size size.
Contains the singular values of A sorted so that s(i) ≥s(i+1).

u (local).
local size mp*sizeq, global size m*size)
If jobu = 'V', u contains the first min(m, n) columns of U.

If jobu = 'N' or 'O', u is not referenced.

vt (local).
local size (sizep, nq), global size (size, n)
If jobvt = 'V', vt contains the first size rows of VTif jobu = 'N', vt is
not referenced.

work On exit, if info = 0, then work[0] returns the required minimal size of
lwork.

rwork On exit, if info = 0, then rwork[0] returns the required size of rwork.

info (global)
If info = 0, the execution is successful.

If info < 0, If info = -i, the ith parameter had an illegal value.

If info > 0 i, then if ?bdsqr did not converge,

1577
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If info = min(m,n) + 1, then p?gesvd has detected heterogeneity by

finding that eigenvalues were not identical across the process grid. In this
case, the accuracy of the results from p?gesvd cannot be guaranteed.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?sygvx
Computes selected eigenvalues and, optionally,
eigenvectors of a real generalized symmetric definite
eigenproblem.

Syntax
void pssygvx (MKL_INT *ibtype , char *jobz , char *range , char *uplo , MKL_INT *n ,
float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT
*jb , MKL_INT *descb , float *vl , float *vu , MKL_INT *il , MKL_INT *iu , float
*abstol , MKL_INT *m , MKL_INT *nz , float *w , float *orfac , float *z , MKL_INT *iz ,
MKL_INT *jz , MKL_INT *descz , float *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT
*liwork , MKL_INT *ifail , MKL_INT *iclustr , float *gap , MKL_INT *info );
void pdsygvx (MKL_INT *ibtype , char *jobz , char *range , char *uplo , MKL_INT *n ,
double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib ,
MKL_INT *jb , MKL_INT *descb , double *vl , double *vu , MKL_INT *il , MKL_INT *iu ,
double *abstol , MKL_INT *m , MKL_INT *nz , double *w , double *orfac , double *z ,
MKL_INT *iz , MKL_INT *jz , MKL_INT *descz , double *work , MKL_INT *lwork , MKL_INT
*iwork , MKL_INT *liwork , MKL_INT *ifail , MKL_INT *iclustr , double *gap , MKL_INT
*info );

Include Files
• mkl_scalapack.h

Description
The p?sygvxfunction computes all the eigenvalues, and optionally, the eigenvectors of a real generalized
symmetric-definite eigenproblem, of the form
sub(A)*x = λ*sub(B)*x, sub(A) sub(B)*x = λ*x, or sub(B)*sub(A)*x = λ*x.
Here x denotes eigen vectors, λ (lambda) denotes eigenvalues, sub(A) denoting A(ia:ia+n-1, ja:ja
+n-1) is assumed to symmetric, and sub(B) denoting B(ib:ib+n-1, jb:jb+n-1) is also positive definite.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

ibtype (global) Must be 1 or 2 or 3.

Specifies the problem type to be solved:
If ibtype = 1, the problem type is sub(A)*x = lambda*sub(B)*x;

If ibtype = 2, the problem type is sub(A)sub(B)x = lambda*x;

1578
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If ibtype = 3, the problem type is sub(B)*sub(A)*x = lambda*x.

jobz (global) Must be 'N' or 'V'.

If jobz ='N', then compute eigenvalues only.

If jobz ='V', then compute eigenvalues and eigenvectors.

range (global) Must be 'A' or 'V' or 'I'.

If range = 'A', the function computes all eigenvalues.

If range = 'V', the function computes eigenvalues in the interval: [vl,

vu]
If range = 'I', the function computes eigenvalues with indices il through
iu.

uplo (global) Must be 'U' or 'L'.

If uplo = 'U', arrays a and b store the upper triangles of sub(A) and sub
(B);
If uplo = 'L', arrays a and b store the lower triangles of sub(A) and sub
(B).

n (global) The order of the matrices sub(A) and sub (B), n≥ 0.

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1). On
entry, this array contains the local pieces of the n-by-n symmetric
distributed matrix sub(A).
If uplo = 'U', the leading n-by-n upper triangular part of sub(A) contains
the upper triangular part of the matrix.
If uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
the lower triangular part of the matrix.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A. If desca[ctxt_ - 1] is incorrect, p?sygvx cannot
guarantee correct error reporting.

b (local).
Pointer into the local memory to an array of size lld_b*LOCc(jb+n-1). On
entry, this array contains the local pieces of the n-by-n symmetric
distributed matrix sub(B).
If uplo = 'U', the leading n-by-n upper triangular part of sub(B) contains
the upper triangular part of the matrix.
If uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
the lower triangular part of the matrix.

ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of the submatrix B, respectively.

1579
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B. descb[ctxt_ - 1] must be equal to desca[ctxt_ -
1].

vl, vu (global)
If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
If range = 'A' or 'I', vl and vu are not referenced.

il, iu (global)

If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned. Constraint: il ≥ 1, min(il, n)≤ iu ≤ n

If range = 'A' or 'V', il and iu are not referenced.

abstol (global)
If jobz='V', setting abstol to p?lamch(context, 'U') yields the most
orthogonal eigenvectors.
The absolute error tolerance for the eigenvalues. An approximate
eigenvalue is accepted as converged when it is determined to lie in an
interval [a,b] of width less than or equal to

abstol + eps*max(|a|,|b|),
where eps is the machine precision. If abstol is less than or equal to zero,
then eps*norm(T) will be used in its place, where norm(T) is the 1-norm
of the tridiagonal matrix obtained by reducing A to tridiagonal form.
Eigenvalues will be computed most accurately when abstol is set to twice
the underflow threshold 2*p?lamch('S') not zero. If this function returns
with ((mod(info,2)≠0) or (mod(info/8,2)≠0)), indicating that some
eigenvalues or eigenvectors did not converge, try setting abstol to
2*p?lamch('S').

NOTE
mod(x,y) is the integer remainder of x/y.

orfac (global).
Specifies which eigenvectors should be reorthogonalized. Eigenvectors that
correspond to eigenvalues which are within tol=orfac*norm(A) of each
other are to be reorthogonalized. However, if the workspace is insufficient
(see lwork), tol may be decreased until all eigenvectors to be
reorthogonalized can be stored in one process. No reorthogonalization will
be done if orfac equals zero. A default value of 1.0e-3 is used if orfac is
negative. orfac should be identical on all processes.

iz, jz (global) The row and column indices in the global matrix Z indicating the
first row and the first column of the submatrix Z, respectively.

descz (global and local) array of size dlen_. The array descriptor for the
distributed matrix Z.descz[ctxt_ - 1] must equal desca[ctxt_ - 1].

work (local)

1580
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Workspace array of size lwork

lwork (local)
Size of the array work. See below for definitions of variables used to define
lwork.
If no eigenvectors are requested (jobz = 'N'), then lwork ≥ 5*n +
max(5*nn, NB*(np0 + 1)).
If eigenvectors are requested (jobz = 'V'), then the amount of workspace
required to guarantee that all eigenvectors are computed is:
lwork ≥ 5*n + max(5*nn, np0*mq0 + 2*nb*nb) + iceil(neig,
NPROW*NPCOL)*nn.
The computed eigenvectors may not be orthogonal if the minimal
workspace is supplied and orfac is too small. If you want to guarantee
orthogonality at the cost of potentially poor performance you should add
the following to lwork:
(clustersize-1)*n,
where clustersize is the number of eigenvalues in the largest cluster, where
a cluster is defined as a set of close eigenvalues:
{w[k - 1],..., w[k+clustersize - 2]|w[j] ≤ w[j - 1] +
orfac*2*norm(A)}
Variable definitions:
neig = number of eigenvectors requested,
nb = desca[mb_ - 1] = desca[nb_ - 1] = descz[mb_ - 1] =
descz[nb_ - 1],
nn = max(n, nb, 2),
desca[rsrc_ - 1] = desca[nb_ - 1] = descz[rsrc_ - 1] =
descz[csrc_ - 1] = 0,
np0 = numroc(nn, nb, 0, 0, NPROW),

mq0 = numroc(max(neig, nb, 2), nb, 0, 0, NPCOL)

iceil(x, y) is a ScaLAPACK function returning ceiling(x/y)

If lwork is too small to guarantee orthogonality, p?syevx attempts to
maintain orthogonality in the clusters with the smallest spacing between the
eigenvalues.
If lwork is too small to compute all the eigenvectors requested, no
computation is performed and info= -23 is returned.
Note that when range='V', number of requested eigenvectors are not
known until the eigenvalues are computed. In this case and if lwork is large
enough to compute the eigenvalues, p?sygvx computes the eigenvalues
and as many eigenvectors as possible.
Greater performance can be achieved if adequate workspace is provided. In
some situations, performance can decrease as the provided workspace
increases above the workspace amount shown below:
lwork ≥ max(lwork, 5*n + nsytrd_lwopt, nsygst_lwopt), where

1581
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lwork, as defined previously, depends upon the number of eigenvectors

requested, and
nsytrd_lwopt = n + 2*(anb+1)*(4*nps+2) + (nps+3)*nps
nsygst_lwopt = 2*np0*nb + nq0*nb + nb*nb
anb = pjlaenv(desca[ctxt_ - 1], 3, p?syttrd ', 'L', 0, 0, 0,
0)
sqnpc = int(sqrt(dble(NPROW * NPCOL)))
nps = max(numroc(n, 1, 0, 0, sqnpc), 2*anb)
NB = desca[mb_ - 1]
np0 = numroc(n, nb, 0, 0, NPROW)
nq0 = numroc(n, nb, 0, 0, NPCOL)
numroc is a ScaLAPACK tool functions;
pjlaenv is a ScaLAPACK environmental inquiry function
MYROW, MYCOL, NPROW and NPCOL can be determined by calling the function
blacs_gridinfo.
For large n, no extra workspace is needed, however the biggest boost in
performance comes for small n, so it is wise to provide the extra workspace
(typically less than a Megabyte per process).
If clustersize ≥ n/sqrt(NPROW*NPCOL), then providing enough space
to compute all the eigenvectors orthogonally will cause serious degradation
in performance. At the limit (that is, clustersize = n-1) p?stein will
perform no better than ?stein on a single processor.

For clustersize = n/sqrt(NPROW*NPCOL) reorthogonalizing all

iwork (local) Workspace array.

liwork (local) , size of iwork.

liwork ≥ 6*nnp
Where:
nnp = max(n, NPROW*NPCOL + 1, 4)
If liwork = -1, then liwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.

1582
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

a On exit,
If jobz = 'V', and if info = 0, sub(A) contains the distributed matrix Z
of eigenvectors. The eigenvectors are normalized as follows:
for ibtype = 1 or 2, ZT*sub(B)*Z = i;

for ibtype = 3, ZTinv(sub(B))Z = i.

If jobz = 'N', then on exit the upper triangle (if uplo='U') or the lower
triangle (if uplo='L') of sub(A), including the diagonal, is destroyed.

b On exit, if info ≤ n, the part of sub(B) containing the matrix is overwritten

by the triangular factor U or L from the Cholesky factorization sub(B) =
UT*U or sub(B) = L*LT.

m (global) The total number of eigenvalues found, 0 ≤ m ≤ n.

nz (global)
Total number of eigenvectors computed. 0 ≤ nz ≤ m. The number of
columns of z that are filled.
If jobz ≠ 'V', nz is not referenced.

If jobz = 'V', nz = m unless the user supplies insufficient space and

p?sygvx is not able to detect this before beginning computation. To get all
the eigenvectors requested, the user must supply both sufficient space to
hold the eigenvectors in z (m≤descz(n_)) and sufficient workspace to
compute them. (See lwork below.) p?sygvx is always able to detect
insufficient space without computation unless range='V'.

w (global)
Array of size n. On normal exit, the first m entries contain the selected
eigenvalues in ascending order.

z (local).

global size nn, local size lld_zLOCc(jz+n-1).

work If jobz='N'work[0] = optimal amount of workspace required to compute

eigenvalues efficiently
If jobz = 'V'work[0] = optimal amount of workspace required to
compute eigenvalues and eigenvectors efficiently with no guarantee on
orthogonality.
If range='V', it is assumed that all eigenvectors may be required.

ifail (global)
Array of size n.

1583
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ifail provides additional information when info≠0

If (mod(info/16,2)≠0) then ifail[0] indicates the order of the smallest

minor which is not positive definite. If (mod(info,2)≠0) on exit, then ifail
contains the indices of the eigenvectors that failed to converge.
If neither of the above error conditions hold and jobz = 'V', then the first
m elements of ifail are set to zero.

iclustr (global)
Array of size (2*NPROW*NPCOL). This array contains indices of eigenvectors
corresponding to a cluster of eigenvalues that could not be reorthogonalized
due to insufficient workspace (see lwork, orfac and info). Eigenvectors
corresponding to clusters of eigenvalues indexed iclustr[2*i - 2] to
iclustr[2*i - 1], could not be reorthogonalized due to lack of
workspace. Hence the eigenvectors corresponding to these clusters may not
be orthogonal. iclustr is a zero terminated array.

(iclustr[2k - 1]≠0.and. iclustr[2k]=0) if and only if k is the

number of clusters iclustr is not referenced if jobz = 'N'.

gap (global)
Array of size NPROW*NPCOL. This array contains the gap between
eigenvalues whose eigenvectors could not be reorthogonalized. The output
values in this array correspond to the clusters indicated by the array iclustr.
As a result, the dot product between eigenvectors corresponding to the i-th
cluster may be as high as (C*n)/gap[i - 1], where C is a small constant.

info (global)
If info = 0, the execution is successful.

If info <0: the i-th argument is an array and the j-entry had an illegal
value, then info = -(i*100+j), if the i-th argument is a scalar and had
an illegal value, then info = -i.

If info> 0:

If (mod(info,2)≠0), then one or more eigenvectors failed to converge.

Their indices are stored in ifail.
If (mod(info,2,2)≠0), then eigenvectors corresponding to one or more
clusters of eigenvalues could not be reorthogonalized because of insufficient
workspace. The indices of the clusters are stored in the array iclustr.
If (mod(info/4,2)≠0), then space limit prevented p?sygvx from
computing all of the eigenvectors between vl and vu. The number of
eigenvectors computed is returned in nz.
If (mod(info/8,2)≠0), then p?stebz failed to compute eigenvalues.

If (mod(info/16,2)≠0), then B was not positive definite. ifail(1) indicates

the order of the smallest minor which is not positive definite.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

1584
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

p?hegvx
Computes selected eigenvalues and, optionally,
eigenvectors of a complex generalized Hermitian
positive-definite eigenproblem.

Syntax
void pchegvx (MKL_INT *ibtype , char *jobz , char *range , char *uplo , MKL_INT *n ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b ,
MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , float *vl , float *vu , MKL_INT *il ,
MKL_INT *iu , float *abstol , MKL_INT *m , MKL_INT *nz , float *w , float *orfac ,
MKL_Complex8 *z , MKL_INT *iz , MKL_INT *jz , MKL_INT *descz , MKL_Complex8 *work ,
MKL_INT *lwork , float *rwork , MKL_INT *lrwork , MKL_INT *iwork , MKL_INT *liwork ,
MKL_INT *ifail , MKL_INT *iclustr , float *gap , MKL_INT *info );
void pzhegvx (MKL_INT *ibtype , char *jobz , char *range , char *uplo , MKL_INT *n ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b ,
MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , double *vl , double *vu , MKL_INT *il ,
MKL_INT *iu , double *abstol , MKL_INT *m , MKL_INT *nz , double *w , double *orfac ,
MKL_Complex16 *z , MKL_INT *iz , MKL_INT *jz , MKL_INT *descz , MKL_Complex16 *work ,
MKL_INT *lwork , double *rwork , MKL_INT *lrwork , MKL_INT *iwork , MKL_INT *liwork ,
MKL_INT *ifail , MKL_INT *iclustr , double *gap , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?hegvx function computes all the eigenvalues, and optionally, the eigenvectors of a complex
generalized Hermitian positive-definite eigenproblem, of the form
sub(A)*x = λ*sub(B)*x, sub(A)*sub(B)*x = λ*x, or sub(B)*sub(A)*x = λ*x.
Here sub (A) denoting A(ia:ia+n-1, ja:ja+n-1) and sub(B) are assumed to be Hermitian and sub(B)
denoting B(ib:ib+n-1, jb:jb+n-1) is also positive definite.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

ibtype (global) Must be 1 or 2 or 3.

Specifies the problem type to be solved:
If ibtype = 1, the problem type is

sub(A)*x = lambda*sub(B)*x;
If ibtype = 2, the problem type is

sub(A)*sub(B)*x = lambda*x;
If ibtype = 3, the problem type is

sub(B)*sub(A)*x = lambda*x.

1585
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

jobz (global) Must be 'N' or 'V'.

If jobz ='N', then compute eigenvalues only.

If jobz ='V', then compute eigenvalues and eigenvectors.

range (global) Must be 'A' or 'V' or 'I'.

If range = 'A', the function computes all eigenvalues.

If range = 'V', the function computes eigenvalues in the interval: [vl,

vu]
If range = 'I', the function computes eigenvalues with indices il through
iu.

uplo (global) Must be 'U' or 'L'.

If uplo = 'U', arrays a and b store the upper triangles of sub(A) and sub
(B);
If uplo = 'L', arrays a and b store the lower triangles of sub(A) and sub
(B).

n (global)
The order of the matrices sub(A) and sub (B) (n≥ 0).

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1). On
entry, this array contains the local pieces of the n-by-n Hermitian
distributed matrix sub(A). If uplo = 'U', the leading n-by-n upper
triangular part of sub(A) contains the upper triangular part of the matrix. If
uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
the lower triangular part of the matrix.

ia, ja (global)
The row and column indices in the global matrix A indicating the first row
and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_.

The array descriptor for the distributed matrix A. If desca[ctxt_ - 1] is
incorrect, p?hegvx cannot guarantee correct error reporting.

b (local).
Pointer into the local memory to an array of size lld_b*LOCc(jb+n-1). On
entry, this array contains the local pieces of the n-by-n Hermitian
distributed matrix sub(B).
If uplo = 'U', the leading n-by-n upper triangular part of sub(B) contains
the upper triangular part of the matrix.
If uplo = 'L', the leading n-by-n lower triangular part of sub(B) contains
the lower triangular part of the matrix.

ib, jb (global)
The row and column indices in the global matrix B indicating the first row
and the first column of the submatrix B, respectively.

1586
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
descb (global and local) array of size dlen_.
The array descriptor for the distributed matrix B.descb[ctxt_ - 1] must
be equal to desca[ctxt_ - 1].

vl, vu (global)
If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
If range = 'A' or 'I', vl and vu are not referenced.

il, iu (global)

If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned. Constraint: il≥ 1, min(il, n) ≤ iu ≤ n

If range = 'A' or 'V', il and iu are not referenced.

abstol + eps*max(|a|,|b|),
where eps is the machine precision. If abstol is less than or equal to zero,
then eps*norm(T) will be used in its place, where norm(T) is the 1-norm of
the tridiagonal matrix obtained by reducing A to tridiagonal form.
Eigenvalues will be computed most accurately when abstol is set to twice
the underflow threshold 2*p?lamch('S') not zero. If this function returns
with ((mod(info,2)≠0).or. * (mod(info/8,2)≠0)), indicating that
some eigenvalues or eigenvectors did not converge, try setting abstol to
2*p?lamch('S').

NOTE
mod(x,y) is the integer remainder of x/y.

orfac (global).
Specifies which eigenvectors should be reorthogonalized. Eigenvectors that
correspond to eigenvalues which are within tol=orfac*norm(A) of each
other are to be reorthogonalized. However, if the workspace is insufficient
(see lwork), tol may be decreased until all eigenvectors to be
reorthogonalized can be stored in one process. No reorthogonalization will
be done if orfac equals zero. A default value of 1.0E-3 is used if orfac is
negative. orfac should be identical on all processes.

iz, jz (global) The row and column indices in the global matrix Z indicating the
first row and the first column of the submatrix Z, respectively.

descz (global and local) array of size dlen_. The array descriptor for the
distributed matrix Z.descz[ctxt_ - 1] must equal desca[ctxt_ - 1].

1587
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

work (local)
Workspace array of size lwork

lwork (local).
The size of the array work.
If only eigenvalues are requested:
lwork ≥ n+ max(NB*(np0 + 1), 3)
If eigenvectors are requested:
lwork ≥ n + (np0+ mq0 + NB)*NB
with nq0 = numroc(nn, NB, 0, 0, NPCOL).

For optimal performance, greater workspace is needed, that is

lwork ≥ max(lwork, n, nhetrd_lwopt, nhegst_lwopt)
where lwork is as defined above, and
nhetrd_lwork = 2*(anb+1)*(4*nps+2) + (nps + 1)*nps;
nhegst_lwopt = 2*np0*nb + nq0*nb + nb*nb
nb = desca[mb_ - 1]
np0 = numroc(n, nb, 0, 0, NPROW)
nq0 = numroc(n, nb, 0, 0, NPCOL)
ictxt = desca[ctxt_ - 1]
anb = pjlaenv(ictxt, 3, 'p?hettrd', 'L', 0, 0, 0, 0)
sqnpc = sqrt(dble(NPROW * NPCOL))
nps = max(numroc(n, 1, 0, 0, sqnpc), 2*anb)
numroc is a ScaLAPACK tool functions;
pjlaenv is a ScaLAPACK environmental inquiry function MYROW, MYCOL,
NPROW and NPCOL can be determined by calling the function
blacs_gridinfo.
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the size required for optimal
performance for all work arrays. Each of these values is returned in the first
entry of the corresponding work arrays, and no error message is issued by
pxerbla.

rwork (local)
Workspace array of size lrwork.

lrwork (local) The size of the array rwork.

See below for definitions of variables used to define lrwork.
If no eigenvectors are requested (jobz = 'N'), then lrwork ≥ 5*nn+4*n

If eigenvectors are requested (jobz = 'V'), then the amount of workspace

required to guarantee that all eigenvectors are computed is:
lrwork ≥ 4*n + max(5*nn, np0*mq0)+iceil(neig,
NPROW*NPCOL)*nn

1588
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The computed eigenvectors may not be orthogonal if the minimal
workspace is supplied and orfac is too small. If you want to guarantee
orthogonality (at the cost of potentially poor performance) you should add
the following value to lrwork:
(clustersize-1)*n,
where clustersize is the number of eigenvalues in the largest cluster, where
a cluster is defined as a set of close eigenvalues:
{w]k - 1],..., w[k+clustersize - 2]|w[j] ≤ w[j -
1]+orfac*2*norm(A)}
Variable definitions:
neig = number of eigenvectors requested;
nb = desca[mb_ - 1] = desca[nb_ - 1] = descz[mb_ - 1] =
descz[nb_ - 1];
nn = max(n, nb, 2);
desca[rsrc_ - 1] = desca[nb_ - 1] = descz[rsrc_ - 1] =
descz[csrc_ - 1] = 0 ;
np0 = numroc(nn, nb, 0, 0, NPROW);
mq0 = numroc(max(neig, nb, 2), nb, 0, 0, NPCOL);
iceil(x, y) is a ScaLAPACK function returning ceiling(x/y).
When lrwork is too small:
If lwork is too small to guarantee orthogonality, p?hegvx attempts to
maintain orthogonality in the clusters with the smallest spacing between the
eigenvalues.
If lwork is too small to compute all the eigenvectors requested, no
computation is performed and info= -25 is returned. Note that when
range='V', p?hegvx does not know how many eigenvectors are requested
until the eigenvalues are computed. Therefore, when range='V' and as
long as lwork is large enough to allow p?hegvx to compute the eigenvalues,
p?hegvx will compute the eigenvalues and as many eigenvectors as it can.
Relationship between workspace, orthogonality & performance:
If clustersize > n/sqrt(NPROW*NPCOL), then providing enough space
to compute all the eigenvectors orthogonally will cause serious degradation
in performance. In the limit (that is, clustersize = n-1) p?stein will
perform no better than ?stein on 1 processor.

For clustersize = n/sqrt(NPROW*NPCOL) reorthogonalizing all

1589
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If lwork = -1, then lrwork is global input and a workspace query is

iwork (local) Workspace array.

liwork (local) , size of iwork.

liwork ≥ 6*nnp
Where: nnp = max(n, NPROW*NPCOL + 1, 4)

If liwork = -1, then liwork is global input and a workspace query is

Output Parameters

a On exit, if jobz = 'V', then if info = 0, sub(A) contains the distributed

matrix Z of eigenvectors.
The eigenvectors are normalized as follows:
If ibtype = 1 or 2, then ZH*sub(B)*Z = i;

If ibtype = 3, then ZHinv(sub(B))Z = i.

If jobz = 'N', then on exit the upper triangle (if uplo='U') or the lower
triangle (if uplo='L') of sub(A), including the diagonal, is destroyed.

b On exit, if info ≤ n, the part of sub(B) containing the matrix is overwritten

by the triangular factor U or L from the Cholesky factorization sub(B) =
UH*U, or sub(B) = L*LH.

m (global) The total number of eigenvalues found, 0 ≤ m ≤ n.

nz (global) Total number of eigenvectors computed. 0 < nz < m. The number

of columns of z that are filled.
If jobz ≠ 'V', nz is not referenced.

If jobz = 'V', nz = m unless the user supplies insufficient space and

p?hegvx is not able to detect this before beginning computation. To get all
the eigenvectors requested, the user must supply both sufficient space to
hold the eigenvectors in z (m≤descz[n_ - 1]) and sufficient workspace to
compute them. (See lwork below.) The function p?hegvx is always able to
detect insufficient space without computation unless range = 'V'.

w (global)
Array of size n. On normal exit, the first m entries contain the selected
eigenvalues in ascending order.

z (local).
global size n*n, local size lld_z*LOCc(jz+n-1).

1590
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If jobz = 'V', then on normal exit the first m columns of z contain the
orthonormal eigenvectors of the matrix corresponding to the selected
eigenvalues. If an eigenvector fails to converge, then that column of z
contains the latest approximation to the eigenvector, and the index of the
eigenvector is returned in ifail.
If jobz = 'N', then z is not referenced.

work On exit, work[0] returns the optimal amount of workspace.

rwork On exit, rwork[0] contains the amount of workspace required for optimal
efficiency
If jobz='N'rwork[0] = optimal amount of workspace required to compute
eigenvalues efficiently
If jobz='V'rwork[0] = optimal amount of workspace required to compute
eigenvalues and eigenvectors efficiently with no guarantee on orthogonality.
If range='V', it is assumed that all eigenvectors may be required when
computing optimal workspace.

ifail (global)
Array of size n.
ifail provides additional information when info≠0

If (mod(info/16,2)≠0), then ifail[0] indicates the order of the

smallest minor which is not positive definite.
If (mod(info,2)≠0) on exit, then ifail[0] contains the indices of the
eigenvectors that failed to converge.
If neither of the above error conditions are held, and jobz = 'V', then the
first m elements of ifail are set to zero.

iclustr (global)
Array of size (2*NPROW*NPCOL). This array contains indices of eigenvectors
corresponding to a cluster of eigenvalues that could not be reorthogonalized
due to insufficient workspace (see lwork, orfac and info). Eigenvectors
corresponding to clusters of eigenvalues indexed iclustr(2*i-1) to
iclustr(2*i), could not be reorthogonalized due to lack of workspace.
Hence the eigenvectors corresponding to these clusters may not be
orthogonal.
iclustr() is a zero terminated array. (iclustr(2*k)
≠0.and.clustr(2*k+1)=0) if and only if k is the number of clusters.
iclustr is not referenced if jobz = 'N'.

gap (global)
Array of size NPROW*NPCOL.

1591
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

info (global)
If info = 0, the execution is successful.

If info <0: the i-th argument is an array and the j-entry had an illegal
value, then info = -(i*100+j), if the i-th argument is a scalar and had
an illegal value, then info = -i.

If info> 0:

If (mod(info,2)≠0), then one or more eigenvectors failed to converge.

If (mod(info/16,2)≠0), then B was not positive definite. ifail(1) indicates

the order of the smallest minor which is not positive definite.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

ScaLAPACK Auxiliary Routines

ScaLAPACK Auxiliary Routines
Routine Name Data Description
Types

p?lacgv c,z Conjugates a complex vector.

p?max1 c,z Finds the index of the element whose real part has maximum
absolute value (similar to the Level 1 PBLAS p?amax, but using the
absolute value to the real part).

pmpcol s,d Finds the collaborators of a process.

pmpim2 s,d Computes the eigenpair range assignments for all processes.

?combamax1 c,z Finds the element with maximum real part absolute value and its
corresponding global index.

p?sum1 sc,dz Forms the 1-norm of a complex vector similar to Level 1 PBLAS
p?asum, but using the true absolute value.

p?dbtrsv s,d,c,z Computes an LU factorization of a general tridiagonal matrix with

no pivoting. The routine is called by p?dbtrs.

p?dttrsv s,d,c,z Computes an LU factorization of a general band matrix, using

partial pivoting with row interchanges. The routine is called by
p?dttrs.

p?gebal s,d Balances a general real/complex matrix.

1592
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Routine Name Data Description
Types

p?gebd2 s,d,c,z Reduces a general rectangular matrix to real bidiagonal form by an

orthogonal/unitary transformation (unblocked algorithm).

p?gehd2 s,d,c,z Reduces a general matrix to upper Hessenberg form by an

orthogonal/unitary similarity transformation (unblocked algorithm).

p?gelq2 s,d,c,z Computes an LQ factorization of a general rectangular matrix

(unblocked algorithm).

p?geql2 s,d,c,z Computes a QL factorization of a general rectangular matrix

(unblocked algorithm).

p?geqr2 s,d,c,z Computes a QR factorization of a general rectangular matrix

(unblocked algorithm).

p?gerq2 s,d,c,z Computes an RQ factorization of a general rectangular matrix

(unblocked algorithm).

p?getf2 s,d,c,z Computes an LU factorization of a general matrix, using partial

pivoting with row interchanges (local blocked algorithm).

p?labrd s,d,c,z Reduces the first nb rows and columns of a general rectangular
matrix A to real bidiagonal form by an orthogonal/unitary
transformation, and returns auxiliary matrices that are needed to
apply the transformation to the unreduced part of A.

p?lacon s,d,c,z Estimates the 1-norm of a square matrix, using the reverse
communication for evaluating matrix-vector products.

p?laconsb s,d Looks for two consecutive small subdiagonal elements.

p?lacp2 s,d,c,z Copies all or part of a distributed matrix to another distributed

matrix.

p?lacp3 s,d Copies from a global parallel array into a local replicated array or
vice versa.

p?lacpy s,d,c,z Copies all or part of one two-dimensional array to another.

p?laevswp s,d,c,z Moves the eigenvectors from where they are computed to
ScaLAPACK standard block cyclic array.

p?lahrd s,d,c,z Reduces the first nb columns of a general rectangular matrix A so

that elements below the kth subdiagonal are zero, by an
orthogonal/unitary transformation, and returns auxiliary matrices
that are needed to apply the transformation to the unreduced part
of A.

p?laiect s,d,c,z Exploits IEEE arithmetic to accelerate the computations of

eigenvalues.

p?lamve s, d Copies all or part of one two-dimensional distributed array to

another.

p?lange s,d,c,z Returns the value of the 1-norm, Frobenius norm, infinity-norm, or
the largest absolute value of any element, of a general rectangular
matrix.

p?lanhs s,d,c,z Returns the value of the 1-norm, Frobenius norm, infinity-norm, or
the largest absolute value of any element, of an upper Hessenberg
matrix.

1593
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Routine Name Data Description

Types

p?lansy, p?lanhe s,d,c,z/c Returns the value of the 1-norm, Frobenius norm, infinity-norm, or
,z the largest absolute value of any element of a real symmetric or
complex Hermitian matrix.

p?lantr s,d,c,z Returns the value of the 1-norm, Frobenius norm, infinity-norm, or
the largest absolute value of any element, of a triangular matrix.

p?lapiv s,d,c,z Applies a permutation matrix to a general distributed matrix,

resulting in row or column pivoting.

p?laqge s,d,c,z Scales a general rectangular matrix, using row and column scaling
factors computed by p?geequ.

p?laqr0 s,d Computes the eigenvalues of a Hessenberg matrix and optionally

returns the matrices from the Schur decomposition.

p?laqr1 s,d Sets a scalar multiple of the first column of the product of a 2-by-2
or 3-by-3 matrix and specified shifts.

p?laqr2 s,d Performs the orthogonal/unitary similarity transformation of a

Hessenberg matrix to detect and deflate fully converged
eigenvalues from a trailing principal submatrix (aggressive early
deflation).

p?laqr3 s,d Performs the orthogonal/unitary similarity transformation of a

Hessenberg matrix to detect and deflate fully converged
eigenvalues from a trailing principal submatrix (aggressive early
deflation).

p?laqr5 s,d Performs a single small-bulge multi-shift QR sweep.

p?laqsy s,d,c,z Scales a symmetric/Hermitian matrix, using scaling factors

computed by p?poequ.

p?lared1d s,d Redistributes an array assuming that the input array bycol is
distributed across rows and that all process columns contain the
same copy of bycol.

p?lared2d s,d Redistributes an array assuming that the input array byrow is
distributed across columns and that all process rows contain the
same copy of byrow .

p?larf s,d,c,z Applies an elementary reflector to a general rectangular matrix.

p?larfb s,d,c,z Applies a block reflector or its transpose/conjugate-transpose to a

general rectangular matrix.

p?larfc c,z Applies the conjugate transpose of an elementary reflector to a

general matrix.

p?larfg s,d,c,z Generates an elementary reflector (Householder matrix).

p?larft s,d,c,z Forms the triangular vector T of a block reflector H=I-VTVH

p?larz s,d,c,z Applies an elementary reflector as returned by p?tzrzf to a

general matrix.

p?larzb s,d,c,z Applies a block reflector or its transpose/conjugate-transpose as

returned by p?tzrzf to a general matrix.

1594
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Routine Name Data Description
Types

p?larzc c,z Applies (multiplies by) the conjugate transpose of an elementary

reflector as returned by p?tzrzf to a general matrix.

p?larzt s,d,c,z Forms the triangular factor T of a block reflector H=I-VTVH as

returned by p?tzrzf.

p?lascl s,d,c,z Multiplies a general rectangular matrix by a real scalar defined as

Cto/Cfrom.

p?laset s,d,c,z Initializes the off-diagonal elements of a matrix to α and the

diagonal elements to β.

p?lasmsub s,d Looks for a small subdiagonal element from the bottom of the
matrix that it can safely set to zero.

p?lassq s,d,c,z Updates a sum of squares represented in scaled form.

p?laswp s,d,c,z Performs a series of row interchanges on a general rectangular

matrix.

p?latra s,d,c,z Computes the trace of a general square distributed matrix.

p?latrd s,d,c,z Reduces the first nb rows and columns of a symmetric/Hermitian

matrix A to real tridiagonal form by an orthogonal/unitary similarity
transformation.

p?latrz s,d,c,z Reduces an upper trapezoidal matrix to upper triangular form by

means of orthogonal/unitary transformations.

p?lauu2 s,d,c,z Computes the product UUH or LHL, where U and L are upper or
lower triangular matrices (local unblocked algorithm).

p?lauum s,d,c,z Computes the product UUH or LHL, where U and L are upper or
lower triangular matrices.

p?lawil s,d Forms the Wilkinson transform.

p?org2l/p?ung2l s,d,c,z Generates all or part of the orthogonal/unitary matrix Q from a QL

factorization determined by p?geqlf (unblocked algorithm).

p?org2r/p?ung2r s,d,c,z Generates all or part of the orthogonal/unitary matrix Q from a QR

factorization determined by p?geqrf (unblocked algorithm).

p?orgl2/p?ungl2 s,d,c,z Generates all or part of the orthogonal/unitary matrix Q from an LQ

factorization determined by p?gelqf (unblocked algorithm).

p?orgr2/p?ungr2 s,d,c,z Generates all or part of the orthogonal/unitary matrix Q from an

RQ factorization determined by p?gerqf (unblocked algorithm).

p?orm2l/p?unm2l s,d,c,z Multiplies a general matrix by the orthogonal/unitary matrix from a

QL factorization determined by p?geqlf (unblocked algorithm).

p?orm2r/p?unm2r s,d,c,z Multiplies a general matrix by the orthogonal/unitary matrix from a

QR factorization determined by p?geqrf (unblocked algorithm).

p?orml2/p?unml2 s,d,c,z Multiplies a general matrix by the orthogonal/unitary matrix from

an LQ factorization determined by p?gelqf (unblocked algorithm).

p?ormr2/p?unmr2 s,d,c,z Multiplies a general matrix by the orthogonal/unitary matrix from

an RQ factorization determined by p?gerqf (unblocked algorithm).

1595
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Routine Name Data Description

Types

p?pbtrsv s,d,c,z Solves a single triangular linear system via frontsolve or backsolve
where the triangular matrix is a factor of a banded matrix
computed by p?pbtrf.

p?pttrsv s,d,c,z Solves a single triangular linear system via frontsolve or backsolve
where the triangular matrix is a factor of a tridiagonal matrix
computed by p?pttrf.

p?potf2 s,d,c,z Computes the Cholesky factorization of a symmetric/Hermitian

positive definite matrix (local unblocked algorithm).

p?rot s,d Applies a planar rotation to two distributed vectors.

p?rscl s,d,cs,zd Multiplies a vector by the reciprocal of a real scalar.

p?sygs2/p?hegs2 s,d,c,z Reduces a symmetric/Hermitian positive-definite generalized

eigenproblem to standard form, using the factorization results
obtained from p?potrf (local unblocked algorithm).

p?sytd2/p?hetd2 s,d,c,z Reduces a symmetric/Hermitian matrix to real symmetric

tridiagonal form by an orthogonal/unitary similarity transformation
(local unblocked algorithm).

p?trord s,d Reorders the Schur factorization of a general matrix.

p?trsen s,d Reorders the Schur factorization of a matrix and (optionally)

computes the reciprocal condition numbers and invariant subspace
for the selected cluster of eigenvalues.

p?trti2 s,d,c,z Computes the inverse of a triangular matrix (local unblocked

algorithm).

?lamsh s,d Sends multiple shifts through a small (single node) matrix to
maximize the number of bulges that can be sent through.

?laqr6 s,d Performs a single small-bulge multi-shift QR sweep collecting the

transformations.

?lar1va s,d Computes scaled eigenvector corresponding to given eigenvalue.

?laref s,d Applies Householder reflectors to matrices on either their rows or

columns.

?larrb2 s,d Provides limited bisection to locate eigenvalues for more accuracy.

?larrd2 s,d Computes the eigenvalues of a symmetric tridiagonal matrix to

suitable accuracy.

?larre2 s,d Given a tridiagonal matrix, sets small off-diagonal elements to zero
and for each unreduced block, finds base representations and
eigenvalues.

?larre2a s,d Given a tridiagonal matrix, sets small off-diagonal elements to zero
and for each unreduced block, finds base representations and
eigenvalues.

?larrf2 s,d Finds a new relatively robust representation such that at least one
of the eigenvalues is relatively isolated.

?larrv2 s,d Computes the eigenvectors of the tridiagonal matrix T = LDLT

given L, D and the eigenvalues of L*D*LT.

1596
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Routine Name Data Description
Types

?lasorte s,d Sorts eigenpairs by real and complex data types.

?lasrt2 s,d Sorts numbers in increasing or decreasing order.

?stegr2 s,d Computes selected eigenvalues and eigenvectors of a real

symmetric tridiagonal matrix.

?stegr2a s,d Computes selected eigenvalues and initial representations needed

for eigenvector computations.

?stegr2b s,d From eigenvalues and initial representations computes the selected
eigenvalues and eigenvectors of the real symmetric tridiagonal
matrix in parallel on multiple processors.

?stein2 s,d Computes the eigenvectors corresponding to specified eigenvalues

of a real symmetric tridiagonal matrix, using inverse iteration.

?dbtf2 s,d,c,z Computes an LU factorization of a general band matrix with no

pivoting (local unblocked algorithm).

?dbtrf s,d,c,z Computes an LU factorization of a general band matrix with no

pivoting (local blocked algorithm).

?dttrf s,d,c,z Computes an LU factorization of a general tridiagonal matrix with

no pivoting (local blocked algorithm).

?dttrsv s,d,c,z Solves a general tridiagonal system of linear equations using the LU
factorization computed by ?dttrf.

?pttrsv s,d,c,z Solves a symmetric (Hermitian) positive-definite tridiagonal system

of linear equations, using the LDLH factorization computed
by ?pttrf.

?steqr2 s,d Computes all eigenvalues and, optionally, eigenvectors of a

symmetric tridiagonal matrix using the implicit QL or QR method.

?trmvt s,d,c,z Performs matrix-vector operations.

pilaenv NA Returns the positive integer value of the logical blocking size.

pilaenvx NA Called from the ScaLAPACK routines to choose problem-dependent

parameters for the local environment.

pjlaenv NA Called from the ScaLAPACK symmetric and Hermitian tailored

eigen-routines to choose problem-dependent parameters for the
local environment.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

p?lacgv
Conjugates a complex vector.

1597
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
void pclacgv (MKL_INT *n , MKL_Complex8 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT
*descx , MKL_INT *incx );
void pzlacgv (MKL_INT *n , MKL_Complex16 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT
*descx , MKL_INT *incx );

Include Files
• mkl_scalapack.h

Description
The p?lacgvfunction conjugates a complex vector sub(X) of length n, where sub(X) denotes X(ix, jx:jx
+n-1) if incx = m_x, and X(ix:ix+n-1, jx) if incx = 1.

Input Parameters

n (global) The length of the distributed vector sub(X).

x (local).
Pointer into the local memory to an array of size lld_x * LOCc(n_x). On
entry the vector to be conjugated x[i] = X(ix+(jx-1)*m_x+i*incx), 0
≤ i < n.

ix (global) The row index in the global matrix X indicating the first row of
sub(X).

jx (global) The column index in the global matrix X indicating the first column
of sub(X).

descx (global and local) Array of size dlen_=9. The array descriptor for the
distributed matrix X.

incx (global) The global increment for the elements of X. Only two values of
incx are supported in this version, namely 1 and m_x. incx must not be
zero.

Output Parameters

x (local).
On exit, the local pieces of conjugated distributed vector sub(X).

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?max1
Finds the index of the element whose real part has
maximum absolute value (similar to the Level 1 PBLAS
p?amax, but using the absolute value to the real part).

Syntax
void pcmax1 (MKL_INT *n , MKL_Complex8 *amax , MKL_INT *indx , MKL_Complex8 *x ,
MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , MKL_INT *incx );
void pzmax1 (MKL_INT *n , MKL_Complex16 *amax , MKL_INT *indx , MKL_Complex16 *x ,
MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , MKL_INT *incx );

1598
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl_scalapack.h

Description
The p?max1function computes the global index of the maximum element in absolute value of a distributed
vector sub(X). The global index is returned in indx and the value is returned in amax, where sub(X) denotes
X(ix:ix+n-1, jx) if incx = 1, X(ix, jx:jx+n-1) if incx = m_x.

Input Parameters

n (global). The number of components of the distributed vector sub(X). n ≥ 0.

x (local)
Pointer into the local memory to an array of size lld_x * LOCc(jx+n-1). On
entry this array contains the local pieces of the distributed vector sub(X).

ix (global) The row index in the global matrix X indicating the first row of
sub(X).

jx (global) The column index in the global matrix X indicating the first column
of sub(X).

descx (global and local) Array of size dlen_. The array descriptor for the
distributed matrix X.

incx (global).The global increment for the elements of X. Only two values of incx
are supported in this version, namely 1 and m_x. incx must not be zero.

Output Parameters

amax (global output).The absolute value of the largest entry of the distributed
vector sub(X) only in the scope of sub(X).

indx (global output).The global index of the element of the distributed vector
sub(X) whose real part has maximum absolute value.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

pilaver
Returns the ScaLAPACK version.

Syntax
void pilaver (MKL_INT* vers_major, MKL_INT* vers_minor, MKL_INT* vers_patch);

Include Files
• mkl_scalapack.h

Description
This function returns the ScaLAPACK version.

Output Parameters

vers_major Return the ScaLAPACK major version.

1599
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

vers_minor Return the ScaLAPACK minor version from the major version.

vers_patch Return the ScaLAPACK patch version from the minor version.

pmpcol
Finds the collaborators of a process.

Syntax
void pmpcol(MKL_INT* myproc, MKL_INT* nprocs, MKL_INT* iil, MKL_INT* needil, MKL_INT*
neediu, MKL_INT* pmyils, MKL_INT* pmyius, MKL_INT* colbrt, MKL_INT* frstcl, MKL_INT*
lastcl);

Include Files
• mkl_scalapack.h

Description
Using the output from pmpim2 and given the information on eigenvalue clusters, pmpcol finds the
collaborators of myproc.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

myproc The processor number, 0 ≤myproc < nprocs.

nprocs The total number of processors available.

iil The index of the leftmost eigenvalue in the eigenvalue cluster.

needil The leftmost position in the eigenvalue cluster needed by myproc.

neediu The rightmost position in the eigenvalue cluster needed by myproc.

pmyils array
For each processor p, 0 < p≤nprocs, pmyils[p-1] is the index of the first
eigenvalue in the eigenvalue cluster to be computed.
pmyils[p-1] equals zero if p stays idle.

pmyius array
For each processor p, pmyius[p-1] is the index of the last eigenvalue in the
eigenvalue cluster to be computed.
pmyius[p-1] equals zero if p stays idle.

OUTPUT Parameters

colbrt Non-zero if myproc collaborates.

frstcl, lastcl First and last collaborator of myproc .

1600
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
myproc collaborates with:
frstcl, ..., myproc-1, myproc+1, ...,lastcl
If myproc = frstcl, there are no collaborators on the left. If myproc =
lastcl, there are no collaborators on the right.
If frstcl = 0 and lastcl = nprocs-1, then myproc collaborates with
everybody

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

pmpim2
Computes the eigenpair range assignments for all
processes.

Syntax
void pmpim2(MKL_INT* il, MKL_INT* iu, MKL_INT* nprocs, MKL_INT* pmyils, MKL_INT*
pmyius);

Include Files
• mkl_scalapack.h

Description
pmpim2 is the scheduling function. It computes for all processors the eigenpair range assignments.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

il, iu The range of eigenpairs to be computed.

nprocs The total number of processors available.

Output Parameters

pmyils array
For each processor p, pmyils[p-1] is the index of the first eigenvalue
in a cluster to be computed.
pmyils[p-1] equals zero if p stays idle.

pmyius array
For each processor p, pmyius[p-1] is the index of the last eigenvalue
in a cluster to be computed.
pmyius[p-1] equals zero if p stays idle.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

1601
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

?combamax1
Finds the element with maximum real part absolute
value and its corresponding global index.

Syntax
void ccombamax1 (MKL_Complex8 *v1 , MKL_Complex8 *v2 );
void zcombamax1 (MKL_Complex16 *v1 , MKL_Complex16 *v2 );

Include Files
• mkl_scalapack.h

Description
The ?combamax1function finds the element having maximum real part absolute value as well as its
corresponding global index.

Input Parameters

v1 (local)
Array of size 2. The first maximum absolute value element and its global
index. v1[0]=amax, v1[1]=indx.

v2 (local)
Array of size 2. The second maximum absolute value element and its global
index. v2[0]=amax, v2[1]=indx.

Output Parameters

v1 (local).
The first maximum absolute value element and its global index.
v1[0]=amax, v1[1]=indx.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?sum1
Forms the 1-norm of a complex vector similar to Level
1 PBLAS p?asum, but using the true absolute value.

Syntax
void pscsum1 (MKL_INT *n , float *asum , MKL_Complex8 *x , MKL_INT *ix , MKL_INT *jx ,
MKL_INT *descx , MKL_INT *incx );
void pdzsum1 (MKL_INT *n , double *asum , MKL_Complex16 *x , MKL_INT *ix , MKL_INT
*jx , MKL_INT *descx , MKL_INT *incx );

Include Files
• mkl_scalapack.h

Description
The p?sum1function returns the sum of absolute values of a complex distributed vector sub(x) in asum,
where sub(x) denotes X(ix:ix+n-1, jx:jx), if incx = 1, X(ix:ix, jx:jx+n-1), if incx = m_x.

1602
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Based on p?asum from the Level 1 PBLAS. The change is to use the 'genuine' absolute value.

Input Parameters

n (global). The number of components of the distributed vector sub(x). n ≥ 0.

x (local )
Pointer into the local memory to an array of size lld_x * LOCc(jx+n-1). This
array contains the local pieces of the distributed vector sub(X).

ix (global) The row index in the global matrix X indicating the first row of
sub(X).

jx (global) The column index in the global matrix X indicating the first column
of sub(X)

descx (local) Array of size dlen_=9. The array descriptor for the distributed matrix
X.

incx (global) The global increment for the elements of X. Only two values of
incx are supported in this version, namely 1 and m_x.

Output Parameters

asum (local)
The sum of absolute values of the distributed vector sub(X) only in its
scope.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?dbtrsv
Computes an LU factorization of a general triangular
matrix with no pivoting. The function is called by
p?dbtrs.

Syntax
void psdbtrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu ,
MKL_INT *nrhs , float *a , MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib ,
MKL_INT *descb , float *af , MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT
*info );
void pddbtrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu ,
MKL_INT *nrhs , double *a , MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib ,
MKL_INT *descb , double *af , MKL_INT *laf , double *work , MKL_INT *lwork , MKL_INT
*info );
void pcdbtrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu ,
MKL_INT *nrhs , MKL_Complex8 *a , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b ,
MKL_INT *ib , MKL_INT *descb , MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8 *work ,
MKL_INT *lwork , MKL_INT *info );
void pzdbtrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu ,
MKL_INT *nrhs , MKL_Complex16 *a , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b ,
MKL_INT *ib , MKL_INT *descb , MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16 *work ,
MKL_INT *lwork , MKL_INT *info );

1603
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl_scalapack.h

Description
The p?dbtrsvfunction solves a banded triangular system of linear equations

A(1 :n, ja:ja+n-1) * X = B(ib:ib+n-1, 1 :nrhs) or

A(1 :n, ja:ja+n-1)T * X = B(ib:ib+n-1, 1 :nrhs) (for real flavors); A(1 :n, ja:ja+n-1)H* X = B(ib:ib+n-1,
1 :nrhs) (for complex flavors),
where A(1 :n, ja:ja+n-1) is a banded triangular matrix factor produced by the Gaussian elimination code of
p?dbtrf and is stored in A(1 :n, ja:ja+n-1) and af. The matrix stored in A(1 :n, ja:ja+n-1) is either
upper or lower triangular according to uplo, and the choice of solving A(1 :n, ja:ja+n-1) or A(1 :n, ja:ja
+n-1)T is dictated by the user by the parameter trans.
The function p?dbtrf must be called first.

Input Parameters

uplo (global)
If uplo='U', the upper triangle of A(1:n, ja:ja+n-1) is stored,

if uplo = 'L', the lower triangle of A(1:n, ja:ja+n-1) is stored.

trans (global)
If trans = 'N', solve with A(1:n, ja:ja+n-1),

if trans = 'C', solve with conjugate transpose A(1:n, ja:ja+n-1).

n (global) The order of the distributed submatrix A;(n≥ 0).

bwl (global) Number of subdiagonals. 0 ≤ bwl ≤ n-1.

bwu (global) Number of subdiagonals. 0 ≤ bwu ≤ n-1.

nrhs (global) The number of right-hand sides; the number of columns of the
distributed submatrix B (nrhs≥ 0).

a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1),
where lld_a≥(bwl+bwu+1). On entry, this array contains the local pieces of
the n-by-n unsymmetric banded distributed Cholesky factor L or LT,
represented in global A as A(1 :n, ja:ja+n-1). This local portion is stored
in the packed banded format used in LAPACK. See the Application Notes
below and the ScaLAPACK manual for more detail on the format of
distributed matrices.

ja (global) The index in the global matrix A that points to the start of the
matrix to be operated on (which may be either all of A or a submatrix of A).

desca (global and local) array of size dlen_.

if 1d type (dtype_a = 501 or 502), dlen≥ 7;
if 2d type (dtype_a = 1), dlen≥ 9. The array descriptor for the distributed
matrix A. Contains information of mapping of A to memory.

1604
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
b (local)
Pointer into the local memory to an array of local lead dimension lld_b≥nb.
On entry, this array contains the local pieces of the right-hand sides
B(ib:ib+n-1, 1:nrhs).

ib (global) The row index in the global matrix B that points to the first row of
the matrix to be operated on (which may be either all of B or a submatrix of
B).

descb (global and local) array of size dlen_.

if 1d type (dtype_b =502), dlen≥7;

if 2d type (dtype_b =1), dlen≥9. The array descriptor for the distributed
matrix B. Contains information of mapping B to memory.

laf (local)
Size of user-input auxiliary fill-in space af.

laf≥nb(bwl+bwu)+6max(bwl, bwu)*max(bwl, bwu). If laf is not

large enough, an error code is returned and the minimum acceptable size
will be returned in af[0].

work (local).
Temporary workspace. This space may be overwritten in between function
calls.
work must be the size given in lwork.

lwork (local or global)

Size of user-input workspace work. If lwork is too small, the minimal
acceptable size will be returned in work[0] and an error code is returned.

lwork≥ max(bwl, bwu)*nrhs.

Output Parameters

a (local).
This local portion is stored in the packed banded format used in LAPACK.
Please see the ScaLAPACK manual for more detail on the format of
distributed matrices.

b On exit, this contains the local piece of the solutions distributed matrix X.

af (local).
auxiliary fill-in space. The fill-in space is created in a call to the factorization
function p?dbtrf and is stored in af. If a linear system is to be solved
using p?dbtrf after the factorization function, af must not be altered after
the factorization.

work On exit, work[0] contains the minimal lwork.

info (local).
If info = 0, the execution is successful.

1605
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

< 0: If the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value, then info= - (i*100+j), if the i-th argument is a scalar and
had an illegal value, then info = -i.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?dttrsv
Computes an LU factorization of a general band
matrix, using partial pivoting with row interchanges.
The function is called by p?dttrs.

Syntax
void psdttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , float *dl , float
*d , float *du , MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT
*descb , float *af , MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT *info );
void pddttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , double *dl ,
double *d , double *du , MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib ,
MKL_INT *descb , double *af , MKL_INT *laf , double *work , MKL_INT *lwork , MKL_INT
*info );
void pcdttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8
*dl , MKL_Complex8 *d , MKL_Complex8 *du , MKL_INT *ja , MKL_INT *desca , MKL_Complex8
*b , MKL_INT *ib , MKL_INT *descb , MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8
*work , MKL_INT *lwork , MKL_INT *info );
void pzdttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16
*dl , MKL_Complex16 *d , MKL_Complex16 *du , MKL_INT *ja , MKL_INT *desca ,
MKL_Complex16 *b , MKL_INT *ib , MKL_INT *descb , MKL_Complex16 *af , MKL_INT *laf ,
MKL_Complex16 *work , MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?dttrsvfunction solves a tridiagonal triangular system of linear equations

A(1 :n, ja:ja+n-1)*X = B(ib:ib+n-1, 1 :nrhs) or

A(1 :n, ja:ja+n-1)T * X = B(ib:ib+n-1, 1 :nrhs) for real flavors; A(1 :n, ja:ja+n-1)H* X =
B(ib:ib+n-1, 1 :nrhs) for complex flavors,
where A(1 :n, ja:ja+n-1) is a tridiagonal matrix factor produced by the Gaussian elimination code of
p?dttrf and is stored in A(1 :n, ja:ja+n-1) and af.
The matrix stored in A(1 :n, ja:ja+n-1) is either upper or lower triangular according to uplo, and the
choice of solving A(1 :n, ja:ja+n-1) or A(1 :n, ja:ja+n-1)T is dictated by the user by the parameter
trans.
The function p?dttrf must be called first.

Input Parameters

uplo (global)
If uplo='U', the upper triangle of A(1:n, ja:ja+n-1) is stored,

1606
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
if uplo = 'L', the lower triangle of A(1:n, ja:ja+n-1) is stored.

trans (global)
If trans = 'N', solve with A(1:n, ja:ja+n-1),

if trans = 'C', solve with conjugate transpose A(1:n, ja:ja+n-1).

n (global) The order of the distributed submatrix A;(n≥ 0).

nrhs (global) The number of right-hand sides; the number of columns of the
distributed submatrix B(ib:ib+n-1, 1:nrhs). (nrhs≥ 0).

dl (local).
Pointer to local part of global vector storing the lower diagonal of the
matrix.
Globally, dl[0] is not referenced, and dl must be aligned with d.

Must be of size ≥nb_a.

d (local).
Pointer to local part of global vector storing the main diagonal of the matrix.

du (local).
Pointer to local part of global vector storing the upper diagonal of the
matrix.
Globally, du[n-1] is not referenced, and du must be aligned with d.

ja (global) The index in the global matrix A that points to the start of the
matrix to be operated on (which may be either all of A or a submatrix of A).

desca (global and local) array of size dlen_.

if 1d type (dtype_a = 501 or 502), dlen≥ 7;

if 2d type (dtype_a = 1), dlen≥ 9.

The array descriptor for the distributed matrix A. Contains information of

mapping of A to memory.

b (local)
Pointer into the local memory to an array of local lead dimension lld_b≥nb.
On entry, this array contains the local pieces of the right-hand sides
B(ib:ib+n-1, 1 :nrhs).

ib (global) The row index in the global matrix B that points to the first row of
the matrix to be operated on (which may be either all of B or a submatrix of
B).

descb (global and local) array of size dlen_.

if 1d type (dtype_b = 502), dlen≥7;

if 2d type (dtype_b = 1), dlen≥ 9.

The array descriptor for the distributed matrix B. Contains information of

mapping B to memory.

laf (local).

1607
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Size of user-input auxiliary fill-in space af.

laf≥ 2*(nb+2). If laf is not large enough, an error code is returned and
the minimum acceptable size will be returned in af[0].

work (local).
Temporary workspace. This space may be overwritten in between function
calls.
work must be the size given in lwork.

lwork (local or global)

Size of user-input workspace work. If lwork is too small, the minimal
acceptable size will be returned in work[0] and an error code is returned.

lwork≥ 10*npcol+4*nrhs.

Output Parameters

dl (local).
On exit, this array contains information containing the factors of the matrix.

d On exit, this array contains information containing the factors of the matrix.
Must be of size ≥nb_a.

b On exit, this contains the local piece of the solutions distributed matrix X.

af (local).
Auxiliary fill-in space. The fill-in space is created in a call to the factorization
function p?dttrf and is stored in af. If a linear system is to be solved
using p?dttrs after the factorization function, af must not be altered after
the factorization.

work On exit, work[0] contains the minimal lwork.

info (local).
If info=0, the execution is successful.

if info< 0: If the i-th argument is an array and the j-th entry,

indexed j-1, had an illegal value, then info = - (i*100+j), if the i-th
argument is a scalar and had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?gebal
Balances a general real/complex matrix.

Syntax
void psgebal(char* job, MKL_INT* n, float* a, MKL_INT* desca, MKL_INT* ilo, MKL_INT*
ihi, float* scale, MKL_INT* info);
void pdgebal(char* job, MKL_INT* n, double* a, MKL_INT* desca, MKL_INT* ilo, MKL_INT*
ihi, double* scale, MKL_INT* info);
void pcgebal(char* job, MKL_INT* n, complex float* a, MKL_INT* desca, MKL_INT* ilo,
MKL_INT* ihi, float* scale, MKL_INT* info);

1608
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void pzgebal(char* job, MKL_INT* n, complex double* a, MKL_INT* desca, MKL_INT* ilo,
MKL_INT* ihi, double* scale, MKL_INT* info);

Include Files
• mkl_scalapack.h

Description
p?gebal balances a general real/complex matrix A. This involves, first, permuting A by a similarity
transformation to isolate eigenvalues in the first 1 to ilo-1 and last ihi+1 to n elements on the diagonal;
and second, applying a diagonal similarity transformation to rows and columns ilo to ihi to make the rows
and columns as close in norm as possible. Both steps are optional.
Balancing may reduce the 1-norm of the matrix, and improve the accuracy of the computed eigenvalues
and/or eigenvectors.

Input Parameters

job (global )
Specifies the operations to be performed on a:

= 'N': none: simply set ilo = 1, ihi = n, scale[i] = 1.0 for i = 0,...,n-1;

= 'P': permute only;

= 'S': scale only;
= 'B': both permute and scale.

n (global )
The order of the matrix A (n≥ 0).

a (local ) Pointer into the local memory to an array of size lld_a * LOCc(n)

This array contains the local pieces of global input matrix A.

desca (global and local) array of size dlen_.

The array descriptor for the distributed matrix A.

OUTPUT Parameters

a On exit, a is overwritten by the balanced matrixA.

If job = 'N', a is not referenced.

See Notes for further details.

ilo, ihi (global )

ilo and ihi are set to integers such that on exit matrix elements A(i,j) are
zero if i > j and j = 1,...,ilo-1 or i = ihi+1,...,n.

If job = 'N' or 'S', ilo = 1 and ihi = n.

scale (global ) array of size n.

Details of the permutations and scaling factors applied to a. If pj is the

index of the row and column interchanged with row and column j and dj is
the scaling factor applied to row and column j, then
scale[j-1] = pj for j = 1,...,ilo-1, ihi+1,..., n

1609
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

scale[j-1] = dj for j = ilo,...,ihi

The order in which the interchanges are made is n to ihi+1, then 1 to
ilo-1.

info (global )
= 0: successful exit.
< 0: if info = -i, the i-th argument had an illegal value.

Application Notes
The permutations consist of row and column interchanges which put the matrix in the form

where T1 and T2 are upper triangular matrices whose eigenvalues lie along the diagonal. The column indices
ilo and ihi mark the starting and ending columns of the submatrix B. Balancing consists of applying a
diagonal similarity transformation D-1BD to make the 1-norms of each row of B and its corresponding column
nearly equal. The output matrix is

1610
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

Information about the permutations P and the diagonal matrix D is returned in the vector scale.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?gebd2
Reduces a general rectangular matrix to real
bidiagonal form by an orthogonal/unitary
transformation (unblocked algorithm).

Syntax
void psgebd2 (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *d , float *e , float *tauq , float *taup , float *work , MKL_INT
*lwork , MKL_INT *info );
void pdgebd2 (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *d , double *e , double *tauq , double *taup , double *work , MKL_INT
*lwork , MKL_INT *info );
void pcgebd2 (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , float *d , float *e , MKL_Complex8 *tauq , MKL_Complex8 *taup ,
MKL_Complex8 *work , MKL_INT *lwork , MKL_INT *info );
void pzgebd2 (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *d , double *e , MKL_Complex16 *tauq , MKL_Complex16 *taup ,
MKL_Complex16 *work , MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?gebd2function reduces a real/complex general m-by-n distributed matrix sub(A) = A(ia:ia+m-1,
ja:ja+n-1) to upper or lower bidiagonal form B by an orthogonal/unitary transformation:

Q'*sub(A)*P = B.
If m ≥ n, B is the upper bidiagonal; if m<n, B is the lower bidiagonal.

1611
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

m (global)
The number of rows of the distributed matrix sub(A). (m≥0).

n (global)
The number of columns in the distributed matrix sub(A). (n≥0).

a (local).
Pointer into the local memory to an array of sizelld_a * LOCc(ja+n-1).

On entry, this array contains the local pieces of the general distributed
matrix sub(A).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

work (local).
This is a workspace array of size lwork.

lwork (local or global)

The size of the array work.
lwork is local input and must be at least lwork≥ max(mpa0, nqa0),

where nb = mb_a = nb_a, iroffa = mod(ia-1, nb),

iarow = indxg2p(ia, nb, myrow, rsrc_a, nprow),

iacol = indxg2p(ja, nb, mycol, csrc_a, npcol),
mpa0 = numroc(m+iroffa, nb, myrow, iarow, nprow),
nqa0 = numroc(n+icoffa, nb, mycol, iacol, npcol).
indxg2p and numroc are ScaLAPACK tool functions; myrow, mycol, nprow,
and npcol can be determined by calling the function blacs_gridinfo.

If lwork = -1, then lwork is global input and a workspace query is

Output Parameters

a (local).
On exit, if m ≥ n, the diagonal and the first superdiagonal of sub(A) are
overwritten with the upper bidiagonal matrix B; the elements below the
diagonal, with the array tauq, represent the orthogonal/unitary matrix Q as
a product of elementary reflectors, and the elements above the first
superdiagonal, with the array taup, represent the orthogonal matrix P as a
product of elementary reflectors. If m < n, the diagonal and the first
subdiagonal are overwritten with the lower bidiagonal matrix B; the
elements below the first subdiagonal, with the array tauq, represent the

1612
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
orthogonal/unitary matrix Q as a product of elementary reflectors, and the
elements above the diagonal, with the array taup, represent the orthogonal
matrix P as a product of elementary reflectors. See Applications Notes
below.

d (local)
Array of size LOCc(ja+min(m,n)-1) if m ≥ n; LOCr(ia+min(m,n)-1)
otherwise. The distributed diagonal elements of the bidiagonal matrix B:
d[i] = A(i+1,i+1), i=0, 1,..., size (d) - 1 . d is tied to the distributed matrix
A.

e (local)
Array of size LOCc(ja+min(m,n)-1) if m≥ n; LOCr(ia+min(m,n)-2)
otherwise. The distributed diagonal elements of the bidiagonal matrix B:
if m ≥ n, e[i] = A(i+1,i+2) for i = 0, 1, ... , n-2;

if m < n, e[i] = A(i+2,i+1) for i = 0, 1, ..., m-2. e is tied to the distributed

matrix A.

tauq (local).
Array of size LOCc(ja+min(m,n)-1). The scalar factors of the elementary
reflectors which represent the orthogonal/unitary matrix Q. tauq is tied to
the distributed matrix A.

taup (local).
Array of size LOCr(ia+min(m,n)-1). The scalar factors of the elementary
reflectors which represent the orthogonal/unitary matrix P. taup is tied to
the distributed matrix A.

work On exit, work[0] returns the minimal and optimal lwork.

info (local)

If info = 0, the execution is successful.

if info < 0: If the i-th argument is an array and the j-th entry, indexed
j-1, had an illegal value, then info = - (i*100+j), if the i-th argument is a
scalar and had an illegal value, then info = -i.

Application Notes
The matrices Q and P are represented as products of elementary reflectors:
If m≥n,

Q = H(1)H(2)...H(n), and P = G(1)G(2)...G(n-1)

Each H(i) and G(i) has the form:
H(i) = I - tauq*v*v', and G(i) = I - taup*u*u',
where tauq and taup are real/complex scalars, and v and u are real/complex vectors. v(1: i-1) = 0, v(i)
= 1, and v(i+i:m) is stored on exit in
A(ia+i-ia+m-1, ja+i-1);
u(1:i) = 0, u(i+1) = 1, and u(i+2:n) is stored on exit in A(ia+i-1, ja+i+1:ja+n-1);

1613
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

tauq is stored in tauq[ja+i-2] and taup in taup[ia+i-2].

If m < n,

v(1: i) = 0, v(i+1) = 1, and v(i+2:m) is stored on exit in A(ia+i+1: ia+m-1, ja+i-1);

u(1: i-1) = 0, u(i) = 1, and u(i+1 :n) is stored on exit in A(ia+i-1,ja+i:ja+n-1);
tauq is stored in tauq[ja+i-2] and taup in taup[ia+i-2].

The contents of sub(A) on exit are illustrated by the following examples:

where d and e denote diagonal and off-diagonal elements of B, vi denotes an element of the vector defining
H(i), and ui an element of the vector defining G(i).

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?gehd2
Reduces a general matrix to upper Hessenberg form
by an orthogonal/unitary similarity transformation
(unblocked algorithm).

Syntax
void psgehd2 (MKL_INT *n , MKL_INT *ilo , MKL_INT *ihi , float *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *tau , float *work , MKL_INT *lwork , MKL_INT
*info );
void pdgehd2 (MKL_INT *n , MKL_INT *ilo , MKL_INT *ihi , double *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *tau , double *work , MKL_INT *lwork , MKL_INT
*info );
void pcgehd2 (MKL_INT *n , MKL_INT *ilo , MKL_INT *ihi , MKL_Complex8 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
void pzgehd2 (MKL_INT *n , MKL_INT *ilo , MKL_INT *ihi , MKL_Complex16 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

1614
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
The p?gehd2function reduces a real/complex general distributed matrix sub(A) to upper Hessenberg form H
by an orthogonal/unitary similarity transformation: Q'*sub(A)*Q = H, where sub(A) = A(ia+n-1 :ia
+n-1, ja+n-1 :ja+n-1).

Input Parameters

n (global) The order of the distributed submatrix A. (n≥ 0).

ilo, ihi (global) It is assumed that the matrix sub(A) is already upper triangular in
rows ia:ia+ilo-2 and ia+ihi:ia+n-1 and columns ja:ja+jlo-2 and ja
+jhi:ja+n-1. See Application Notes for further information.
If n≥ 0, 1 ≤ ilo ≤ ihi ≤ n; otherwise set ilo = 1, ihi = n.

a (local).
Pointer into the local memory to an array of sizelld_a * LOCc(ja+n-1).

On entry, this array contains the local pieces of the n-by-n general
distributed matrix sub(A) to be reduced.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

work (local).
This is a workspace array of size lwork.

lwork (local or global)

The size of the array work.
lwork is local input and must be at least lwork≥nb + max( npa0, nb ),
where nb = mb_a = nb_a, iroffa = mod( ia-1, nb ), iarow =
indxg2p ( ia, nb, myrow, rsrc_a, nprow ),npa0 = numroc(ihi
+iroffa, nb, myrow, iarow, nprow ).
indxg2p and numroc are ScaLAPACK tool functions;myrow, mycol, nprow,
and npcol can be determined by calling the function blacs_gridinfo.

If lwork = -1, then lwork is global input and a workspace query is

Output Parameters

a (local). On exit, the upper triangle and the first subdiagonal of sub(A) are
overwritten with the upper Hessenberg matrix H, and the elements below
the first subdiagonal, with the array tau, represent the orthogonal/unitary
matrix Q as a product of elementary reflectors. (see Application Notes
below).

tau (local).

1615
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Array of size LOCc(ja+n-2) The scalar factors of the elementary reflectors

(see Application Notes below). Elements ja:ja+ilo-2 and ja+ihi:ja+n-2
of the global vector tau are set to zero. tau is tied to the distributed matrix
A.

work On exit, work[0] returns the minimal and optimal lwork.

info (local)
If info = 0, the execution is successful.

if info < 0: If the i-th argument is an array and the j-th entry, indexed j-1,
had an illegal value, then info = - (i*100+j), if the i-th argument is a
scalar and had an illegal value, then info = -i.

Application Notes
The matrix Q is represented as a product of (ihi-ilo) elementary reflectors

Q = H(ilo)*H(ilo+1)*...*H(ihi-1).
Each H(i) has the form

H(i) = I - tau*v*v',
where tau is a real/complex scalar, and v is a real/complex vector with v(1: i)=0, v(i+1)=1 and v(ihi
+1:n)=0; v(i+2:ihi) is stored on exit in A(ia+ilo+i:ia+ihi-1, ia+ilo+i-2), and tau in tau[ja+ilo
+i-3].
The contents of A(ia:ia+n-1, ja:ja+n-1) are illustrated by the following example, with n = 7, ilo = 2
and ihi = 6:

where a denotes an element of the original matrix sub(A), h denotes a modified element of the upper
Hessenberg matrix H, and vi denotes an element of the vector defining H(ja+ilo+i-2).

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?gelq2
Computes an LQ factorization of a general rectangular
matrix (unblocked algorithm).

1616
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void psgelq2 (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdgelq2 (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
void pcgelq2 (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT
*info );
void pzgelq2 (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT
*info );

Include Files
• mkl_scalapack.h

Description
The p?gelq2function computes an LQ factorization of a real/complex distributed m-by-n matrix sub(A) =
A(ia:ia+m-1, ja:ja+n-1) = L*Q.

Input Parameters

m (global)
The number of rows of the distributed matrix sub(A). (m≥0).

n (global)
The number of columns of the distributed matrix sub(A). (n≥0).

a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).

On entry, this array contains the local pieces of the m-by-n distributed
matrix sub(A) which is to be factored.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

work (local).
This is a workspace array of size lwork.

lwork (local or global)

The size of the array work.

lwork is local input and must be at least lwork≥nq0 + max( 1, mp0 ),

where iroff = mod(ia-1, mb_a), icoff = mod(ja-1, nb_a),

iarow = indxg2p(ia, mb_a, myrow, rsrc_a, nprow),

iacol = indxg2p(ja, nb_a, mycol, csrc_a, npcol),
mp0 = numroc(m+iroff, mb_a, myrow, iarow, nprow),

1617
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

nq0 = numroc(n+icoff, nb_a, mycol, iacol, npcol),

indxg2p and numroc are ScaLAPACK tool functions; myrow, mycol, nprow,
and npcol can be determined by calling the function blacs_gridinfo.

If lwork = -1, then lwork is global input and a workspace query is

assumed; the function only calculates the minimum and optimal size
for all work arrays. Each of these values is returned in the first entry
of the corresponding work array, and no error message is issued by
pxerbla.

Output Parameters

a (local).
On exit, the elements on and below the diagonal of sub(A) contain the m by
min(m,n) lower trapezoidal matrix L (L is lower triangular if m ≤ n); the
elements above the diagonal, with the array tau, represent the orthogonal/
unitary matrix Q as a product of elementary reflectors (see Application
Notes below).

tau (local).
Array of size LOCr(ia+min(m, n)-1). This array contains the scalar
factors of the elementary reflectors. tau is tied to the distributed matrix A.

work On exit, work[0] returns the minimal and optimal lwork.

info (local) If info = 0, the execution is successful. if info < 0: If the i-th
argument is an array and the j-th entry, indexed j-1, had an illegal value,
then info = - (i*100+j), if the i-th argument is a scalar and had an illegal
value, then info = -i.

Application Notes
The matrix Q is represented as a product of elementary reflectors
Q =H(ia+k-1)*H(ia+k-2)*. . . *H(ia) for real flavors, Q =(H(ia+k-1))H*(H(ia
+k-2))H...*(H(ia))H for complex flavors,
where k = min(m,n).

Each H(i) has the form

H(i) = I - tau*v*v'
where tau is a real/complex scalar, and v is a real/complex vector with v(1: i-1) = 0 and v(i) = 1; v(i
+1: n) (for real flavors) or conjg(v(i+1: n)) (for complex flavors) is stored on exit in A(ia+i-1,ja+i:ja
+n-1), and tau in tau[ia+i-2].

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?geql2
Computes a QL factorization of a general rectangular
matrix (unblocked algorithm).

Syntax
void psgeql2 (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );

1618
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void pdgeql2 (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
void pcgeql2 (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT
*info );
void pzgeql2 (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT
*info );

Include Files
• mkl_scalapack.h

Description
The p?geql2function computes a QL factorization of a real/complex distributed m-by-n matrix sub(A) =
A(ia:ia+m-1, ja:ja+n-1)= Q *L.

Input Parameters

m (global)
The number of rows in the distributed matrix sub(A). (m≥ 0).

n (global)
The number of columns in the distributed matrix sub(A). (n≥ 0).

a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).

On entry, this array contains the local pieces of the m-by-n distributed
matrix sub(A) which is to be factored.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

work (local).
This is a workspace array of size lwork.

lwork (local or global)

The size of the array work.

lwork is local input and must be at least lwork≥mp0 + max(1, nq0),

where iroff = mod(ia-1, mb_a), icoff = mod(ja-1, nb_a),

iarow = indxg2p(ia, mb_a, myrow, rsrc_a, nprow),

iacol = indxg2p(ja, nb_a, mycol, csrc_a, npcol),
mp0 = numroc(m+iroff, mb_a, myrow, iarow, nprow),
nq0 = numroc(n+icoff, nb_a, mycol, iacol, npcol),
indxg2p and numroc are ScaLAPACK tool functions; myrow, mycol, nprow,
and npcol can be determined by calling the function blacs_gridinfo.

1619
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If lwork = -1, then lwork is global input and a workspace query is

Output Parameters

a (local).
On exit,
if m ≥ n, the lower triangle of the distributed submatrix A(ia+m-n:ia+m-1,
ja:ja+n-1) contains the n-by-n lower triangular matrix L;
if m ≤ n, the elements on and below the (n-m)-th superdiagonal contain
the m-by-n lower trapezoidal matrix L; the remaining elements, with the
array tau, represent the orthogonal/ unitary matrix Q as a product of
elementary reflectors (see Application Notes below).

tau (local).
Array of size LOCc(ja+n-1). This array contains the scalar factors of the
elementary reflectors. tau is tied to the distributed matrix A.

work On exit, work[0] returns the minimal and optimal lwork.

info (local).
If info = 0, the execution is successful. if info < 0: If the i-th argument
is an array and the j-th entry, indexed j-1, had an illegal value, then info
= - (i*100+j), if the i-th argument is a scalar and had an illegal value,
then info = -i.

Application Notes
The matrix Q is represented as a product of elementary reflectors
Q = H(ja+k-1)*...*H(ja+1)*H(ja), where k = min(m,n).
Each H(i) has the form

H(i) = I- tau vv'

where tau is a real/complex scalar, and v is a real/complex vector with v(m-k+i+1: m) = 0 and v(m-k+i) =
1; v(1: m-k+i-1) is stored on exit in A(ia:ia+m-k+i-2, ja+n-k+i-1), and tau in tau[ja+n-k+i-2].

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?geqr2
Computes a QR factorization of a general rectangular
matrix (unblocked algorithm).

Syntax
void psgeqr2 (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdgeqr2 (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );

1620
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void pcgeqr2 (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT
*info );
void pzgeqr2 (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT
*info );

Include Files
• mkl_scalapack.h

Description
The p?geqr2function computes a QR factorization of a real/complex distributed m-by-n matrix sub(A) =
A(ia:ia+m-1, ja:ja+n-1)= Q*R.

Input Parameters

m (global)
The number of rows in the distributed matrix sub(A). (m≥0).

n (global) The number of columns in the distributed matrix sub(A). (n≥0).

a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).

On entry, this array contains the local pieces of the m-by-n distributed
matrix sub(A) which is to be factored.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

work (local).
This is a workspace array of size lwork.

lwork (local or global)

The size of the array work.
lwork is local input and must be at least lwork≥mp0+max(1, nq0),

where iroff = mod(ia-1, mb_a), icoff = mod(ja-1, nb_a),

iarow = indxg2p(ia, mb_a, myrow, rsrc_a, nprow),

iacol = indxg2p(ja, nb_a, mycol, csrc_a, npcol),
mp0 = numroc(m+iroff, mb_a, myrow, iarow, nprow),
nq0 = numroc(n+icoff, nb_a, mycol, iacol, npcol).
indxg2p and numroc are ScaLAPACK tool functions; myrow, mycol, nprow,
and npcol can be determined by calling the function blacs_gridinfo.

1621
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If lwork = -1, then lwork is global input and a workspace query is

Output Parameters

a (local).
On exit, the elements on and above the diagonal of sub(A) contain the
min(m,n) by n upper trapezoidal matrix R (R is upper triangular if m≥n); the
elements below the diagonal, with the array tau, represent the orthogonal/
unitary matrix Q as a product of elementary reflectors (see Application
Notes below).

tau (local).
Array of size LOCc(ja+min(m,n)-1). This array contains the scalar factors of
the elementary reflectors. tau is tied to the distributed matrix A.

work On exit, work[0] returns the minimal and optimal lwork.

info (local)
If info = 0, the execution is successful. if info < 0:

If the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value, then info = - (i*100+j),

if the i-th argument is a scalar and had an illegal value, then info = -i.

Application Notes
The matrix Q is represented as a product of elementary reflectors
Q = H(ja)*H(ja+1)*. . .* H(ja+k-1), where k = min(m,n).
Each H(i) has the form

H(j)= I - tau*v*v',
where tau is a real/complex scalar, and v is a real/complex vector with v(1: i-1) = 0 and v(i) = 1; v(i+1: m)
is stored on exit in A(ia+i:ia+m-1, ja+i-1), and tau in tau[ja+i-2].

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?gerq2
Computes an RQ factorization of a general rectangular
matrix (unblocked algorithm).

Syntax
void psgerq2 (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdgerq2 (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
void pcgerq2 (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT
*info );

1622
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void pzgerq2 (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT
*info );

Include Files
• mkl_scalapack.h

Description
The p?gerq2function computes an RQ factorization of a real/complex distributed m-by-n matrix sub(A) =
A(ia:ia+m-1, ja:ja+n-1) = R*Q.

Input Parameters

m (global) The number of rows in the distributed matrix sub(A). (m≥0).

n (global) The number of columns in the distributed matrix sub(A). (n≥0).

a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).

On entry, this array contains the local pieces of the m-by-n distributed
matrix sub(A) which is to be factored.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

work (local).
This is a workspace array of size lwork.

lwork (local or global)

The size of the array work.

lwork is local input and must be at least lwork≥nq0 + max(1, mp0),

where
iroff = mod(ia-1, mb_a), icoff = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, myrow, rsrc_a, nprow),
iacol = indxg2p(ja, nb_a, mycol, csrc_a, npcol), mp0 =
numroc( m+iroff, mb_a, myrow, iarow, nprow),
nq0 = numroc(n+icoff, nb_a, mycol, iacol, npcol),
indxg2p and numroc are ScaLAPACK tool functions; myrow, mycol, nprow,
and npcol can be determined by calling the function blacs_gridinfo.

If lwork = -1, then lwork is global input and a workspace query is

Output Parameters

a (local).

1623
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

On exit,
if m ≤ n, the upper triangle of A(ia+m-n:ia+m-1, ja:ja+n-1) contains the
m-by-m upper triangular matrix R;
if m ≥ n, the elements on and above the (m-n)-th subdiagonal contain the
m-by-n upper trapezoidal matrix R; the remaining elements, with the array
tau, represent the orthogonal/ unitary matrix Q as a product of elementary
reflectors (see Application Notes below).

tau (local).
Array of size LOCr(ia+m -1). This array contains the scalar factors of the
elementary reflectors. tau is tied to the distributed matrix A.

work On exit, work[0] returns the minimal and optimal lwork.

info (local)
If info = 0, the execution is successful.

if info < 0: If the i-th argument is an array and the j-th entry, indexed j-1,
had an illegal value, then info = - (i*100+j), if the i-th argument is a
scalar and had an illegal value, then info = -i.

Application Notes
The matrix Q is represented as a product of elementary reflectors
Q = H(ia)*H(ia+1)*...*H(ia+k-1) for real flavors,
Q = (H(ia))H*(H(ia+1))H...*(H(ia+k-1))H for complex flavors,
where k = min(m, n).

Each H(i) has the form

H(i) = I - tau*v*v',
where tau is a real/complex scalar, and v is a real/complex vector with v(n-k+i+1:n) = 0 and v(n-k+i) =
1; v(1:n-k+i-1) for real flavors or conjg(v(1:n-k+i-1)) for complex flavors is stored on exit in A(ia+m-
k+i-1, ja:ja+n-k+i-2), and tau in tau[ia+m-k+i-2].

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?getf2
Computes an LU factorization of a general matrix,
using partial pivoting with row interchanges (local
blocked algorithm).

Syntax
void psgetf2 (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *ipiv , MKL_INT *info );
void pdgetf2 (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *ipiv , MKL_INT *info );
void pcgetf2 (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *ipiv , MKL_INT *info );

1624
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void pzgetf2 (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *ipiv , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?getf2function computes an LU factorization of a general m-by-n distributed matrix sub(A) = A(ia:ia
+m-1, ja:ja+n-1) using partial pivoting with row interchanges.
The factorization has the form sub(A) = P * L* U, where P is a permutation matrix, L is lower triangular
with unit diagonal elements (lower trapezoidal if m>n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Parallel Level 2 BLAS version of the algorithm.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

m (global)
The number of rows in the distributed matrix sub(A). (m≥0).

n (global) The number of columns in the distributed matrix sub(A). (nb_a -

mod(ja-1, nb_a)≥n≥0).

a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).

On entry, this array contains the local pieces of the m-by-n distributed
matrix sub(A).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

Output Parameters

ipiv (local)
Array of size(LOCr(m_a) + mb_a). This array contains the pivoting
information. ipiv[i] -> The global row that local row (i +1) was swapped
with, i = 0, 1, ... , LOCr(m_a) + mb_a - 1. This array is tied to the
distributed matrix A.

info (local).
If info = 0: successful exit.

If info < 0:

1625
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

• if the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value, then info = -(i*100+j),
• if the i-th argument is a scalar and had an illegal value, then info = -
i.

If info > 0: If info = k, the matrix element U(ia+k-1, ja+k-1) is

exactly zero. The factorization has been completed, but the factor U is
exactly singular, and division by zero will occur if it is used to solve a
system of equations.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?labrd
Reduces the first nb rows and columns of a general
rectangular matrix A to real bidiagonal form by an
orthogonal/unitary transformation, and returns
auxiliary matrices that are needed to apply the
transformation to the unreduced part of A.

Syntax
void pslabrd (MKL_INT *m , MKL_INT *n , MKL_INT *nb , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *d , float *e , float *tauq , float *taup , float *x ,
MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , float *y , MKL_INT *iy , MKL_INT *jy ,
MKL_INT *descy , float *work );
void pdlabrd (MKL_INT *m , MKL_INT *n , MKL_INT *nb , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *d , double *e , double *tauq , double *taup , double *x ,
MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , double *y , MKL_INT *iy , MKL_INT *jy ,
MKL_INT *descy , double *work );
void pclabrd (MKL_INT *m , MKL_INT *n , MKL_INT *nb , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *d , float *e , MKL_Complex8 *tauq , MKL_Complex8
*taup , MKL_Complex8 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , MKL_Complex8 *y ,
MKL_INT *iy , MKL_INT *jy , MKL_INT *descy , MKL_Complex8 *work );
void pzlabrd (MKL_INT *m , MKL_INT *n , MKL_INT *nb , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *d , double *e , MKL_Complex16 *tauq ,
MKL_Complex16 *taup , MKL_Complex16 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx ,
MKL_Complex16 *y , MKL_INT *iy , MKL_INT *jy , MKL_INT *descy , MKL_Complex16 *work );

Include Files
• mkl_scalapack.h

Description
The p?labrdfunction reduces the first nb rows and columns of a real/complex general m-by-n distributed
matrix sub(A) = A(ia:ia+m-1, ja:ja+n-1) to upper or lower bidiagonal form by an orthogonal/unitary
transformation Q'* A * P, and returns the matrices X and Y necessary to apply the transformation to the
unreduced part of sub(A).
If m ≥n, sub(A) is reduced to upper bidiagonal form; if m < n, sub(A) is reduced to lower bidiagonal form.

This is an auxiliary function called by p?gebrd.

1626
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

m (global) The number of rows in the distributed matrix sub(A). (m≥ 0).

n (global) The number of columns in the distributed matrix sub(A). (n ≥ 0).

nb (global)
The number of leading rows and columns of sub(A) to be reduced.

a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).

On entry, this array contains the local pieces of the general distributed
matrix sub(A).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

ix, jx (global) The row and column indices in the global matrix X indicating the
first row and the first column of the matrix sub(X), respectively.

descx (global and local) array of size dlen_. The array descriptor for the
distributed matrix X.

iy, jy (global) The row and column indices in the global matrix Y indicating the
first row and the first column of the matrix sub(Y), respectively.

descy (global and local) array of size dlen_. The array descriptor for the
distributed matrix Y.

work (local).
Workspace array of sizelwork.

lwork ≥ nb_a + nq,

with nq = numroc(n+mod(ia-1, nb_y), nb_y, mycol, iacol,
npcol)
iacol = indxg2p (ja, nb_a, mycol, csrc_a, npcol)
indxg2p and numroc are ScaLAPACK tool functions; myrow, mycol, nprow,
and npcol can be determined by calling the function blacs_gridinfo.

Output Parameters

a (local)
On exit, the first nb rows and columns of the matrix are overwritten; the
rest of the distributed matrix sub(A) is unchanged.

If m ≥ n, elements on and below the diagonal in the first nb columns, with

the array tauq, represent the orthogonal/unitary matrix Q as a product of
elementary reflectors; and elements above the diagonal in the first nb rows,
with the array taup, represent the orthogonal/unitary matrix P as a product
of elementary reflectors.

1627
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If m < n, elements below the diagonal in the first nb columns, with the
array tauq, represent the orthogonal/unitary matrix Q as a product of
elementary reflectors, and elements on and above the diagonal in the first
nb rows, with the array taup, represent the orthogonal/unitary matrix P as
a product of elementary reflectors. See Application Notes below.

d (local).
Array of size LOCr(ia+min(m,n)-1) if m ≥ n; LOCc(ja+min(m,n)-1)
otherwise. The distributed diagonal elements of the bidiagonal distributed
matrix B:
d[i] = A(ia+i, ja+i), i= 0, 1, ..., size (d)-1
d is tied to the distributed matrix A.

e (local).
Array of size LOCr(ia+min(m,n)-1) if m ≥ n; LOCc(ja+min(m,n)-2)
otherwise. The distributed off-diagonal elements of the bidiagonal
distributed matrix B:
if m ≥ n, e[i] = A(ia+i, ja+i+1) for i = 0, 1, ..., n-2;

if m<n, e[i] = A(ia+i+1, ja+i) for i = 0, 1, ..., m-2.

e is tied to the distributed matrix A.

tauq, taup (local).

Array size LOCc(ja+min(m, n)-1) for tauq, size LOCr(ia+min(m, n)-1) for
taup. The scalar factors of the elementary reflectors which represent the
orthogonal/unitary matrix Q for tauq, P for taup. tauq and taup are tied to
the distributed matrix A. See Application Notes below.

x (local)
Pointer into the local memory to an array of size lld_x* nb. On exit, the
local pieces of the distributed m-by-nb matrix X(ix:ix+m-1, jx:jx+nb-1)
required to update the unreduced part of sub(A).

y (local).
Pointer into the local memory to an array of size lld_y* nb. On exit, the
local pieces of the distributed n-by-nb matrix Y(iy:iy+n-1, jy:jy+nb-1)
required to update the unreduced part of sub(A).

Application Notes
The matrices Q and P are represented as products of elementary reflectors:
Q = H(1)*H(2)*...*H(nb), and P = G(1)*G(2)*...*G(nb)
Each H(i) and G(i) has the form:

H(i) = I - tauqvv' , and G(i) = I - taupuu',

where tauq and taup are real/complex scalars, and v and u are real/complex vectors.
If m ≥ n, v(1: i-1 ) = 0, v(i) = 1, and v(i:m) is stored on exit in

A(ia+i-1:ia+m-1, ja+i-1); u(1:i) = 0, u(i+1 ) = 1, and u(i+1:n) is stored on exit in A(ia+i-1,

ja+i:ja+n-1); tauq is stored in tauq[ja+i-2] and taup in taup[ia+i-2].

1628
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If m < n, v(1: i) = 0, v(i+1 ) = 1, and v(i+1:m) is stored on exit in

A(ia+i+1:ia+m-1, ja+i-1); u(1:i-1 ) = 0, u(i) = 1, and u(i:n) is stored on exit in A(ia+i-1, ja

+i:ja+n-1); tauq is stored in tauq[ja+i-2] and taup in taup[ia+i-2]. The elements of the vectors v and
u together form the m-by-nb matrix V and the nb-by-n matrix U' which are necessary, with X and Y, to apply
the transformation to the unreduced part of the matrix, using a block update of the form: sub(A):= sub(A)
- V*Y' - X*U'. The contents of sub(A) on exit are illustrated by the following examples with nb = 2:

where a denotes an element of the original matrix which is unchanged, vi denotes an element of the vector
defining H(i), and ui an element of the vector defining G(i).

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?lacon
Estimates the 1-norm of a square matrix, using the
reverse communication for evaluating matrix-vector
products.

Syntax
void pslacon (MKL_INT *n , float *v , MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , float
*x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , MKL_INT *isgn , float *est , MKL_INT
*kase );
void pdlacon (MKL_INT *n , double *v , MKL_INT *iv , MKL_INT *jv , MKL_INT *descv ,
double *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , MKL_INT *isgn , double *est ,
MKL_INT *kase );
void pclacon (MKL_INT *n , MKL_Complex8 *v , MKL_INT *iv , MKL_INT *jv , MKL_INT
*descv , MKL_Complex8 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , float *est ,
MKL_INT *kase );
void pzlacon (MKL_INT *n , MKL_Complex16 *v , MKL_INT *iv , MKL_INT *jv , MKL_INT
*descv , MKL_Complex16 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , double *est ,
MKL_INT *kase );

Include Files
• mkl_scalapack.h

1629
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Description
The p?laconfunction estimates the 1-norm of a square, real/unitary distributed matrix A. Reverse
communication is used for evaluating matrix-vector products. x and v are aligned with the distributed matrix
A, this information is implicitly contained within iv, ix, descv, and descx.

Input Parameters

n (global) The length of the distributed vectors v and x. n ≥ 0.

v (local).
Pointer into the local memory to an array of size LOCr(n+mod(iv-1, mb_v)).
On the final return, v = a*w, where est = norm(v)/norm(w) (w is not
returned).

iv, jv (global) The row and column indices in the global matrix V indicating the
first row and the first column of the submatrix V, respectively.

descv (global and local) array of size dlen_. The array descriptor for the
distributed matrix V.

x (local).
Pointer into the local memory to an array of size LOCr(n+mod(ix-1, mb_x)).

ix, jx (global) The row and column indices in the global matrix X indicating the
first row and the first column of the submatrix X, respectively.

descx (global and local) array of size dlen_. The array descriptor for the
distributed matrix X.

isgn (local).
Array of size LOCr(n+mod(ix-1, mb_x)). isgn is aligned with x and v.

kase (local).
On the initial call to p?lacon, kase should be 0.

Output Parameters

x (local).
On an intermediate return, X should be overwritten by A*X, if kase=1, A'
*X, if kase=2,
p?lacon must be re-called with all the other parameters unchanged.

est (global).

kase (local)
On an intermediate return, kase is 1 or 2, indicating whether X should be
overwritten by A*X, or A'*X. On the final return from p?lacon, kase is
again 0.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

1630
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

p?laconsb
Looks for two consecutive small subdiagonal elements.

Syntax
void pslaconsb (const float *a, const MKL_INT *desca, const MKL_INT *i, const MKL_INT
*l, MKL_INT *m, const float *h44, const float *h33, const float *h43h34, float *buf,
const MKL_INT *lwork );
void pdlaconsb (const double *a, const MKL_INT *desca, const MKL_INT *i, const MKL_INT
*l, MKL_INT *m, const double *h44, const double *h33, const double *h43h34, double *buf,
const MKL_INT *lwork );
void pclaconsb (const MKL_Complex8 *a , const MKL_INT *desca , const MKL_INT *i , const
MKL_INT *l , MKL_INT *m , const MKL_Complex8 *h44 , const MKL_Complex8 *h33 , const
MKL_Complex8 *h43h34 , MKL_Complex8 *buf , const MKL_INT *lwork );
void pzlaconsb (const MKL_Complex16 *a , const MKL_INT *desca , const MKL_INT *i ,
const MKL_INT *l , MKL_INT *m , const MKL_Complex16 *h44 , const MKL_Complex16 *h33 ,
const MKL_Complex16 *h43h34 , MKL_Complex16 *buf , const MKL_INT *lwork );

Include Files
• mkl_scalapack.h

Description
The p?laconsbfunction looks for two consecutive small subdiagonal elements by analyzing the effect of
starting a double shift QR iteration given by h44, h33, and h43h34 to see if this process makes a subdiagonal
negligible.

Input Parameters

a (local)
Array of size lld_a*LOCc(n_a). On entry, the Hessenberg matrix whose
tridiagonal part is being scanned. Unchanged on exit.

desca (global and local)

Array of size dlen_. The array descriptor for the distributed matrix A.

i (global)
The global location of the bottom of the unreduced submatrix of A.
Unchanged on exit.

l (global)
The global location of the top of the unreduced submatrix of A. Unchanged
on exit.

h44, h33, h43h34 (global).

These three values are for the double shift QR iteration.

lwork (local)
This must be at least 7*ceil(ceil( (i-l)/mb_a )/lcm(nprow,
npcol)). Here lcm is the least common multiple and nprow*npcol is the
logical grid size.

1631
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

m (global). On exit, this yields the starting location of the QR double shift.
This will satisfy:

l ≤ m ≤ i-2.

buf (local).
Array of size lwork.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?lacp2
Copies all or part of a distributed matrix to another
distributed matrix.

Syntax
void pslacp2 (char *uplo , MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb );
void pdlacp2 (char *uplo , MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb );
void pclacp2 (char *uplo , MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb );
void pzlacp2 (char *uplo , MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb );

Include Files
• mkl_scalapack.h

Description
The p?lacp2function copies all or part of a distributed matrix A to another distributed matrix B. No
communication is performed, p?lacp2 performs a local copy sub(A):= sub(B), where sub(A) denotes
A(ia:ia+m-1, a:ja+n-1) and sub(B) denotes B(ib:ib+m-1, jb:jb+n-1).
p?lacp2 requires that only dimension of the matrix operands is distributed.

Input Parameters

uplo (global) Specifies the part of the distributed matrix sub(A) to be copied:
= 'U': Upper triangular part is copied; the strictly lower triangular part of
sub(A) is not referenced;
= 'L': Lower triangular part is copied; the strictly upper triangular part of
sub(A) is not referenced.
Otherwise: all of the matrix sub(A) is copied.

m (global)
The number of rows in the distributed matrix sub(A). (m ≥ 0).

1632
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
n (global)
The number of columns in the distributed matrix sub(A). (n ≥ 0).

a (local).
Pointer into the local memory to an array of sizelld_a * LOCc(ja+n-1).

On entry, this array contains the local pieces of the m-by-n distributed
matrix sub(A).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of sub(B), respectively.

descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.

Output Parameters

b (local).
Pointer into the local memory to an array of size lld_b * LOCc(jb+n-1).
This array contains on exit the local pieces of the distributed matrix sub( B )
set as follows:
if uplo = 'U', B(ib+i-1, jb+j-1) = A(ia+i-1, ja+j-1), 1≤i≤j, 1≤j≤n;

if uplo = 'L', B(ib+i-1, jb+j-1) = A(ia+i-1, ja+j-1), j≤i≤m, 1≤j≤n;

otherwise, B(ib+i-1, jb+j-1) = A(ia+i-1, ja+j-1), 1≤i≤m, 1≤j≤n.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?lacp3
Copies from a global parallel array into a local
replicated array or vice versa.

Syntax
void pslacp3 (const MKL_INT *m, const MKL_INT *i, float *a, const MKL_INT *desca, float
*b, const MKL_INT *ldb, const MKL_INT *ii, const MKL_INT *jj, const MKL_INT *rev );
void pdlacp3 (const MKL_INT *m, const MKL_INT *i, double *a, const MKL_INT *desca,
double *b, const MKL_INT *ldb, const MKL_INT *ii, const MKL_INT *jj, const MKL_INT
*rev );
void pclacp3 (const MKL_INT *m, const MKL_INT *i, MKL_Complex8 *a, const MKL_INT
*desca, MKL_Complex8 *b, const MKL_INT *ldb, const MKL_INT *ii, const MKL_INT *jj,
const MKL_INT *rev);
void pzlacp3 (const MKL_INT *m, const MKL_INT *i, MKL_Complex16 *a, const MKL_INT
*desca, MKL_Complex16 *b, const MKL_INT *ldb, const MKL_INT *ii, const MKL_INT *jj,
const MKL_INT *rev);

1633
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl_scalapack.h

Description
This is an auxiliary function that copies from a global parallel array into a local replicated array or vise versa.
Note that the entire submatrix that is copied gets placed on one node or more. The receiving node can be
specified precisely, or all nodes can receive, or just one row or column of nodes.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

m (global)
m is the order of the square submatrix that is copied.
m≥ 0. Unchanged on exit.

i (global) The matrix element A(i, i) is the global location that the copying
starts from. Unchanged on exit.

a (local)
Array of size lld_a*LOCc(n_a). On entry, the parallel matrix to be copied
into or from.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

b (local)
Array of size ldb*LOCc(m). If rev = 0, this is the global portion of the
matrix A(i:i+m-1, i:i+m-1). If rev = 1, this is unchanged on exit.

ldb (local)

The leading dimension of B.

ii, jj (global) By using rev 0 and 1, data can be sent out and returned again. If
rev = 0, then ii is destination row index and jj is destination column index
for the node(s) receiving the replicated matrixB. If ii ≥ 0, jj ≥ 0, then node
(ii, jj) receives the data. If ii = -1, jj ≥ 0, then all rows in column jj receive
the data. If ii ≥ 0, jj = -1, then all cols in row ii receive the data. If ii = -1, jj
= -1, then all nodes receive the data. If rev !=0, then ii is the source row
index for the node(s) sending the replicated B.

rev (global) Use rev = 0 to send global matrixA into locally replicated matrixB
(on node (ii, jj)). Use rev != 0 to send locally replicated B from node (ii, jj)
to its owner (which changes depending on its location in A) into the global
A.

1634
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

a On exit, if rev = 1, the copied data. Unchanged on exit if rev = 0.

b If rev = 1, this is unchanged on exit.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?lacpy
Copies all or part of one two-dimensional array to
another.

Syntax
void pslacpy (char *uplo , MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb );
void pdlacpy (char *uplo , MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb );
void pclacpy (char *uplo , MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb );
void pzlacpy (char *uplo , MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb );

Include Files
• mkl_scalapack.h

Description
The p?lacpyfunction copies all or part of a distributed matrix A to another distributed matrix B. No
communication is performed, p?lacpy performs a local copy sub(B):= sub(A), where sub(A) denotes
A(ia:ia+m-1,ja:ja+n-1) and sub(B) denotes B(ib:ib+m-1,jb:jb+n-1).

Input Parameters

uplo (global) Specifies the part of the distributed matrix sub(A) to be copied:
= 'U': Upper triangular part; the strictly lower triangular part of sub(A) is
not referenced;
= 'L': Lower triangular part; the strictly upper triangular part of sub(A) is
not referenced.
Otherwise: all of the matrix sub(A) is copied.

m (global)
The number of rows in the distributed matrix sub(A). (m≥0).

n (global)
The number of columns in the distributed matrix sub(A). (n≥0).

a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).

1635
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

On entry, this array contains the local pieces of the distributed matrix
sub(A).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of sub(B) respectively.

descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

Output Parameters

b (local).
Pointer into the local memory to an array of size lld_b * LOCc(jb+n-1).
This array contains on exit the local pieces of the distributed matrix sub(B)
set as follows:
if uplo = 'U', B(ib+i-1, jb+j-1) = A(ia+i-1, ja+j-1), 1≤i≤j, 1≤j≤n;

if uplo = 'L', B(ib+i-1, jb+j-1) = A(ia+i-1, ja+j-1), j≤i≤m, 1≤j≤n;

otherwise, B(ib+i-1, jb+j-1) = A(ia+i-1, ja+j-1), 1≤i≤m, 1≤j≤n.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?laevswp
Moves the eigenvectors from where they are
computed to ScaLAPACK standard block cyclic array.

Syntax
void pslaevswp (MKL_INT *n , float *zin , MKL_INT *ldzi , float *z , MKL_INT *iz ,
MKL_INT *jz , MKL_INT *descz , MKL_INT *nvs , MKL_INT *key , float *work , MKL_INT
*lwork );
void pdlaevswp (MKL_INT *n , double *zin , MKL_INT *ldzi , double *z , MKL_INT *iz ,
MKL_INT *jz , MKL_INT *descz , MKL_INT *nvs , MKL_INT *key , double *work , MKL_INT
*lwork );
void pclaevswp (MKL_INT *n , float *zin , MKL_INT *ldzi , MKL_Complex8 *z , MKL_INT
*iz , MKL_INT *jz , MKL_INT *descz , MKL_INT *nvs , MKL_INT *key , float *rwork ,
MKL_INT *lrwork );
void pzlaevswp (MKL_INT *n , double *zin , MKL_INT *ldzi , MKL_Complex16 *z , MKL_INT
*iz , MKL_INT *jz , MKL_INT *descz , MKL_INT *nvs , MKL_INT *key , double *rwork ,
MKL_INT *lrwork );

Include Files
• mkl_scalapack.h

1636
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
The p?laevswpfunction moves the eigenvectors (potentially unsorted) from where they are computed, to a
ScaLAPACK standard block cyclic array, sorted so that the corresponding eigenvalues are sorted.

Input Parameters
np = the number of rows local to a given process.
nq = the number of columns local to a given process.

n (global)
The order of the matrix A. n ≥ 0.

zin (local).
Array of size ldzi * nvs[iam+1]. The eigenvectors on input. iam is a process
rank from [0, nprocs) interval. Each eigenvector resides entirely in one
process. Each process holds a contiguous set of nvs[iam+1] eigenvectors.
The global number of the first eigenvector that the process holds is: ((sum
for i=[0, iam] of nvs[i])+1).

ldzi (local)
The leading dimension of the zin array.

iz, jz (global) The row and column indices in the global matrix Z indicating the
first row and the first column of the submatrix Z, respectively.

descz (global and local)

Array of size dlen_. The array descriptor for the distributed matrix Z.

nvs (global)
Array of size nprocs+1
nvs[i] = number of eigenvectors held by processes [0, i)
nvs[0] = number of eigenvectors held by processes [0, 0) = 0
nvs[nprocs]= number of eigenvectors held by [0, nprocs)= total number of
eigenvectors.

key (global)
Array of size n. Indicates the actual index (after sorting) for each of the
eigenvectors.

rwork (local).
Array of size lrwork.

lrwork (local)
Size of work.

Output Parameters

z (local).
Array of global size n* n and of local size lld_z * nq. The eigenvectors on
output. The eigenvectors are distributed in a block cyclic manner in both
dimensions, with a block size of nb.

1637
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?lahrd
Reduces the first nb columns of a general rectangular
matrix A so that elements below the k-th subdiagonal
are zero, by an orthogonal/unitary transformation,
and returns auxiliary matrices that are needed to
apply the transformation to the unreduced part of A.

Syntax
void pslahrd (MKL_INT *n , MKL_INT *k , MKL_INT *nb , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *tau , float *t , float *y , MKL_INT *iy , MKL_INT *jy ,
MKL_INT *descy , float *work );
void pdlahrd (MKL_INT *n , MKL_INT *k , MKL_INT *nb , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *tau , double *t , double *y , MKL_INT *iy , MKL_INT *jy ,
MKL_INT *descy , double *work );
void pclahrd (MKL_INT *n , MKL_INT *k , MKL_INT *nb , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *t , MKL_Complex8 *y ,
MKL_INT *iy , MKL_INT *jy , MKL_INT *descy , MKL_Complex8 *work );
void pzlahrd (MKL_INT *n , MKL_INT *k , MKL_INT *nb , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *t , MKL_Complex16
*y , MKL_INT *iy , MKL_INT *jy , MKL_INT *descy , MKL_Complex16 *work );

Include Files
• mkl_scalapack.h

Description
The p?lahrdfunction reduces the first nb columns of a real general n-by-(n-k+1) distributed matrix A(ia:ia
+n-1 , ja:ja+n-k) so that elements below the k-th subdiagonal are zero. The reduction is performed by
an orthogonal/unitary similarity transformation Q'*A*Q. The function returns the matrices V and T which
determine Q as a block reflector I-V*T*V', and also the matrix Y = A*V*T.

This is an auxiliary function called by p?gehrd. In the following comments sub(A) denotes A(ia:ia+n-1,
ja:ja+n-1).

Input Parameters

n (global)
The order of the distributed matrix sub(A). n ≥ 0.

k (global)
The offset for the reduction. Elements below the k-th subdiagonal in the
first nb columns are reduced to zero.

nb (global)
The number of columns to be reduced.

a (local).

1638
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-k). On
entry, this array contains the local pieces of the n-by-(n-k+1) general
distributed matrix A(ia:ia+n-1, ja:ja+n-k).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

iy, jy (global) The row and column indices in the global matrix Y indicating the
first row and the first column of the matrix sub(Y), respectively.

descy (global and local) array of size dlen_. The array descriptor for the
distributed matrix Y.

work (local).
Array of size nb.

Output Parameters

a (local).
On exit, the elements on and above the k-th subdiagonal in the first nb
columns are overwritten with the corresponding elements of the reduced
distributed matrix; the elements below the k-th subdiagonal, with the array
tau, represent the matrix Q as a product of elementary reflectors. The other
columns of the matrix A(ia:ia+n-1, ja:ja+n-k) are unchanged. (See
Application Notes below.)

tau (local)
Array of size LOCc(ja+n-2). The scalar factors of the elementary reflectors
(see Application Notes below). tau is tied to the distributed matrix A.

t (local)
Array of size nb_a* nb_a. The upper triangular matrix T.

y (local).
Pointer into the local memory to an array of size lld_y* nb_a. On exit, this
array contains the local pieces of the n-by-nb distributed matrix Y. lld_y ≥
LOCr(ia+n-1).

Application Notes
The matrix Q is represented as a product of nb elementary reflectors
Q = H(1)*H(2)*...*H(nb).
Each H(i) has the form

H(i) = i-tau*v*v',
where tau is a real/complex scalar, and v is a real/complex vector with v(1: i+k-1)= 0, v(i+k)= 1; v(i+k
+1:n) is stored on exit in A(ia+i+k:ia+n-1, ja+i-1), and tau in tau[ja+i-2].

1639
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The elements of the vectors v together form the (n-k+1)-by-nb matrix V which is needed, with T and Y, to
apply the transformation to the unreduced part of the matrix, using an update of the form: A(ia:ia+n-1,
ja:ja+n-k) := (I-V*T*V')*(A(ia:ia+n-1, ja:ja+n-k)-Y*V'). The contents of A(ia:ia+n-1, ja:ja+n-k) on exit
are illustrated by the following example with n = 7, k = 3, and nb = 2:

where a denotes an element of the original matrix A(ia:ia+n-1, ja:ja+n-k), h denotes a modified element
of the upper Hessenberg matrix H, and vi denotes an element of the vector defining H(i).

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?laiect
Exploits IEEE arithmetic to accelerate the
computations of eigenvalues.

Syntax
void pslaiect (float *sigma , MKL_INT *n , float *d , MKL_INT *count );
void pdlaiectb (float *sigma , MKL_INT *n , float *d , MKL_INT *count );
void pdlaiectl (float *sigma , MKL_INT *n , float *d , MKL_INT *count );

Include Files
• mkl_scalapack.h

Description
The p?laiectfunction computes the number of negative eigenvalues of (A- σI). This implementation of the
Sturm Sequence loop exploits IEEE arithmetic and has no conditionals in the innermost loop. The signbit for
real function pslaiect is assumed to be bit 32. Double-precision functions pdlaiectb and pdlaiectl differ
in the order of the double precision word storage and, consequently, in the signbit location. For pdlaiectb,
the double precision word is stored in the big-endian word order and the signbit is assumed to be bit 32. For
pdlaiectl, the double precision word is stored in the little-endian word order and the signbit is assumed to
be bit 64.
This is a ScaLAPACK internal function and arguments are not checked for unreasonable values.

Input Parameters

sigma The shift. p?laiect finds the number of eigenvalues less than equal to
sigma.

1640
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
n The order of the tridiagonal matrix T. n≥ 1.

d Array of size 2n-1.

On entry, this array contains the diagonals and the squares of the off-
diagonal elements of the tridiagonal matrix T. These elements are assumed
to be interleaved in memory for better cache performance. The diagonal
entries of T are in the entries d[0], d[2],..., d[2n-2], while the
squares of the off-diagonal entries are d[1], d[3], ..., d[2n-3]. To
avoid overflow, the matrix must be scaled so that its largest entry is no
greater than overflow(1/2) * underflow(1/4) in absolute value, and for
greatest accuracy, it should not be much smaller than that.

Output Parameters

n The count of the number of eigenvalues of T less than or equal to sigma.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?lamve
Copies all or part of one two-dimensional distributed
array to another.

Syntax
void pslamve(char* uplo, MKL_INT* m, MKL_INT* n, float* a, MKL_INT* ia, MKL_INT* ja,
MKL_INT* desca, float* b, MKL_INT* ib, MKL_INT* jb, MKL_INT* descb, float* dwork);
void pdlamve(char* uplo, MKL_INT* m, MKL_INT* n, double* a, MKL_INT* ia, MKL_INT* ja,
MKL_INT* desca, double* b, MKL_INT* ib, MKL_INT* jb, MKL_INT* descb, double* dwork);

Include Files
• mkl_scalapack.h

Description
p?lamve copies all or part of a distributed matrix A to another distributed matrix B. There is no alignment
assumptions at all except that A and B are of the same size.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

uplo (global )
Specifies the part of the distributed matrix sub( A ) to be copied:
= 'U': Upper triangular part is copied; the strictly lower triangular part of
sub( A ) is not referenced;
= 'L': Lower triangular part is copied; the strictly upper triangular part of
sub( A ) is not referenced;

1641
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Otherwise: All of the matrix sub( A ) is copied.

m (global )
The number of rows to be operated on, which is the number of rows of the
distributed matrix sub( A ). m≥ 0.

n (global )
The number of columns to be operated on, which is the number of columns
of the distributed matrix sub( A ). n≥ 0.

a (local ) pointer into the local memory to an array of size lld_a * LOCc(ja
+n-1) . This array contains the local pieces of the distributed matrix
sub( A ) to be copied from.

ia (global )
The row index in the global matrix A indicating the first row of sub( A ).

ja (global )
The column index in the global matrix A indicating the first column of
sub( A ).

desca (global and local) array of size dlen_.

The array descriptor for the distributed matrix A.

ib (global )
The row index in the global matrix B indicating the first row of sub( B ).

jb (global )
The column index in the global matrix B indicating the first column of
sub( B ).

descb (global and local) array of size dlen_.

The array descriptor for the distributed matrix B.

dwork (local workspace) array

If uplo = 'U' or uplo = 'L' and number of processors > 1, the length of
dwork is at least as large as the length of b.
Otherwise, dwork is not referenced.

OUTPUT Parameters

b (local ) pointer into the local memory to an array of size lld_b * LOCc(jb
+n-1) . This array contains on exit the local pieces of the distributed matrix
sub( B ).

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?lange
Returns the value of the 1-norm, Frobenius norm,
infinity-norm, or the largest absolute value of any
element, of a general rectangular matrix.

1642
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
float pslange (char *norm , MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *work );
double pdlange (char *norm , MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *work );
float pclange (char *norm , MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *work );
double pzlange (char *norm , MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *work );

Include Files
• mkl_scalapack.h

Description
The p?langefunction returns the value of the 1-norm, or the Frobenius norm, or the infinity norm, or the
element of largest absolute value of a distributed matrix sub(A) = A(ia:ia+m-1, ja:ja+n-1).

Input Parameters

norm (global) Specifies what value is returned by the function:

= 'M' or 'm': val = max(abs(Aij)), largest absolute value of the matrix
A, it s not a matrix norm.
= '1' or 'O' or 'o': val = norm1(A), 1-norm of the matrix A
(maximum column sum),
= 'I' or 'i': val = normI(A), infinity norm of the matrix A (maximum
row sum),
= 'F', 'f', 'E' or 'e': val = normF(A), Frobenius norm of the matrix
A (square root of sum of squares).

m (global)
The number of rows in the distributed matrix sub(A). When m = 0,
p?lange is set to zero. m ≥ 0.

n (global)
The number of columns in the distributed matrix sub(A). When n = 0,
p?lange is set to zero. n ≥ 0.

a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1)
containing the local pieces of the distributed matrix sub(A).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

work (local).
Array size lwork.

1643
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lwork ≥ 0 if norm = 'M' or 'm' (not referenced),

nq0 if norm = '1', 'O' or 'o',

mp0 if norm = 'I' or 'i',

0 if norm = 'F', 'f', 'E' or 'e' (not referenced),

where
iroffa = mod(ia-1, mb_a), icoffa = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, myrow, rsrc_a, nprow),
iacol = indxg2p(ja, nb_a, mycol, csrc_a, npcol),
mp0 = numroc(m+iroffa, mb_a, myrow, iarow, nprow),
nq0 = numroc(n+icoffa, nb_a, mycol, iacol, npcol),
indxg2p and numroc are ScaLAPACK tool functions; myrow, mycol, nprow,
and npcol can be determined by calling the function blacs_gridinfo.

Output Parameters

val The value returned by the function.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?lanhs
Returns the value of the 1-norm, Frobenius norm,
infinity-norm, or the largest absolute value of any
element, of an upper Hessenberg matrix.

Syntax
float pslanhs (char *norm , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *work );
double pdlanhs (char *norm , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *work );
float pclanhs (char *norm , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , float *work );
double pzlanhs (char *norm , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *work );

Include Files
• mkl_scalapack.h

Description
The p?lanhsfunction returns the value of the 1-norm, or the Frobenius norm, or the infinity norm, or the
element of largest absolute value of an upper Hessenberg distributed matrix sub(A) = A(ia:ia+m-1,
ja:ja+n-1).

Input Parameters

norm Specifies the value to be returned by the function:

1644
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
= 'M' or 'm': val = max(abs(Aij)), largest absolute value of the matrix
A.
= '1' or 'O' or 'o': val = norm1(A), 1-norm of the matrix A
(maximum column sum),
= 'I' or 'i': val = normI(A), infinity norm of the matrix A (maximum
row sum),
= 'F', 'f', 'E' or 'e': val = normF(A), Frobenius norm of the matrix
A (square root of sum of squares).

n (global)
The number of columns in the distributed matrix sub(A). When n =
0, p?lanhs is set to zero. n≥ 0.
a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1)
containing the local pieces of the distributed matrix sub(A).

ia, ja (global)
The row and column indices in the global matrix A indicating the first row
and the first column of the matrix sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

work (local).
Array of size lwork.
lwork ≥ 0 if norm = 'M' or 'm' (not referenced),

nq0 if norm = '1', 'O' or 'o',

mp0 if norm = 'I' or 'i',

0 if norm = 'F', 'f', 'E' or 'e' (not referenced),

where
iroffa = mod( ia-1, mb_a ), icoffa = mod( ja-1, nb_a ),

iarow = indxg2p( ia, mb_a, myrow, rsrc_a, nprow ),

iacol = indxg2p( ja, nb_a, mycol, csrc_a, npcol ),

mp0 = numroc( m+iroffa, mb_a, myrow, iarow, nprow ),

nq0 = numroc( n+icoffa, nb_a, mycol, iacol, npcol ),

indxg2p and numroc are ScaLAPACK tool functions; myrow, imycol, nprow,
and npcol can be determined by calling the function blacs_gridinfo.

Output Parameters

val The value returned by the function.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

1645
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

p?lansy, p?lanhe
Returns the value of the 1-norm, Frobenius norm,
infinity-norm, or the largest absolute value of any
element, of a real symmetric or a complex Hermitian
matrix.

Syntax
float pslansy (char *norm , char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *work );
double pdlansy (char *norm , char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *work );
float pclansy (char *norm , char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *work );
double pzlansy (char *norm , char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *work );
float pclanhe (char *norm , char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *work );
double pzlanhe (char *norm , char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *work );

Include Files
• mkl_scalapack.h

Description
The p?lansy and p?lanhefunctions return the value of the 1-norm, or the Frobenius norm, or the infinity
norm, or the element of largest absolute value of a distributed matrix sub(A) = A(ia:ia+m-1, ja:ja
+n-1).

Input Parameters

norm (global) Specifies what value is returned by the function:

uplo (global) Specifies whether the upper or lower triangular part of the
symmetric matrix sub(A) is to be referenced.
= 'U': Upper triangular part of sub(A) is referenced,

= 'L': Lower triangular part of sub(A) is referenced.

n (global)

1646
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The number of columns in the distributed matrix sub(A). When n = 0,
p?lansy is set to zero. n ≥ 0.

a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1)
containing the local pieces of the distributed matrix sub(A).
If uplo = 'U', the leading n-by-n upper triangular part of sub(A) contains
the upper triangular matrix whose norm is to be computed, and the strictly
lower triangular part of this matrix is not referenced. If uplo = 'L', the
leading n-by-n lower triangular part of sub(A) contains the lower triangular
matrix whose norm is to be computed, and the strictly upper triangular part
of sub(A) is not referenced.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

work (local).
Array of size lwork.
lwork ≥ 0 if norm = 'M' or 'm' (not referenced),

2*nq0+mp0+ldw if norm = '1', 'O' or 'o', 'I' or 'i',

where ldw is given by:
if( nprow≠npcol ) then
ldw = mb_a*iceil(iceil(np0,mb_a),(lcm/nprow))
else
ldw = 0
end if
0 if norm = 'F', 'f', 'E' or 'e' (not referenced),

where lcm is the least common multiple of nprow and npcol, lcm =
ilcm( nprow, npcol ) and iceil(x,y) is a ScaLAPACK function that
returns ceiling (x/y).

iroffa = mod(ia-1, mb_a ), icoffa = mod( ja-1, nb_a),

iarow = indxg2p(ia, mb_a, myrow, rsrc_a, nprow),

iacol = indxg2p(ja, nb_a, mycol, csrc_a, npcol),

mp0 = numroc(m+iroffa, mb_a, myrow, iarow, nprow),

nq0 = numroc(n+icoffa, nb_a, mycol, iacol, npcol),

ilcm, iceil, indxg2p, and numroc are ScaLAPACK tool functions; myrow,
mycol, nprow, and npcol can be determined by calling the function
blacs_gridinfo.

Output Parameters

val The value returned by the function.

1647
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?lantr
Returns the value of the 1-norm, Frobenius norm,
infinity-norm, or the largest absolute value of any
element, of a triangular matrix.

Syntax
float pslantr (char *norm , char *uplo , char *diag , MKL_INT *m , MKL_INT *n , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *work );
double pdlantr (char *norm , char *uplo , char *diag , MKL_INT *m , MKL_INT *n , double
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *work );
float pclantr (char *norm , char *uplo , char *diag , MKL_INT *m , MKL_INT *n ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *work );
double pzlantr (char *norm , char *uplo , char *diag , MKL_INT *m , MKL_INT *n ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *work );

Include Files
• mkl_scalapack.h

Description
The p?lantrfunction returns the value of the 1-norm, or the Frobenius norm, or the infinity norm, or the
element of largest absolute value of a trapezoidal or triangular distributed matrix sub(A) = A(ia:ia+m-1,
ja:ja+n-1).

Input Parameters

norm (global) Specifies what value is returned by the function:

uplo (global)
Specifies whether the upper or lower triangular part of the symmetric
matrix sub(A) is to be referenced.
= 'U': Upper trapezoidal,

= 'L': Lower trapezoidal.

Note that sub(A) is triangular instead of trapezoidal if m = n.

diag (global)
Specifies whether the distributed matrix sub(A) has unit diagonal.

1648
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
= 'N': Non-unit diagonal.

= 'U': Unit diagonal.

m (global)
The number of rows in the distributed matrix sub(A). When m = 0,
p?lantr is set to zero. m ≥ 0.

n (global)
The number of columns in the distributed matrix sub(A). When n = 0,
p?lantr is set to zero. n ≥ 0.

a (local).
Pointer into the local memory to an array of sizelld_a * LOCc(ja+n-1)
containing the local pieces of the distributed matrix sub(A).

ia, ja (global)
The row and column indices in the global matrix A indicating the first row
and the first column of the matrix sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

work (local).
Array size lwork.
lwork ≥ 0 if norm = 'M' or 'm' (not referenced),
nq0 if norm = '1', 'O' or 'o',

mp0 if norm = 'I' or 'i',

0 if norm = 'F', 'f', 'E' or 'e' (not referenced),

iroffa = mod(ia-1, mb_a ), icoffa = mod( ja-1, nb_a),

iarow = indxg2p(ia, mb_a, myrow, rsrc_a, nprow),
iacol = indxg2p(ja, nb_a, mycol, csrc_a, npcol),
mp0 = numroc(m+iroffa, mb_a, myrow, iarow, nprow),
nq0 = numroc(n+icoffa, nb_a, mycol, iacol, npcol),
indxg2p and numroc are ScaLAPACK tool functions; myrow, mycol, nprow,
and npcol can be determined by calling the function blacs_gridinfo.

Output Parameters

val The value returned by the function.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?lapiv
Applies a permutation matrix to a general distributed
matrix, resulting in row or column pivoting.

1649
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
void pslapiv (char *direc , char *rowcol , char *pivroc , MKL_INT *m , MKL_INT *n ,
float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , MKL_INT *ip ,
MKL_INT *jp , MKL_INT *descip , MKL_INT *iwork );
void pdlapiv (char *direc , char *rowcol , char *pivroc , MKL_INT *m , MKL_INT *n ,
double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , MKL_INT *ip ,
MKL_INT *jp , MKL_INT *descip , MKL_INT *iwork );
void pclapiv (char *direc , char *rowcol , char *pivroc , MKL_INT *m , MKL_INT *n ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , MKL_INT
*ip , MKL_INT *jp , MKL_INT *descip , MKL_INT *iwork );
void pzlapiv (char *direc , char *rowcol , char *pivroc , MKL_INT *m , MKL_INT *n ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , MKL_INT
*ip , MKL_INT *jp , MKL_INT *descip , MKL_INT *iwork );

Include Files
• mkl_scalapack.h

Description
The p?lapivfunction applies either P (permutation matrix indicated by ipiv) or inv(P) to a general m-by-n
distributed matrix sub(A) = A(ia:ia+m-1, ja:ja+n-1), resulting in row or column pivoting. The pivot
vector may be distributed across a process row or a column. The pivot vector should be aligned with the
distributed matrix A. This function will transpose the pivot vector, if necessary.
For example, if the row pivots should be applied to the columns of sub(A), pass rowcol='C' and
pivroc='C'.

Input Parameters

direc (global)
Specifies in which order the permutation is applied:
= 'F' (Forward): Applies pivots forward from top of matrix. Computes
P*sub(A).
= 'B' (Backward): Applies pivots backward from bottom of matrix.
Computes inv(P)*sub(A).

rowcol (global)
Specifies if the rows or columns are to be permuted:
= 'R': Rows will be permuted,

= 'C': Columns will be permuted.

pivroc (global)
Specifies whether ipiv is distributed over a process row or column:
= 'R': ipiv is distributed over a process row,

= 'C': ipiv is distributed over a process column.

m (global)

1650
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The number of rows in the distributed matrix sub(A). When m = 0,
p?lapiv is set to zero. m ≥ 0.

n (global)
The number of columns in the distributed matrix sub(A). When n = 0,
p?lapiv is set to zero. n ≥ 0.

a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1)
containing the local pieces of the distributed matrix sub(A).

ia, ja (global)
The row and column indices in the global matrix A indicating the first row
and the first column of the matrix sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

ipiv (local)
Array of size lipiv ;
when rowcol='R' or 'r':

lipiv≥LOCr(ia+m-1) + mb_a if pivroc='C' or 'c',

lipiv≥LOCc(m + mod(jp-1, nb_p)) if pivroc='R' or 'r', and,
when rowcol='C' or 'c':

lipiv≥LOCr(n + mod(ip-1, mb_p)) if pivroc='C' or 'c',

lipiv≥LOCc(ja+n-1) + nb_a if pivroc='R' or 'r'.
This array contains the pivoting information. ipiv(i) is the global row
(column), local row (column) i was swapped with. When rowcol='R' or
'r' and pivroc='C' or 'c', or rowcol='C' or 'c' and pivroc='R' or
'r', the last piece of this array of size mb_a (resp. nb_a) is used as
workspace. In those cases, this array is tied to the distributed matrix A.

ip, jp (global) The row and column indices in the global matrix P indicating the
first row and the first column of the matrix sub(P), respectively.

descip (global and local) array of size dlen_. The array descriptor for the
distributed vector ipiv.

iwork (local).
Array of size ldw, where ldw is equal to the workspace necessary for
transposition, and the storage of the transposed ipiv:
Let lcm be the least common multiple of nprow and npcol.

if( rowcol == 'r' && pivroc == 'r') {

if( nprow == npcol) {
ldw = LOCr( n_p + (*jp-1)%nb_p ) + nb_p;
} else {
ldw = LOCr( n_p + (*jp-1)%nb_p )+
nb_p * ceil( ceil(LOCc(n_p)/nb_p) / (lcm/npcol) );
}

1651
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

} else if( rowcol == 'c' && pivroc == 'c') {

if( nprow == npcol ) {
ldw = LOCc( m_p + (*ip-1)%mb_p ) + mb_p;
} else {
ldw = LOCc( m_p + (*ip-1)%mb_p ) +
mb_p *ceil(ceil(LOCr(m_p)/mb_p) / (lcm/nprow) );
}
} else {
// iwork is not referenced.

Output Parameters

a (local).
On exit, the local pieces of the permuted distributed submatrix.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?lapv2
Applies a permutation to an m-by-n distributed matrix.

Syntax
void pslapv2 (const char* direc, const char* rowcol, const MKL_INT* m, const MKL_INT*
n, float* a, const MKL_INT* ia, const MKL_INT* ja, const MKL_INT* desca, const MKL_INT*
ipiv, const MKL_INT* ip, const MKL_INT* jp, const MKL_INT* descip);
void pdlapv2 (const char* direc, const char* rowcol, const MKL_INT* m, const MKL_INT*
n, double* a, const MKL_INT* ia, const MKL_INT* ja, const MKL_INT* desca, const
MKL_INT* ipiv, const MKL_INT* ip, const MKL_INT* jp, const MKL_INT* descip);
void pclapv2 (const char* direc, const char* rowcol, const MKL_INT* m, const MKL_INT*
n, MKL_Complex8* a, const MKL_INT* ia, const MKL_INT* ja, const MKL_INT* desca, const
MKL_INT* ipiv, const MKL_INT* ip, const MKL_INT* jp, const MKL_INT* descip);
void pzlapv2 (const char* direc, const char* rowcol, const MKL_INT* m, const MKL_INT*
n, MKL_Complex16* a, const MKL_INT* ia, const MKL_INT* ja, const MKL_INT* desca, const
MKL_INT* ipiv, const MKL_INT* ip, const MKL_INT* jp, const MKL_INT* descip);

Include Files
• mkl_scalapack.h

Description
p?lapv2 applies either P (permutation matrix indicated by ipiv) or inv( P ) to an m-by-n distributed matrix
sub( A ) denoting A(ia:ia+m-1,ja:ja+n-1), resulting in row or column pivoting. The pivot vector should be
aligned with the distributed matrix A. For pivoting the rows of sub( A ), ipiv should be distributed along a
process column and replicated over all process rows. Similarly, ipiv should be distributed along a process
row and replicated over all process columns for column pivoting.

Input Parameters

direc (global)
Specifies in which order the permutation is applied:

1652
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
= 'F' (Forward) Applies pivots Forward from top of matrix. Computes P *
sub( A );
= 'B' (Backward) Applies pivots Backward from bottom of matrix. Computes
inv( P ) * sub( A ).

rowcol (global)
Specifies if the rows or columns are to be permuted:
= 'R' Rows will be permuted,
= 'C' Columns will be permuted.

m (global)
The number of rows to be operated on, i.e. the number of rows of the
distributed submatrix sub( A ). m >= 0.

n (global)
The number of columns to be operated on, i.e. the number of columns of
the distributed submatrix sub( A ). n >= 0.

a Pointer into local memory to an array of size lld_a*LOCc(ja+n-1) .

On entry, this local array contains the local pieces of the distributed matrix
sub( A ) to which the row or columns interchanges will be applied.

ia (global)
The row index in the global array a indicating the first row of sub( A ).

ja (global)
The column index in the global array a indicating the first column of
sub( A ).

desca (global and local)

Array of size dlen_.
The array descriptor for the distributed matrix A.

ipiv Array, size >= LOCr(m_a)+mb_a if rowcol = 'R', LOCc(n_a)+nb_a

otherwise.
It contains the pivoting information. ipiv[i - 1] is the global row (column),
local row (column) i was swapped with. The last piece of the array of size
mb_a or nb_a is used as workspace. ipiv is tied to the distributed matrix
A.

ip (global)
The global row index of ipiv, which points to the beginning of the
submatrix on which to operate.

jp (global)
The global column index of ipiv, which points to the beginning of the
submatrix on which to operate.

descip (global and local)

Array of size 8.

1653
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The array descriptor for the distributed matrix ipiv.

Output Parameters

a On exit, this array contains the local pieces of the permuted

distributed matrix.

p?laqge
Scales a general rectangular matrix, using row and
column scaling factors computed by p?geequ .

Syntax
void pslaqge (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *r , float *c , float *rowcnd , float *colcnd , float *amax , char
*equed );
void pdlaqge (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *r , double *c , double *rowcnd , double *colcnd , double *amax , char
*equed );
void pclaqge (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , float *r , float *c , float *rowcnd , float *colcnd , float *amax ,
char *equed );
void pzlaqge (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *r , double *c , double *rowcnd , double *colcnd , double
*amax , char *equed );

Include Files
• mkl_scalapack.h

Description
The p?laqgefunction equilibrates a general m-by-n distributed matrix sub(A) = A(ia:ia+m-1, ja:ja+n-1)
using the row and scaling factors in the vectors r and c computed by p?geequ.

Input Parameters

m (global)
The number of rows in the distributed matrix sub(A). (m ≥0).

n (global)
The number of columns in the distributed matrix sub(A). (n ≥0).

a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).

On entry, this array contains the distributed matrix sub(A).

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

1654
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
r (local).
Array of size LOCr(m_a). The row scale factors for sub(A). r is aligned with
the distributed matrix A, and replicated across every process column. r is
tied to the distributed matrix A.

c (local).
Array of size LOCc(n_a). The row scale factors for sub(A). c is aligned with
the distributed matrix A, and replicated across every process column. c is
tied to the distributed matrix A.

rowcnd (local).
The global ratio of the smallest r[i] to the largest r[i] , ia-1 ≤ i ≤ ia
+m-2.

colcnd (local).
The global ratio of the smallest c[i] to the largest c[i], ia-1 ≤ i ≤ ia+n-2.

amax (global).
Absolute value of largest distributed submatrix entry.

Output Parameters

a (local).
On exit, the equilibrated distributed matrix. See equed for the form of the
equilibrated distributed submatrix.

equed (global)
Specifies the form of equilibration that was done.
= 'N': No equilibration

= 'R': Row equilibration, that is, sub(A) has been pre-multiplied by

diag(r[ia-1:ia+m-2]),
= 'C': column equilibration, that is, sub(A) has been post-multiplied by
diag(c[ja-1:ja+n-2]),
= 'B': Both row and column equilibration, that is, sub(A) has been
replaced by diag(r[ia-1:ia+m-2])* sub(A) * diag(c[ja-1:ja
+n-2]).

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?laqr0
Computes the eigenvalues of a Hessenberg matrix and
optionally returns the matrices from the Schur
decomposition.

Syntax
void pslaqr0(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* n, MKL_INT* ilo, MKL_INT* ihi,
float* h, MKL_INT* desch, float* wr, float* wi, MKL_INT* iloz, MKL_INT* ihiz, float* z,
MKL_INT* descz, float* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork, MKL_INT*
info, MKL_INT* reclevel);

1655
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

void pdlaqr0(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* n, MKL_INT* ilo, MKL_INT* ihi,
double* h, MKL_INT* desch, double* wr, double* wi, MKL_INT* iloz, MKL_INT* ihiz, double*
z, MKL_INT* descz, double* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork,
MKL_INT* info, MKL_INT* reclevel);

Include Files
• mkl_scalapack.h

Description
p?laqr0 computes the eigenvalues of a Hessenberg matrix H and, optionally, the matrices T and Z from the
Schur decomposition H = Z*T*ZT, where T is an upper quasi-triangular matrix (the Schur form), and Z is the
orthogonal matrix of Schur vectors.
Optionally Z may be postmultiplied into an input orthogonal matrix Q so that this function can give the Schur
factorization of a matrix A which has been reduced to the Hessenberg form H by the orthogonal matrix Q: A
= Q * H * QT = (QZ) * T * (QZ)T.

Input Parameters

wantt (global )
Non-zero : the full Schur form T is required;
Zero : only eigenvalues are required.

wantz (global )
Non-zero : the matrix of Schur vectors Z is required;
Zero: Schur vectors are not required.

n (global )
The order of the Hessenberg matrix H (and Z if wantzis non-zero). n≥ 0.

ilo, ihi (global )

It is assumed that the matrix H is already upper triangular in rows and
columns 1:ilo-1 and ihi+1:n. ilo and ihi are normally set by a previous
call to p?gebal, and then passed to p?gehrd when the matrix output by
ihi is reduced to Hessenberg form. Otherwise ilo and ihi should be set
to 1 and n, respectively. If n > 0, then 1 ≤ilo≤ihi≤n.

If n = 0, then ilo = 1 and ihi = 0.

h (global ) array of size lld_h * LOCc(n)

The upper Hessenberg matrix H.

desch (global and local )

Array of size dlen_.
The array descriptor for the distributed matrix H.

iloz, ihiz Specify the rows of the matrix Z to which transformations must be applied if
wantz is non-zero, 1 ≤iloz≤ilo; ihi≤ihiz≤n.

z Array of size lld_z * LOCc(n).

If wantz is non-zero, contains the matrix Z.

1656
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If wantzequals zero, z is not referenced.

descz (global and local ) array of size dlen_.

The array descriptor for the distributed matrix Z.

work (local workspace) array of size lwork

lwork (local )
The length of the workspace array work.

iwork (local workspace) array of size liwork

liwork (local )
The length of the workspace array iwork.

reclevel (local )
Level of recursion. reclevel = 0 must hold on entry.

OUTPUT Parameters

h On exit, if wantt is non-zero, the matrix H is upper quasi-triangular in rows

and columns ilo:ihi, with 1-by-1 and 2-by-2 blocks on the main diagonal.
The 2-by-2 diagonal blocks (corresponding to complex conjugate pairs of
eigenvalues) are returned in standard form, with H(i,i) = H(i+1,i+1) and H(i
+1,i)*H(i,i+1) < 0. If info = 0 and wanttequals zero, the contents of h
are unspecified on exit.

wr, wi The real and imaginary parts, respectively, of the computed eigenvalues
ilo to ihi are stored in the corresponding elements of wr and wi. If two
eigenvalues are computed as a complex conjugate pair, they are stored in
consecutive elements of wr and wi, say the i-th and (i+1)th, with wi[i-1] >
0 and wi[i] < 0. If wantt is non-zero, the eigenvalues are stored in the
same order as on the diagonal of the Schur form returned in h.

z Updated matrix with transformations applied only to the submatrix

Z(ilo:ihi,ilo:ihi).

If COMPZ = 'I', on exit, if info = 0, z contains the orthogonal matrix Z of

the Schur vectors of H.
If wantz is non-zero, then Z(ilo:ihi,iloz:ihiz) is replaced by
Z(ilo:ihi,iloz:ihiz)*U, when U is the orthogonal/unitary Schur factor of
H(ilo:ihi,ilo:ihi).

If wantzequals zero, then z is not defined.

work[0] On exit, if info = 0, work[0] returns the optimal lwork.

iwork[0] On exit, if info = 0, iwork[0] returns the optimal liwork.

info > 0: if info = i, then the function failed to compute all the eigenvalues.
Elements 0:ilo-2 and i:n-1 of wr and wi contain those eigenvalues which
have been successfully computed.

1657
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

> 0: if wanttequals zero, then the remaining unconverged eigenvalues are

the eigenvalues of the upper Hessenberg matrix rows and columns ilo
through ihi of the final output value of H.

> 0: if wantt is non-zero, then (initial value of H)*U = U*(final value of H),
where U is an orthogonal/unitary matrix. The final value of H is upper
Hessenberg and quasi-triangular/triangular in rows and columns info+1
through ihi.

> 0: if wantz is non-zero, then (final value of

Z(ilo:ihi,iloz:ihiz))=(initial value of Z(ilo:ihi,iloz:ihiz))*U, where
U is the orthogonal/unitary matrix in the previous expression (regardless of
the value of wantt).

> 0: if wantzequals zero, then z is not accessed.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?laqr1
Sets a scalar multiple of the first column of the
product of a 2-by-2 or 3-by-3 matrix and specified
shifts.

Syntax
void pslaqr1(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* n, MKL_INT* ilo, MKL_INT* ihi,
float* a, MKL_INT* desca, float* wr, float* wi, MKL_INT* iloz, MKL_INT* ihiz, float* z,
MKL_INT* descz, float* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* ilwork, MKL_INT*
info);
void pdlaqr1(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* n, MKL_INT* ilo, MKL_INT* ihi,
double* a, MKL_INT* desca, double* wr, double* wi, MKL_INT* iloz, MKL_INT* ihiz, double*
z, MKL_INT* descz, double* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* ilwork,
MKL_INT* info);

Include Files
• mkl_scalapack.h

Description
p?laqr1 is an auxiliary function used to find the Schur decomposition and/or eigenvalues of a matrix already
in Hessenberg form from columns ilo to ihi.

This is a modified version of p?lahqr from ScaLAPACK version 1.7.3. The following modifications were
made:

• Workspace query functionality was added.

• Aggressive early deflation is implemented.
• Aggressive deflation (looking for two consecutive small subdiagonal elements by PSLACONSB) is
abandoned.
• The returned Schur form is now in canonical form, i.e., the returned 2-by-2 blocks really correspond to
complex conjugate pairs of eigenvalues.
• For some reason, the original version of p?lahqr sometimes did not read out the converged eigenvalues
correctly. This is now fixed.

1658
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

wantt (global )
Non-zero : the full Schur form T is required;
Zero: only eigenvalues are required.

wantz Non-zero : the matrix of Schur vectors Z is required;

Zero: Schur vectors are not required.

n (global )
The order of the Hessenberg matrix A (and Z if wantzis non-zero). n≥ 0.

ilo, ihi (global )

It is assumed that the matrix A is already upper quasi-triangular in rows
and columns ihi+1:n, and that A(ilo,ilo-1) = 0 (unless ilo = 1).
p?laqr1 works primarily with the Hessenberg submatrix in rows and
columns ilo to ihi, but applies transformations to all of H if wantt is non-
zero.
1 ≤ilo≤ max(1,ihi); ihi≤n.

a (global ) array of size lld_a * LOCc(n)

On entry, the upper Hessenberg matrix A.

desca (global and local ) array of size dlen_.

The array descriptor for the distributed matrix A.

iloz, ihiz (global )

Specify the rows of the matrix Z to which transformations must be applied if
wantz is non-zero.
1 ≤iloz≤ilo; ihi≤ihiz≤n.

z (global ) array of size lld_z * LOCc(n).

If wantz is non-zero, on entry z must contain the current matrix Z of

transformations accumulated by p?hseqr

If wantz is zero, z is not referenced.

descz (global and local ) array of size dlen_.

The array descriptor for the distributed matrix Z.

work (local output) array of size lwork

lwork (local )
The size of the work array (lwork>=1).

1659
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If lwork=-1, then a workspace query is assumed.

iwork (global and local ) array of size ilwork

This holds the some of the IBLK integer arrays.

ilwork (local )
The size of the iwork array (ilwork≥ 3 ).

OUTPUT Parameters

a If wantt is non-zero, the matrix A is upper quasi-triangular in rows and

columns ilo:ihi, with any 2-by-2 or larger diagonal blocks not yet in
standard form. If wanttequals zero, the contents of a are unspecified on
exit.

wr, wi (global replicated ) array of size n

The real and imaginary parts, respectively, of the computed eigenvalues

ilo to ihi are stored in the corresponding elements of wr and wi. If two
eigenvalues are computed as a complex conjugate pair, they are stored in
consecutive elements of wr and wi, say the i-th and (i+1)th, with wi[i-1] >
0 and wi[i] < 0. If wantt is non-zero, the eigenvalues are stored in the
same order as on the diagonal of the Schur form returned in a. a may be
returned with larger diagonal blocks until the next release.

z On exit z is updated; transformations are applied only to the submatrix

Z(iloz:ihiz,ilo:ihi).

If wantzequals zero, z is not referenced.

work[0] On exit, if info = 0, work[0] returns the optimal lwork.

info (global )
< 0: parameter number -info incorrect or inconsistent

= 0: successful exit
> 0: p?laqr1 failed to compute all the eigenvalues ilo to ihi in a total of
30*(ihi-ilo+1) iterations; if info = i, elements i:ihi-1 of wr and wi
contain those eigenvalues which have been successfully computed.

Application Notes
This algorithm is very similar to p?ahqr. Unlike p?lahqr, instead of sending one double shift through the
largest unreduced submatrix, this algorithm sends multiple double shifts and spaces them apart so that there
can be parallelism across several processor row/columns. Another critical difference is that this algorithm
aggregrates multiple transforms together in order to apply them in a block fashion.
Current Notes and/or Restrictions:

• This code requires the distributed block size to be square and at least six (6); unlike simpler codes like
LU, this algorithm is extremely sensitive to block size. Unwise choices of too small a block size can lead to
bad performance.
• This code requires a and z to be distributed identically and have identical contxts.
• This release currently does not have a function for resolving the Schur blocks into regular 2x2 form after
this code is completed. Because of this, a significant performance impact is required while the deflation is
done by sometimes a single column of processors.

1660
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• This code does not currently block the initial transforms so that none of the rows or columns for any bulge
are completed until all are started. To offset pipeline start-up it is recommended that at least
2*LCM(NPROW,NPCOL) bulges are used (if possible)
• The maximum number of bulges currently supported is fixed at 32. In future versions this will be limited
only by the incoming work array.
• The matrix A must be in upper Hessenberg form. If elements below the subdiagonal are nonzero, the
resulting transforms may be nonsimilar. This is also true with the LAPACK function.
• For this release, it is assumed rsrc_=csrc_=0
• Currently, all the eigenvalues are distributed to all the nodes. Future releases will probably distribute the
eigenvalues by the column partitioning.
• The internals of this function are subject to change.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?laqr2
Performs the orthogonal/unitary similarity
transformation of a Hessenberg matrix to detect and
deflate fully converged eigenvalues from a trailing
principal submatrix (aggressive early deflation).

Syntax
void pslaqr2(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* n, MKL_INT* ktop, MKL_INT* kbot,
MKL_INT* nw, float* a, MKL_INT* desca, MKL_INT* iloz, MKL_INT* ihiz, float* z, MKL_INT*
descz, MKL_INT* ns, MKL_INT* nd, float* sr, float* si, float* t, MKL_INT* ldt, float* v,
MKL_INT* ldv, float* wr, float* wi, float* work, MKL_INT* lwork);
void pdlaqr2(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* n, MKL_INT* ktop, MKL_INT* kbot,
MKL_INT* nw, double* a, MKL_INT* desca, MKL_INT* iloz, MKL_INT* ihiz, double* z,
MKL_INT* descz, MKL_INT* ns, MKL_INT* nd, double* sr, double* si, double* t, MKL_INT*
ldt, double* v, MKL_INT* ldv, double* wr, double* wi, double* work, MKL_INT* lwork);

Include Files
• mkl_scalapack.h

Description
p?laqr2 accepts as input an upper Hessenberg matrix A and performs an orthogonal similarity
transformation designed to detect and deflate fully converged eigenvalues from a trailing principal submatrix.
On output Ais overwritten by a new Hessenberg matrix that is a perturbation of an orthogonal similarity
transformation of A. It is to be hoped that the final version of A has many zero subdiagonal entries.
This function handles small deflation windows which is affordable by one processor. Normally, it is called by
p?laqr1. All the inputs are assumed to be valid without checking.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

wantt (global )

1661
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If wantt is non-zero, then the Hessenberg matrix A is fully updated so that

the quasi-triangular Schur factor may be computed (in cooperation with the
calling function).
If wantt equals zero, then only enough of A is updated to preserve the
eigenvalues.

wantz (global )
If wantz is non-zero, then the orthogonal matrix Z is updated so that the
orthogonal Schur factor may be computed (in cooperation with the calling
function).
If wantz equals zero, then z is not referenced.

n (global )
The order of the matrix A and (if wantz is non-zero) the order of the
orthogonal matrix Z.

ktop, kbot (global )

It is assumed without a check that either kbot = n or A(kbot+1,kbot)=0.
kbot and ktop together determine an isolated block along the diagonal of
the Hessenberg matrix. However, A(ktop,ktop-1)=0 is not essentially
necessary if wantt is non-zero .

nw (global )
Deflation window size. 1 ≤nw≤ (kbot-ktop+1). Normally nw≥ 3 if p?laqr2 is
called by p?laqr1.

a (local ) array of size lld_a * LOCc(n)

The initial n-by-n section of a stores the Hessenberg matrix undergoing

aggressive early deflation.

desca (global and local) array of size dlen_.

The array descriptor for the distributed matrix A.

iloz, ihiz (global )

Specify the rows of the matrix Zto which transformations must be applied if
wantz is non-zero. 1 ≤iloz≤ihiz≤n.

z Array of size lld_z * LOCc(n)

If wantz is non-zero, then on output, the orthogonal similarity

transformation mentioned above has been accumulated into the matrix
Z(iloz:ihiz,

kbot:ktop), stored in z, from the right.

If wantz is zero, then z is unreferenced.

descz (global and local) array of size dlen_.

The array descriptor for the distributed matrix Z.

t (local workspace) array of size ldt * nw.

ldt (local )

1662
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The leading dimension of the array t. ldt≥nw.

v (local workspace) array of size ldv * nw.

ldv (local )
The leading dimension of the array v. ldv≥nw.

wr, wi (local workspace) array of size kbot.

work (local workspace) array of size lwork.

lwork (local )
work(lwork) is a local array and lwork is assumed big enough so that
lwork≥nw*nw.

OUTPUT Parameters

a On output a has been transformed by an orthogonal similarity

transformation, perturbed, and returned to Hessenberg form that (it is to be
hoped) has some zero subdiagonal entries.

z
ns (global )
The number of unconverged (that is, approximate) eigenvalues returned in
sr and si that may be used as shifts by the calling function.

nd (global )
The number of converged eigenvalues uncovered by this function.

sr, si (global ) array of size kbot

On output, the real and imaginary parts of approximate eigenvalues that

may be used for shifts are stored in sr[kbot-nd-ns] through sr[kbot-
nd-1] and si[kbot-nd-ns] through si[kbot-nd-1], respectively.
On processor #0, the real and imaginary parts of converged eigenvalues
are stored in sr[kbot-nd] through sr[kbot-1] and si[kbot-nd] through
si[kbot-1], respectively. On other processors, these entries are set to
zero.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?laqr3
Performs the orthogonal/unitary similarity
transformation of a Hessenberg matrix to detect and
deflate fully converged eigenvalues from a trailing
principal submatrix (aggressive early deflation).

Syntax
void pslaqr3(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* n, MKL_INT* ktop, MKL_INT* kbot,
MKL_INT* nw, float* h, MKL_INT* desch, MKL_INT* iloz, MKL_INT* ihiz, float* z, MKL_INT*
descz, MKL_INT* ns, MKL_INT* nd, float* sr, float* si, float* v, MKL_INT* descv,
MKL_INT* nh, float* t, MKL_INT* desct, MKL_INT* nv, float* wv, MKL_INT* descw, float*
work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork, MKL_INT* reclevel);

1663
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

void pdlaqr3(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* n, MKL_INT* ktop, MKL_INT* kbot,
MKL_INT* nw, double* h, MKL_INT* desch, MKL_INT* iloz, MKL_INT* ihiz, double* z,
MKL_INT* descz, MKL_INT* ns, MKL_INT* nd, double* sr, double* si, double* v, MKL_INT*
descv, MKL_INT* nh, double* t, MKL_INT* desct, MKL_INT* nv, double* wv, MKL_INT* descw,
double* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork, MKL_INT* reclevel);

Include Files
• mkl_scalapack.h

Description
This function accepts as input an upper Hessenberg matrix H and performs an orthogonal similarity
transformation designed to detect and deflate fully converged eigenvalues from a trailing principal submatrix.
On output H is overwritten by a new Hessenberg matrix that is a perturbation of an orthogonal similarity
transformation of H. It is to be hoped that the final version of H has many zero subdiagonal entries.

Input Parameters

wantt (global )
If wantt is non-zero, then the Hessenberg matrix H is fully updated so that
the quasi-triangular Schur factor may be computed (in cooperation with the
calling function).
If wantt equals zero, then only enough of H is updated to preserve the
eigenvalues.

n (global )
The order of the matrix H and (if wantz is non-zero), the order of the
orthogonal matrix Z.

ktop (global )
It is assumed that either ktop = 1 or H (ktop,ktop-1)=0. kbot and ktop
together determine an isolated block along the diagonal of the Hessenberg
matrix.

kbot (global )
It is assumed without a check that either kbot = n or H (kbot+1,kbot)=0.
kbot and ktop together determine an isolated block along the diagonal of
the Hessenberg matrix.

nw (global )
Deflation window size. 1 ≤nw≤ (kbot-ktop+1).

h (local ) array of size lld_h * LOCc(n)

The initial n-by-n section of H stores the Hessenberg matrix undergoing

aggressive early deflation.

1664
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
desch (global and local) array of size dlen_.
The array descriptor for the distributed matrix H.

iloz, ihiz (global )

Specify the rows of the matrix Z to which transformations must be applied if
wantz is non-zero. 1 ≤iloz≤ihiz≤n.

z Array of size lld_z * LOCc(n)

If wantz is non-zero, then on output, the orthogonal similarity

transformation mentioned above has been accumulated into the matrix
Z(iloz:ihiz,kbot:ktop) from the right.

If wantz is zero, then z is unreferenced.

descz (global and local) array of size dlen_.

The array descriptor for the distributed matrix Z.

v (global workspace) array of size lld_v * LOCcnw)

An nw-by-nw distributed work array.

descv (global and local) array of size dlen_.

The array descriptor for the distributed matrix V.

nh The number of columns of t. nh≥nw.

t (global workspace) array of size lld_t * LOCc(nh)

desct (global and local) array of size dlen_.

The array descriptor for the distributed matrix T.

nv (global )
The number of rows of work array wv available for workspace. nv≥nw.

wv (global workspace) array of size lld_w *LOCc(nw)

descw (global and local) array of size dlen_.

The array descriptor for the distributed matrix wv.

work (local workspace) array of size lwork.

lwork (local )
The size of the work array work (lwork≥1). lwork = 2*nw suffices, but
greater efficiency may result from larger values of lwork.

If lwork = -1, then a workspace query is assumed; p?laqr3 only

estimates the optimal workspace size for the given values of n, nw, ktop
and kbot. The estimate is returned in work[0]. No error message related to
lwork is issued by xerbla. Neither h nor z are accessed.

iwork (local workspace) array of size liwork

liwork (local )
The length of the workspace array iwork (liwork≥1).

1665
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If liwork=-1, then a workspace query is assumed.

OUTPUT Parameters

h On output h has been transformed by an orthogonal similarity

transformation, perturbed, and the returned to Hessenberg form that (it is
to be hoped) has some zero subdiagonal entries.

z IF wantz is non-zero, then on output, the orthogonal similarity

transformation mentioned above has been accumulated into the matrix
Z(iloz:ihiz,kbot:ktop) from the right.

If wantz is zero, then z is unreferenced.

ns (global )
The number of unconverged (that is, approximate) eigenvalues returned in
sr and si that may be used as shifts by the calling function.

nd (global )
The number of converged eigenvalues uncovered by this function.

sr, si (global ) array of size kbot. The real and imaginary parts of approximate
eigenvalues that may be used for shifts are stored in sr[kbot-nd-ns]
through sr[kbot-nd-1] and si[kbot-nd-ns] through si[kbot-nd-1],
respectively. The real and imaginary parts of converged eigenvalues are
stored in sr[kbot-nd] through sr[kbot-1] and si[kbot-nd] through
si[kbot-1], respectively.

work[0] On exit, if info = 0, work[0] returns the optimal lwork

iwork[0] On exit, if info = 0, iwork[0] returns the optimal liwork

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?laqr5
Performs a single small-bulge multi-shift QR sweep.

Syntax
void pslaqr5(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* kacc22, MKL_INT* n, MKL_INT*
ktop, MKL_INT* kbot, MKL_INT* nshfts, float* sr, float* si, float* h, MKL_INT* desch,
MKL_INT* iloz, MKL_INT* ihiz, float* z, MKL_INT* descz, float* work, MKL_INT* lwork,
MKL_INT* iwork, MKL_INT* liwork);
void pdlaqr5(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* kacc22, MKL_INT* n, MKL_INT*
ktop, MKL_INT* kbot, MKL_INT* nshfts, double* sr, double* si, double* h, MKL_INT* desch,
MKL_INT* iloz, MKL_INT* ihiz, double* z, MKL_INT* descz, double* work, MKL_INT* lwork,
MKL_INT* iwork, MKL_INT* liwork);

Include Files
• mkl_scalapack.h

1666
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
This auxiliary function called by p?laqr0 performs a single small-bulge multi-shift QR sweep by chasing
separated groups of bulges along the main block diagonal of a Hessenberg matrix H.

Input Parameters

wantt (global) scalar

wanttis non-zero if the quasi-triangular Schur factor is being
computed. wantt is set to zero otherwise.

wantz (global) scalar

wantzis non-zero if the orthogonal Schur factor is being computed.
wantz is set to zero otherwise.

kacc22 (global)
Value 0, 1, or 2. Specifies the computation mode of far-from-diagonal
orthogonal updates.
= 0: p?laqr5 does not accumulate reflections and does not use
matrix-matrix multiply to update far-from-diagonal matrix entries.
= 1: p?laqr5 accumulates reflections and uses matrix-matrix multiply
to update the far-from-diagonal matrix entries.
= 2: p?laqr5 accumulates reflections, uses matrix-matrix multiply to
update the far-from-diagonal matrix entries, and takes advantage of
2-by-2 block structure during matrix multiplies.

n (global) scalar
The order of the Hessenberg matrix H and, if wantzis non-zero, the
order of the orthogonal matrix Z.

ktop, kbot (global) scalar

These are the first and last rows and columns of an isolated diagonal
block upon which the QR sweep is to be applied. It is assumed without
a check that either ktop = 1 or H(ktop,ktop-1) = 0 and either kbot
= n or H(kbot+1,kbot) = 0.

nshfts (global) scalar

nshfts gives the number of simultaneous shifts. nshfts must be
positive and even.

sr, si (global) Array of size nshfts

sr contains the real parts and si contains the imaginary parts of the
nshfts shifts of origin that define the multi-shift QR sweep.

h (local) Array of size lld_h * LOCc(n)

On input h contains a Hessenberg matrix H.

desch (global and local)

array of size dlen_.
The array descriptor for the distributed matrix H .

1667
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

iloz, ihiz (global)

Specify the rows of the matrix Z to which transformations must be
applied if wantzis non-zero. 1 ≤iloz≤ihiz≤n

z (local) array of size lld_z * LOCc(n)

If wantzis non-zero, then the QR Sweep orthogonal similarity

transformation is accumulated into the matrix
Z(iloz:ihiz,kbot:ktop) from the right. If wantzequals zero, then z
is unreferenced.

descz (global and local) array of size dlen_.

The array descriptor for the distributed matrix Z.

work (local workspace) array of size lwork

lwork (local)
The size of the work array (lwork≥1).

If lwork=-1, then a workspace query is assumed.

iwork (local workspace) array of size liwork

liwork (local)
The size of the iwork array (liwork≥1).

If liwork=-1, then a workspace query is assumed.

Output Parameters

h A multi-shift QR sweep with shifts sr(j)+i*si(j) is applied to the

isolated diagonal block in rows and columns ktop through kbot of the
matrix H.

z If wantzis non-zero, z is updated with transformations applied only to

the submatrix Z(iloz:ihiz,kbot:ktop).

work[0] On exit, if info = 0, work[0] returns the optimal lwork.

iwork[0] On exit, if info = 0, iwork[0] returns the optimal liwork.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?laqsy
Scales a symmetric/Hermitian matrix, using scaling
factors computed by p?poequ .

Syntax
void pslaqsy (char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *sr , float *sc , float *scond , float *amax , char *equed );
void pdlaqsy (char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *sr , double *sc , double *scond , double *amax , char *equed );
void pclaqsy (char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , float *sr , float *sc , float *scond , float *amax , char *equed );

1668
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void pzlaqsy (char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *sr , double *sc , double *scond , double *amax , char *equed );

Include Files
• mkl_scalapack.h

Description
The p?laqsyfunction equilibrates a symmetric distributed matrix sub(A) = A(ia:ia+n-1, ja:ja+n-1) using
the scaling factors in the vectors sr and sc. The scaling factors are computed by p?poequ.

Input Parameters

uplo (global) Specifies the upper or lower triangular part of the symmetric
distributed matrix sub(A) is to be referenced:

= 'U': Upper triangular part;

= 'L': Lower triangular part.

n (global)
The order of the distributed matrix sub(A). n ≥ 0.

a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).
On entry, this array contains the local pieces of the distributed matrix
sub(A). On entry, the local pieces of the distributed symmetric matrix
sub(A).
If uplo = 'U', the leading n-by-n upper triangular part of sub(A) contains
the upper triangular part of the matrix, and the strictly lower triangular part
of sub(A) is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
the lower triangular part of the matrix, and the strictly upper triangular part
of sub(A) is not referenced.

ia, ja (global)
The row and column indices in the global matrix A indicating the first row
and the first column of the matrix sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

sr (local)
Array of size LOCr(m_a). The scale factors for the matrix A(ia:ia+m-1, ja:ja
+n-1). sr is aligned with the distributed matrix A, and replicated across
every process column. sr is tied to the distributed matrix A.

sc (local)
Array of size LOCc(m_a). The scale factors for the matrix A (ia:ia+m-1,
ja:ja+n-1). sc is aligned with the distributed matrix A, and replicated
across every process column. sc is tied to the distributed matrix A.

scond (global).

1669
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Ratio of the smallest sr[i] (respectively sc[j]) to the largest sr[i]

(respectively sc[j]), with ia -1 ≤ i < ia+n-1 and ja -1 ≤ j < ja+n-1.

amax (global).
Absolute value of largest distributed submatrix entry.

Output Parameters

a On exit,
if equed = 'Y', the equilibrated matrix:

diag(sr ia, ..., sr ia+n-1) * sub(A) * diag(sc ja, ..., sc ja+n-1).

equed (global).
Specifies whether or not equilibration was done.
= 'N': No equilibration.

= 'Y': Equilibration was done, that is, sub(A) has been replaced by:

diag(sr ia, ..., sr ia+n-1) * sub(A) * diag(sc ja, ..., sc ja+n-1).

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?lared1d
Redistributes an array assuming that the input array,
bycol, is distributed across rows and that all process
columns contain the same copy of bycol.

Syntax
void pslared1d (MKL_INT *n , MKL_INT *ia , MKL_INT *ja , MKL_INT *desc , float *bycol ,
float *byall , float *work , MKL_INT *lwork );
void pdlared1d (MKL_INT *n , MKL_INT *ia , MKL_INT *ja , MKL_INT *desc , double
*bycol , double *byall , double *work , MKL_INT *lwork );

Include Files
• mkl_scalapack.h

Description
The p?lared1dfunction redistributes a 1D array. It assumes that the input array bycol is distributed across
rows and that all process column contain the same copy of bycol. The output array byall is identical on all
processes and contains the entire array.

Input Parameters
np = Number of local rows in bycol()

n (global)
The number of elements to be redistributed. n≥ 0.

ia, ja (global) ia, ja must be equal to 1.

desc (local) array of size 9. A 2D array descriptor, which describes bycol.

1670
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
bycol (local).
Distributed block cyclic array of global size n and of local size np. bycol is
distributed across the process rows. All process columns are assumed to
contain the same value.

work (local).
size lwork. Used to hold the buffers sent from one process to another.

lwork (local)
The size of the work array. lwork ≥ numroc(n, desc[nb_], 0, 0,
npcol).

Output Parameters

byall (global).
Global size n, local size n. byall is exactly duplicated on all processes. It
contains the same values as bycol, but it is replicated across all processes
rather than being distributed.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?lared2d
Redistributes an array assuming that the input array
byrow is distributed across columns and that all
process rows contain the same copy of byrow.

Syntax
void pslared2d (MKL_INT *n , MKL_INT *ia , MKL_INT *ja , MKL_INT *desc , float *byrow ,
float *byall , float *work , MKL_INT *lwork );
void pdlared2d (MKL_INT *n , MKL_INT *ia , MKL_INT *ja , MKL_INT *desc , double
*byrow , double *byall , double *work , MKL_INT *lwork );

Include Files
• mkl_scalapack.h

Description
The p?lared2dfunction redistributes a 1D array. It assumes that the input array byrow is distributed across
columns and that all process rows contain the same copy of byrow. The output array byall will be identical
on all processes and will contain the entire array.

Input Parameters
np = Number of local rows in byrow()

n (global)
The number of elements to be redistributed. n≥ 0.

ia, ja (global) ia, ja must be equal to 1.

desc (local) array of size dlen_. A 2D array descriptor, which describes byrow.

1671
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

byrow (local).
Distributed block cyclic array of global size n and of local size np.
byrow is distributed across the process columns. All process rows
are assumed to contain the same value.
work (local).
size lwork. Used to hold the buffers sent from one process to another.

lwork (local) The size of the work array. lwork ≥ numroc(n, desc[nb_], 0, 0,
npcol).

Output Parameters

byall (global).
Global size n, local size n. byall is exactly duplicated on all processes. It
contains the same values as byrow, but it is replicated across all processes
rather than being distributed.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?larf
Applies an elementary reflector to a general
rectangular matrix.

Syntax
void pslarf (char *side , MKL_INT *m , MKL_INT *n , float *v , MKL_INT *iv , MKL_INT
*jv , MKL_INT *descv , MKL_INT *incv , float *tau , float *c , MKL_INT *ic , MKL_INT
*jc , MKL_INT *descc , float *work );
void pdlarf (char *side , MKL_INT *m , MKL_INT *n , double *v , MKL_INT *iv , MKL_INT
*jv , MKL_INT *descv , MKL_INT *incv , double *tau , double *c , MKL_INT *ic , MKL_INT
*jc , MKL_INT *descc , double *work );
void pclarf (char *side , MKL_INT *m , MKL_INT *n , MKL_Complex8 *v , MKL_INT *iv ,
MKL_INT *jv , MKL_INT *descv , MKL_INT *incv , MKL_Complex8 *tau , MKL_Complex8 *c ,
MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work );
void pzlarf (char *side , MKL_INT *m , MKL_INT *n , MKL_Complex16 *v , MKL_INT *iv ,
MKL_INT *jv , MKL_INT *descv , MKL_INT *incv , MKL_Complex16 *tau , MKL_Complex16 *c ,
MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex16 *work );

Include Files
• mkl_scalapack.h

Description
The p?larffunction applies a real/complex elementary reflector Q (or QT) to a real/complex m-by-n
distributed matrix sub(C) = C(ic:ic+m-1, jc:jc+n-1), from either the left or the right. Q is represented
in the form

Q = I-tau*v*v',
where tau is a real/complex scalar and v is a real/complex vector.
If tau = 0, then Q is taken to be the unit matrix.

1672
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

side (global).
= 'L': form Q*sub(C),

= 'R': form sub(C)*Q, Q=QT.

m (global)
The number of rows in the distributed submatrix sub(A). (m≥ 0).

n (global)
The number of columns in the distributed submatrix sub(A). (n ≥ 0).

v (local).
Pointer into the local memory to an array of size lld_v * LOCc(n_v),
containing the local pieces of the global distributed matrix V representing
the Householder transformation Q,
V(iv:iv+m-1, jv) if side = 'L' and incv = 1,

V(iv, jv:jv+m-1) if side = 'L' and incv = m_v,

V(iv:iv+n-1, jv) if side = 'R' and incv = 1,

V(iv, jv:jv+n-1) if side = 'R' and incv = m_v.

The array v is the representation of Q. v is not used if tau = 0.

iv, jv (global) The row and column indices in the global matrix V indicating the
first row and the first column of the matrix sub(V), respectively.

descv (global and local) array of size dlen_. The array descriptor for the
distributed matrix V.

incv (global)
The global increment for the elements of V. Only two values of incv are
supported in this version, namely 1 and m_v.
incv must not be zero.

tau (local).
Array of size LOCc(jv) if incv = 1, and LOCr(iv) otherwise. This array
contains the Householder scalars related to the Householder vectors.
tau is tied to the distributed matrix V.

c (local).
Pointer into the local memory to an array of size lld_c * LOCc(jc+n-1),
containing the local pieces of sub(C).

ic, jc (global)
The row and column indices in the global matrix C indicating the first row
and the first column of the matrix sub(C), respectively.

descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.

work (local).

1673
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Array of size lwork.

If incv = 1,
if side = 'L',

if ivcol = iccol,
lwork≥nqc0
else
lwork≥mpc0 + max( 1, nqc0 )

end if
else if side = 'R' ,

lwork≥nqc0 + max( max( 1, mpc0), numroc(numroc( n+

icoffc,nb_v,0,0,npcol),nb_v,0,0,lcmq ) )
end if
else if incv = m_v,
if side = 'L',

lwork≥mpc0 + max( max( 1, nqc0 ), numroc(

numroc(m+iroffc,mb_v,0,0,nprow ),mb_v,0,0, lcmp ) )

else if side = 'R',

if ivrow = icrow,
lwork≥mpc0
else
lwork≥nqc0 + max( 1, mpc0 )

end if
end if
end if,
where lcm is the least common multiple of nprow and npcol and lcm =
ilcm( nprow, npcol ), lcmp = lcm/nprow, lcmq = lcm/npcol,
iroffc = mod( ic-1, mb_c ), icoffc = mod( jc-1, nb_c ),
icrow = indxg2p( ic, mb_c, myrow, rsrc_c, nprow ),
iccol = indxg2p( jc, nb_c, mycol, csrc_c, npcol ),
mpc0 = numroc( m+iroffc, mb_c, myrow, icrow, nprow ),
nqc0 = numroc( n+icoffc, nb_c, mycol, iccol, npcol ),
ilcm, indxg2p, and numroc are ScaLAPACK tool functions; myrow, mycol,
nprow, and npcol can be determined by calling the function
blacs_gridinfo.

Output Parameters

c (local).
On exit, sub(C) is overwritten by the Q*sub(C) if side = 'L',

1674
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
or sub(C) * Q if side = 'R'.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?larfb
Applies a block reflector or its transpose/conjugate-
transpose to a general rectangular matrix.

Syntax
void pslarfb (char *side , char *trans , char *direct , char *storev , MKL_INT *m ,
MKL_INT *n , MKL_INT *k , float *v , MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , float
*t , float *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , float *work );
void pdlarfb (char *side , char *trans , char *direct , char *storev , MKL_INT *m ,
MKL_INT *n , MKL_INT *k , double *v , MKL_INT *iv , MKL_INT *jv , MKL_INT *descv ,
double *t , double *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , double *work );
void pclarfb (char *side , char *trans , char *direct , char *storev , MKL_INT *m ,
MKL_INT *n , MKL_INT *k , MKL_Complex8 *v , MKL_INT *iv , MKL_INT *jv , MKL_INT *descv ,
MKL_Complex8 *t , MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc ,
MKL_Complex8 *work );
void pzlarfb (char *side , char *trans , char *direct , char *storev , MKL_INT *m ,
MKL_INT *n , MKL_INT *k , MKL_Complex16 *v , MKL_INT *iv , MKL_INT *jv , MKL_INT
*descv , MKL_Complex16 *t , MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT
*descc , MKL_Complex16 *work );

Include Files
• mkl_scalapack.h

Description
The p?larfbfunction applies a real/complex block reflector Q or its transpose QT/conjugate transpose QH to a
real/complex distributed m-by-n matrix sub(C) = C(ic:ic+m-1, jc:jc+n-1) from the left or the right.

Input Parameters

side (global)
if side = 'L': apply Q or QT for real flavors (QH for complex flavors) from
the Left;
if side = 'R': apply Q or QTfor real flavors (QH for complex flavors) from
the Right.

trans (global)
if trans = 'N': no transpose, apply Q;

for real flavors, if trans='T': transpose, apply QT

for complex flavors, if trans = 'C': conjugate transpose, apply QH;

direct (global) Indicates how Q is formed from a product of elementary reflectors.

if direct = 'F': Q = H(1)*H(2)*...*H(k) (Forward)

if direct = 'B': Q = H(k)...H(2)*H(1) (Backward)

1675
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

storev (global)
Indicates how the vectors that define the elementary reflectors are stored:
if storev = 'C': Columnwise

if storev = 'R': Rowwise.

m (global)
The number of rows in the distributed matrix sub(C). (m ≥ 0).

n (global)
The number of columns in the distributed matrix sub(C). (n ≥ 0).

k (global)
The order of the matrix T.

v (local).
Pointer into the local memory to an array of size

lld_v * LOCc(jv+k-1) if storev = 'C',

lld_v * LOCc(jv+m-1) if storev = 'R' and side = 'L',

lld_v * LOCc(jv+n-1) if storev = 'R' and side = 'R'.

It contains the local pieces of the distributed vectors V representing the

Householder transformation.
if storev = 'C' and side = 'L', lld_v ≥ max(1,LOCr(iv+m-1));

if storev = 'C' and side = 'R', lld_v ≥ max(1,LOCr(iv+n-1));

if storev = 'R', lld_v≥LOCr(jv+k-1).

iv, jv (global)
The row and column indices in the global matrix V indicating the first row
and the first column of the matrix sub(V), respectively.

descv (global and local) array of size dlen_. The array descriptor for the
distributed matrix V.

c (local).
Pointer into the local memory to an array of size lld_c * LOCc(jc+n-1),
containing the local pieces of sub(C).

ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the matrix sub(C), respectively.

descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.

work (local).
Workspace array of size lwork.
If storev = 'C',

if side = 'L',

lwork≥ ( nqc0 + mpc0 ) * k

1676
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
else if side = 'R',

lwork ≥ ( nqc0 + max( npv0 + numroc( numroc( n +

icoffc, nb_v, 0, 0, npcol ), nb_v, 0, 0, lcmq ),

mpc0 ) ) * k
end if
else if storev = 'R' ,

if side = 'L' ,

lwork≥ ( mpc0 + max( mqv0 + numroc( numroc( m +

iroffc, mb_v, 0, 0, nprow ), mb_v, 0, 0, lcmp ),

nqc0 ) ) * k
else if side = 'R',

lwork ≥ ( mpc0 + nqc0 ) * k

end if
end if,
where
lcmq = lcm / npcol with lcm = iclm( nprow, npcol ),

iroffv = mod( iv-1, mb_v ), icoffv = mod( jv-1, nb_v ),

ivrow = indxg2p( iv, mb_v, myrow, rsrc_v, nprow ),

ivcol = indxg2p( jv, nb_v, mycol, csrc_v, npcol ),

MqV0 = numroc( m+icoffv, nb_v, mycol, ivcol, npcol ),

NpV0 = numroc( n+iroffv, mb_v, myrow, ivrow, nprow ),

iroffc = mod( ic-1, mb_c ), icoffc = mod( jc-1, nb_c ),

icrow = indxg2p( ic, mb_c, myrow, rsrc_c, nprow ),

iccol = indxg2p( jc, nb_c, mycol, csrc_c, npcol ),

MpC0 = numroc( m+iroffc, mb_c, myrow, icrow, nprow ),

NpC0 = numroc( n+icoffc, mb_c, myrow, icrow, nprow ),

NqC0 = numroc( n+icoffc, nb_c, mycol, iccol, npcol ),

ilcm, indxg2p, and numroc are ScaLAPACK tool functions; myrow, mycol,
nprow, and npcol can be determined by calling the function
blacs_gridinfo.

Output Parameters

t (local).
Array of size mb_v * mb_vif storev = 'R', and nb_v * nb_vif storev =
'C'. The triangular matrix t is the representation of the block reflector.

c (local).
On exit, sub(C) is overwritten by the Q*sub(C), or Q'*sub(C), or
sub(C)*Q, or sub(C)*Q'. Q' is transpose (conjugate transpose) of Q.

1677
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?larfc
Applies the conjugate transpose of an elementary
reflector to a general matrix.

Syntax
void pclarfc (char *side , MKL_INT *m , MKL_INT *n , MKL_Complex8 *v , MKL_INT *iv ,
MKL_INT *jv , MKL_INT *descv , MKL_INT *incv , MKL_Complex8 *tau , MKL_Complex8 *c ,
MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work );
void pzlarfc (char *side , MKL_INT *m , MKL_INT *n , MKL_Complex16 *v , MKL_INT *iv ,
MKL_INT *jv , MKL_INT *descv , MKL_INT *incv , MKL_Complex16 *tau , MKL_Complex16 *c ,
MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex16 *work );

Include Files
• mkl_scalapack.h

Description
The p?larfcfunction applies a complex elementary reflector QH to a complex m-by-n distributed matrix
sub(C) = C(ic:ic+m-1, jc:jc+n-1), from either the left or the right. Q is represented in the form

Q = i-tau*v*v',
where tau is a complex scalar and v is a complex vector.
If tau = 0, then Q is taken to be the unit matrix.

Input Parameters

side (global)
if side = 'L': form QH*sub(C) ;

if side = 'R': form sub (C)*QH.

m (global)
The number of rows in the distributed matrix sub(C). (m ≥ 0).

n (global)
The number of columns in the distributed matrix sub(C). (n ≥ 0).

V(iv, jv:jv+m-1) if side = 'L' and incv = m_v,

V(iv:iv+n-1, jv) if side = 'R' and incv = 1,

V(iv, jv:jv+n-1) if side = 'R' and incv = m_v.

The array v is the representation of Q. v is not used if tau = 0.

1678
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
iv, jv (global)
The row and column indices in the global matrix V indicating the first row
and the first column of the matrix sub(V), respectively.

descv (global and local) array of size dlen_. The array descriptor for the
distributed matrix V.

incv (global)
The global increment for the elements of v. Only two values of incv are
supported in this version, namely 1 and m_v.
incv must not be zero.

tau (local)
Array of size LOCc(jv) if incv = 1, and LOCr(iv) otherwise. This array
contains the Householder scalars related to the Householder vectors.
tau is tied to the distributed matrix V.

c (local).
Pointer into the local memory to an array of size lld_c * LOCc(jc+n-1),
containing the local pieces of sub(C).

ic, jc (global)
The row and column indices in the global matrix C indicating the first row
and the first column of the matrix sub(C), respectively.

descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.

work (local).
Workspace array of size lwork.
If incv = 1,
if side = 'L' ,

if ivcol = iccol,
lwork ≥ nqc0
else
lwork ≥ mpc0 + max( 1, nqc0 )

end if
else if side = 'R',

lwork ≥ nqc0 + max( max( 1, mpc0 ), numroc( numroc(

n+icoffc,nb_v,0,0,npcol ), nb_v,0,0,lcmq ) )
end if
else if incv = m_v,
if side = 'L',

lwork ≥ mpc0 + max( max( 1, nqc0 ), numroc( numroc(

m+iroffc,mb_v,0,0,nprow ),mb_v,0,0,lcmp ) )

1679
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

else if side = 'R' ,

if ivrow = icrow,
lwork ≥ mpc0
else
lwork ≥ nqc0 + max( 1, mpc0 )

end if
end if
end if,
where lcm is the least common multiple of nprow and npcol and lcm =
ilcm(nprow, npcol),
lcmp = lcm/nprow, lcmq = lcm/npcol,
iroffc = mod(ic-1, mb_c), icoffc = mod(jc-1, nb_c),
icrow = indxg2p(ic, mb_c, myrow, rsrc_c, nprow),
iccol = indxg2p(jc, nb_c, mycol, csrc_c, npcol),
mpc0 = numroc(m+iroffc, mb_c, myrow, icrow, nprow),
nqc0 = numroc(n+icoffc, nb_c, mycol, iccol, npcol),
ilcm, indxg2p, and numroc are ScaLAPACK tool functions;myrow, mycol,
nprow, and npcol can be determined by calling the function
blacs_gridinfo.

Output Parameters

c (local).
On exit, sub(C) is overwritten by the QH*sub(C) if side = 'L', or sub(C)
* QH if side = 'R'.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?larfg
Generates an elementary reflector (Householder
matrix).

Syntax
void pslarfg (MKL_INT *n , float *alpha , MKL_INT *iax , MKL_INT *jax , float *x ,
MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , MKL_INT *incx , float *tau );
void pdlarfg (MKL_INT *n , double *alpha , MKL_INT *iax , MKL_INT *jax , double *x ,
MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , MKL_INT *incx , double *tau );
void pclarfg (MKL_INT *n , MKL_Complex8 *alpha , MKL_INT *iax , MKL_INT *jax ,
MKL_Complex8 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , MKL_INT *incx ,
MKL_Complex8 *tau );
void pzlarfg (MKL_INT *n , MKL_Complex16 *alpha , MKL_INT *iax , MKL_INT *jax ,
MKL_Complex16 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , MKL_INT *incx ,
MKL_Complex16 *tau );

1680
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl_scalapack.h

Description
The p?larfgfunction generates a real/complex elementary reflector H of order n, such that

where alpha is a scalar (a real scalar - for complex flavors), and sub(X) is an (n-1)-element real/complex
distributed vector X(ix:ix+n-2, jx) if incx = 1 and X(ix, jx:jx+n-2) if incx = m_x. H is represented
in the form

where tau is a real/complex scalar and v is a real/complex (n-1)-element vector. Note that H is not
Hermitian.
If the elements of sub(X) are all zero (and X(iax, jax) is real for complex flavors), then tau = 0 and H is
taken to be the unit matrix.
Otherwise 1 ≤ real(tau) ≤ 2 and abs(tau-1) ≤ 1.

Input Parameters

n (global)
The global order of the elementary reflector. n ≥ 0.

iax, jax (global)

The global row and column indices of X(iax, jax) in the global matrix X.

x (local).
Pointer into the local memory to an array of size lld_x * LOCc(n_x). This
array contains the local pieces of the distributed vector sub(X). Before
entry, the incremented array sub(X) must contain vector x.

ix, jx (global)
The row and column indices in the global matrix X indicating the first row
and the first column of sub(X), respectively.

descx (global and local)

Array of size dlen_. The array descriptor for the distributed matrix X.

1681
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

incx (global)
The global increment for the elements of x. Only two values of incx are
supported in this version, namely 1 and m_x. incx must not be zero.

Output Parameters

alpha (local)
On exit, alpha is computed in the process scope having the vector sub(X).

x (local).
On exit, it is overwritten with the vector v.

tau (local).
Array of size LOCc(jx) if incx = 1, and LOCr(ix) otherwise. This array
contains the Householder scalars related to the Householder vectors.
tau is tied to the distributed matrix X.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?larft
Forms the triangular vector T of a block reflector H=I-
V*T*VH.

Syntax
void pslarft (char *direct , char *storev , MKL_INT *n , MKL_INT *k , float *v , MKL_INT
*iv , MKL_INT *jv , MKL_INT *descv , float *tau , float *t , float *work );
void pdlarft (char *direct , char *storev , MKL_INT *n , MKL_INT *k , double *v ,
MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , double *tau , double *t , double *work );
void pclarft (char *direct , char *storev , MKL_INT *n , MKL_INT *k , MKL_Complex8 *v ,
MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , MKL_Complex8 *tau , MKL_Complex8 *t ,
MKL_Complex8 *work );
void pzlarft (char *direct , char *storev , MKL_INT *n , MKL_INT *k , MKL_Complex16
*v , MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , MKL_Complex16 *tau , MKL_Complex16
*t , MKL_Complex16 *work );

Include Files
• mkl_scalapack.h

Description
The p?larftfunction forms the triangular factor T of a real/complex block reflector H of order n, which is
defined as a product of k elementary reflectors.
If direct = 'F', H = H(1)*H(2)...*H(k), and T is upper triangular;

If direct = 'B', H = H(k)...H(2)*H(1), and T is lower triangular.

If storev = 'C', the vector which defines the elementary reflector H(i) is stored in the i-th column of the
distributed matrix V, and
H = I-V*T*V'

1682
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If storev = 'R', the vector which defines the elementary reflector H(i) is stored in the i-th row of the
distributed matrix V, and
H = I-V'*T*V.

Input Parameters

direct (global)
Specifies the order in which the elementary reflectors are multiplied to form
the block reflector:
if direct = 'F': H = H(1)*H(2)*...*H(k) (forward)

if direct = 'B': H = H(k)...H(2)*H(1) (backward).

storev (global)
Specifies how the vectors that define the elementary reflectors are stored
(See Application Notes below):
if storev = 'C': columnwise;

if storev = 'R': rowwise.

n (global)
The order of the block reflector H. n ≥ 0.

k (global)
The order of the triangular factor T, is equal to the number of elementary
reflectors.

1 ≤ k ≤ mb_v (= nb_v).

v Pointer into the local memory to an array of local size

LOCr(iv+n-1) * LOCc(jv+k-1) if storev = 'C', and

LOCr(iv+k-1) * LOCc(jv+n-1) if storev = 'R'.
The distributed matrix V contains the Householder vectors. (See Application
Notes below).

iv, jv (global)
The row and column indices in the global matrix V indicating the first row
and the first column of the matrix sub(V), respectively.

descv (local) array of size dlen_. The array descriptor for the distributed matrix V.

tau (local)
Array of size LOCr(iv+k-1) if incv = m_v, and LOCc(jv+k-1) otherwise.
This array contains the Householder scalars related to the Householder
vectors.
tau is tied to the distributed matrix V.

work (local).
Workspace array of size k*(k -1)/2.

1683
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

v
t (local)
Array of size nb_v * nb_v if storev = 'C', and mb_v * mb_v otherwise. It
contains the k-by-k triangular factor of the block reflector associated with v.
If direct = 'F', t is upper triangular;

if direct = 'B', t is lower triangular.

Application Notes
The shape of the matrix V and the storage of the vectors that define the H(i) is best illustrated by the
following example with n = 5 and k = 3. The elements equal to 1 are not stored; the corresponding array
elements are modified but restored on exit. The rest of the array is not used.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?larz
Applies an elementary reflector as returned by
p?tzrzf to a general matrix.

Syntax
void pslarz (char *side , MKL_INT *m , MKL_INT *n , MKL_INT *l , float *v , MKL_INT
*iv , MKL_INT *jv , MKL_INT *descv , MKL_INT *incv , float *tau , float *c , MKL_INT
*ic , MKL_INT *jc , MKL_INT *descc , float *work );
void pdlarz (char *side , MKL_INT *m , MKL_INT *n , MKL_INT *l , double *v , MKL_INT
*iv , MKL_INT *jv , MKL_INT *descv , MKL_INT *incv , double *tau , double *c , MKL_INT
*ic , MKL_INT *jc , MKL_INT *descc , double *work );

1684
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void pclarz (char *side , MKL_INT *m , MKL_INT *n , MKL_INT *l , MKL_Complex8 *v ,
MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , MKL_INT *incv , MKL_Complex8 *tau ,
MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work );
void pzlarz (char *side , MKL_INT *m , MKL_INT *n , MKL_INT *l , MKL_Complex16 *v ,
MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , MKL_INT *incv , MKL_Complex16 *tau ,
MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex16 *work );

Include Files
• mkl_scalapack.h

Description
The p?larzfunction applies a real/complex elementary reflector Q (or QT) to a real/complex m-by-n
distributed matrix sub(C) = C(ic:ic+m-1, jc:jc+n-1), from either the left or the right. Q is represented
in the form
Q = I-tau*v*v',
where tau is a real/complex scalar and v is a real/complex vector.
If tau = 0, then Q is taken to be the unit matrix.

Q is a product of k elementary reflectors as returned by p?tzrzf.

Input Parameters

side (global)
if side = 'L': form Q*sub(C),

if side = 'R': form sub(C)*Q, Q = QT (for real flavors).

m (global)
The number of rows in the distributed matrix sub(C). (m ≥ 0).

n (global)
The number of columns in the distributed matrix sub(C). (n ≥ 0).

l (global)
The columns of the distributed matrix sub(A) containing the meaningful
part of the Householder reflectors. If side = 'L', m ≥ l ≥ 0,

if side = 'R', n ≥ l ≥ 0.

v (local).
Pointer into the local memory to an array of size lld_v * LOCc(n_v)
containing the local pieces of the global distributed matrix V representing
the Householder transformation Q,
V(iv:iv+l-1, jv) if side = 'L' and incv = 1,

V(iv, jv:jv+l-1) if side = 'L' and incv = m_v,

V(iv:iv+l-1, jv) if side = 'R' and incv = 1,

V(iv, jv:jv+l-1) if side = 'R' and incv = m_v.

The vector v in the representation of Q. v is not used if tau = 0.

1685
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

iv, jv (global) The row and column indices in the global distributed matrix V
indicating the first row and the first column of the matrix sub(V),
respectively.

descv (global and local) array of size dlen_. The array descriptor for the
distributed matrix V.

incv (global)
The global increment for the elements of V. Only two values of incv are
supported in this version, namely 1 and m_v.
incv must not be zero.

tau (local)
Array of size LOCc(jv) if incv = 1, and LOCr(iv) otherwise. This array
contains the Householder scalars related to the Householder vectors.
tau is tied to the distributed matrix V.

c (local).
Pointer into the local memory to an array of size lld_c * LOCc(jc+n-1),
containing the local pieces of sub(C).

ic, jc (global)
The row and column indices in the global matrix C indicating the first row
and the first column of the matrix sub(C), respectively.

descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.

work (local).
Array of size lwork
If incv = 1,
if side = 'L' ,

if ivcol = iccol,
lwork ≥ NqC0
else
lwork ≥ MpC0 + max(1, NqC0)

end if
else if side = 'R' ,

lwork ≥ NqC0 + max(max(1, MpC0), numroc(numroc(n

+icoffc,nb_v,0,0,npcol),nb_v,0,0,lcmq))
end if
else if incv = m_v,
if side = 'L' ,

lwork ≥ MpC0 + max(max(1, NqC0), numroc(numroc(m

+iroffc,mb_v,0,0,nprow),mb_v,0,0,lcmp))
else if side = 'R' ,

1686
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
if ivrow = icrow,
lwork ≥ MpC0
else
lwork ≥ NqC0 + max(1, MpC0)

end if
end if
end if.
Here lcm is the least common multiple of nprow and npcol and
lcm = ilcm( nprow, npcol ), lcmp = lcm / nprow,

lcmq = lcm / npcol,

iroffc = mod( ic-1, mb_c ), icoffc = mod( jc-1, nb_c ),

icrow = indxg2p( ic, mb_c, myrow, rsrc_c, nprow ),

iccol = indxg2p( jc, nb_c, mycol, csrc_c, npcol ),

mpc0 = numroc( m+iroffc, mb_c, myrow, icrow, nprow ),

nqc0 = numroc( n+icoffc, nb_c, mycol, iccol, npcol ),

ilcm, indxg2p, and numroc are ScaLAPACK tool functions; myrow, mycol,
nprow, and npcol can be determined by calling the function
blacs_gridinfo.

Output Parameters

c (local).
On exit, sub(C) is overwritten by the Q*sub(C) if side = 'L', or
sub(C)*Q if side = 'R'.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?larzb
Applies a block reflector or its transpose/conjugate-
transpose as returned by p?tzrzf to a general
matrix.

Syntax
void pslarzb (char *side , char *trans , char *direct , char *storev , MKL_INT *m ,
MKL_INT *n , MKL_INT *k , MKL_INT *l , float *v , MKL_INT *iv , MKL_INT *jv , MKL_INT
*descv , float *t , float *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , float
*work );
void pdlarzb (char *side , char *trans , char *direct , char *storev , MKL_INT *m ,
MKL_INT *n , MKL_INT *k , MKL_INT *l , double *v , MKL_INT *iv , MKL_INT *jv , MKL_INT
*descv , double *t , double *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , double
*work );

1687
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

void pclarzb (char *side , char *trans , char *direct , char *storev , MKL_INT *m ,
MKL_INT *n , MKL_INT *k , MKL_INT *l , MKL_Complex8 *v , MKL_INT *iv , MKL_INT *jv ,
MKL_INT *descv , MKL_Complex8 *t , MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc ,
MKL_INT *descc , MKL_Complex8 *work );
void pzlarzb (char *side , char *trans , char *direct , char *storev , MKL_INT *m ,
MKL_INT *n , MKL_INT *k , MKL_INT *l , MKL_Complex16 *v , MKL_INT *iv , MKL_INT *jv ,
MKL_INT *descv , MKL_Complex16 *t , MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc ,
MKL_INT *descc , MKL_Complex16 *work );

Include Files
• mkl_scalapack.h

Description
The p?larzbfunction applies a real/complex block reflector Q or its transpose QT (conjugate transpose QH for
complex flavors) to a real/complex distributed m-by-n matrix sub(C) = C(ic:ic+m-1, jc:jc+n-1) from the
left or the right.
Q is a product of k elementary reflectors as returned by p?tzrzf.

Currently, only storev = 'R' and direct = 'B' are supported.

Input Parameters

side (global)
if side = 'L': apply Q or QT (QH for complex flavors) from the Left;

if side = 'R': apply Q or QT (QH for complex flavors) from the Right.

trans (global)
if trans = 'N': No transpose, apply Q;

If trans='T': Transpose, apply QT (real flavors);

If trans='C': Conjugate transpose, apply QH (complex flavors).

direct (global)
Indicates how H is formed from a product of elementary reflectors.
if direct = 'F': H = H(1)*H(2)*...*H(k) - forward (not supported) ;

if direct = 'B': H = H(k)...H(2)*H(1) - backward.

storev (global)
Indicates how the vectors that define the elementary reflectors are stored:
if storev = 'C': columnwise (not supported ).

if storev = 'R': rowwise.

m (global)
The number of rows in the distributed submatrix sub(C). (m ≥ 0).

n (global)
The number of columns in the distributed submatrix sub(C). (n ≥ 0).

k (global)

1688
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The order of the matrix T. (= the number of elementary reflectors whose
product defines the block reflector).

l (global)
The columns of the distributed submatrix sub(A) containing the meaningful
part of the Householder reflectors.
If side = 'L', m ≥ l ≥ 0,

if side = 'R', n ≥ l ≥ 0.

v (local).
Pointer into the local memory to an array of size lld_v * LOCc(jv+m-1) if
side = 'L', lld_v * LOCc(jv+m-1) if side = 'R'.
It contains the local pieces of the distributed vectors V representing the
Householder transformation as returned by p?tzrzf.

lld_v ≥ LOCr(iv+k-1).

iv, jv (global)
The row and column indices in the global matrix V indicating the first row
and the first column of the submatrix sub(V), respectively.

descv (global and local) array of size dlen_. The array descriptor for the
distributed matrix V.

t (local)
Array of size mb_v* mb_v.
The lower triangular matrix T in the representation of the block reflector.

c (local).
Pointer into the local memory to an array of size lld_c * LOCc(jc+n-1).

On entry, the m-by-n distributed matrix sub(C).

ic, jc (global)
The row and column indices in the global matrix C indicating the first row
and the first column of the submatrix sub(C), respectively.

descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.

work (local).

1689
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Array of size lwork.

If storev = 'C' ,
if side = 'L' ,

lwork ≥(nqc0 + mpc0)* k

else if side = 'R' ,

lwork ≥ (nqc0 + max(npv0 + numroc(numroc(n+icoffc, nb_v, 0, 0,

npcol),
nb_v, 0, 0, lcmq), mpc0))* k
end if
else if storev = 'R' ,
if side = 'L' ,

lwork ≥ (mpc0 + max(mqv0 + numroc(numroc(m+iroffc, mb_v, 0, 0,

nprow),
mb_v, 0, 0, lcmp), nqc0))* k
else if side = 'R' ,

lwork ≥ (mpc0 + nqc0) * k

end if
end if.
Here lcmq = lcm/npcol with lcm = iclm(nprow, npcol),

iroffv = mod(iv-1, mb_v), icoffv = mod( jv-1, nb_v),

ivrow = indxg2p(iv, mb_v, myrow, rsrc_v, nprow),
ivcol = indxg2p(jv, nb_v, mycol, csrc_v, npcol),
mqv0 = numroc(m+icoffv, nb_v, mycol, ivcol, npcol),
npv0 = numroc(n+iroffv, mb_v, myrow, ivrow, nprow),
iroffc = mod(ic-1, mb_c ), icoffc= mod( jc-1, nb_c),
icrow= indxg2p(ic, mb_c, myrow, rsrc_c, nprow),
iccol= indxg2p(jc, nb_c, mycol, csrc_c, npcol),
mpc0 = numroc(m+iroffc, mb_c, myrow, icrow, nprow),
npc0 = numroc(n+icoffc, mb_c, myrow, icrow, nprow),
nqc0 = numroc(n+icoffc, nb_c, mycol, iccol, npcol),
ilcm, indxg2p, and numroc are ScaLAPACK tool functions; myrow, mycol,
nprow, and npcol can be determined by calling the function
blacs_gridinfo.

Output Parameters

c (local).

1690
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
On exit, sub(C) is overwritten by the Q*sub(C), or Q'*sub(C), or
sub(C)*Q, or sub(C)*Q', where Q' is the transpose (conjugate transpose)
of Q.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?larzc
Applies (multiplies by) the conjugate transpose of an
elementary reflector as returned by p?tzrzf to a
general matrix.

Syntax
void pclarzc (char *side , MKL_INT *m , MKL_INT *n , MKL_INT *l , MKL_Complex8 *v ,
MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , MKL_INT *incv , MKL_Complex8 *tau ,
MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work );
void pzlarzc (char *side , MKL_INT *m , MKL_INT *n , MKL_INT *l , MKL_Complex16 *v ,
MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , MKL_INT *incv , MKL_Complex16 *tau ,
MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex16 *work );

Include Files
• mkl_scalapack.h

Description
The p?larzcfunction applies a complex elementary reflector QH to a complex m-by-n distributed matrix
sub(C) = C(ic:ic+m-1, jc:jc+n-1), from either the left or the right. Q is represented in the form
Q = i-tau*v*v',
where tau is a complex scalar and v is a complex vector.
If tau = 0, then Q is taken to be the unit matrix.

Q is a product of k elementary reflectors as returned by p?tzrzf.

Input Parameters

side (global)
if side = 'L': form QH*sub(C);

if side = 'R': form sub(C)*QH .

m (global)
The number of rows in the distributed matrix sub(C). (m ≥ 0).

n (global)
The number of columns in the distributed matrix sub(C). (n ≥ 0).

l (global)
The columns of the distributed matrix sub(A) containing the meaningful
part of the Householder reflectors.
If side = 'L', m ≥ l ≥ 0,

1691
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

if side = 'R', n ≥ l ≥ 0.

v (local).

Pointer into the local memory to an array of size lld_v * LOCc(n_v)

containing the local pieces of the global distributed matrix V representing
the Householder transformation Q,
V(iv:iv+l-1, jv) if side = 'L' and incv = 1,

V(iv, jv:jv+l-1) if side = 'L' and incv = m_v,

V(iv:iv+l-1, jv) if side = 'R' and incv = 1,

V(iv, jv:jv+l-1) if side = 'R' and incv = m_v.

The vector v in the representation of Q. v is not used if tau = 0.

iv, jv (global)
The row and column indices in the global matrix V indicating the first row
and the first column of the matrix sub(V), respectively.

descv (global and local) array of size dlen_. The array descriptor for the
distributed matrix V.

incv (global)
The global increment for the elements of V. Only two values of incv are
supported in this version, namely 1 and m_v.
incv must not be zero.

tau (local)
Array of size LOCc(jv) if incv = 1, and LOCr(iv) otherwise. This array
contains the Householder scalars related to the Householder vectors.
tau is tied to the distributed matrix V.

c (local).
Pointer into the local memory to an array of size lld_c * LOCc(jc+n-1),
containing the local pieces of sub(C).

ic, jc (global)
The row and column indices in the global matrix C indicating the first row
and the first column of the matrix sub(C), respectively.

descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.

work (local).

If incv = 1,
if side = 'L' ,
if ivcol = iccol,
lwork ≥ nqc0
else
lwork ≥ mpc0 + max(1, nqc0)
end if
else if side = 'R' ,
lwork ≥ nqc0 + max(max(1, mpc0), numroc(numroc(n+icoffc, nb_v,

1692
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
0, 0, npcol),
nb_v, 0, 0, lcmq)) end if
else if incv = m_v,
if side = 'L' ,
lwork ≥ mpc0 + max(max(1, nqc0), numroc(numroc(m+iroffc, mb_v,
0, 0, nprow),
mb_v, 0, 0, lcmp))
else if side = 'R',
if ivrow = icrow,
lwork ≥ mpc0
else
lwork ≥ nqc0 + max(1, mpc0)
end if
end if
end if
Here lcm is the least common multiple of nprow and npcol;
lcm = ilcm(nprow, npcol), lcmp = lcm/nprow, lcmq= lcm/npcol,
iroffc = mod(ic-1, mb_c), icoffc= mod(jc-1, nb_c),
icrow = indxg2p(ic, mb_c, myrow, rsrc_c, nprow),
iccol = indxg2p(jc, nb_c, mycol, csrc_c, npcol),
mpc0 = numroc(m+iroffc, mb_c, myrow, icrow, nprow),
nqc0 = numroc(n+icoffc, nb_c, mycol, iccol, npcol),
ilcm, indxg2p, and numroc are ScaLAPACK tool functions;
myrow, mycol, nprow, and npcol can be determined by calling the function
blacs_gridinfo.

Output Parameters

c (local).
On exit, sub(C) is overwritten by the QH*sub(C) if side = 'L', or
sub(C)*QH if side = 'R'.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?larzt
Forms the triangular factor T of a block reflector H=I-
V*T*VH as returned by p?tzrzf.

Syntax
void pslarzt (char *direct , char *storev , MKL_INT *n , MKL_INT *k , float *v , MKL_INT
*iv , MKL_INT *jv , MKL_INT *descv , float *tau , float *t , float *work );
void pdlarzt (char *direct , char *storev , MKL_INT *n , MKL_INT *k , double *v ,
MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , double *tau , double *t , double *work );
void pclarzt (char *direct , char *storev , MKL_INT *n , MKL_INT *k , MKL_Complex8 *v ,
MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , MKL_Complex8 *tau , MKL_Complex8 *t ,
MKL_Complex8 *work );

1693
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

void pzlarzt (char direct , char storev , MKL_INT n , MKL_INT k , MKL_Complex16

*v , MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , MKL_Complex16 *tau , MKL_Complex16
*t , MKL_Complex16 *work );

Include Files
• mkl_scalapack.h

Description
The p?larztfunction forms the triangular factor T of a real/complex block reflector H of order greater than
n, which is defined as a product of k elementary reflectors as returned by p?tzrzf.

If direct = 'F', H = H(1)H(2)...*H(k), and T is upper triangular;

If direct = 'B', H = H(k)...H(2)*H(1), and T is lower triangular.

If storev = 'C', the vector which defines the elementary reflector H(i), is stored in the i-th column of the
array v, and
H = i-v*t*v'.
If storev = 'R', the vector, which defines the elementary reflector H(i), is stored in the i-th row of the
array v, and
H = i-v'*t*v
Currently, only storev = 'R' and direct = 'B' are supported.

Input Parameters

direct (global)
Specifies the order in which the elementary reflectors are multiplied to form
the block reflector:
if direct = 'F': H = H(1)*H(2)*...*H(k) (Forward, not supported)

if direct = 'B': H = H(k)...H(2)*H(1) (Backward).

storev (global)
Specifies how the vectors which defines the elementary reflectors are
stored:
if storev = 'C': columnwise (not supported);

if storev = 'R': rowwise.

n (global)
The order of the block reflector H. n ≥ 0.

k (global)
The order of the triangular factor T (= the number of elementary
reflectors).
1≤k≤mb_v(= nb_v).

v Pointer into the local memory to an array of local size LOCr(iv+k-1) *

LOCc(jv+n-1).
The distributed matrix V contains the Householder vectors. See Application
Notes below.

1694
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
iv, jv (global) The row and column indices in the global matrix V indicating the
first row and the first column of the matrix sub(V), respectively.

descv (local) array of size dlen_. The array descriptor for the distributed matrix V.

work (local).
Workspace array of size(k*(k-1)/2).

Output Parameters

v
t (local)
Array of size mb_v* mb_v. It contains the k-by-k triangular factor of the
block reflector associated with v. t is lower triangular.

1695
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?lascl
Multiplies a general rectangular matrix by a real scalar
defined as Cto/Cfrom.

1696
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void pslascl (char *type , float *cfrom , float *cto , MKL_INT *m , MKL_INT *n , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *info );
void pdlascl (char *type , double *cfrom , double *cto , MKL_INT *m , MKL_INT *n ,
double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *info );
void pclascl (char *type , float *cfrom , float *cto , MKL_INT *m , MKL_INT *n ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *info );
void pzlascl (char *type , double *cfrom , double *cto , MKL_INT *m , MKL_INT *n ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?lasclfunction multiplies the m-by-n real/complex distributed matrix sub(A) denoting A(ia:ia+m-1,
ja:ja+n-1) by the real/complex scalar cto/cfrom. This is done without over/underflow as long as the final
result cto*A(i,j)/cfrom does not over/underflow. type specifies that sub(A) may be full, upper triangular,
lower triangular or upper Hessenberg.

Input Parameters

type (global)
type indicates the storage type of the input distributed matrix.
if type = 'G': sub(A) is a full matrix,

if type = 'L': sub(A) is a lower triangular matrix,

if type = 'U': sub(A) is an upper triangular matrix,

if type = 'H': sub(A) is an upper Hessenberg matrix.

cfrom, cto (global)

The distributed matrix sub(A) is multiplied by cto/cfrom. A(i,j) is
computed without over/underflow if the final result cto*A(i,j)/cfrom can be
represented without over/underflow. cfrom must be nonzero.

m (global)
The number of rows in the distributed matrix sub(A). (m≥0).

n (global)
The number of columns in the distributed matrix sub(A). (n≥0).

a (local input/local output)

Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).

This array contains the local pieces of the distributed matrix sub(A).

ia, ja (global)
The column and row indices in the global matrix A indicating the first row
and column of the matrix sub(A), respectively.

desca (global and local)

1697
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Array of size dlen_.The array descriptor for the distributed matrix A.

Output Parameters

a (local).
On exit, this array contains the local pieces of the distributed matrix
multiplied by cto/cfrom.

info (local)
if info = 0: the execution is successful.

if info < 0: If the i-th argument is an array and the j-th entry, indexed
j-1, had an illegal value, then info = -(i*100+j),

if the i-th argument is a scalar and had an illegal value, then info = -i.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?lase2
Initializes an m-by-n distributed matrix.

Syntax
void pslase2 (const char* uplo, const MKL_INT* m, const MKL_INT* n, const float* alpha,
const float* beta, float* a, const MKL_INT* ia, const MKL_INT* ja, const MKL_INT*
desca);
void pdlase2 (const char* uplo, const MKL_INT* m, const MKL_INT* n, const double*
alpha, const double* beta, double* a, const MKL_INT* ia, const MKL_INT* ja, const
MKL_INT* desca);
void pclase2 (const char* uplo, const MKL_INT* m, const MKL_INT* n, const MKL_Complex8*
alpha, const MKL_Complex8* beta, MKL_Complex8* a, const MKL_INT* ia, const MKL_INT* ja,
const MKL_INT* desca);
void pzlase2 (const char* uplo, const MKL_INT* m, const MKL_INT* n, const
MKL_Complex16* alpha, const MKL_Complex16* beta, MKL_Complex16* a, const MKL_INT* ia,
const MKL_INT* ja, const MKL_INT* desca);

Include Files
• mkl_scalapack.h

Description
p?lase2 initializes an m-by-n distributed matrix sub( A ) denoting A(ia:ia+m-1,ja:ja+n-1) to beta on the
diagonal and alpha on the off-diagonals. p?lase2 requires that only the dimension of the matrix operand is
distributed.

Input Parameters

uplo (global)
Specifies the part of the distributed matrix sub( A ) to be set:
= 'U': Upper triangular part is set; the strictly lower triangular part of
sub( A ) is not changed;

1698
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
= 'L': Lower triangular part is set; the strictly upper triangular part of
sub( A ) is not changed;
Otherwise: All of the matrix sub( A ) is set.

m (global)
The number of rows to be operated on i.e the number of rows of the
distributed submatrix sub( A ). m >= 0.

n (global)
The number of columns to be operated on i.e the number of columns of the
distributed submatrix sub( A ). n >= 0.

alpha (global)
The constant to which the off-diagonal elements are to be set.

beta (global)
The constant to which the diagonal elements are to be set.

ia (global)
The row index in the global array a indicating the first row of sub( A ).

ja (global)
The column index in the global array a indicating the first column of
sub( A ).

desca (global and local)

Array of size dlen_.
The array descriptor for the distributed matrix A.

Output Parameters

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1).

This array contains the local pieces of the distributed matrix sub( A )
to be set.
On exit, the leading m-by-n submatrix sub( A ) is set as follows:

if uplo = 'U', A(ia+i-1,ja+j-1) = alpha, 1<=i<=j-1, 1<=j<=n,

if uplo = 'L', A(ia+i-1,ja+j-1) = alpha, j+1<=i<=m, 1<=j<=n,

otherwise, A(ia+i-1,ja+j-1) = alpha, 1<=i<=m, 1<=j<=n, ia+i !=

ja+j,
and, for all uplo, A(ia+i-1,ja+i-1) = beta, 1<=i<=min(m,n).

p?laset
Initializes the offdiagonal elements of a matrix to alpha
and the diagonal elements to beta.

1699
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
void pslaset (char *uplo , MKL_INT *m , MKL_INT *n , float *alpha , float *beta , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca );
void pdlaset (char *uplo , MKL_INT *m , MKL_INT *n , double *alpha , double *beta ,
double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca );
void pclaset (char *uplo , MKL_INT *m , MKL_INT *n , MKL_Complex8 *alpha , MKL_Complex8
*beta , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca );
void pzlaset (char *uplo , MKL_INT *m , MKL_INT *n , MKL_Complex16 *alpha ,
MKL_Complex16 *beta , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca );

Include Files
• mkl_scalapack.h

Description
The p?lasetfunction initializes an m-by-n distributed matrix sub(A) denoting A(ia:ia+m-1, ja:ja+n-1) to
beta on the diagonal and alpha on the offdiagonals.

Input Parameters

uplo (global)
Specifies the part of the distributed matrix sub(A) to be set:
if uplo = 'U': upper triangular part; the strictly lower triangular part of
sub(A) is not changed;
if uplo = 'L': lower triangular part; the strictly upper triangular part of
sub(A) is not changed.
Otherwise: all of the matrix sub(A) is set.

m (global)
The number of rows in the distributed matrix sub(A). (m≥0).

n (global)
The number of columns in the distributed matrix sub(A). (n≥0).

alpha (global).
The constant to which the offdiagonal elements are to be set.

beta (global).
The constant to which the diagonal elements are to be set.

Output Parameters

a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).

This array contains the local pieces of the distributed matrix sub(A) to be
set. On exit, the leading m-by-n matrix sub(A) is set as follows:
if uplo = 'U', A(ia+i-1, ja+j-1) = alpha, 1≤i≤j-1, 1≤j≤n,

if uplo = 'L', A(ia+i-1, ja+j-1) = alpha, j+1≤i≤ m, 1≤j≤n,

1700
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
otherwise, A(ia+i-1, ja+j-1) = alpha, 1≤i≤m, 1≤j≤n, ia+i≠ja+j, and, for
all uplo, A(ia+i-1, ja+i-1) = beta, 1≤i≤min(m,n).

ia, ja (global)
The column and row indices in the distributed matrix A indicating the first
row and column of the matrix sub(A), respectively.

desca (global and local)

Array of size dlen_. The array descriptor for the distributed matrix A.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?lasmsub
Looks for a small subdiagonal element from the
bottom of the matrix that it can safely set to zero.

Syntax
void pslasmsub (const float *a, const MKL_INT *desca, const MKL_INT *i, const MKL_INT
*l, MKL_INT *k, const float *smlnum, float *buf, const MKL_INT *lwork );
void pdlasmsub (const double *a, const MKL_INT *desca, const MKL_INT *i, const MKL_INT
*l, MKL_INT *k, const double *smlnum, double *buf, const MKL_INT *lwork );
void pclasmsub (const MKL_Complex8 *a , const MKL_INT *desca , const MKL_INT *i , const
MKL_INT *l , MKL_INT *k , const float *smlnum , MKL_Complex8 *buf , const MKL_INT
*lwork );
void pzlasmsub (const MKL_Complex16 *a , const MKL_INT *desca , const MKL_INT *i ,
const MKL_INT *l , MKL_INT *k , const double *smlnum , MKL_Complex16 *buf , const
MKL_INT *lwork );

Include Files
• mkl_scalapack.h

Description
The p?lasmsubfunction looks for a small subdiagonal element from the bottom of the matrix that it can
safely set to zero. This function performs a global maximum and must be called by all processes.

Input Parameters

a (local)
Array of size lld_a*LOCc(n_a).
On entry, the Hessenberg matrix whose tridiagonal part is being scanned.
Unchanged on exit.

desca (global and local)

Array of size dlen_. The array descriptor for the distributed matrix A.

i (global)
The global location of the bottom of the unreduced submatrix of A.
Unchanged on exit.

1701
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

l (global)
The global location of the top of the unreduced submatrix of A.
Unchanged on exit.

smlnum (global)
On entry, a "small number" for the given matrix. Unchanged on exit. The
machine-dependent constants for the stopping criterion.

lwork (local)
This must be at least 2*ceil(ceil((i-l)/mb_a )/ lcm(nprow,npcol)).
Here lcm is least common multiple and nprowxnpcol is the logical grid size.

Output Parameters

k (global)
On exit, this yields the bottom portion of the unreduced submatrix. This will
satisfy: l ≤ k ≤ i-1.

buf (local).
Array of size lwork.

Application Notes
This routine parallelizes the code from ?lahqr that looks for a single small subdiagonal element.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?lasrt
Sorts the numbers in an array and the corresponding
vectors in increasing order.

Syntax
void pslasrt (const char* id, const MKL_INT* n, float* d, const float* q, const
MKL_INT* iq, const MKL_INT* jq, const MKL_INT* descq, float* work, const MKL_INT*
lwork, MKL_INT* iwork, const MKL_INT* liwork, MKL_INT* info);
void pdlasrt (const char* id, const MKL_INT* n, double* d, const double* q, const
MKL_INT* iq, const MKL_INT* jq, const MKL_INT* descq, double* work, const MKL_INT*
lwork, MKL_INT* iwork, const MKL_INT* liwork, MKL_INT* info);

Include Files
• mkl_scalapack.h

Description
p?lasrt sorts the numbers in d and the corresponding vectors in q in increasing order.

Input Parameters

id (global)
= 'I': sort d in increasing order;

1702
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
= 'D': sort d in decreasing order. (NOT IMPLEMENTED YET)

n (global)
The number of columns to be operated on i.e the number of columns of the
distributed submatrix sub( Q ). n >= 0.

d (global)
Array, size (n)

q (local)
Pointer into the local memory to an array of size lld_q*LOCc(jq+n-1) .

This array contains the local pieces of the distributed matrix sub( A ) to be
copied from.

iq (global)
The row index in the global array A indicating the first row of sub( Q ).

jq (global)
The column index in the global array A indicating the first column of
sub( Q ).

descq (global and local)

Array of size dlen_.
The array descriptor for the distributed matrix A.

work (local)
Array, size (lwork)

lwork (local)
The size of the array work.

lwork = MAX( n, NP * ( NB + NQ )), where NP = numroc( n, NB, MYROW,

IAROW, NPROW ), NQ = numroc( n, NB, MYCOL, DESCQ( csrc_ ), NPCOL ).

numroc is a ScaLAPACK tool function.

iwork (local)
Array, size (liwork)

liwork (local)
The size of the array iwork.

liwork = n + 2NB + 2NPCOL

Output Parameters

d On exit, the numbers in d are sorted in increasing order.

info (global)
= 0: successful exit

1703
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

< 0: If the i-th argument is an array and the j-th entry had an illegal
value, then info = -(i*100+j), if the i-th argument is a scalar and
had an illegal value, then info = -i.

p?lassq
Updates a sum of squares represented in scaled form.

Syntax
void pslassq (MKL_INT *n , float *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx ,
MKL_INT *incx , float *scale , float *sumsq );
void pdlassq (MKL_INT *n , double *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx ,
MKL_INT *incx , double *scale , double *sumsq );
void pclassq (MKL_INT *n , MKL_Complex8 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT
*descx , MKL_INT *incx , float *scale , float *sumsq );
void pzlassq (MKL_INT *n , MKL_Complex16 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT
*descx , MKL_INT *incx , double *scale , double *sumsq );

Include Files
• mkl_scalapack.h

Description
The p?lassqfunction returns the values scl and smsq such that

scl 2 * smsq = x12 + ... + xn2 + scale 2*sumsq,

where
xi= sub(X) = X(ix + (jx-1)*m_x + (i - 1)*incx) for pslassq/pdlassq ,

xi= sub(X) = abs(X(ix + (jx-1)m_x + (i - 1)incx) for pclassq/pzlassq.

For real functions pslassq/pdlassq the value of sumsq is assumed to be non-negative and scl returns the
value
scl = max(scale, abs(xi)).
For complex functions pclassq/pzlassq the value of sumsq is assumed to be at least unity and the value of
ssq will then satisfy
1.0 ≤ ssq ≤sumsq +2n

Value scale is assumed to be non-negative and scl returns the value

For all functions p?lassq values scale and sumsq must be supplied in scale and sumsq respectively, and
scale and sumsq are overwritten by scl and ssq respectively.
All functions p?lassq make only one pass through the vector sub(X).

Input Parameters

n (global)
The length of the distributed vector sub(x ).

1704
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
x The array that stores the vector for which a scaled sum of squares is
computed:
x[ix + (jx-1)*m_x + i*incx], 0 ≤ i < n.

ix (global)
The row index in the global matrix X indicating the first row of sub(X).

jx (global)
The column index in the global matrix X indicating the first column of
sub(X).

descx (global and local) array of size dlen_.

The array descriptor for the distributed matrix X.

incx (global)
The global increment for the elements of X. Only two values of incx are
supported in this version, namely 1 and m_x. The argument incx must not
equal zero.

scale (local).
On entry, the value scale in the equation above.

sumsq (local)
On entry, the value sumsq in the equation above.

Output Parameters

scale (local).
On exit, scale is overwritten with scl , the scaling factor for the sum of
squares.

sumsq (local).
On exit, sumsq is overwritten with the value smsq, the basic sum of squares
from which scl has been factored out.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?laswp
Performs a series of row interchanges on a general
rectangular matrix.

Syntax
void pslaswp (char *direc , char *rowcol , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , MKL_INT *k1 , MKL_INT *k2 , MKL_INT *ipiv );
void pdlaswp (char *direc , char *rowcol , MKL_INT *n , double *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_INT *k1 , MKL_INT *k2 , MKL_INT *ipiv );
void pclaswp (char *direc , char *rowcol , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_INT *k1 , MKL_INT *k2 , MKL_INT *ipiv );
void pzlaswp (char *direc , char *rowcol , MKL_INT *n , MKL_Complex16 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *k1 , MKL_INT *k2 , MKL_INT *ipiv );

1705
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl_scalapack.h

Description
The p?laswpfunction performs a series of row or column interchanges on the distributed matrix
sub(A)=A(ia:ia+n-1, ja:ja+n-1). One interchange is initiated for each of rows or columns k1 through k2
of sub(A). This function assumes that the pivoting information has already been broadcast along the process
row or column. Also note that this function will only work for k1-k2 being in the same mb (or nb) block. If
you want to pivot a full matrix, use p?lapiv.

Input Parameters

direc (global)
Specifies in which order the permutation is applied:
= 'F' - forward,

= 'B' - backward.

rowcol (global)
Specifies if the rows or columns are permuted:
= 'R' - rows,

= 'C' - columns.

n (global)
If rowcol='R', the length of the rows of the distributed matrix A(*,
ja:ja+n-1) to be permuted;
If rowcol='C', the length of the columns of the distributed matrix A(ia:ia
+n-1 , *) to be permuted;

a (local)
Pointer into the local memory to an array of size lld_a * LOCc(n_a). On
entry, this array contains the local pieces of the distributed matrix to which
the row/columns interchanges will be applied.

ia (global)
The row index in the global matrix A indicating the first row of sub(A).

ja (global)
The column index in the global matrix A indicating the first column of
sub(A).

desca (global and local) array of size dlen_.

The array descriptor for the distributed matrix A.

k1 (global)
The first element of ipiv for which a row or column interchange will be done.

k2 (global)
The last element of ipiv for which a row or column interchange will be done.

1706
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ipiv (local)
Array of size LOCr(m_a)+mb_a for row pivoting and LOCr(n_a)+nb_a for
column pivoting. This array is tied to the matrix A, ipiv[k]=l implies rows
(or columns) k+1 and l are to be interchanged, k = 0, 1, ..., size (ipiv) -1.

Output Parameters

A (local)
On exit, the permuted distributed matrix.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?latra
Computes the trace of a general square distributed
matrix.

Syntax
float pslatra (MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca );
double pdlatra (MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca );
void pclatra (MKL_Complex8 * , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca );
void pzlatra (MKL_Complex16 * , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca );

Include Files
• mkl_scalapack.h

Description
This function computes the trace of an n-by-n distributed matrix sub(A) denoting A(ia:ia+n-1, ja:ja
+n-1). The result is left on every process of the grid.

Input Parameters

n (global)
The number of rows and columns to be operated on, that is, the order of
the distributed matrix sub(A). n ≥0.

a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1)
containing the local pieces of the distributed matrix, the trace of which is to
be computed.

ia, ja (global) The row and column indices respectively in the global matrix A
indicating the first row and the first column of the matrix sub(A),
respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

1707
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters

val The value returned by the function.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?latrd
Reduces the first nb rows and columns of a
symmetric/Hermitian matrix A to real tridiagonal form
by an orthogonal/unitary similarity transformation.

Syntax
void pslatrd (char *uplo , MKL_INT *n , MKL_INT *nb , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *d , float *e , float *tau , float *w , MKL_INT *iw ,
MKL_INT *jw , MKL_INT *descw , float *work );
void pdlatrd (char *uplo , MKL_INT *n , MKL_INT *nb , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *d , double *e , double *tau , double *w , MKL_INT *iw ,
MKL_INT *jw , MKL_INT *descw , double *work );
void pclatrd (char *uplo , MKL_INT *n , MKL_INT *nb , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *d , float *e , MKL_Complex8 *tau , MKL_Complex8
*w , MKL_INT *iw , MKL_INT *jw , MKL_INT *descw , MKL_Complex8 *work );
void pzlatrd (char *uplo , MKL_INT *n , MKL_INT *nb , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *d , double *e , MKL_Complex16 *tau ,
MKL_Complex16 *w , MKL_INT *iw , MKL_INT *jw , MKL_INT *descw , MKL_Complex16 *work );

Include Files
• mkl_scalapack.h

Description
The p?latrdfunction reduces nb rows and columns of a real symmetric or complex Hermitian matrix
sub(A)= A(ia:ia+n-1, ja:ja+n-1) to symmetric/complex tridiagonal form by an orthogonal/unitary
similarity transformation Q'*sub(A)*Q, and returns the matrices V and W, which are needed to apply the
transformation to the unreduced part of sub(A).
If uplo = U, p?latrd reduces the last nb rows and columns of a matrix, of which the upper triangle is
supplied;
if uplo = L, p?latrd reduces the first nb rows and columns of a matrix, of which the lower triangle is
supplied.
This is an auxiliary function called by p?sytrd/p?hetrd.

Input Parameters

uplo (global)
Specifies whether the upper or lower triangular part of the symmetric/
Hermitian matrix sub(A) is stored:
= 'U': Upper triangular

= L: Lower triangular.

1708
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
n (global)
The number of rows and columns to be operated on, that is, the order of
the distributed matrix sub(A). n ≥ 0.

nb (global)
The number of rows and columns to be reduced.

a Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).

On entry, this array contains the local pieces of the symmetric/Hermitian

distributed matrix sub(A).
If uplo = U, the leading n-by-n upper triangular part of sub(A) contains
the upper triangular part of the matrix, and its strictly lower triangular part
is not referenced.
If uplo = L, the leading n-by-n lower triangular part of sub(A) contains the
lower triangular part of the matrix, and its strictly upper triangular part is
not referenced.

ia (global)
The row index in the global matrix A indicating the first row of sub(A).

ja (global)
The column index in the global matrix A indicating the first column of
sub(A).

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

iw (global)
The row index in the global matrix W indicating the first row of sub(W).

jw (global)
The column index in the global matrix W indicating the first column of
sub(W).

descw (global and local) array of size dlen_. The array descriptor for the
distributed matrix W.

work (local)
Workspace array of size nb_a.

Output Parameters

a (local)
On exit, if uplo = 'U', the last nb columns have been reduced to
tridiagonal form, with the diagonal elements overwriting the diagonal
elements of sub(A); the elements above the diagonal with the array tau
represent the orthogonal/unitary matrix Q as a product of elementary
reflectors;

1709
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

if uplo = 'L', the first nb columns have been reduced to tridiagonal form,
with the diagonal elements overwriting the diagonal elements of sub(A);
the elements below the diagonal with the array tau represent the
orthogonal/unitary matrix Q as a product of elementary reflectors.

d (local)
Array of size LOCc(ja+n-1).

The diagonal elements of the tridiagonal matrix T: d[i] = A(i+1,i+1), i = 0,

1, ..., LOCc(ja+n-1)-1. d is tied to the distributed matrix A.

e (local)
Array of size LOCc(ja+n-1) if uplo = 'U', LOCc(ja+n-2) otherwise.

The off-diagonal elements of the tridiagonal matrix T:

e[i] = A(i + 1, i + 2) if uplo = 'U',
e[i] = A(i + 2, i + 1) if uplo = 'L',
i = 0, 1, ..., LOCc(ja+n-1)-1.

e is tied to the distributed matrix A.

tau (local)
Array of size LOCc(ja+n-1). This array contains the scalar factors of the
elementary reflectors. tau is tied to the distributed matrix A.

w (local)
Pointer into the local memory to an array of size lld_w* nb_w. This array
contains the local pieces of the n-by-nb_w matrix w required to update the
unreduced part of sub(A).

Application Notes
If uplo = 'U', the matrix Q is represented as a product of elementary reflectors

Q = H(n)*H(n-1)*...*H(n-nb+1)
Each H(i) has the form

H(i) = I - tau*v*v' ,
where tau is a real/complex scalar, and v is a real/complex vector with v(i:n) = 0 and v(i-1) = 1;
v(1:i-1) is stored on exit in A(ia:ia+i-1, ja+i), and tau in tau[ja+i-2].
If uplo = L, the matrix Q is represented as a product of elementary reflectors

Q = H(1)*H(2)*...*H(nb)
Each H(i) has the form

H(i) = I - tau*v*v' ,
where tau is a real/complex scalar, and v is a real/complex vector with v(1:i) = 0 and v(i+1) = 1; v(i
+2: n) is stored on exit in A(ia+i+1: ia+n-1, ja+i-1), and tau in tau[ja+i-2].
The elements of the vectors v together form the n-by-nb matrix V which is needed, with W, to apply the
transformation to the unreduced part of the matrix, using a symmetric/Hermitian rank-2k update of the
form:
sub(A) := sub(A)-vw'-wv'.

1710
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The contents of a on exit are illustrated by the following examples with
n = 5 and nb = 2:

where d denotes a diagonal element of the reduced matrix, a denotes an element of the original matrix that
is unchanged, and vi denotes an element of the vector defining H(i).

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?latrs
Solves a triangular system of equations with the scale
factor set to prevent overflow.

Syntax
void pslatrs (char *uplo , char *trans , char *diag , char *normin , MKL_INT *n , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *x , MKL_INT *ix , MKL_INT *jx ,
MKL_INT *descx , float *scale , float *cnorm , float *work );
void pdlatrs (char *uplo , char *trans , char *diag , char *normin , MKL_INT *n , double
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *x , MKL_INT *ix , MKL_INT
*jx , MKL_INT *descx , double *scale , double *cnorm , double *work );
void pclatrs (char *uplo , char *trans , char *diag , char *normin , MKL_INT *n ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *x ,
MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , float *scale , float *cnorm , MKL_Complex8
*work );
void pzlatrs (char *uplo , char *trans , char *diag , char *normin , MKL_INT *n ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *x ,
MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , double *scale , double *cnorm ,
MKL_Complex16 *work );

Include Files
• mkl_scalapack.h

Description
The p?latrsfunction solves a triangular system of equations Ax = sb, ATx = sb or AHx = sb, where s is a
scale factor set to prevent overflow. The description of the function will be extended in the future releases.

1711
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

uplo Specifies whether the matrix A is upper or lower triangular.

= 'U': Upper triangular

= 'L': Lower triangular

trans Specifies the operation applied to Ax.

= 'N': Solve Ax = s*b (no transpose)

= 'T': Solve ATx = s*b (transpose)

= 'C': Solve AHx = s*b (conjugate transpose),

where s - is a scale factor

diag Specifies whether or not the matrix A is unit triangular.

= 'N': Non-unit triangular

= 'U': Unit triangular

normin Specifies whether cnorm has been set or not.

= 'Y': cnorm contains the column norms on entry;

= 'N': cnorm is not set on entry. On exit, the norms will be computed and
stored in cnorm.

n The order of the matrix A. n ≥ 0

a Array of size lda* n. Contains the triangular matrix A.

If uplo = U, the leading n-by-n upper triangular part of the array a
contains the upper triangular matrix, and the strictly lower triangular part
of a is not referenced.

If uplo = 'L', the leading n-by-n lower triangular part of the array a
contains the lower triangular matrix, and the strictly upper triangular part
of a is not referenced.

If diag = 'U', the diagonal elements of a are also not referenced and are
assumed to be 1.

ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

x Array of size n. On entry, the right hand side b of the triangular system.

ix (global).The row index in the global matrix X indicating the first row of
sub(x).

jx (global)
The column index in the global matrix X indicating the first column of
sub(X).

1712
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
descx (global and local)
Array of size dlen_. The array descriptor for the distributed matrix X.

cnorm Array of size n. If normin = 'Y', cnorm is an input argument and cnorm[j]
contains the norm of the off-diagonal part of the (j+1)-th column of the
matrix A, j=0, 1, ..., n-1. If trans = 'N', cnorm[j] must be greater than or
equal to the infinity-norm, and if trans = 'T' or 'C', cnorm[j] must be
greater than or equal to the 1-norm.

work (local).
Temporary workspace.

Output Parameters

X On exit, x is overwritten by the solution vector x.

scale Array of size lda* n. The scaling factor s for the triangular system as
described above.
If scale = 0, the matrix A is singular or badly scaled, and the vector x is
an exact or approximate solution to Ax = 0.

cnorm If normin = 'N', cnorm is an output argument and cnorm[j] returns the 1-
norm of the off-diagonal part of the (j+1)-th column of A, j=0, 1, ..., n-1.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?latrz
Reduces an upper trapezoidal matrix to upper
triangular form by means of orthogonal/unitary
transformations.

Syntax
void pslatrz (MKL_INT *m , MKL_INT *n , MKL_INT *l , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *tau , float *work );
void pdlatrz (MKL_INT *m , MKL_INT *n , MKL_INT *l , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *tau , double *work );
void pclatrz (MKL_INT *m , MKL_INT *n , MKL_INT *l , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work );
void pzlatrz (MKL_INT *m , MKL_INT *n , MKL_INT *l , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work );

Include Files
• mkl_scalapack.h

Description
The p?latrzfunction reduces the m-by-n(m ≤ n) real/complex upper trapezoidal matrix sub(A) =
[A(ia:ia+m-1, ja:ja+m-1)A(ia:ia+m-1, ja+n-l:ja+n-1)] to upper triangular form by means of
orthogonal/unitary transformations.
The upper trapezoidal matrix sub(A) is factored as

1713
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

sub(A) = ( R 0 )*Z,
where Z is an n-by-n orthogonal/unitary matrix and R is an m-by-m upper triangular matrix.

Input Parameters

m (global)
The number of rows in the distributed matrix sub(A). m ≥ 0.

n (global)
The number of columns in the distributed matrix sub(A). n ≥ 0.

l (global)
The number of columns of the distributed matrix sub(A) containing the
meaningful part of the Householder reflectors. l > 0.

a (local)
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1). On
entry, the local pieces of the m-by-n distributed matrix sub(A), which is to
be factored.

ia (global)
The row index in the global matrix A indicating the first row of sub(A).

ja (global)
The column index in the global matrix A indicating the first column of
sub(A).

desca (global and local) array of size dlen_.

The array descriptor for the distributed matrix A.

work (local)
Workspace array of size lwork.
lwork ≥ nq0 + max(1, mp0), where
iroff = mod(ia-1, mb_a),
icoff = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, myrow, rsrc_a, nprow),
iacol = indxg2p(ja, nb_a, mycol, csrc_a, npcol),
mp0 = numroc(m+iroff, mb_a, myrow, iarow, nprow),
nq0 = numroc(n+icoff, nb_a, mycol, iacol, npcol),
numroc, indxg2p, and numroc are ScaLAPACK tool functions; myrow,
mycol, nprow, and npcol can be determined by calling the function
blacs_gridinfo.

1714
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

a On exit, the leading m-by-m upper triangular part of sub(A) contains the
upper triangular matrix R, and elements n-l+1 to n of the first m rows of
sub(A), with the array tau, represent the orthogonal/unitary matrix Z as a
product of m elementary reflectors.

tau (local)
Array of sizeLOCr(ja+m-1). This array contains the scalar factors of the
elementary reflectors. tau is tied to the distributed matrix A.

Application Notes
The factorization is obtained by Householder's method. The k-th transformation matrix, Z(k), which is used
(or, in case of complex functions, whose conjugate transpose is used) to introduce zeros into the (m - k +
1)-th row of sub(A), is given in the form

where

tau is a scalar and z( k ) is an (n-m)-element vector. tau and z( k ) are chosen to annihilate the elements
of the k-th row of sub(A). The scalar tau is returned in the k-th element of tau, indexed k-1, and the vector
u( k ) in the k-th row of sub(A), such that the elements of z(k ) are in A( k, m + 1 ), ..., A( k,
n ). The elements of R are returned in the upper triangular part of sub(A).
Z is given by

Z = Z(1)Z(2)...Z(m).

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?lauu2
Computes the product U*U' or L'*L, where U and L
are upper or lower triangular matrices (local
unblocked algorithm).

Syntax
void pslauu2 (char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca );
void pdlauu2 (char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca );
void pclauu2 (char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca );

1715
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

void pzlauu2 (char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca );

Include Files
• mkl_scalapack.h

Description
The p?lauu2function computes the product U*U' or L'*L, where the triangular factor U or L is stored in the
upper or lower triangular part of the distributed matrix
sub(A)= A(ia:ia+n-1, ja:ja+n-1).
If uplo = 'U' or 'u', then the upper triangle of the result is stored, overwriting the factor U in sub(A).

If uplo = 'L' or 'l', then the lower triangle of the result is stored, overwriting the factor L in sub(A).

This is the unblocked form of the algorithm, calling BLAS Level 2 Routines. No communication is performed
by this function, the matrix to operate on should be strictly local to one process.

Input Parameters

uplo (global)
Specifies whether the triangular factor stored in the matrix sub(A) is upper
or lower triangular:
= U: upper triangular

= L: lower triangular.

n (global)
The number of rows and columns to be operated on, that is, the order of
the triangular factor U or L. n ≥ 0.

a (local)
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1). On
entry, the local pieces of the triangular factor U or L.

ia (global)
The row index in the global matrix A indicating the first row of sub(A).

ja (global)
The column index in the global matrix A indicating the first column of
sub(A).

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

Output Parameters

a (local)
On exit, if uplo = 'U', the upper triangle of the distributed matrix sub(A)
is overwritten with the upper triangle of the product U*U'; if uplo = 'L',
the lower triangle of sub(A) is overwritten with the lower triangle of the
product L'*L.

1716
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?lauum
Computes the product U*U' or L'*L, where U and L
are upper or lower triangular matrices.

Syntax
void pslauum (char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca );
void pdlauum (char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca );
void pclauum (char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca );
void pzlauum (char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca );

Include Files
• mkl_scalapack.h

Description
The p?lauumfunction computes the product U*U' or L'*L, where the triangular factor U or L is stored in the
upper or lower triangular part of the matrix sub(A)= A(ia:ia+n-1, ja:ja+n-1).

If uplo = 'U' or 'u', then the upper triangle of the result is stored, overwriting the factor U in sub(A). If
uplo = 'L' or 'l', then the lower triangle of the result is stored, overwriting the factor L in sub(A).

This is the blocked form of the algorithm, calling Level 3 PBLAS.

Input Parameters

uplo (global)
Specifies whether the triangular factor stored in the matrix sub(A) is upper
or lower triangular:
= 'U': upper triangular

= 'L': lower triangular.

n (global)
The number of rows and columns to be operated on, that is, the order of
the triangular factor U or L. n ≥ 0.

a (local)
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1). On
entry, the local pieces of the triangular factor U or L.

ia (global)
The row index in the global matrix A indicating the first row of sub(A).

ja (global)

1717
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The column index in the global matrix A indicating the first column of
sub(A).

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

Output Parameters

a (local)
On exit, if uplo = 'U', the upper triangle of the distributed matrix sub(A)
is overwritten with the upper triangle of the product U*U' ; if uplo = 'L',
the lower triangle of sub(A) is overwritten with the lower triangle of the
product L'*L.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?lawil
Forms the Wilkinson transform.

Syntax
void pslawil (const MKL_INT *ii, const MKL_INT *jj, const MKL_INT *m, const float *a,
const MKL_INT *desca, const float *h44, const float *h33, const float *h43h34, float
*v );
void pdlawil (const MKL_INT *ii, const MKL_INT *jj, const MKL_INT *m, const double *a,
const MKL_INT *desca, const double *h44, const double *h33, const double *h43h34,
double *v );
void pclawil (const MKL_INT *ii , const MKL_INT *jj , const MKL_INT *m , const
MKL_Complex8 *a , const MKL_INT *desca , const MKL_Complex8 *h44 , const MKL_Complex8
*h33 , const MKL_Complex8 *h43h34 , MKL_Complex8 *v );
void pzlawil (const MKL_INT *ii , const MKL_INT *jj , const MKL_INT *m , const
MKL_Complex16 *a , const MKL_INT *desca , const MKL_Complex16 *h44 , const
MKL_Complex16 *h33 , const MKL_Complex16 *h43h34 , MKL_Complex16 *v );

Include Files
• mkl_scalapack.h

Description
The p?lawilfunction gets the transform given by h44, h33, and h43h34 into v starting at row m.

Input Parameters

ii (global)
Number of the process row which owns the matrix element A(m+2, m+2).

jj (global)
Number of the process column which owns the matrix element A(m+2, m
+2).

m (global)

1718
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
On entry, the location from where the transform starts (row m). Unchanged
on exit.

a (local)
Array of size lld_a*LOCc(n_a).
On entry, the Hessenberg matrix. Unchanged on exit.

desca (global and local)

Array of size dlen_. The array descriptor for the distributed matrix A.
Unchanged on exit.

h43h34 (global)
These three values are for the double shift QR iteration. Unchanged on exit.

Output Parameters

v (global)
Array of size 3 that contains the transform on output.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?org2l/p?ung2l
Generates all or part of the orthogonal/unitary matrix
Q from a QL factorization determined by p?geqlf
(unblocked algorithm).

Syntax
void psorg2l (MKL_INT *m , MKL_INT *n , MKL_INT *k , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdorg2l (MKL_INT *m , MKL_INT *n , MKL_INT *k , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
void pcung2l (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
void pzung2l (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?org2l/p?ung2lfunction generates an m-by-n real/complex distributed matrix Q denoting A(ia:ia
+m-1, ja:ja+n-1) with orthonormal columns, which is defined as the last n columns of a product of k
elementary reflectors of order m:
Q = H(k)*...*H(2)*H(1) as returned by p?geqlf.

1719
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

m (global)
The number of rows in the distributed submatrix Q. m ≥ 0.

n (global)
The number of columns in the distributed submatrix Q. m ≥ n ≥ 0.

k (global)
The number of elementary reflectors whose product defines the matrix Q.
n≥ k ≥ 0.

a Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).

On entry, the j-th column of the matrix stored in amust contain the vector
that defines the elementary reflector H(j), ja+n-k ≤ j ≤ ja+n-k, as
returned by p?geqlf in the k columns of its distributed matrix argument
A(ia:*,ja+n-k:ja+n-1).

ia (global)
The row index in the global matrix A indicating the first row of sub(A).

ja (global)
The column index in the global matrix A indicating the first column of
sub(A).

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

tau (local)
Array of size LOCc(ja+n-1).

tau[j] contains the scalar factor of the elementary reflector H(j+1), j =

0, 1, ..., LOCc(ja+n-1)-1, as returned by p?geqlf.

work (local)
Workspace array of size lwork.

lwork (local or global)

The size of the array work.
lwork is local input and must be at least lwork ≥ mpa0 + max(1, nqa0),
where
iroffa = mod(ia-1, mb_a),
icoffa = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, myrow, rsrc_a, nprow),
iacol = indxg2p(ja, nb_a, mycol, csrc_a, npcol),
mpa0 = numroc(m+iroffa, mb_a, myrow, iarow, nprow),
nqa0 = numroc(n+icoffa, nb_a, mycol, iacol, npcol).
indxg2p and numroc are ScaLAPACK tool functions; myrow, mycol, nprow,
and npcol can be determined by calling the function blacs_gridinfo.

1720
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.

Output Parameters

a On exit, this array contains the local pieces of the m-by-n distributed matrix
Q.

work On exit, work[0] returns the minimal and optimal lwork.

info (local).
= 0: successful exit
< 0: if the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value,
then info = - (i*100 +j),

if the i-th argument is a scalar and had an illegal value,

then info = -i.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?org2r/p?ung2r
Generates all or part of the orthogonal/unitary matrix
Q from a QR factorization determined by p?geqrf
(unblocked algorithm).

Syntax
void psorg2r (MKL_INT *m , MKL_INT *n , MKL_INT *k , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdorg2r (MKL_INT *m , MKL_INT *n , MKL_INT *k , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
void pcung2r (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
void pzung2r (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?org2r/p?ung2rfunction generates an m-by-n real/complex matrix Q denoting A(ia:ia+m-1, ja:ja
+n-1) with orthonormal columns, which is defined as the first n columns of a product of k elementary
reflectors of order m:
Q = H(1)*H(2)*...*H(k)
as returned by p?geqrf.

1721
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

m (global)
The number of rows in the distributed submatrix Q.m ≥ 0.

n (global)
The number of columns in the distributed submatrix Q. m ≥ n ≥ 0.

k (global)
The number of elementary reflectors whose product defines the matrix Q. n
≥ k ≥ 0.

a Pointer into the local memory to an array of sizelld_a * LOCc(ja+n-1)

ia (global)
The row index in the global matrix A indicating the first row of sub(A).

ja (global)
The column index in the global matrix A indicating the first column of
sub(A).

desca (global and local) array of size dlen_.

The array descriptor for the distributed matrix A.

tau (local)
Array of size LOCc(ja+k-1).

tau[j] contains the scalar factor of the elementary reflector H(j+1), j =

0, 1, ..., LOCc(ja+k-1)-1, as returned by p?geqrf. This array is tied
to the distributed matrix A.

work (local)
Workspace array of size lwork.

lwork (local or global)

The size of the array work.
lwork is local input and must be at least lwork ≥ mpa0 + max(1, nqa0),

where
iroffa = mod(ia-1, mb_a , icoffa = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, myrow, rsrc_a, nprow),
iacol = indxg2p(ja, nb_a, mycol, csrc_a, npcol),
mpa0 = numroc(m+iroffa, mb_a, myrow, iarow, nprow),
nqa0 = numroc(n+icoffa, nb_a, mycol, iacol, npcol).

1722
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
indxg2p and numroc are ScaLAPACK tool functions; myrow, mycol, nprow,
and npcol can be determined by calling the function blacs_gridinfo.

If lwork = -1, then lwork is global input and a workspace query is

Output Parameters

a On exit, this array contains the local pieces of the m-by-n distributed matrix
Q.

work On exit, work[0] returns the minimal and optimal lwork.

info (local).
= 0: successful exit
< 0: if the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value,
then info = - (i*100 +j),

if the i-th argument is a scalar and had an illegal value,

then info = -i.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?orgl2/p?ungl2
Generates all or part of the orthogonal/unitary matrix
Q from an LQ factorization determined by p?gelqf
(unblocked algorithm).

Syntax
void psorgl2 (MKL_INT *m , MKL_INT *n , MKL_INT *k , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdorgl2 (MKL_INT *m , MKL_INT *n , MKL_INT *k , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
void pcungl2 (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
void pzungl2 (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?orgl2/p?ungl2function generates a m-by-n real/complex matrix Q denoting A(ia:ia+m-1, ja:ja
+n-1) with orthonormal rows, which is defined as the first m rows of a product of k elementary reflectors of
order n

1723
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Q = H(k)...H(2)*H(1) (for real flavors),

Q = (H(k))H*...*(H(2))H*(H(1))H (for complex flavors) as returned by p?gelqf.

Input Parameters

m (global)
The number of rows in the distributed submatrix Q. m ≥ 0.

n (global)
The number of columns in the distributed submatrix Q. n ≥ m ≥ 0.

k (global)
The number of elementary reflectors whose product defines the matrix Q. m
≥ k ≥ 0.

a Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).

On entry, the i-th row of the matrix stored in amust contain the vector that
defines the elementary reflector H(i), ia ≤ i ≤ ia+k-1, as returned by
p?gelqf in the k rows of its distributed matrix argument A(ia:ia+k-1,
ja:*).

ia (global)
The row index in the global matrix A indicating the first row of sub(A).

ja (global)
The column index in the global matrix A indicating the first column of
sub(A).

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

tau (local)
Array of size LOCr(ja+k-1). tau[j] contains the scalar factor of the
elementary reflectors H(j+1), j = 0, 1, ..., LOCr(ja+k-1)-1, as returned by
p?gelqf. This array is tied to the distributed matrix A.

WORK (local)
Workspace array of size lwork.

lwork (local or global)

The size of the array work.
lwork is local input and must be at least lwork ≥ nqa0 + max(1, mpa0),
where

iroffa = mod(ia-1, mb_a),

icoffa = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, myrow, rsrc_a, nprow),
iacol = indxg2p(ja, nb_a, mycol, csrc_a, npcol),
mpa0 = numroc(m+iroffa, mb_a, myrow, iarow, nprow),
nqa0 = numroc(n+icoffa, nb_a, mycol, iacol, npcol).

1724
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
indxg2p and numroc are ScaLAPACK tool functions; myrow, mycol, nprow,
and npcol can be determined by calling the function blacs_gridinfo.

If lwork = -1, then lwork is global input and a workspace query is

Output Parameters

a On exit, this array contains the local pieces of the m-by-n distributed matrix
Q.

work On exit, work[0] returns the minimal and optimal lwork.

info (local)
= 0: successful exit
< 0: if the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value,
then info = - (i*100 +j),

if the i-th argument is a scalar and had an illegal value,

then info = -i.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?orgr2/p?ungr2
Generates all or part of the orthogonal/unitary matrix
Q from an RQ factorization determined by p?gerqf
(unblocked algorithm).

Syntax
void psorgr2 (MKL_INT *m , MKL_INT *n , MKL_INT *k , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdorgr2 (MKL_INT *m , MKL_INT *n , MKL_INT *k , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
void pcungr2 (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
void pzungr2 (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?orgr2/p?ungr2function generates an m-by-n real/complex matrix Q denoting A(ia:ia+m-1, ja:ja
+n-1) with orthonormal rows, which is defined as the last m rows of a product of k elementary reflectors of
order n

1725
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Q = H(1)H(2)...*H(k) (for real flavors);

Q = (H(1))H*(H(2))H...*(H(k))H (for complex flavors) as returned by p?gerqf.

Input Parameters

m (global)
The number of rows in the distributed submatrix Q. m ≥ 0.

n (global)
The number of columns in the distributed submatrix Q. n ≥ m ≥ 0.

k (global)
The number of elementary reflectors whose product defines the matrix Q. m
≥ k ≥ 0.

a Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).

On entry, the i-th row of the matrix stored in amust contain the vector that
defines the elementary reflector H(i), ia+m-k ≤ i ≤ ia+m-1, as returned by
p?gerqf in the k rows of its distributed matrix argument A(ia+m-k:ia
+m-1, ja:*).

ia (global)
The row index in the global matrix A indicating the first row of sub(A).

ja (global)
The column index in the global matrix A indicating the first column of
sub(A).

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

tau (local)
Array of size LOCr(ja+m-1). tau[j] contains the scalar factor of the
elementary reflectors H(j+1), j = 0, 1, ..., LOCr(ja+m-1)-1, as returned by
p?gerqf. This array is tied to the distributed matrix A.

work (local)
Workspace array of size lwork.

lwork (local or global)

The size of the array work.
lwork is local input and must be at least lwork ≥ nqa0 + max(1, mpa0 ),
where iroffa = mod( ia-1, mb_a ), icoffa = mod( ja-1, nb_a ),

iarow = indxg2p( ia, mb_a, myrow, rsrc_a, nprow ),

iacol = indxg2p( ja, nb_a, mycol, csrc_a, npcol ),
mpa0 = numroc( m+iroffa, mb_a, myrow, iarow, nprow ),
nqa0 = numroc( n+icoffa, nb_a, mycol, iacol, npcol ).
indxg2p and numroc are ScaLAPACK tool functions; myrow, mycol, nprow,
and npcol can be determined by calling the function blacs_gridinfo.

1726
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.

Output Parameters

a On exit, this array contains the local pieces of the m-by-n distributed matrix
Q.

work On exit, work[0] returns the minimal and optimal lwork.

info (local)
= 0: successful exit
< 0: if the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value,
then info = - (i*100 +j),

if the i-th argument is a scalar and had an illegal value,

then info = -i.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?orm2l/p?unm2l
Multiplies a general matrix by the orthogonal/unitary
matrix from a QL factorization determined by p?geqlf
(unblocked algorithm).

Syntax
void psorm2l (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *tau , float *c , MKL_INT *ic ,
MKL_INT *jc , MKL_INT *descc , float *work , MKL_INT *lwork , MKL_INT *info );
void pdorm2l (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , double
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *tau , double *c , MKL_INT
*ic , MKL_INT *jc , MKL_INT *descc , double *work , MKL_INT *lwork , MKL_INT *info );
void pcunm2l (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau ,
MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work ,
MKL_INT *lwork , MKL_INT *info );
void pzunm2l (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau ,
MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex16 *work ,
MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?orm2l/p?unm2lfunction overwrites the general real/complex m-by-n distributed matrix sub
(C)=C(ic:ic+m-1,jc:jc+n-1) with

1727
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Q*sub(C) if side = 'L' and trans = 'N', or

QT*sub(C) / QH*sub(C) if side = 'L' and trans = 'T' (for real flavors) or trans = 'C' (for complex
flavors), or
sub(C)*Q if side = 'R' and trans = 'N', or

sub(C)*QT / sub(C)*QH if side = 'R' and trans = 'T' (for real flavors) or trans = 'C' (for complex
flavors).
where Q is a real orthogonal or complex unitary distributed matrix defined as the product of k elementary
reflectors
Q = H(k)*...*H(2)*H(1) as returned by p?geqlf . Q is of order m if side = 'L' and of order n if side =
'R'.

Input Parameters

side (global)
= 'L': apply Q or QT for real flavors (QH for complex flavors) from the left,

= 'R': apply Q or QT for real flavors (QH for complex flavors) from the
right.

trans (global)
= 'N': apply Q (no transpose)

= 'T': apply QT (transpose, for real flavors)

= 'C': apply QH (conjugate transpose, for complex flavors)

m (global)
The number of rows in the distributed matrix sub(C). m ≥ 0.

n (global)
The number of columns in the distributed matrix sub(C). n ≥ 0.

k (global)
The number of elementary reflectors whose product defines the matrix Q.
If side = 'L', m ≥ k ≥ 0;

if side = 'R', n ≥ k ≥ 0.

a (local)
Pointer into the local memory to an array of size lld_a * LOCc(ja+k-1).
On entry, the j-th row of the matrix stored in amust contain the vector that
defines the elementary reflector H(j), ja ≤ j ≤ ja+k-1, as returned by
p?geqlf in the k columns of its distributed matrix argument A(ia:*,ja:ja
+k-1). The argument A(ia:*,ja:ja+k-1) is modified by the function but
restored on exit.
If side = 'L', lld_a ≥ max(1, LOCr(ia+m-1)),

if side = 'R', lld_a ≥ max(1, LOCr(ia+n-1)).

ia (global)
The row index in the global matrix A indicating the first row of sub(A).

1728
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ja (global)
The column index in the global matrix A indicating the first column of
sub(A).

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

tau (local)
Array of size LOCc(ja+n-1). tau[j] contains the scalar factor of the
elementary reflector H(j+1), j = 0, 1, ..., LOCc(ja+n-1)-1, as returned by
p?geqlf. This array is tied to the distributed matrix A.

c (local)
Pointer into the local memory to an array of size lld_c * LOCc(jc+n-1).On
entry, the local pieces of the distributed matrix sub (C).

ic (global)
The row index in the global matrix C indicating the first row of sub(C).

jc (global)
The column index in the global matrix C indicating the first column of
sub(C).

descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.

work (local)
Workspace array of size lwork.
On exit, work(1) returns the minimal and optimal lwork.

lwork (local or global)

The size of the array work.
lwork is local input and must be at least
if side = 'L', lwork ≥ mpc0 + max(1, nqc0),

if side = 'R', lwork ≥ nqc0 + max(max(1, mpc0), numroc(numroc(n

+icoffc, nb_a, 0, 0, npcol), nb_a, 0, 0, lcmq)),
where

1729
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

ilcm, indxg2p, and numroc are ScaLAPACK tool functions; myrow, mycol,
nprow, and npcol can be determined by calling the function
blacs_gridinfo.
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.

Output Parameters

c On exit, c is overwritten by Qsub(C), or QTsub(C)/ QH*sub(C), or

sub(C)*Q, or sub(C)*QT / sub(C)*QH

work On exit, work[0] returns the minimal and optimal lwork.

info (local)
= 0: successful exit
< 0: if the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value,
then info = - (i*100 +j),

if the i-th argument is a scalar and had an illegal value,

then info = -i.

NOTE
The distributed submatrices A(ia:*, ja:*) and C(ic:ic+m-1,jc:jc+n-1) must verify some
alignment properties, namely the following expressions should be true:
If side = 'L', ( mb_a == mb_c && iroffa == iroffc && iarow == icrow )

If side = 'R', ( mb_a == nb_c && iroffa == iroffc ).

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?orm2r/p?unm2r
Multiplies a general matrix by the orthogonal/unitary
matrix from a QR factorization determined by
p?geqrf (unblocked algorithm).

Syntax
void psorm2r (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *tau , float *c , MKL_INT *ic ,
MKL_INT *jc , MKL_INT *descc , float *work , MKL_INT *lwork , MKL_INT *info );
void pdorm2r (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , double
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *tau , double *c , MKL_INT
*ic , MKL_INT *jc , MKL_INT *descc , double *work , MKL_INT *lwork , MKL_INT *info );

1730
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void pcunm2r (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau ,
MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work ,
MKL_INT *lwork , MKL_INT *info );
void pzunm2r (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau ,
MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex16 *work ,
MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?orm2r/p?unm2rfunction overwrites the general real/complex m-by-n distributed matrix sub
(C)=C(ic:ic+m-1, jc:jc+n-1) with
Q*sub(C) if side = 'L' and trans = 'N', or

QT*sub(C) / QH*sub(C) if side = 'L' and trans = 'T' (for real flavors) or trans = 'C' (for complex
flavors), or
sub(C)*Q if side = 'R' and trans = 'N', or

sub(C)*QT / sub(C)*QH if side = 'R' and trans = 'T' (for real flavors) or trans = 'C' (for complex
flavors).
where Q is a real orthogonal or complex unitary matrix defined as the product of k elementary reflectors

Q = H(k)...H(2)*H(1) as returned by p?geqrf . Q is of order m if side = 'L' and of order n if side =

'R'.

Input Parameters

side (global)
= 'L': apply Q or QT for real flavors (QH for complex flavors) from the left,

= 'R': apply Q or QT for real flavors (QH for complex flavors) from the
right.

trans (global)
= 'N': apply Q (no transpose)

= 'T': apply QT (transpose, for real flavors)

= 'C': apply QH (conjugate transpose, for complex flavors)

m (global)
The number of rows in the distributed matrix sub(C). m ≥ 0.

n (global)
The number of columns in the distributed matrix sub(C). n ≥ 0.

k (global)
The number of elementary reflectors whose product defines the matrix Q.
If side = 'L', m ≥ k ≥ 0;

1731
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

if side = 'R', n ≥ k ≥ 0.

a (local)
Pointer into the local memory to an array of size lld_a * LOCc(ja+k-1).

On entry, the j-th column of the matrix stored in amust contain the vector
that defines the elementary reflector H(j), ja ≤ j ≤ja+k-1, as returned by
p?geqrf in the k columns of its distributed matrix argument A(ia:*,ja:ja
+k-1). The argument A(ia:*,ja:ja+k-1) is modified by the function but
restored on exit.
If side = 'L', lld_a ≥ max(1, LOCr(ia+m-1)),

if side = 'R', lld_a ≥ max(1, LOCr(ia+n-1)).

ia (global)
The row index in the global matrix A indicating the first row of sub(A).

ja (global)
The column index in the global matrix A indicating the first column of
sub(A).

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

tau (local)
Array of size LOCc(ja+k-1). tau[j] contains the scalar factor of the
elementary reflector H(j+1), j = 0, 1, ..., LOCc(ja+k-1)-1, as returned by
p?geqrf. This array is tied to the distributed matrix A.

c (local)
Pointer into the local memory to an array of size lld_c * LOCc(jc+n-1).

On entry, the local pieces of the distributed matrix sub (C).

ic (global)
The row index in the global matrix C indicating the first row of sub(C).

jc (global)
The column index in the global matrix C indicating the first column of
sub(C).

descc (global and local) array of size dlen_.

The array descriptor for the distributed matrix C.

work (local)
Workspace array of size lwork.

lwork (local or global)

The size of the array work.
lwork is local input and must be at least
if side = 'L', lwork ≥ mpc0 + max(1, nqc0),

1732
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
if side = 'R', lwork ≥ nqc0 + max(max(1, mpc0), numroc(numroc(n
+icoffc, nb_a, 0, 0, npcol), nb_a, 0, 0, lcmq)),
where

lcmq = lcm/npcol ,
lcm = iclm(nprow, npcol),
iroffc = mod(ic-1, mb_c),
icoffc = mod(jc-1, nb_c),
icrow = indxg2p(ic, mb_c, myrow, rsrc_c, nprow),
iccol = indxg2p(jc, nb_c, mycol, csrc_c, npcol),
Mqc0 = numroc(m+icoffc, nb_c, mycol, icrow, nprow),
Npc0 = numroc(n+iroffc, mb_c, myrow, iccol, npcol),
ilcm, indxg2p and numroc are ScaLAPACK tool functions; myrow, mycol,
nprow, and npcol can be determined by calling the function
blacs_gridinfo.
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.

Output Parameters

c On exit, c is overwritten by Qsub(C), or QTsub(C)/ QH*sub(C), or

sub(C)*Q, or sub(C)*QT / sub(C)*QH

work On exit, work[0] returns the minimal and optimal lwork.

info (local)
= 0: successful exit
< 0: if the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value,
then info = - (i*100 +j),

if the i-th argument is a scalar and had an illegal value,

then info = -i.

NOTE
The distributed submatrices A(ia:*, ja:*) and C(ic:ic+m-1, jc:jc+n-1) must verify some
alignment properties, namely the following expressions should be true:
If side = 'L', (mb_a == mb_c) && (iroffa == iroffc) && (iarow == icrow).

If side = 'R', (mb_a == nb_c) && (iroffa == iroffc).

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

1733
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

p?orml2/p?unml2
Multiplies a general matrix by the orthogonal/unitary
matrix from an LQ factorization determined by
p?gelqf (unblocked algorithm).

Syntax
void psorml2 (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *tau , float *c , MKL_INT *ic ,
MKL_INT *jc , MKL_INT *descc , float *work , MKL_INT *lwork , MKL_INT *info );
void pdorml2 (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , double
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *tau , double *c , MKL_INT
*ic , MKL_INT *jc , MKL_INT *descc , double *work , MKL_INT *lwork , MKL_INT *info );
void pcunml2 (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau ,
MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work ,
MKL_INT *lwork , MKL_INT *info );
void pzunml2 (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau ,
MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex16 *work ,
MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?orml2/p?unml2function overwrites the general real/complex m-by-n distributed matrix sub
(C)=C(ic:ic+m-1, jc:jc+n-1) with

Q*sub(C) if side = 'L' and trans = 'N', or

QT*sub(C) / QH*sub(C) if side = 'L' and trans = 'T' (for real flavors) or trans = 'C' (for complex
flavors), or
sub(C)*Q if side = 'R' and trans = 'N', or

sub(C)*QT / sub(C)*QH if side = 'R' and trans = 'T' (for real flavors) or trans = 'C' (for complex
flavors).
where Q is a real orthogonal or complex unitary distributed matrix defined as the product of k elementary
reflectors
Q = H(k)*...*H(2)*H(1) (for real flavors)
Q = (H(k))H*...*(H(2))H*(H(1))H (for complex flavors)
as returned by p?gelqf . Q is of order m if side = 'L' and of order n if side = 'R'.

Input Parameters

side (global)
= 'L': apply Q or QT for real flavors (QH for complex flavors) from the left,

= 'R': apply Q or QT for real flavors (QH for complex flavors) from the
right.

1734
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
trans (global)
= 'N': apply Q (no transpose)

= 'T': apply QT (transpose, for real flavors)

= 'C': apply QH (conjugate transpose, for complex flavors)

m (global)
The number of rows in the distributed matrix sub(C). m ≥ 0.

n (global)
The number of columns in the distributed matrix sub(C). n ≥ 0.

k (global)
The number of elementary reflectors whose product defines the matrix Q.
If side = 'L', m ≥ k ≥ 0;

if side = 'R', n ≥ k ≥ 0.

a (local)
Pointer into the local memory to an array of size
lld_a * LOCc(ja+m-1) if side='L',

lld_a * LOCc(ja+n-1) if side='R',

where lld_a ≥ max (1, LOCr(ia+k-1)).

ia (global)
The row index in the global matrix A indicating the first row of sub(A).

ja (global)
The column index in the global matrix A indicating the first column of
sub(A).

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

tau (local)
Array of size LOCc(ia+k-1). tau[i] contains the scalar factor of the
elementary reflector H(i+1), i = 0, 1, ..., LOCc(ja+k-1)-1, as returned by
p?gelqf. This array is tied to the distributed matrix A.

c (local)
Pointer into the local memory to an array of size lld_c * LOCc(jc+n-1). On
entry, the local pieces of the distributed matrix sub (C).

ic (global)
The row index in the global matrix C indicating the first row of sub(C).

1735
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

jc (global)
The column index in the global matrix C indicating the first column of
sub(C).

descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.

work (local)
Workspace array of size lwork.

lwork (local or global)

The size of the array work.
lwork is local input and must be at least
if side = 'L', lwork ≥ mqc0 + max(max( 1, npc0), numroc(numroc(m
+icoffc, mb_a, 0, 0, nprow), mb_a, 0, 0, lcmp)),
if side = 'R', lwork ≥ npc0 + max(1, mqc0),

where
lcmp = lcm / nprow,
lcm = iclm(nprow, npcol),
iroffc = mod(ic-1, mb_c),
icoffc = mod(jc-1, nb_c),
icrow = indxg2p(ic, mb_c, myrow, rsrc_c, nprow),
iccol = indxg2p(jc, nb_c, mycol, csrc_c, npcol),
Mpc0 = numroc(m+icoffc, mb_c, mycol, icrow, nprow),
Nqc0 = numroc(n+iroffc, nb_c, myrow, iccol, npcol),
ilcm, indxg2p and numroc are ScaLAPACK tool functions; myrow, mycol,
nprow, and npcol can be determined by calling the function
blacs_gridinfo.
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.

Output Parameters

c On exit, c is overwritten by Qsub(C), or QTsub(C)/ QH*sub(C), or

sub(C)*Q, or sub(C)*QT / sub(C)*QH

work On exit, work[0] returns the minimal and optimal lwork.

info (local)
= 0: successful exit
< 0: if the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value,
then info = - (i*100 +j),

1736
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
if the i-th argument is a scalar and had an illegal value,
then info = -i.

NOTE
The distributed submatrices A(ia:*, ja:*) and C(ic:ic+m-1, jc:jc+n-1) must verify some
alignment properties, namely the following expressions should be true:
If side = 'L', (nb_a == mb_c && icoffa == iroffc)

If side = 'R', (nb_a == nb_c && icoffa == icoffc && iacol == iccol).

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?ormr2/p?unmr2
Multiplies a general matrix by the orthogonal/unitary
matrix from an RQ factorization determined by
p?gerqf (unblocked algorithm).

Syntax
void psormr2 (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *tau , float *c , MKL_INT *ic ,
MKL_INT *jc , MKL_INT *descc , float *work , MKL_INT *lwork , MKL_INT *info );
void pdormr2 (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , double
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *tau , double *c , MKL_INT
*ic , MKL_INT *jc , MKL_INT *descc , double *work , MKL_INT *lwork , MKL_INT *info );
void pcunmr2 (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau ,
MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work ,
MKL_INT *lwork , MKL_INT *info );
void pzunmr2 (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau ,
MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex16 *work ,
MKL_INT *lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?ormr2/p?unmr2function overwrites the general real/complex m-by-n distributed matrix sub
(C)=C(ic:ic+m-1, jc:jc+n-1) with

Q*sub(C) if side = 'L' and trans = 'N', or

QT*sub(C) / QH*sub(C) if side = 'L' and trans = 'T' (for real flavors) or trans = 'C' (for complex
flavors), or
sub(C)*Q if side = 'R' and trans = 'N', or

sub(C)*QT / sub(C)*QH if side = 'R' and trans = 'T' (for real flavors) or trans = 'C' (for complex
flavors).

1737
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

where Q is a real orthogonal or complex unitary distributed matrix defined as the product of k elementary
reflectors
Q = H(1)*H(2)*...*H(k) (for real flavors)
Q = (H(1))H*(H(2))H*...*(H(k))H (for complex flavors)
as returned by p?gerqf . Q is of order m if side = 'L' and of order n if side = 'R'.

Input Parameters

side (global)
= 'L': apply Q or QT for real flavors (QH for complex flavors) from the left,

= 'R': apply Q or QT for real flavors (QH for complex flavors) from the
right.

trans (global)
= 'N': apply Q (no transpose)

= 'T': apply QT (transpose, for real flavors)

= 'C': apply QH(conjugate transpose, for complex flavors)

m (global)
The number of rows in the distributed matrix sub(C). m ≥ 0.

n (global)
The number of columns in the distributed matrix sub(C). n ≥ 0.

k (global)
The number of elementary reflectors whose product defines the matrix Q.
If side = 'L', m ≥ k ≥ 0;

if side = 'R', n ≥ k ≥ 0.

a (local)
Pointer into the local memory to an array of size
lld_a * LOCc(ja+m-1) if side='L',

lld_a * LOCc(ja+n-1) if side='R',

where lld_a ≥ max (1, LOCr(ia+k-1)).

On entry, the i-th row of the matrix stored in amust contain the vector that
defines the elementary reflector H(i), ia ≤ i ≤ ia+k-1, as returned by
p?gerqf in the k rows of its distributed matrix argument A(ia:ia+k-1,
ja:*).
The argument A(ia:ia+k-1, ja:*) is modified by the function but
restored on exit.

ia (global)
The row index in the global matrix A indicating the first row of sub(A).

ja (global)

1738
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The column index in the global matrix A indicating the first column of
sub(A).

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

tau (local)
Array of size LOCc(ia+k-1). tau[j] contains the scalar factor of the
elementary reflector H(j+1), j = 0, 1, ..., LOCc(ja+k-1)-1, as returned by
p?gerqf. This array is tied to the distributed matrix A.

c (local)
Pointer into the local memory to an array of size lld_c * LOCc(jc+n-1). On
entry, the local pieces of the distributed matrix sub (C).

ic (global)
The row index in the global matrix C indicating the first row of sub(C).

jc (global)
The column index in the global matrix C indicating the first column of
sub(C).

descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.

work (local)
Workspace array of size lwork.

lwork (local or global)

The size of the array work.
lwork is local input and must be at least
if side = 'L', lwork ≥ mpc0 + max(max(1, nqc0), numroc(numroc(m
+iroffc, mb_a, 0, 0, nprow), mb_a, 0, 0, lcmp)),
if side = 'R', lwork ≥ nqc0 + max(1, mpc0),

where lcmp = lcm/nprow,

lcm = iclm(nprow, npcol),

iroffc = mod(ic-1, mb_c),
icoffc = mod(jc-1, nb_c),
icrow = indxg2p(ic, mb_c, myrow, rsrc_c, nprow),
iccol = indxg2p(jc, nb_c, mycol, csrc_c, npcol),
Mpc0 = numroc(m+iroffc, mb_c, myrow, icrow, nprow),
Nqc0 = numroc(n+icoffc, nb_c, mycol, iccol, npcol),
ilcm, indxg2p and numroc are ScaLAPACK tool functions; myrow, mycol,
nprow, and npcol can be determined by calling the function
blacs_gridinfo.

1739
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If lwork = -1, then lwork is global input and a workspace query is

Output Parameters

c On exit, c is overwritten by Qsub(C), or QTsub(C)/ QH*sub(C), or

sub(C)*Q, or sub(C)*QT / sub(C)*QH

work On exit, work[0] returns the minimal and optimal lwork.

info (local)
= 0: successful exit
< 0: if the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value,
then info = - (i*100 +j),

if the i-th argument is a scalar and had an illegal value,

then info = -i.

If side = 'R', (nb_a == nb_c) && (icoffa == icoffc) && (iacol == iccol).

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?pbtrsv
Solves a single triangular linear system via frontsolve
or backsolve where the triangular matrix is a factor of
a banded matrix computed by p?pbtrf.

Syntax
void pspbtrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs ,
float *a , MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *descb ,
float *af , MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT *info );
void pdpbtrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs ,
double *a , MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *descb ,
double *af , MKL_INT *laf , double *work , MKL_INT *lwork , MKL_INT *info );
void pcpbtrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs ,
MKL_Complex8 *a , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib ,
MKL_INT *descb , MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );

1740
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void pzpbtrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs ,
MKL_Complex16 *a , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib ,
MKL_INT *descb , MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?pbtrsvfunction solves a banded triangular system of linear equations

A(1:n, ja:ja+n-1)*X = B(jb:jb+n-1, 1:nrhs)

or
A(1:n, ja:ja+n-1)T*X = B(jb:jb+n-1, 1:nrhs) for real flavors,

A(1:n, ja:ja+n-1)H*X = B(jb:jb+n-1, 1:nrhs) for complex flavors,

where A(1:n, ja:ja+n-1) is a banded triangular matrix factor produced by the Cholesky factorization code
p?pbtrf and is stored in A(1:n, ja:ja+n-1) and af. The matrix stored in A(1:n, ja:ja+n-1) is either
upper or lower triangular according to uplo.

The function p?pbtrf must be called first.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

uplo (global) Must be 'U' or 'L'.

If uplo = 'U', upper triangle of A(1:n, ja:ja+n-1) is stored;

If uplo = 'L', lower triangle of A(1:n, ja:ja+n-1) is stored.

trans (global) Must be 'N' or 'T' or 'C'.

If trans = 'N', solve with A(1:n, ja:ja+n-1);

If trans = 'T' or 'C' for real flavors, solve with A(1:n, ja:ja+n-1)T.

If trans = 'C' for complex flavors, solve with conjugate transpose

(A(1:n, ja:ja+n-1)H.

n (global)
The number of rows and columns to be operated on, that is, the order of
the distributed submatrix A(1:n, ja:ja+n-1). n ≥ 0.

bw (global)
The number of subdiagonals in 'L' or 'U', 0 ≤bw≤n-1.

nrhs (global)
The number of right hand sides; the number of columns of the distributed
submatrix B(jb:jb+n-1, 1:nrhs); nrhs≥ 0.

1741
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

a (local)
Pointer into the local memory to an array with the first size lld_a ≥ (bw
+1), stored in desca.
On entry, this array contains the local pieces of the n-by-n symmetric
banded distributed Cholesky factor L or LT*A(1:n, ja:ja+n-1).

This local portion is stored in the packed banded format used in LAPACK.
See the Application Notes below and the ScaLAPACK manual for more detail
on the format of distributed matrices.

ja (global) The index in the global in the global matrix A that points to the
start of the matrix to be operated on (which may be either all of A or a
submatrix of A).

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
If 1D type (dtype_a = 501), then dlen≥ 7;

If 2D type (dtype_a = 1), then dlen ≥ 9.

Contains information on mapping of A to memory. (See ScaLAPACK manual

for full description and options.)

b (local)
Pointer into the local memory to an array of local lead size lld_b ≥nb.

On entry, this array contains the local pieces of the right hand sides
B(jb:jb+n-1, 1:nrhs).

ib (global) The row index in the global matrix B that points to the first row of
the matrix to be operated on (which may be either all of B or a submatrix of
B).

descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
If 1D type (dtype_b = 502), then dlen≥ 7;

If 2D type (dtype_b = 1), then dlen≥ 9.

Contains information on mapping of B to memory. Please, see ScaLAPACK

manual for full description and options.

laf (local)
The size of user-input auxiliary fill-in space af. Must be laf ≥ (nb
+2*bw)*bw . If laf is not large enough, an error code will be returned and
the minimum acceptable size will be returned in af[0].

work (local)
The array work is a temporary workspace array of size lwork. This space
may be overwritten in between function calls.

lwork (local or global) The size of the user-input workspace work, must be at
least lwork ≥bw*nrhs. If lwork is too small, the minimal acceptable size
will be returned in work[0] and an error code is returned.

1742
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

af (local)
The array af is of size laf. It contains auxiliary fill-in space. The fill-in
space is created in a call to the factorization function p?pbtrf and is stored
in af. If a linear system is to be solved using p?pbtrs after the
factorization function, af must not be altered after the factorization.

b On exit, this array contains the local piece of the solutions distributed
matrix X.

work[0] On exit, work[0] contains the minimum value of lwork.

info (local)
= 0: successful exit
< 0: if the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value,
then info = - (i*100 +j),

if the i-th argument is a scalar and had an illegal value,

then info = -i.

Application Notes
If the factorization function and the solve function are to be called separately to solve various sets of right-
hand sides using the same coefficient matrix, the auxiliary space af must not be altered between calls to the
factorization function and the solve function.
The best algorithm for solving banded and tridiagonal linear systems depends on a variety of parameters,
especially the bandwidth. Currently, only algorithms designed for the case N/P>>bw are implemented. These
algorithms go by many names, including Divide and Conquer, Partitioning, domain decomposition-type, etc.

The Divide and Conquer algorithm assumes the matrix is narrowly banded compared with the number of
equations. In this situation, it is best to distribute the input matrix A one-dimensionally, with columns atomic
and rows divided amongst the processes. The basic algorithm divides the banded matrix up into P pieces with
one stored on each processor, and then proceeds in 2 phases for the factorization or 3 for the solution of a
linear system.

1. Local Phase: The individual pieces are factored independently and in parallel. These factors are
applied to the matrix creating fill-in, which is stored in a non-inspectable way in auxiliary space af.
Mathematically, this is equivalent to reordering the matrix A as PAPT and then factoring the principal
leading submatrix of size equal to the sum of the sizes of the matrices factored on each processor. The
factors of these submatrices overwrite the corresponding parts of A in memory.
2. Reduced System Phase: A small (bw*(P-1)) system is formed representing interaction of the larger
blocks and is stored (as are its factors) in the space af. A parallel Block Cyclic Reduction algorithm is
used. For a linear system, a parallel front solve followed by an analogous backsolve, both using the
structure of the factored matrix, are performed.
3. Back Subsitution Phase: For a linear system, a local backsubstitution is performed on each processor
in parallel.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

1743
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

p?pttrsv
Solves a single triangular linear system via frontsolve
or backsolve where the triangular matrix is a factor of
a tridiagonal matrix computed by p?pttrf .

Syntax
void pspttrsv (char *uplo , MKL_INT *n , MKL_INT *nrhs , float *d , float *e , MKL_INT
*ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *descb , float *af , MKL_INT
*laf , float *work , MKL_INT *lwork , MKL_INT *info );
void pdpttrsv (char *uplo , MKL_INT *n , MKL_INT *nrhs , double *d , double *e , MKL_INT
*ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *descb , double *af , MKL_INT
*laf , double *work , MKL_INT *lwork , MKL_INT *info );
void pcpttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , float *d ,
MKL_Complex8 *e , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib ,
MKL_INT *descb , MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
void pzpttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , double *d ,
MKL_Complex16 *e , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib ,
MKL_INT *descb , MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?pttrsvfunction solves a tridiagonal triangular system of linear equations

A(1:n, ja:ja+n-1)*X = B(jb:jb+n-1, 1:nrhs)

or
A(1:n, ja:ja+n-1)T*X = B(jb:jb+n-1, 1:nrhs) for real flavors,

A(1:n, ja:ja+n-1)H*X = B(jb:jb+n-1, 1:nrhs) for complex flavors,

where A(1:n, ja:ja+n-1) is a tridiagonal triangular matrix factor produced by the Cholesky factorization
code p?pttrf and is stored in A(1:n, ja:ja+n-1) and af. The matrix stored in A(1:n, ja:ja+n-1) is
either upper or lower triangular according to uplo.

The function p?pttrf must be called first.

Input Parameters

uplo (global) Must be 'U' or 'L'.

If uplo = 'U', upper triangle of A(1:n, ja:ja+n-1) is stored;

If uplo = 'L', lower triangle of A(1:n, ja:ja+n-1) is stored.

trans (global) Must be 'N' or 'C'.

If trans = 'N', solve with A(1:n, ja:ja+n-1);

If trans = 'C' (for complex flavors), solve with conjugate transpose

(A(1:n, ja:ja+n-1))H.

1744
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
n (global)
The number of rows and columns to be operated on, that is, the order of
the distributed submatrix A(1:n, ja:ja+n-1). n ≥ 0.

nrhs (global)
The number of right hand sides; the number of columns of the distributed
submatrix B(jb:jb+n-1, 1:nrhs); nrhs ≥ 0.

d (local)
Pointer to the local part of the global vector storing the main diagonal of the
matrix; must be of size ≥nb_a.

e (local)
Pointer to the local part of the global vector du storing the upper diagonal of
the matrix; must be of size ≥nb_a. Globally, du(n) is not referenced, and du
must be aligned with d.

ja (global) The index in the global matrix A that points to the start of the
matrix to be operated on (which may be either all of A or a submatrix of A).

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
If 1D type (dtype_a = 501 or 502), then dlen ≥ 7;

If 2D type (dtype_a = 1), then dlen ≥ 9.

Contains information on mapping of A to memory. See ScaLAPACK manual

for full description and options.

b (local)
Pointer into the local memory to an array of local lead size lld_b ≥ nb.

On entry, this array contains the local pieces of the right hand sides
B(jb:jb+n-1, 1:nrhs).

ib (global) The row index in the global matrix B that points to the first row of
the matrix to be operated on (which may be either all of B or a submatrix of
B).

descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
If 1D type (dtype_b = 502), then dlen ≥ 7;

If 2D type (dtype_b = 1), then dlen ≥ 9.

Contains information on mapping of B to memory. See ScaLAPACK manual

for full description and options.

laf (local)
The size of user-input auxiliary fill-in space af. Must be laf ≥ (nb
+2*bw)*bw.
If laf is not large enough, an error code will be returned and the minimum
acceptable size will be returned in af[0].

work (local)

1745
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The array work is a temporary workspace array of size lwork. This space
may be overwritten in between function calls.

lwork (local or global) The size of the user-input workspace work, must be at
least lwork ≥(10+2*min(100, nrhs))*npcol+4*nrhs. If lwork is too
small, the minimal acceptable size will be returned in work[0] and an error
code is returned.

Output Parameters

d, e (local).
On exit, these arrays contain information on the factors of the matrix.

af (local)
The array af is of size laf. It contains auxiliary fill-in space. The fill-in
space is created in a call to the factorization function p?pbtrf and is stored
in af. If a linear system is to be solved using p?pttrs after the
factorization function, af must not be altered after the factorization.

b On exit, this array contains the local piece of the solutions distributed
matrix X.

work[0] On exit, work[0] contains the minimum value of lwork.

info (local)
= 0: successful exit
< 0: if the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value,
then info = - (i*100 +j),

if the i-th argument is a scalar and had an illegal value,

then info = -i.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?potf2
Computes the Cholesky factorization of a symmetric/
Hermitian positive definite matrix (local unblocked
algorithm).

Syntax
void pspotf2 (char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *info );
void pdpotf2 (char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *info );
void pcpotf2 (char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *info );
void pzpotf2 (char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *info );

1746
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl_scalapack.h

Description
The p?potf2function computes the Cholesky factorization of a real symmetric or complex Hermitian positive
definite distributed matrix sub (A)=A(ia:ia+n-1, ja:ja+n-1).

The factorization has the form

sub(A) = U'*U, if uplo = 'U', or sub(A) = L*L', if uplo = 'L',

where U is an upper triangular matrix, L is lower triangular. X' denotes transpose (conjugate transpose) of X.

Input Parameters

uplo (global)
Specifies whether the upper or lower triangular part of the symmetric/
Hermitian matrix A is stored.
= 'U': upper triangle of sub (A) is stored;

= 'L': lower triangle of sub (A) is stored.

n (global)
The number of rows and columns to be operated on, that is, the order of
the distributed matrix sub (A). n ≥ 0.

a (local)
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1)
containing the local pieces of the n-by-n symmetric distributed matrix
sub(A) to be factored.
If uplo = 'U', the leading n-by-n upper triangular part of sub(A) contains
the upper triangular matrix and the strictly lower triangular part of this
matrix is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
the lower triangular matrix and the strictly upper triangular part of sub(A) is
not referenced.

ia, ja (global)
The row and column indices in the global matrix A indicating the first row
and the first column of the sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

Output Parameters

a (local)
On exit,
if uplo = 'U', the upper triangular part of the distributed matrix contains
the Cholesky factor U;

if uplo = 'L', the lower triangular part of the distributed matrix contains
the Cholesky factor L.

1747
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

info (local)
= 0: successful exit
< 0: if the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value,
then info = - (i*100 +j),

if the i-th argument is a scalar and had an illegal value,

then info = -i.

> 0: if info = k, the leading minor of order k is not positive definite, and
the factorization could not be completed.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?rot
Applies a planar rotation to two distributed vectors.

Syntax
void psrot(MKL_INT* n, float* x, MKL_INT* ix, MKL_INT* jx, MKL_INT* descx, MKL_INT*
incx, float* y, MKL_INT* iy, MKL_INT* jy, MKL_INT* descy, MKL_INT* incy, float* cs,
float* sn, float* work, MKL_INT* lwork, MKL_INT* info);
void pdrot(MKL_INT* n, double* x, MKL_INT* ix, MKL_INT* jx, MKL_INT* descx, MKL_INT*
incx, double* y, MKL_INT* iy, MKL_INT* jy, MKL_INT* descy, MKL_INT* incy, double* cs,
double* sn, double* work, MKL_INT* lwork, MKL_INT* info);

Include Files
• mkl_scalapack.h

Description
p?rot applies a planar rotation defined by cs and sn to the two distributed vectors sub(x) and sub(y).

Input Parameters

n (global )
The number of elements to operate on when applying the planar rotation to
x and y (n≥0).

x (local) array of size ( (jx-1)m_x + ix + ( n - 1 )abs( incx ) )

This array contains the entries of the distributed vector sub( x ).

ix (global )
The global row index of the submatrix of the distributed matrix x to operate
on. If incx = 1, then it is required that ix = iy. 1 ≤ix≤m_x.

jx (global )
The global column index of the submatrix of the distributed matrix x to
operate on. If incx = m_x, then it is required that jx = jy. 1 ≤ix≤n_x.

descx (global and local) array of size 9

1748
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The array descriptor of the distributed matrix x.

incx (global )
The global increment for the elements of x. Only two values of incx are
supported in this version, namely 1 and m_x. Moreover, it must hold that
incx = m_x if incy =m_y and that incx = 1 if incy = 1.

y (local) array of size ( (jy-1)m_y + iy + ( n - 1 )abs( incy ) )

This array contains the entries of the distributed vector sub( y ).

iy (global )
The global row index of the submatrix of the distributed matrix y to operate
on. If incy = 1, then it is required that iy = ix. 1 ≤iy≤m_y.

jy (global )
The global column index of the submatrix of the distributed matrix y to
operate on. If incy = m_x, then it is required that jy = jx. 1 ≤jy≤m_y.

descy (global and local) array of size 9

The array descriptor of the distributed matrix y.

incy (global )
The global increment for the elements of y. Only two values of incy are
supported in this version, namely 1 and m_y. Moreover, it must hold that
incy = m_y if incx = m_x and that incy = 1 if incx = 1.

cs, sn (global)
The parameters defining the properties of the planar rotation. It must hold
that 0 ≤cs,sn≤ 1 and that sn2 + cs2 = 1. The latter is hardly checked in
finite precision arithmetics.

work (local workspace) array of size lwork

lwork (local )
The length of the workspace array work.

If incx = 1 and incy = 1, then lwork = 2*m_x

If lwork = -1, then a workspace query is assumed; the function only

calculates the optimal size of the work array, returns this value as the first
entry of the IWORK array, and no error message related to LIWORK is
issued by pxerbla.

OUTPUT Parameters

x
y
work[0] On exit, if info = 0, work[0] returns the optimal lwork

info (global )
= 0: successful exit
< 0: if info = -i, the i-th argument had an illegal value.

1749
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value, then info = -(i*100+j), if the i-th argument is a scalar and
had an illegal value, then info = -i.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?rscl
Multiplies a vector by the reciprocal of a real scalar.

Syntax
void psrscl (MKL_INT *n , float *sa , float *sx , MKL_INT *ix , MKL_INT *jx , MKL_INT
*descx , MKL_INT *incx );
void pdrscl (MKL_INT *n , double *sa , double *sx , MKL_INT *ix , MKL_INT *jx , MKL_INT
*descx , MKL_INT *incx );
void pcsrscl (MKL_INT *n , float *sa , MKL_Complex8 *sx , MKL_INT *ix , MKL_INT *jx ,
MKL_INT *descx , MKL_INT *incx );
void pzdrscl (MKL_INT *n , double *sa , MKL_Complex16 *sx , MKL_INT *ix , MKL_INT *jx ,
MKL_INT *descx , MKL_INT *incx );

Include Files
• mkl_scalapack.h

Description
The p?rsclfunction multiplies an n-element real/complex vector sub(X) by the real scalar 1/a. This is done
without overflow or underflow as long as the final result sub(X)/a does not overflow or underflow.
sub(X) denotes X(ix:ix+n-1, jx:jx), if incx = 1,

and X(ix:ix, jx:jx+n-1), if incx = m_x.

Input Parameters

n (global)
The number of components of the distributed vector sub(X). n ≥ 0.

sa The scalar a that is used to divide each component of the vector sub(X).
This parameter must be ≥ 0.

sx Array containing the local pieces of a distributed matrix of size of at least

((jx-1)*m_x + ix + (n-1)*abs(incx)). This array contains the entries
of the distributed vector sub(X).

ix (global) The row index of the submatrix of the distributed matrix X to

operate on.

jx (global)
The column index of the submatrix of the distributed matrix X to operate
on.

descx (global and local)

Array of size 9. The array descriptor for the distributed matrix X.

1750
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
incx (global)
The increment for the elements of X. This version supports only two values
of incx, namely 1 and m_x.

Output Parameters

sx On exit, the result x/a.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?sygs2/p?hegs2
Reduces a symmetric/Hermitian positive-definite
generalized eigenproblem to standard form, using the
factorization results obtained from p?potrf (local
unblocked algorithm).

Syntax
void pssygs2 (MKL_INT *ibtype , char *uplo , MKL_INT *n , float *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb ,
MKL_INT *info );
void pdsygs2 (MKL_INT *ibtype , char *uplo , MKL_INT *n , double *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb ,
MKL_INT *info );
void pchegs2 (MKL_INT *ibtype , char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *jb ,
MKL_INT *descb , MKL_INT *info );
void pzhegs2 (MKL_INT *ibtype , char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *jb ,
MKL_INT *descb , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?sygs2/p?hegs2function reduces a real symmetric-definite or a complex Hermitian positive-definite
generalized eigenproblem to standard form.
Here sub(A) denotes A(ia:ia+n-1, ja:ja+n-1), and sub(B) denotes B(ib:ib+n-1, jb:jb+n-1).

If ibtype = 1, the problem is

sub(A)*x = λ*sub(B)*x
and sub(A) is overwritten by

inv(UT)sub(A)inv(U) or inv(L)sub(A)inv(LT) - for real flavors, and

inv(UH)*sub(A)*inv(U) or inv(L)*sub(A)*inv(LH) - for complex flavors.
If ibtype = 2 or 3, the problem is

sub(A)sub(B)x = λx or sub(B)sub(A)x =λx

and sub(A) is overwritten by

1751
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Usub(A)UT or L**Tsub(A)L- for real flavors and

U*sub(A)*UH or L**H*sub(A)*L- for complex flavors.
The matrix sub(B) must have been previously factorized as UT*U or L*LT (for real flavors), or as UH*U or
L*LH (for complex flavors) by p?potrf.

Input Parameters

ibtype (global)
= 1:
compute inv(UT)*sub(A)*inv(U), or inv(L)*sub(A)*inv(LT) for real
functions,
and inv(UH)*sub(A)*inv(U), or inv(L)*sub(A)*inv(LH) for complex
functions;
= 2 or 3:
compute U*sub(A)*UT, or LT*sub(A)*L for real functions,

and Usub(A)UH or LHsub(A)L for complex functions.

uplo (global)
Specifies whether the upper or lower triangular part of the symmetric/
Hermitian matrix sub(A) is stored, and how sub(B) is factorized.
= 'U': Upper triangular of sub(A) is stored and sub(B) is factorized as UT*U
(for real functions) or as UH*U (for complex functions).
= 'L': Lower triangular of sub(A) is stored and sub(B) is factorized as L*LT
(for real functions) or as L*LH (for complex functions)

n (global)
The order of the matrices sub(A) and sub(B). n ≥ 0.

a (local)
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).

On entry, this array contains the local pieces of the n-by-n symmetric/
Hermitian distributed matrix sub(A).
If uplo = 'U', the leading n-by-n upper triangular part of sub(A) contains
the upper triangular part of the matrix, and the strictly lower triangular part
of sub(A) is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
the lower triangular part of the matrix, and the strictly upper triangular part
of sub(A) is not referenced.

ia, ja (global)
The row and column indices in the global matrix A indicating the first row
and the first column of the sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

B (local)
Pointer into the local memory to an array of size lld_b * LOCc(jb+n-1).

1752
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
On entry, this array contains the local pieces of the triangular factor from
the Cholesky factorization of sub(B) as returned by p?potrf.

ib, jb (global)
The row and column indices in the global matrix B indicating the first row
and the first column of the sub(B), respectively.

descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.

Output Parameters

a (local)
On exit, if info = 0, the transformed matrix is stored in the same format
as sub(A).

info = 0: successful exit.

< 0: if the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value,
then info = - (i*100+ j),

if the i-th argument is a scalar and had an illegal value,

then info = -i.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?sytd2/p?hetd2
Reduces a symmetric/Hermitian matrix to real
symmetric tridiagonal form by an orthogonal/unitary
similarity transformation (local unblocked algorithm).

Syntax
void pssytd2 (char *uplo, MKL_INT *n, float *a, MKL_INT *ia, MKL_INT *ja, MKL_INT
*desca, float *d, float *e, float *tau, float *work, MKL_INT *lwork, MKL_INT *info);
void pdsytd2 (char *uplo, MKL_INT *n, double *a, MKL_INT *ia, MKL_INT *ja, MKL_INT
*desca, double *d, double *e, double *tau, double *work, MKL_INT *lwork, MKL_INT *info);
void pchetd2 (char *uplo, MKL_INT *n, MKL_Complex8 *a, MKL_INT *ia, MKL_INT *ja, MKL_INT
*desca, float *d, float *e, MKL_Complex8 *tau, MKL_Complex8 *work, MKL_INT *lwork,
MKL_INT *info);
void pzhetd2 (char *uplo, MKL_INT *n, MKL_Complex16 *a, MKL_INT *ia, MKL_INT *ja,
MKL_INT *desca, double *d, double *e, MKL_Complex16 *tau, MKL_Complex16 *work, MKL_INT
*lwork, MKL_INT *info);

Include Files
• mkl_scalapack.h

Description
The p?sytd2/p?hetd2function reduces a real symmetric/complex Hermitian matrix sub(A) to symmetric/
Hermitian tridiagonal form T by an orthogonal/unitary similarity transformation:

1753
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Q'sub(A)Q = T, where sub(A) = A(ia:ia+n-1, ja:ja+n-1).

Input Parameters

uplo (global)
Specifies whether the upper or lower triangular part of the symmetric/
Hermitian matrix sub(A) is stored:
= 'U': upper triangular

= 'L': lower triangular

n (global)
The number of rows and columns to be operated on, that is, the order of
the distributed matrix sub(A). n ≥ 0.

a (local)
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).

ia, ja (global)
The row and column indices in the global matrix A indicating the first row
and the first column of the sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

work (local)
The array work is a temporary workspace array of size lwork.

Output Parameters

a On exit, if uplo = 'U', the diagonal and first superdiagonal of sub(A) are
overwritten by the corresponding elements of the tridiagonal matrix T, and
the elements above the first superdiagonal, with the array tau, represent
the orthogonal/unitary matrix Q as a product of elementary reflectors;
if uplo = 'L', the diagonal and first subdiagonal of A are overwritten by
the corresponding elements of the tridiagonal matrix T, and the elements
below the first subdiagonal, with the array tau, represent the orthogonal/
unitary matrix Q as a product of elementary reflectors. See the Application
Notes below.

d (local)
Array of sizeLOCc(ja+n-1). The diagonal elements of the tridiagonal matrix
T:

1754
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
d[i] = A(i+1,i+1), where i=0,1, ..., LOCc(ja+n-1) -1 ; d is tied to the
distributed matrix A.

e (local)
Array of size LOCc(ja+n-1),

if uplo = 'U', LOCc(ja+n-2) otherwise.

The off-diagonal elements of the tridiagonal matrix T:

e[i] = A(i+1,i+2) if uplo = 'U',
e[i] = A(i+2,i+1) if uplo = 'L',
where i=0,1, ..., LOCc(ja+n-1) -1.

e is tied to the distributed matrix A.

tau (local)
Array of size LOCc(ja+n-1).

The scalar factors of the elementary reflectors. tau is tied to the distributed
matrix A.

work[0] On exit, work[0] returns the minimal and optimal value of lwork.

lwork (local or global)

The size of the workspace array work.
lwork is local input and must be at least lwork ≥ 3n.

If lwork = -1, then lwork is global input and a workspace query is

info (local)
= 0: successful exit
< 0: if the i-th argument, indexed i-1, is an array and the j-th entry had an
illegal value,
then info = -(i*100+j),

if the i-th argument is a scalar and had an illegal value,

then info = -i.

Application Notes
If uplo = 'U', the matrix Q is represented as a product of elementary reflectors

Q = H(n-1)*...*H(2)*H(1)
Each H(i) has the form

H(i) = I - tau*v*v',
where tau is a real/complex scalar, and v is a real/complex vector with v(i+1:n) = 0 and v(i) = 1;
v(1:i-1) is stored on exit in A(ia:ia+i-2, ja+i), and tau in tau[ja+i-2].
If uplo = 'L', the matrix Q is represented as a product of elementary reflectors

1755
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Q = H(1)*H(2)*...*H(n-1).
Each H(i) has the form
H(i) = I - tau*v*v' ,
where tau is a real/complex scalar, and v is a real/complex vector with v(1:i) = 0 and v(i+1) = 1; v(i
+2:n) is stored on exit in A(ia+i+1:ia+n-1, ja+i-1), and tau in tau[ja+i-2].
The contents of sub (A) on exit are illustrated by the following examples with n = 5:

where d and e denotes diagonal and off-diagonal elements of T, and vi denotes an element of the vector
defining H(i).

NOTE
The distributed matrix sub(A) must verify some alignment properties, namely the following
expression should be true:
( mb_a==nb_a && iroffa==icoffa )where iroffa = mod(ia - 1, mb_a) and icoffa =
mod(ja -1, nb_a).

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?trord
Reorders the Schur factorization of a general matrix.

Syntax
void pstrord( char* compq, MKL_INT* select, MKL_INT* para, MKL_INT* n, float* t,
MKL_INT* it, MKL_INT* jt, MKL_INT* desct, float* q, MKL_INT* iq, MKL_INT* jq, MKL_INT*
descq, float* wr, float* wi, MKL_INT* m, float* work, MKL_INT* lwork, MKL_INT* iwork,
MKL_INT* liwork, MKL_INT* info);
void pdtrord(char* compq, MKL_INT* select, MKL_INT* para, MKL_INT* n, double* t,
MKL_INT* it, MKL_INT* jt, MKL_INT* desct, double* q, MKL_INT* iq, MKL_INT* jq, MKL_INT*
descq, double* wr, double* wi, MKL_INT* m, double* work, MKL_INT* lwork, MKL_INT* iwork,
MKL_INT* liwork, MKL_INT* info);

Include Files
• mkl_scalapack.h

Description
p?trord reorders the real Schur factorization of a real matrix A = Q*T*QT, so that a selected cluster of
eigenvalues appears in the leading diagonal blocks of the upper quasi-triangular matrix T, and the leading
columns of Q form an orthonormal basis of the corresponding right invariant subspace.

1756
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
T must be in Schur form (as returned by p?lahqr), that is, block upper triangular with 1-by-1 and 2-by-2
diagonal blocks.
This function uses a delay and accumulate procedure for performing the off-diagonal updates.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

compq (global)
= 'V': update the matrix q of Schur vectors;

= 'N': do not update q.

select (global) array of size n

select specifies the eigenvalues in the selected cluster. To select a real

eigenvalue w(j), select[j-1] must be set to 1. To select a complex
conjugate pair of eigenvalues w(j) and w(j+1), corresponding to a 2-by-2
diagonal block, either select[j-1] or select[j] or both must be set to 1; a
complex conjugate pair of eigenvalues must be either both included in the
cluster or both excluded.

para (global)
Block parameters:

para[0] maximum number of concurrent computational

windows allowed in the algorithm; 0 < para[0]≤
min(nprow, npcol) must hold;

para[1] number of eigenvalues in each window; 0 <

para[1] < para[2] must hold;

para[2] window size; para[1] < para[2] < mb_t must

hold;

para[3] minimal percentage of FLOPS required for

performing matrix-matrix multiplications instead
of pipelined orthogonal transformations; 0
≤para[3]≤ 100 must hold;

para[4] width of block column slabs for row-wise

application of pipelined orthogonal
transformations in their factorized form; 0 <
para[4]≤mb_t must hold.

para[5] the maximum number of eigenvalues moved

together over a process border; in practice, this
will be approximately half of the cross border
window size; 0 < para[5]≤para[1] must hold.

n (global)

1757
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The order of the globally distributed matrix t. n≥ 0.

t (local) array of size lld_t * LOCc(n).

The local pieces of the global distributed upper quasi-triangular matrix T, in

Schur form.

it, jt (global)
The row and column index in the global matrix T indicating the first column
of T. it = jt = 1 must hold (see Application Notes).

desct (global and local) array of size dlen_.

The array descriptor for the global distributed matrix T.
q (local) array of size lld_q * LOCc(n).

On entry, if compq = 'V', the local pieces of the global distributed matrix Q
of Schur vectors.
If compq = 'N', q is not referenced.

iq, jq (global)
The column index in the global matrix Q indicating the first column of Q. iq
= jq = 1 must hold (see Application Notes).

descq (global and local) array of size dlen_.

The array descriptor for the global distributed matrix Q.

work (local workspace) array of size lwork

lwork (local)
The size of the array work.

If lwork = -1, then a workspace query is assumed; the function only

calculates the optimal size of the work array, returns this value as the first
entry of the work array, and no error message related to lwork is issued by
pxerbla.

iwork (local workspace) array of size liwork

liwork (local)
The size of the array iwork.

If liwork = -1, then a workspace query is assumed; the function only

calculates the optimal size of the iwork array, returns this value as the first
entry of the iwork array, and no error message related to liwork is issued
by pxerbla

OUTPUT Parameters

select (global) array of size n

The (partial) reordering is displayed.

t On exit, t is overwritten by the local pieces of the reordered matrix T, again

in Schur form, with the selected eigenvalues in the globally leading diagonal
blocks.

1758
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
q On exit, if compq = 'V', q has been postmultiplied by the global orthogonal
transformation matrix which reorders t; the leading m columns of q form an
orthonormal basis for the specified invariant subspace.
If compq = 'N', q is not referenced.

wr, wi (global ) array of size n

The real and imaginary parts, respectively, of the reordered eigenvalues of

the matrix T. The eigenvalues are in principle stored in the same order as
on the diagonal of T, with wr[i] = T(i+1,i+1) and, if T(i:i+1,i:i+1) is a 2-
by-2 diagonal block, wi[i-1] > 0 and wi[i] = -wi[i-1].

Note also that if a complex eigenvalue is sufficiently ill-conditioned, then its

value may differ significantly from its value before reordering.

m (global )
The size of the specified invariant subspace.
0 ≤m≤n.

work[0] On exit, if info = 0, work[0] returns the optimal lwork.

iwork[0] On exit, if info = 0, iwork[0] returns the optimal liwork.

info (global)
= 0: successful exit
< 0: if info = -i, the i-th argument had an illegal value. If the i-th
argument is an array and the j-th entry, indexed j-1, had an illegal value,
then info = -(i*1000+j), if the i-th argument is a scalar and had an illegal
value, then info = -i.

> 0: here we have several possibilities

• Reordering of t failed because some eigenvalues are too close to

separate (the problem is very ill-conditioned);
t may have been partially reordered, and wr and wi contain the
eigenvalues in the same order as in t.

On exit, info = {the index of t where the swap failed (indexing starts
at 1)}.
• A 2-by-2 block to be reordered split into two 1-by-1 blocks and the
second block failed to swap with an adjacent block.
On exit, info = {the index of t where the swap failed}.
• If info = n+1, there is no valid BLACS context (see the BLACS
documentation for details).

Application Notes
The following alignment requirements must hold:

• mb_t = nb_t = mb_q = nb_q

• rsrc_t = rsrc_q
• csrc_t = csrc_q
All matrices must be blocked by a block factor larger than or equal to two (3). This is to simplify reordering
across processor borders in the presence of 2-by-2 blocks.

1759
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

This algorithm cannot work on submatrices of t and q, i.e., it = jt = iq = jq = 1 must hold. This is
however no limitation since p?lahqr does not compute Schur forms of submatrices anyway.

Parallel execution recommendations:

• Use a square grid, if possible, for maximum performance. The block parameters in para should be kept
well below the data distribution block size.
• In general, the parallel algorithm strives to perform as much work as possible without crossing the block
borders on the main block diagonal.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?trsen
Reorders the Schur factorization of a matrix and
(optionally) computes the reciprocal condition
numbers and invariant subspace for the selected
cluster of eigenvalues.

Syntax
void pstrsen(char* job, char* compq, MKL_INT* select, MKL_INT* para, MKL_INT* n, float*
t, MKL_INT* it, MKL_INT* jt, MKL_INT* desct, float* q, MKL_INT* iq, MKL_INT* jq,
MKL_INT* descq, float* wr, float* wi, MKL_INT* m, float* s, float* sep, float* work,
MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork, MKL_INT* info);
void pdtrsen(char* job, char* compq, MKL_INT* select, MKL_INT* para, MKL_INT* n, double*
t, MKL_INT* it, MKL_INT* jt, MKL_INT* desct, double* q, MKL_INT* iq, MKL_INT* jq,
MKL_INT* descq, double* wr, double* wi, MKL_INT* m, double* s, double* sep, double*
work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork, MKL_INT* info);

Include Files
• mkl_scalapack.h

Description
p?trsen reorders the real Schur factorization of a real matrix A = Q*T*QT, so that a selected cluster of
eigenvalues appears in the leading diagonal blocks of the upper quasi-triangular matrix T, and the leading
columns of Q form an orthonormal basis of the corresponding right invariant subspace. The reordering is
performed by p?trord.

Optionally the function computes the reciprocal condition numbers of the cluster of eigenvalues and/or the
invariant subspace.
T must be in Schur form (as returned by p?lahqr), that is, block upper triangular with 1-by-1 and 2-by-2
diagonal blocks.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

job (global )

1760
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Specifies whether condition numbers are required for the cluster of
eigenvalues (s) or the invariant subspace (sep):

= 'N': no condition numbers are required;

= 'E': only the condition number for the cluster of eigenvalues is computed
(s);

= 'V': only the condition number for the invariant subspace is computed
(sep);

= 'B': condition numbers for both the cluster and the invariant subspace are
computed (s and sep).

compq (global )
= 'V': update the matrix q of Schur vectors;

= 'N': do not update q.

select (global ) array of size n

select specifies the eigenvalues in the selected cluster. To select a real

eigenvalue w(j), select[j-1] must be set to a non-zero number. To select a
complex conjugate pair of eigenvalues w(j) and w(j+1), corresponding to a
2-by-2 diagonal block, either select[j-1] or select[j] or both must be set
to a non-zero number; a complex conjugate pair of eigenvalues must be
either both included in the cluster or both excluded.

para (global )
Block parameters:

para[0] maximum number of concurrent computational

windows allowed in the algorithm; 0 < para[0]≤
min(NPROW,NPCOL) must hold;

para[1] number of eigenvalues in each window; 0 <

para[1] < para[2] must hold;

para[2] window size; para[1] < para[2] < mb_t must

hold;

para[3] minimal percentage of flops required for

performing matrix-matrix multiplications instead
of pipelined orthogonal transformations; 0
≤para[3]≤ 100 must hold;

para[4] width of block column slabs for row-wise

application of pipelined orthogonal
transformations in their factorized form; 0 <
para[4]≤mb_t must hold.

para[5] the maximum number of eigenvalues moved

together over a process border; in practice, this
will be approximately half of the cross border
window size 0 < para[5]≤para[1] must hold;

n (global )
The order of the globally distributed matrix t. n≥ 0.

1761
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

t (local ) array of size lld_t * LOCc(n).

The local pieces of the global distributed upper quasi-triangular matrix T, in

Schur form.

it, jt (global )
The row and column index in the global matrix T indicating the first column
of T. it = jt = 1 must hold (see Application Notes).

desct (global and local) array of size dlen_.

The array descriptor for the global distributed matrix T.

q (local ) array of size lld_q * LOCc(n).

On entry, if compq = 'V', the local pieces of the global distributed matrix Q
of Schur vectors.
If compq = 'N', q is not referenced.

iq, jq (global )
The column index in the global matrix Q indicating the first column of Q. iq
= jq = 1 must hold (see Application Notes).

descq (global and local) array of size dlen_.

The array descriptor for the global distributed matrix Q.

work (local workspace) array of size lwork

lwork (local )
The size of the array work.

If lwork = -1, then a workspace query is assumed; the function only

calculates the optimal size of the work array, returns this value as the first
entry of the work array, and no error message related to lwork is issued by
pxerbla.

iwork (local workspace) array of size liwork

liwork (local )
The size of the array iwork.

If liwork = -1, then a workspace query is assumed; the function only

calculates the optimal size of the iwork array, returns this value as the first
entry of the iwork array, and no error message related to liwork is issued
by pxerbla.

OUTPUT Parameters

t t is overwritten by the local pieces of the reordered matrix T, again in

Schur form, with the selected eigenvalues in the globally leading diagonal
blocks.

q On exit, if compq = 'V', q has been postmultiplied by the global orthogonal

transformation matrix which reorders t; the leading m columns of q form an
orthonormal basis for the specified invariant subspace.

1762
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If compq = 'N', q is not referenced.

wr, wi (global ) array of size n

The real and imaginary parts, respectively, of the reordered eigenvalues of

Note also that if a complex eigenvalue is sufficiently ill-conditioned, then its

value may differ significantly from its value before reordering.

m (global )
The size of the specified invariant subspace. 0 ≤m≤n.

s (global )
If job = 'E' or 'B', s is a lower bound on the reciprocal condition number for
the selected cluster of eigenvalues. s cannot underestimate the true
reciprocal condition number by more than a factor of sqrt(n). If m = 0 or n,
s = 1.
If job = 'N' or 'V', s is not referenced.

sep (global )
If job = 'V' or 'B', sep is the estimated reciprocal condition number of the
specified invariant subspace. If
m = 0 or n, sep = norm(t).
If job = 'N' or 'E', sep is not referenced.

work[0] On exit, if info = 0, work[0] returns the optimal lwork.

iwork[0] On exit, if info = 0, iwork[0] returns the optimal liwork.

info (global )
= 0: successful exit
< 0: if info = -i, the i-th argument had an illegal value.

If the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value, then info = -(i*1000+j), if the i-th argument is a scalar and
had an illegal value, then info = -i.

> 0: here we have several possibilities

• Reordering of t failed because some eigenvalues are too close to

separate (the problem is very ill-conditioned); t may have been partially
reordered, and wr and wi contain the eigenvalues in the same order as
in t.

1763
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Application Notes
The following alignment requirements must hold:

• mb_t = nb_t = mb_q = nb_q

• rsrc_t = rsrc_q
• csrc_t = csrc_q
All matrices must be blocked by a block factor larger than or equal to two (3). This to simplify reordering
across processor borders in the presence of 2-by-2 blocks.
This algorithm cannot work on submatrices of t and q, i.e., it = jt = iq = jq = 1 must hold. This is
however no limitation since p?lahqr does not compute Schur forms of submatrices anyway.

For parallel execution, use a square grid, if possible, for maximum performance. The block parameters in
para should be kept well below the data distribution block size.
In general, the parallel algorithm strives to perform as much work as possible without crossing the block
borders on the main block diagonal.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?trti2
Computes the inverse of a triangular matrix (local
unblocked algorithm).

Syntax
void pstrti2 (char *uplo , char *diag , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , MKL_INT *info );
void pdtrti2 (char *uplo , char *diag , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , MKL_INT *info );
void pctrti2 (char *uplo , char *diag , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_INT *info );
void pztrti2 (char *uplo , char *diag , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The p?trti2function computes the inverse of a real/complex upper or lower triangular block matrix sub (A)
= A(ia:ia+n-1, ja:ja+n-1).
This matrix should be contained in one and only one process memory space (local operation).

Input Parameters

uplo (global)
Specifies whether the matrix sub (A) is upper or lower triangular.
= 'U': sub (A) is upper triangular

= 'L': sub (A) is lower triangular.

diag (global)

1764
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Specifies whether or not the matrix A is unit triangular.
= 'N': sub (A) is non-unit triangular

= 'U': sub (A) is unit triangular.

n (global)
The number of rows and columns to be operated on, i.e., the order of the
distributed submatrix sub(A). n ≥ 0.

a (local)
Pointer into the local memory to an array, size lld_a * LOCc(ja+n-1).

On entry, this array contains the local pieces of the triangular matrix
sub(A).
If uplo = 'U', the leading n-by-n upper triangular part of the matrix
sub(A) contains the upper triangular part of the matrix, and the strictly
lower triangular part of sub(A) is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of the matrix
sub(A) contains the lower triangular part of the matrix, and the strictly
upper triangular part of sub(A) is not referenced. If diag = 'U', the
diagonal elements of sub(A) are not referenced either and are assumed to
be 1.

ia, ja (global)
The row and column indices in the global matrix A indicating the first row
and the first column of the sub(A), respectively.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.

Output Parameters

a On exit, the (triangular) inverse of the original matrix, in the same storage
format.

info = 0: successful exit

< 0: if the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value,
then info = - (i*100+j),

if the i-th argument is a scalar and had an illegal value,

then info = -i.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

?lahqr2
Updates the eigenvalues and Schur decomposition.

1765
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
void clahqr2 (const MKL_INT* wantt, const MKL_INT* wantz, const MKL_INT* n, const
MKL_INT* ilo, const MKL_INT* ihi, MKL_Complex8* h, const MKL_INT* ldh, MKL_Complex8* w,
const MKL_INT* iloz, const MKL_INT* ihiz, MKL_Complex8* z, const MKL_INT* ldz, MKL_INT*
info);
void zlahqr2 (const MKL_INT* wantt, const MKL_INT* wantz, const MKL_INT* n, const
MKL_INT* ilo, const MKL_INT* ihi, MKL_Complex16* h, const MKL_INT* ldh, MKL_Complex16*
w, const MKL_INT* iloz, const MKL_INT* ihiz, MKL_Complex16* z, const MKL_INT* ldz,
MKL_INT* info);

Include Files
• mkl_scalapack.h

Description
?lahqr2 is an auxiliary routine called by ?hseqr to update the eigenvalues and Schur decomposition already
computed by ?hseqr, by dealing with the Hessenberg submatrix in rows and columns ilo to ihi. This
version of ?lahqr (not the standard LAPACK version) uses a double-shift algorithm (like LAPACK's ?lahqr).
Unlike the standard LAPACK convention, this does not assume the subdiagonal is real, nor does it work to
preserve this quality if given.

Input Parameters

wantt ≠ 0: the full Schur form T is required;

= 0: only eigenvalues are required.

wantz ≠ 0: the matrix of Schur vectors Z is required;

= 0: Schur vectors are not required.

n The order of the matrix H. n >= 0.

ilo, ihi It is assumed that the matrix H is upper triangular in rows and columns ihi
+1 :n, and that matrix element H(ilo,ilo-1) = 0 (unless ilo =
1). ?lahqr works primarily with the Hessenberg submatrix in rows and
columns ilo to ihi, but applies transformations to all of h if wantt is
nonzero.
1 <= ilo <= max(1,ihi); ihi <= n.

h Array, size ldh*n.

On entry, the upper Hessenberg matrix H.

ldh The leading dimension of the array h. ldh >= max(1,n).

iloz, ihiz Specify the rows of Z to which transformations must be applied if wantz≠ 0.

1 <= iloz <= ilo; ihi <= ihiz <= n.

z Array, size ldz*n.

If wantz≠ 0, on entry z must contain the current matrix Z of

transformations. If wantz= 0, z is not referenced.

1766
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ldz The leading dimension of the array z. ldz >= max(1,n).

Output Parameters

h On exit, if wantt≠ 0, h is upper triangular in rows and columns

ilo:ihi. If wantt= 0, the contents of h are unspecified on exit.

w Array, size (n)

The computed eigenvalues ilo to ihi are stored in the corresponding

elements of w. If wantt≠ 0, the eigenvalues are stored in the same
order as on the diagonal of the Schur form returned in h, with w[i] =
H(i, i).

z If wantz≠ 0, on exit z has been updated; transformations are applied

only to the submatrix Z(iloz:ihiz,ilo:ihi). If wantz= 0, z is not
referenced.

info = 0: successful exit

> 0: if info = i, ?lahqr failed to compute all the eigenvalues ilo to
ihi in a total of 30*(ihi-ilo+1) iterations; elements w[i:ihi - 1]
contain those eigenvalues which have been successfully computed.

?lamsh
Sends multiple shifts through a small (single node)
matrix to maximize the number of bulges that can be
sent through.

Syntax
void slamsh (float *s, const MKL_INT *lds, MKL_INT *nbulge, const MKL_INT *jblk, float
*h, const MKL_INT *ldh, const MKL_INT *n, const float *ulp );
void dlamsh (double *s, const MKL_INT *lds, MKL_INT *nbulge, const MKL_INT *jblk,
double *h, const MKL_INT *ldh, const MKL_INT *n, const double *ulp );
void clamsh (MKL_Complex8 *s , const MKL_INT *lds , MKL_INT *nbulge , const MKL_INT
*jblk , MKL_Complex8 *h , const MKL_INT *ldh , const MKL_INT *n , const float *ulp );
void zlamsh (MKL_Complex16 *s , const MKL_INT *lds , MKL_INT *nbulge , const MKL_INT
*jblk , MKL_Complex16 *h , const MKL_INT *ldh , const MKL_INT *n , const double *ulp );

Include Files
• mkl_scalapack.h

Description
The ?lamshfunction sends multiple shifts through a small (single node) matrix to see how small consecutive
subdiagonal elements are modified by subsequent shifts in an effort to maximize the number of bulges that
can be sent through. The function should only be called when there are multiple shifts/bulges (nbulge > 1)
and the first shift is starting in the middle of an unreduced Hessenberg matrix because of two or more small
consecutive subdiagonal elements.

1767
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

s (local)
Array of size lds*2*jblk.

On entry, the matrix of shifts. Only the 2x2 diagonal of s is referenced. It is

assumed that s has jblk double shifts (size 2).

lds (local)
On entry, the leading dimension of S; unchanged on exit. 1<nbulge ≤ jblk
≤ lds/2.

nbulge (local)
On entry, the number of bulges to send through h (>1). nbulge should be
less than the maximum determined (jblk). 1<nbulge ≤ jblk ≤ lds/2.

jblk (local)
On entry, the number of double shifts determined for S; unchanged on exit.

h (local)
Array of size ldh*n.

On entry, the local matrix to apply the shifts on.

h should be aligned so that the starting row is 2.

ldh (local)

On entry, the leading dimension of H; unchanged on exit.

n (local)
On entry, the size of H. If all the bulges are expected to go through, n
should be at least 4nbulge+2. Otherwise, nbulge may be reduced by this
function.

ulp (local)
On entry, machine precision. Unchanged on exit.

Output Parameters

s On exit, the data is rearranged in the best order for applying.

nbulge On exit, the maximum number of bulges that can be sent through.

h On exit, the data is destroyed.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

?lapst
Sorts the numbers in increasing or decreasing order.

Syntax
void slapst (const char* id, const MKL_INT* n, const float* d, MKL_INT* indx, MKL_INT*
info);

1768
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void dlapst (const char* id, const MKL_INT* n, const double* d, MKL_INT* indx, MKL_INT*
info);

Include Files
• mkl_scalapack.h

Description
?lapst is a modified version of the LAPACK routine ?lasrt.
Define a permutation indx that sorts the numbers in d in increasing order (if id = 'I') or in decreasing order
(if id = 'D' ).

Use Quick Sort, reverting to Insertion sort on arrays of size <= 20. Dimension of STACK limits n to about
232.

Input Parameters

id = 'I': sort d in increasing order;

= 'D': sort d in decreasing order.

n The length of the array d.

d Array, size (n)

The array to be sorted.

Output Parameters

indx Array, size (n).

The permutation which sorts the array d.

info = 0: successful exit

< 0: if info = -i, the i-th argument had an illegal value

?laqr6
Performs a single small-bulge multi-shift QR sweep
collecting the transformations.

Syntax
void slaqr6(char* job, MKL_INT* wantt, MKL_INT* wantz, MKL_INT* kacc22, MKL_INT* n,
MKL_INT* ktop, MKL_INT* kbot, MKL_INT* nshfts, float* sr, float* si, float* h, MKL_INT*
ldh, MKL_INT* iloz, MKL_INT* ihiz, float* z, MKL_INT* ldz, float* v, MKL_INT* ldv,
float* u, MKL_INT* ldu, MKL_INT* nv, float* wv, MKL_INT* ldwv, MKL_INT* nh, float* wh,
MKL_INT* ldwh);
void dlaqr6(char* job, MKL_INT* wantt, MKL_INT* wantz, MKL_INT* kacc22, MKL_INT* n,
MKL_INT* ktop, MKL_INT* kbot, MKL_INT* nshfts, double* sr, double* si, double* h,
MKL_INT* ldh, MKL_INT* iloz, MKL_INT* ihiz, double* z, MKL_INT* ldz, double* v, MKL_INT*
ldv, double* u, MKL_INT* ldu, MKL_INT* nv, double* wv, MKL_INT* ldwv, MKL_INT* nh,
double* wh, MKL_INT* ldwh);

1769
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Include Files
• mkl_scalapack.h

Description
This auxiliary function performs a single small-bulge multi-shift QR sweep, moving the chain of bulges from
top to bottom in the submatrix H(ktop:kbot,ktop:kbot), collecting the transformations in the matrix V or
accumulating the transformations in the matrix Z (see below).
This is a modified version of ?laqr5 from LAPACK 3.1.

Input Parameters

job Set the kind of job to do in ?laqr6, as follows:

job = 'I': Introduce and chase bulges in submatrix

job = 'C': Chase bulges from top to bottom of submatrix
job = 'O': Chase bulges off submatrix

wantt wanttis non-zero if the quasi-triangular Schur factor is being computed.

wantt is set to zero otherwise.

wantz wantzis non-zero if the orthogonal Schur factor is being computed. wantz
is set to zero otherwise.

kacc22 Specifies the computation mode of far-from-diagonal orthogonal updates.

= 0: ?laqr6 does not accumulate reflections and does not use matrix-
matrix multiply to update far-from-diagonal matrix entries.
= 1: ?laqr6 accumulates reflections and uses matrix-matrix multiply to
update the far-from-diagonal matrix entries.
= 2: ?laqr6 accumulates reflections, uses matrix-matrix multiply to update
the far-from-diagonal matrix entries, and takes advantage of 2-by-2 block
structure during matrix multiplies.

n n is the order of the Hessenberg matrix H upon which this function

operates.

ktop, kbot These are the first and last rows and columns of an isolated diagonal block
upon which the QR sweep is to be applied. It is assumed without a check
that either ktop = 1 or H(ktop,ktop-1) = 0 and either kbot = n or H(kbot
+1,kbot) = 0.

nshfts nshfts gives the number of simultaneous shifts. nshfts must be positive
and even.

sr, si Array of size nshfts

sr contains the real parts and si contains the imaginary parts of the
nshfts shifts of origin that define the multi-shift QR sweep.

h Array of size ldh * n

On input h contains a Hessenberg matrix H.

ldh ldh is the leading dimension of H just as declared in the calling function.
ldh≥ max(1,n).

1770
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
iloz, ihiz Specify the rows of the matrix Zto which transformations must be applied if
wantzis non-zero. 1≤iloz≤ihiz≤n

z Array of size ldz * ktop

If wantzis non-zero, then the QR sweep orthogonal similarity

transformation is accumulated into the matrix Z(iloz:ihiz,kbot:ktop),
stored in the array z, from the right.

If wantzequals zero, then z is unreferenced.

ldz ldz is the leading dimension of z just as declared in the calling function.
ldz≥n.

v (workspace) array of size ldv * nshfts/2

ldv ldv is the leading dimension of v as declared in the calling function. ldv≥3.

u (workspace) array of size ldu * (3*nshfts-3)

ldu ldu is the leading dimension of u just as declared in the calling function.
ldu≥3*nshfts-3.

nh nh is the number of columns in array wh available for workspace. nh≥1 is

required for usage of this workspace, otherwise the updates of the far-
from-diagonal elements will be updated without level 3 BLAS.

wh (workspace) array of size ldwh * nh

ldwh Leading dimension of wh just as declared in the calling function.

ldwh≥3*nshfts-3.

nv nv is the number of rows in wv available for workspace. nv≥1 is required for

usage of this workspace, otherwise the updates of the far-from-diagonal
elements will be updated without level 3 BLAS.

wv (workspace) array of size ldwv * 3*nshfts

ldwv scalar
ldwv is the leading dimension of wv as declared in the in the calling
function. ldwv≥nv.

OUTPUT Parameters

h A multi-shift QR sweep with shifts sr[j]+i*si[j] is applied to the isolated

diagonal block in matrix rows and columns ktop through kbot.

z If wantzis non-zero, then the QR sweep orthogonal/unitary similarity

transformation is accumulated into the matrix Z(iloz:ihiz,kbot:ktop)
from the right.
If wantzequals zero, then z is unreferenced.

Application Notes
Notes
Based on contributions by Karen Braman and Ralph Byers, Department of Mathematics, University of Kansas,
USA Robert Granat, Department of Computing Science and HPC2N, Umea University, Sweden

1771
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

?lar1va
Computes scaled eigenvector corresponding to given
eigenvalue.

Syntax
void slar1va(MKL_INT* n, MKL_INT* b1, MKL_INT* bn, float* lambda, float* d, float* l,
float* ld, float* lld, float* pivmin, float* gaptol, float* z, MKL_INT* wantnc, MKL_INT*
negcnt, float* ztz, float* mingma, MKL_INT* r, MKL_INT* isuppz, float* nrminv, float*
resid, float* rqcorr, float* work);
void dlar1va(MKL_INT* n, MKL_INT* b1, MKL_INT* bn, double* lambda, double* d, double* l,
double* ld, double* lld, double* pivmin, double* gaptol, double* z, MKL_INT* wantnc,
MKL_INT* negcnt, double* ztz, double* mingma, MKL_INT* r, MKL_INT* isuppz, double*
nrminv, double* resid, double* rqcorr, double* work);

Include Files
• mkl_scalapack.h

Description
?slar1va computes the (scaled) r-th column of the inverse of the submatrix in rows b1 through bn of the
tridiagonal matrix LDLT - λI. When λ is close to an eigenvalue, the computed vector is an accurate
eigenvector. Usually, r corresponds to the index where the eigenvector is largest in magnitude. The following
steps accomplish this computation :

1. Stationary qd transform, LDLT - λI = L+D+L+T,

2. Progressive qd transform, LDLT - λI = U-D-U-T,
3. Computation of the diagonal elements of the inverse of LDLT - λI by combining the above transforms,
and choosing r as the index where the diagonal of the inverse is (one of the) largest in magnitude.
4. Computation of the (scaled) r-th column of the inverse using the twisted factorization obtained by
combining the top part of the stationary and the bottom part of the progressive transform.

Input Parameters

n The order of the matrix LDLT.

b1 First index of the submatrix of LDLT.

bn Last index of the submatrix of LDLT.

lambda The shift λ. In order to compute an accurate eigenvector, lambda should be

a good approximation to an eigenvalue of LDLT.

l Array of size n-1

The (n-1) subdiagonal elements of the unit bidiagonal matrix L, in elements

0 to n-2.

d Array of size n

The n diagonal elements of the diagonal matrix D.

ld Array of size n-1

The n-1 elements l[i]*d[i], i=0,...,n-2.

1772
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lld Array of size n-1

The n-1 elements l[i]l]i]d[i], i=0,...,n-2.

pivmin The minimum pivot in the Sturm sequence.

gaptol Tolerance that indicates when eigenvector entries are negligible with respect
to their contribution to the residual.

z Array of size n

On input, all entries of z must be set to 0.

wantnc Specifies whether negcnt has to be computed.

r The twist index for the twisted factorization used to compute z.

On input, 0 ≤r≤n. If r is input as 0, r is set to the index where (LDLT - σI)-1

is largest in magnitude. If 1 ≤r≤n, r is unchanged.

Ideally, r designates the position of the maximum entry in the eigenvector.

work (Workspace) array of size 4*n

OUTPUT Parameters

z On output, z contains the (scaled) r-th column of the inverse. The scaling is
such that z[r-1] equals 1.

negcnt If wantncis non-zero then negcnt = the number of pivots < pivmin in the
matrix factorization LDLT, and negcnt = -1 otherwise.

ztz The square of the 2-norm of z.

mingma The reciprocal of the largest (in magnitude) diagonal element of the inverse
of LDLT - σI.

r On output, r contains the twist index used to compute z.

isuppz array of size 2

The support of the vector in z, i.e., the vector z is non-zero only in
elements isuppz[0] and isuppz[1].

nrminv nrminv = 1/SQRT( ztz )

resid The residual of the FP vector.

resid = ABS( mingma )/SQRT( ztz )

rqcorr The Rayleigh Quotient correction to lambda.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

?laref
Applies Householder reflectors to matrices on their
rows or columns.

1773
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
void slaref (const char* type, float* a, const MKL_INT* lda, const MKL_INT* wantz,
float* z, const MKL_INT* ldz, const MKL_INT* block, MKL_INT* irow1, MKL_INT* icol1,
const MKL_INT* istart, const MKL_INT* istop, const MKL_INT* itmp1, const MKL_INT*
itmp2, const MKL_INT* liloz, const MKL_INT* lihiz, const float* vecs, float* v2, float*
v3, float* t1, float* t2, float* t3);
void dlaref (const char* type, double* a, const MKL_INT* lda, const MKL_INT* wantz,
double* z, const MKL_INT* ldz, const MKL_INT* block, MKL_INT* irow1, MKL_INT* icol1,
const MKL_INT* istart, const MKL_INT* istop, const MKL_INT* itmp1, const MKL_INT*
itmp2, const MKL_INT* liloz, const MKL_INT* lihiz, const double* vecs, double* v2,
double* v3, double* t1, double* t2, double* t3);
void claref (const char* type, MKL_Complex8* a, const MKL_INT* lda, const MKL_INT*
wantz, MKL_Complex8* z, const MKL_INT* ldz, const MKL_INT* block, MKL_INT* irow1,
MKL_INT* icol1, const MKL_INT* istart, const MKL_INT* istop, const MKL_INT* itmp1,
const MKL_INT* itmp2, const MKL_INT* liloz, const MKL_INT* lihiz, const MKL_Complex8*
vecs, MKL_Complex8* v2, MKL_Complex8* v3, MKL_Complex8* t1, MKL_Complex8* t2,
MKL_Complex8* t3);
void zlaref (const char* type, MKL_Complex16* a, const MKL_INT* lda, const MKL_INT*
wantz, MKL_Complex16* z, const MKL_INT* ldz, const MKL_INT* block, MKL_INT* irow1,
MKL_INT* icol1, const MKL_INT* istart, const MKL_INT* istop, const MKL_INT* itmp1,
const MKL_INT* itmp2, const MKL_INT* liloz, const MKL_INT* lihiz, const MKL_Complex16*
vecs, MKL_Complex16* v2, MKL_Complex16* v3, MKL_Complex16* t1, MKL_Complex16* t2,
MKL_Complex16* t3);

Include Files
• mkl_scalapack.h

Description
?laref applies one or several Householder reflectors of size 3 to one or two matrices (if column is specified)
on either their rows or columns.

Input Parameters

type (local)

If 'R': Apply reflectors to the rows of the matrix (apply from left)
Otherwise: Apply reflectors to the columns of the matrix
Unchanged on exit.

a (local)
Array, lld_a*LOCc(ja+n-1)

On entry, the matrix to receive the reflections.

lda (local)

On entry, the leading dimension of a.

Unchanged on exit.

wantz (local)

1774
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If wantz≠ 0, then apply any column reflections to z as well.

If wantz = 0, then do no additional work on z.

z (local)
Array, ldz*ncols, where the value ncols depends on other arguments. If
wantzwantz≠ 0 and type≠ 'R' then ncols = icol1 + 3*(lihiz - liloz +
1). Otherwise, ncols is unused.
On entry, the second matrix to receive column reflections.
This is changed only if wantz is set.

ldz (local)

On entry, the leading dimension of z.

Unchanged on exit.

block (local)

If nonzero, then apply several reflectors at once and read their data from
the vecs array.

If zero, apply the single reflector given by v2, v3, t1, t2, and t3.

irow1 (local)

On entry, the local row element of a.

icol1 (local)

On entry, the local column element of a.

istart (local)

Specifies the "number" of the first reflector. This is used as an index into
vecs if block is set. istart is ignored if block is zero.

istop (local)

Specifies the "number" of the last reflector. This is used as an index into
vecs if block is set. istop is ignored if block is zero.

itmp1 (local)

Starting range into a. For rows, this is the local first column. For columns,
this is the local first row.

itmp2 (local)

Ending range into a. For rows, this is the local last column. For columns,
this is the local last row.

liloz, lihiz (local)

These serve the same purpose as itmp1, itmp2 but for z when wantz is
set.

vecs (local)

1775
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Array of size 3*N (matrix size)

This holds the size 3 reflectors one after another and this is only accessed
when block is nonzero

v2, v3, t1, t2, t3 (local)

This holds information on a single size 3 Householder reflector and is read
when block is zero, and overwritten when block is nonzero

Output Parameters

a The updated matrix on exit.

z This is changed only if wantz is set.

irow1 Undefined on output.

icol1 Undefined on output.

v2, v3, t1, t2, t3 Overwritten when block is nonzero.

?larrb2
Provides limited bisection to locate eigenvalues for
more accuracy.

Syntax
void slarrb2(MKL_INT* n, float* d, float* lld, MKL_INT* ifirst, MKL_INT* ilast, float*
rtol1, float* rtol2, MKL_INT* offset, float* w, float* wgap, float* werr, float* work,
MKL_INT* iwork, float* pivmin, float* lgpvmn, float* lgspdm, MKL_INT* twist, MKL_INT*
info);
void dlarrb2(MKL_INT* n, double* d, double* lld, MKL_INT* ifirst, MKL_INT* ilast,
double* rtol1, double* rtol2, MKL_INT* offset, double* w, double* wgap, double* werr,
double* work, MKL_INT* iwork, double* pivmin, double* lgpvmn, double* lgspdm, MKL_INT*
twist, MKL_INT* info);

Include Files
• mkl_scalapack.h

Description
Given the relatively robust representation (RRR) LDLT, ?larrb2 does "limited" bisection to refine the
eigenvalues of LDLT with indices in a given range to more accuracy. Initial guesses for these eigenvalues are
input in w, the corresponding estimate of the error in these guesses and their gaps are input in werr and
wgap, respectively. During bisection, intervals [left, right] are maintained by storing their mid-points and
semi-widths in the arrays w and werr respectively. The range of indices is specified by the ifirst, ilast,
and offset parameters, as explained in Input Parameters.

1776
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
NOTE
There are very few minor differences between larrb from LAPACK and this current
function ?larrb2. The most important reason for creating this nearly identical copy is
profiling: in the ScaLAPACK MRRR algorithm, eigenvalue computation using ?larrb2 is used
for refinement in the construction of the representation tree, as opposed to the initial
computation of the eigenvalues for the root RRR which uses ?larrb. When profiling, this
allows an easy quantification of refinement work vs. computing eigenvalues of the root.

Input Parameters

n The order of the matrix.

d Array of size n.

The n diagonal elements of the diagonal matrix D.

lld Array of size n-1.

The (n-1) elements li+1li+1d[i], i=0, ..., n-2.

ifirst The index of the first eigenvalue to be computed.

ilast The index of the last eigenvalue to be computed.

rtol1, rtol2 Tolerance for the convergence of the bisection intervals.

An interval [left, right] has converged if right - left < max (rtol1 * gap,
rtol2 * max(|left|, |right|)) where gap is the (estimated) distance to the
nearest eigenvalue.

offset Offset for the arrays w, wgap and werr, i.e., the elements indexed ifirst -
offset - 1 through ilast - offset -1 of these arrays are to be used.

w Array of size n

On input, w[ifirst - offset - 1] through w[ilast - offset - 1] are

estimates of the eigenvalues of LDLT indexed ifirst through ilast.

wgap Array of size n-1.

On input, the (estimated) gaps between consecutive eigenvalues of LDLT,

i.e., wgap[I - offset - 1] is the gap between eigenvalues I and I + 1. Note
that if ifirst = ilast then wgap[ifirst - offset - 1] must be set to
zero.

werr Array of size n.

On input, werr[ifirst - offset - 1] through werr[ilast - offset - 1]

are the errors in the estimates of the corresponding elements in w.

work (workspace) array of size 4*n.

Workspace.

iwork (workspace) array of size 2*n.

Workspace.

pivmin The minimum pivot in the Sturm sequence.

1777
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lgpvmn Logarithm of pivmin, precomputed.

lgspdm Logarithm of the spectral diameter, precomputed.

twist The twist index for the twisted factorization that is used for the negcount.
twist = n: Compute negcount from LDLT - λI = L+D+L+T
twist = 1: Compute negcount from LDLT - λI = U-D-U-T
twist = r, 1 < r < n: Compute negcount from LDLT - λI = Nr Δr NrT

OUTPUT Parameters

w On output, the eigenvalue estimates in w are refined.

wgap On output, the eigenvalue gaps in wgap are refined.

werr On output, the errors in werr are refined.

info Error flag.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

?larrd2
Computes the eigenvalues of a symmetric tridiagonal
matrix to suitable accuracy.

Syntax
void slarrd2(char* range, char* order, MKL_INT* n, float* vl, float* vu, MKL_INT* il,
MKL_INT* iu, float* gers, float* reltol, float* d, float* e, float* e2, float* pivmin,
MKL_INT* nsplit, MKL_INT* isplit, MKL_INT* m, float* w, float* werr, float* wl, float*
wu, MKL_INT* iblock, MKL_INT* indexw, float* work, MKL_INT* iwork, MKL_INT* dol,
MKL_INT* dou, MKL_INT* info);
void dlarrd2(char* range, char* order, MKL_INT* n, double* vl, double* vu, MKL_INT* il,
MKL_INT* iu, double* gers, double* reltol, double* d, double* e, double* e2, double*
pivmin, MKL_INT* nsplit, MKL_INT* isplit, MKL_INT* m, double* w, double* werr, double*
wl, double* wu, MKL_INT* iblock, MKL_INT* indexw, double* work, MKL_INT* iwork, MKL_INT*
dol, MKL_INT* dou, MKL_INT* info);

Include Files
• mkl_scalapack.h

Description
?larrd2 computes the eigenvalues of a symmetric tridiagonal matrix T to limited initial accuracy. This is an
auxiliary code to be called from larre2a.
?larrd2 has been created using the LAPACK code larrd which itself stems from stebz. The motivation for
creating ?larrd2 is efficiency: When computing eigenvalues in parallel and the input tridiagonal matrix splits
into blocks, ?larrd2 can skip over blocks which contain none of the eigenvalues from DOL to DOU for which
the processor responsible. In extreme cases (such as large matrices consisting of many blocks of small size
like 2x2), the gain can be substantial.

1778
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

range = 'A': ("All") all eigenvalues will be found.

= 'V': ("Value") all eigenvalues in the half-open interval (vl, vu] will be
found.
= 'I': ("Index") eigenvalues of the entire matrix with the indices in a given
range will be found.

order = 'B': ("By Block") the eigenvalues will be grouped by split-off block (see
iblock, isplit) and ordered from smallest to largest within the block.
= 'E': ("Entire matrix") the eigenvalues for the entire matrix will be ordered
from smallest to largest.

n The order of the tridiagonal matrix T. n >= 0.

vl, vu If range='V', the lower and upper bounds of the interval to be searched for
eigenvalues. Eigenvalues less than or equal to vl, or greater than vu, will
not be returned. vl < vu.

Not referenced if range = 'A' or 'I'.

il, iu If range='I', the indices (in ascending order) of the smallest eigenvalue, to
be returned in w[il-1], and largest eigenvalue, to be returned in w[iu-1].

1 ≤il≤iu≤=n, if n > 0; il = 1 and iu = 0 if n = 0.

Not referenced if range = 'A' or 'V'.

gers Array of size 2*n

The n Gerschgorin intervals (the i-th Gerschgorin interval is (gers[2*i-2],

gers[2*i-1])).

reltol The minimum relative width of an interval. When an interval is narrower

than reltol times the larger (in magnitude) endpoint, then it is considered
to be sufficiently small, i.e., converged. Note: this should always be at least
radix*machine epsilon.

d Array of size n

The n diagonal elements of the tridiagonal matrix T.

e Array of size n-1

The (n-1) off-diagonal elements of the tridiagonal matrix T.

e2 Array of size n-1

The (n-1) squared off-diagonal elements of the tridiagonal matrix T.

pivmin The minimum pivot allowed in the sturm sequence for T.

1779
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

nsplit The number of diagonal blocks in the matrix T.

1 ≤nsplit≤n.

isplit Array of size n

The splitting points, at which T breaks up into submatrices.

The first submatrix consists of rows/columns 1 to isplit[0], the second of
rows/columns isplit[0]+1 through isplit[1], etc., and the nsplit-th
submatrix consists of rows/columns isplit[nsplit-2]+1 through
isplit[nsplit-1]=n.
(Only the first nsplit elements will actually be used, but since the user
cannot know a priori what value nsplit will have, n words must be
reserved for isplit.)

work (workspace) Array of size 4*n

iwork (workspace) Array of size 3*n

dol, dou Specifying an index range dol:dou allows the user to work on only a
selected part of the representation tree.
Otherwise, the setting dol=1, dou=n should be applied.

Note that dol and dou refer to the order in which the eigenvalues are
stored in W.

OUTPUT Parameters

m The actual number of eigenvalues found. 0 ≤m≤n.

(See also the description of info=2,3.)

w Array of size n

On exit, the first m elements of w will contain the eigenvalue

approximations. ?larrd2 computes an interval Ij = (aj, bj] that includes
eigenvalue j. The eigenvalue approximation is given as the interval midpoint
w[j-1]= (aj + bj)/2. The corresponding error is bounded by werr[j-1] =
abs(aj - bj)/2.

werr Array of size n

The error bound on the corresponding eigenvalue approximation in w.

wl, wu The interval (wl, wu] contains all the wanted eigenvalues.

If range='V', then wl=vl and wu=vu.

If range='A', then wl and wu are the global Gerschgorin bounds

on the spectrum.
If range='I', then wl and wu are computed by SLAEBZ from the

index range specified.

iblock Array of size n

1780
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
At each row/column j where e[j-1] is zero or small, the matrix T is
considered to split into a block diagonal matrix. On exit, if info = 0,
iblock[i] specifies to which block (from 0 to the number of blocks minus
one) the eigenvalue w[i] belongs. (?larrd2 may use the remaining n-m
elements as workspace.)

indexw Array of size n

The indices of the eigenvalues within each block (submatrix); for example,
indexw[i]= j and iblock[i]=k imply that the (i+1)-th eigenvalue w[i] is the
j-th eigenvalue in block k.

info = 0: successful exit

< 0: if info = -i, the i-th argument had an illegal value

> 0: some or all of the eigenvalues failed to converge or were not

computed:

• =1 or 3: Bisection failed to converge for some eigenvalues; these

eigenvalues are flagged by a negative block number. The effect is that
the eigenvalues may not be as accurate as the absolute and relative
tolerances.
• =2 or 3: range='I' only: Not all of the eigenvalues il:iu were found.
• = 4: range='I', and the Gershgorin interval initially used was too small.
No eigenvalues were computed.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

?larre2
Given a tridiagonal matrix, sets small off-diagonal
elements to zero and for each unreduced block, finds
base representations and eigenvalues.

Syntax
void slarre2(char* range, MKL_INT* n, float* vl, float* vu, MKL_INT* il, MKL_INT* iu,
float* d, float* e, float* e2, float* rtol1, float* rtol2, float* spltol, MKL_INT*
nsplit, MKL_INT* isplit, MKL_INT* m, MKL_INT* dol, MKL_INT* dou, float* w, float* werr,
float* wgap, MKL_INT* iblock, MKL_INT* indexw, float* gers, float* pivmin, float* work,
MKL_INT* iwork, MKL_INT* info);
void dlarre2(char* range, MKL_INT* n, double* vl, double* vu, MKL_INT* il, MKL_INT* iu,
double* d, double* e, double* e2, double* rtol1, double* rtol2, double* spltol, MKL_INT*
nsplit, MKL_INT* isplit, MKL_INT* m, MKL_INT* dol, MKL_INT* dou, double* w, double*
werr, double* wgap, MKL_INT* iblock, MKL_INT* indexw, double* gers, double* pivmin,
double* work, MKL_INT* iwork, MKL_INT* info);

Include Files
• mkl_scalapack.h

Description
To find the desired eigenvalues of a given real symmetric tridiagonal matrix T, ?larre2 sets, via ?larra,
"small" off-diagonal elements to zero. For each block Ti, it finds

• a suitable shift at one end of the block's spectrum,

1781
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

• the root RRR, Ti - σiI = LiDiLiT, and

• eigenvalues of each LiDiLiT.
The representations and eigenvalues found are then returned to ?stegr2 to compute the eigenvectors T.

?larre2 is more suitable for parallel computation than the original LAPACK code for computing the root RRR
and its eigenvalues. When computing eigenvalues in parallel and the input tridiagonal matrix splits into
blocks, ?larre2 can skip over blocks which contain none of the eigenvalues from dol to dou for which the
processor is responsible. In extreme cases (such as large matrices consisting of many blocks of small size,
e.g. 2x2), the gain can be substantial.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

range = 'A': ("All") all eigenvalues will be found.

= 'V': ("Value") all eigenvalues in the half-open interval (vl, vu] will be
found.
= 'I': ("Index") eigenvalues of the entire matrix with the indices in a given
range will be found.

n The order of the matrix. n > 0.

vl, vu If range='V', the lower and upper bounds for the eigenvalues.

Eigenvalues less than or equal to vl, or greater than vu, will not be
returned. vl < vu.

il, iu If range='I', the indices (in ascending order) of the smallest eigenvalue, to
be returned in w[il-1], and largest eigenvalue, to be returned in w[iu-1].

1 ≤il≤iu≤n.

d Array of size n

The n diagonal elements of the tridiagonal matrix T.

e Array of size n

The first (n-1) entries contain the subdiagonal elements of the tridiagonal
matrix T; e[n-1] need not be set.

e2 Array of size n

The first (n-1) entries contain the squares of the subdiagonal elements of
the tridiagonal matrix T; e2[n-1] need not be set.

rtol1, rtol2 Parameters for bisection.

An interval [left, right] has converged if right-left<max( rtol1*gap,
rtol2*max(|left|,|right|) )

spltol The threshold for splitting.

1782
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
dol, dou Specifying an index range dol:dou allows the user to work on only a
selected part of the representation tree. Otherwise, the setting dol=1,
dou=n should be applied.
Note that dol and dou refer to the order in which the eigenvalues are
stored in w.

work Workspace array of size 6*n

iwork Workspace array of size 5*n

OUTPUT Parameters

vl, vu If range='I' or ='A', ?larre2 contains bounds on the desired part of the
spectrum.

d The n diagonal elements of the diagonal matrices Di.

e e contains the subdiagonal elements of the unit bidiagonal matrices Li. The
entries e[isplit[i]], 0 ≤i<nsplit, contain the base points σi+1 on output.

e2 The entries e2[isplit[i]], 0≤i<nsplit, are set to zero.

nsplit The number of blocks T splits into. 1 ≤nsplit≤n.

isplit Array of size n

The splitting points, at which T breaks up into blocks.

The first block consists of rows/columns 1 to isplit[0], the second of
rows/columns isplit[0]+1 through isplit[1], etc., and the nsplit-th
block consists of rows/columns isplit[nsplit-2]+1 through
isplit[nsplit-1]=n.

m The total number of eigenvalues (of all LiDiLiT) found.

w Array of size n

The first m elements contain the eigenvalues. The eigenvalues of each of the
blocks, LiDiLiT, are sorted in ascending order (?larre2 may use the
remaining n-m elements as workspace).

Note that immediately after exiting this function, only the eigenvalues in
wwith indices in range dol-1:dou-1 might rely on this processor when the
eigenvalue computation is done in parallel.

werr Array of size n

The error bound on the corresponding eigenvalue in w.

Note that immediately after exiting this function, only the uncertainties in
werrwith indices in range dol-1:dou-1 might rely on this processor when
the eigenvalue computation is done in parallel.

wgap Array of size n

The separation from the right neighbor eigenvalue in w.

The gap is only with respect to the eigenvalues of the same block as each
block has its own representation tree.

1783
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Exception: at the right end of a block we store the left gap

Note that immediately after exiting this function, only the gaps in wgapwith
indices in range dol-1:dou-1 might rely on this processor when the
eigenvalue computation is done in parallel.

iblock Array of size n

The indices of the blocks (submatrices) associated with the corresponding

eigenvalues in w; iblock[i]=1 if eigenvalue w[i] belongs to the first block
from the top, iblock[i]=2 if w[i] belongs to the second block, and so on.

indexw Array of size n

The indices of the eigenvalues within each block (submatrix); for example,
indexw[i]= 10 and iblock[i]=2 imply that the (i+1)-th eigenvalue w[i] is
the 10th eigenvalue in block 2.

gers Array of size 2*n

The n Gerschgorin intervals (the i-th Gerschgorin interval is (gers[2*i-2],

gers[2*i-1])).

pivmin The minimum pivot in the sturm sequence for T.

info = 0: successful exit

> 0: A problem occurred in ?larre2.

< 0: One of the called functions signaled an internal problem.

Needs inspection of the corresponding parameter info for further
information.
=-1: Problem in ?larrd.

=-2: Not enough internal iterations to find the base representation.

=-3: Problem in ?larrb when computing the refined root representation
for ?lasq2.

=-4: Problem in ?larrb when preforming bisection on the desired part of

the spectrum.
=-5: Problem in ?lasq2

=-6: Problem in ?lasq2

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

?larre2a
Given a tridiagonal matrix, sets small off-diagonal
elements to zero and for each unreduced block, finds
base representations and eigenvalues.

Syntax
void slarre2a(char* range, MKL_INT* n, float* vl, float* vu, MKL_INT* il, MKL_INT* iu,
float* d, float* e, float* e2, float* rtol1, float* rtol2, float* spltol, MKL_INT*
nsplit, MKL_INT* isplit, MKL_INT* m, MKL_INT* dol, MKL_INT* dou, MKL_INT* needil,

1784
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
MKL_INT* neediu, float* w, float* werr, float* wgap, MKL_INT* iblock, MKL_INT* indexw,
float* gers, float* sdiam, float* pivmin, float* work, MKL_INT* iwork, float* minrgp,
MKL_INT* info);
void dlarre2a(char* range, MKL_INT* n, double* vl, double* vu, MKL_INT* il, MKL_INT* iu,
double* d, double* e, double* e2, double* rtol1, double* rtol2, double* spltol, MKL_INT*
nsplit, MKL_INT* isplit, MKL_INT* m, MKL_INT* dol, MKL_INT* dou, MKL_INT* needil,
MKL_INT* neediu, double* w, double* werr, double* wgap, MKL_INT* iblock, MKL_INT*
indexw, double* gers, double* sdiam, double* pivmin, double* work, MKL_INT* iwork,
double* minrgp, MKL_INT* info);

Include Files
• mkl_scalapack.h

Description
To find the desired eigenvalues of a given real symmetric tridiagonal matrix T, ?larre2a sets any "small" off-
diagonal elements to zero, and for each unreduced block Ti, it finds

• a suitable shift at one end of the block's spectrum,

• the base representation, Ti - σiI = LiDiLiT, and
• eigenvalues of each LiDiLiT.

NOTE
The algorithm obtains a crude picture of all the wanted eigenvalues (as selected by range).
However, to reduce work and improve scalability, only the eigenvalues dol to dou are
refined. Furthermore, if the matrix splits into blocks, RRRs for blocks that do not contain
eigenvalues from dol to dou are skipped. The DQDS algorithm (function ?lasq2) is not used,
unlike in the sequential case. Instead, eigenvalues are computed in parallel to some figures
using bisection.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

range = 'A': ("All") all eigenvalues will be found.

= 'V': ("Value") all eigenvalues in the half-open interval (vl, vu] will be
found.
= 'I': ("Index") eigenvalues of the entire matrix with the indices in a given
range will be found.

n The order of the matrix. n > 0.

vl, vu If range='V', the lower and upper bounds for the eigenvalues. Eigenvalues
less than or equal to vl, or greater than vu, will not be returned. vl < vu.

If range='I' or ='A', ?larre2a computes bounds on the desired part of the

spectrum.

1785
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

il, iu If range='I', the indices (in ascending order) of the smallest eigenvalue, to
be returned in w[il-1], and largest eigenvalue, to be returned in w[iu-1].

1 ≤il≤iu≤n.

d Array of size n

On entry, the n diagonal elements of the tridiagonal matrix T.

e Array of size n

The first (n-1) entries contain the subdiagonal elements of the tridiagonal
matrix T; e[n-1] need not be set.

e2 Array of size n

The first (n-1) entries contain the squares of the subdiagonal elements of
the tridiagonal matrix T; e2[n-1] need not be set.

rtol1, rtol2 Parameters for bisection.

An interval [left,right] has converged if right - left < max( rtol1*gap,
rtol2*max(|left|,|right|) )

spltol The threshold for splitting.

dol, dou If the user wants to work on only a selected part of the representation tree,
he can specify an index range dol:dou.

Otherwise, the setting dol=1, dou=n should be applied.

Note that dol and dou refer to the order in which the eigenvalues are
stored in w.

work Workspace array of size 6*n

iwork Workspace array of size 5*n

minrgp The minimum relative gap threshold to decide whether an eigenvalue or a

cluster boundary is reached.

OUTPUT Parameters

vl, vu If range='V', the lower and upper bounds for the eigenvalues. Eigenvalues
less than or equal to vl, or greater than vu, are not returned. vl < vu.

If range='I' or range='A', ?larre2a computes bounds on the desired part

of the spectrum.

d The n diagonal elements of the diagonal matrices Di.

e e contains the subdiagonal elements of the unit bidiagonal matrices Li. The
entries e[isplit[i]], 0 ≤i<nsplit, contain the base points σi+1 on output.

e2 The entries e2[isplit[i ]], 0≤i<nsplit have been set to zero.

nsplit The number of blocks T splits into. 1 ≤nsplit≤n.

isplit Array of size n

The splitting points, at which T breaks up into blocks.

1786
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The first block consists of rows/columns 1 to isplit[0], the second of
rows/columns isplit[0]+1 through isplit[1], etc., and the nsplit-th
block consists of rows/columns isplit[nsplit-2]+1 through
isplit[nsplit-1]=n.

m The total number of eigenvalues (of all LiDiLiT) found.

needil, neediu The indices of the leftmost and rightmost eigenvalues of the root node RRR
which are needed to accurately compute the relevant part of the
representation tree.

w Array of size n

The first m elements contain the eigenvalues. The eigenvalues of each of the
blocks, LiDiLiT, are sorted in ascending order ( ?larre2a may use the
remaining n-m elements as workspace).

Note that immediately after exiting this function, only the eigenvalues in
wwith indices in range dol-1:dou-1 rely on this processor because the
eigenvalue computation is done in parallel.

werr Array of size n

The error bound on the corresponding eigenvalue in w.

Note that immediately after exiting this function, only the uncertainties in
werrwith indices in range dol-1:dou-1 are reliable on this processor
because the eigenvalue computation is done in parallel.

wgap Array of size n

The separation from the right neighbor eigenvalue in w. The gap is only with
respect to the eigenvalues of the same block as each block has its own
representation tree.
Exception: at the right end of a block we store the left gap
Note that immediately after exiting this function, only the gaps in wgapwith
indices in range dol-1:dou-1 are reliable on this processor because the
eigenvalue computation is done in parallel.

iblock Array of size n

The indices of the blocks (submatrices) associated with the corresponding

eigenvalues in w; iblock[i]=1 if eigenvalue w[i] belongs to the first block
from the top, iblock[i]=2 if w[i] belongs to the second block, and so on.

indexw Array of size n

The indices of the eigenvalues within each block (submatrix); for example,
indexw[i]= 10 and iblock[i]=2 imply that the (i+1)-th eigenvalue w[i] is
the 10th eigenvalue in block 2.

gers Array of size 2*n

The n Gerschgorin intervals (the i-th Gerschgorin interval is (gers[2*i-2],

gers[2*i-1])).

pivmin The minimum pivot in the sturm sequence for T.

info = 0: successful exit

1787
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

> 0: A problem occurred in ?larre2a.

< 0: One of the called functions signaled an internal problem. Needs

inspection of the corresponding parameter info for further information.

=-1: Problem in ?larrd2.

=-2: Not enough internal iterations to find base representation.

=-3: Problem in ?larrb2 when computing the refined root representation.

=-4: Problem in ?larrb2 when preforming bisection on the desired part of

the spectrum.
= -9 Problem: m < dou-dol+1, that is the code found fewer eigenvalues
than it was supposed to.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

?larrf2
Finds a new relatively robust representation such that
at least one of the eigenvalues is relatively isolated.

Syntax
void slarrf2(MKL_INT* n, float* d, float* l, float* ld, MKL_INT* clstrt, MKL_INT* clend,
MKL_INT* clmid1, MKL_INT* clmid2, float* w, float* wgap, float* werr, MKL_INT* trymid,
float* spdiam, float* clgapl, float* clgapr, float* pivmin, float* sigma, float* dplus,
float* lplus, float* work, MKL_INT* info);
void dlarrf2(MKL_INT* n, double* d, double* l, double* ld, MKL_INT* clstrt, MKL_INT*
clend, MKL_INT* clmid1, MKL_INT* clmid2, double* w, double* wgap, double* werr, MKL_INT*
trymid, double* spdiam, double* clgapl, double* clgapr, double* pivmin, double* sigma,
double* dplus, double* lplus, double* work, MKL_INT* info);

Include Files
• mkl_scalapack.h

Description
Given the initial representation LDLT and its cluster of close eigenvalues (in a relative measure), defined by
the indices of the first and last eigenvalues in the cluster, ?larrf2 finds a new relatively robust
representation LDLT - σ I = L+D+L+T such that at least one of the eigenvalues of L+D+L+T is relatively
isolated.
This is an enhanced version of ?larrf that also tries shifts in the middle of the cluster, should there be a
large gap, in order to break large clusters into at least two pieces.

Input Parameters

n The order of the matrix (subblock, if the matrix was split).

d Array of size n

The n diagonal elements of the diagonal matrix D.

l Array of size n-1

The (n-1) subdiagonal elements of the unit bidiagonal matrix L.

1788
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ld Array of size n-1

The (n-1) elements l[i]*d[i].

clstrt The index of the first eigenvalue in the cluster.

clend The index of the last eigenvalue in the cluster.

clmid1, clmid2 The index of a middle eigenvalue pair with large gap.

w Array of size ≥ (clend-clstrt+1)

The eigenvalue approximations of LD LT in ascending order. w[clstrt - 1]

through w[clend - 1] form the cluster of relatively close eigenalues.

wgap Array of size ≥ (clend-clstrt+1)

The separation from the right neighbor eigenvalue in w.

werr Array of size ≥ (clend-clstrt+1)

werr contains the semiwidth of the uncertainty interval of the

corresponding eigenvalue approximation in w.

spdiam Estimate of the spectral diameter obtained from the Gerschgorin intervals

clgapl, clgapr Absolute gap on each end of the cluster.

Set by the calling function to protect against shifts too close to eigenvalues
outside the cluster.

pivmin The minimum pivot allowed in the Sturm sequence.

work Workspace array of size 2*n

OUTPUT Parameters

wgap Contains refined values of its input approximations. Very small gaps are
unchanged.

sigma The shift (σ) used to form L+D+L+T.

dplus Array of size n

The n diagonal elements of the diagonal matrix D+.

lplus Array of size n-1

The first (n-1) elements of lplus contain the subdiagonal elements of the
unit bidiagonal matrix L+.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

?larrv2
Computes the eigenvectors of the tridiagonal matrix T
= L*D*LT given L, D and the eigenvalues of L*D*LT.

1789
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Syntax
void slarrv2(MKL_INT* n, float* vl, float* vu, float* d, float* l, float* pivmin,
MKL_INT* isplit, MKL_INT* m, MKL_INT* dol, MKL_INT* dou, MKL_INT* needil, MKL_INT*
neediu, float* minrgp, float* rtol1, float* rtol2, float* w, float* werr, float* wgap,
MKL_INT* iblock, MKL_INT* indexw, float* gers, float* sdiam, float* z, MKL_INT* ldz,
MKL_INT* isuppz, float* work, MKL_INT* iwork, MKL_INT* vstart, MKL_INT* finish,
MKL_INT* maxcls, MKL_INT* ndepth, MKL_INT* parity, MKL_INT* zoffset, MKL_INT* info);
void dlarrv2(MKL_INT* n, double* vl, double* vu, double* d, double* l, double* pivmin,
MKL_INT* isplit, MKL_INT* m, MKL_INT* dol, MKL_INT* dou, MKL_INT* needil, MKL_INT*
neediu, double* minrgp, double* rtol1, double* rtol2, double* w, double* werr, double*
wgap, MKL_INT* iblock, MKL_INT* indexw, double* gers, double* sdiam, double* z, MKL_INT*
ldz, MKL_INT* isuppz, double* work, MKL_INT* iwork, MKL_INT* vstart, MKL_INT* finish,
MKL_INT* maxcls, MKL_INT* ndepth, MKL_INT* parity, MKL_INT* zoffset, MKL_INT* info);

Include Files
• mkl_scalapack.h

Description
?larrv2 computes the eigenvectors of the tridiagonal matrix T = LDLT given L, D and approximations to the
eigenvalues of LDLT. The input eigenvalues should have been computed by larre2a or by previous calls
to ?larrv2.

The major difference between the parallel and the sequential construction of the representation tree is that in
the parallel case, not all eigenvalues of a given cluster might be computed locally. Other processors might
"own" and refine part of an eigenvalue cluster. This is crucial for scalability. Thus there might be
communication necessary before the current level of the representation tree can be parsed.
Please note:

• The calling sequence has two additional integer parameters, dol and dou, that should satisfy
m≥dou≥dol≥1. These parameters are only relevant when both eigenvalues and eigenvectors are computed
(stegr2b parameter jobz = 'V'). ?larrv2 only computes the eigenvectors corresponding to eigenvalues
dol through dou in w. (That is, instead of computing the eigenvectors belonging to w[0] through w[m-1],
only the eigenvectors belonging to eigenvalues w[dol - 1] through w[dou -1] are computed. In this case,
only the eigenvalues dol:dou are guaranteed to be accurately refined to all figures by Rayleigh-Quotient
iteration.
• The additional arguments vstart, finish, ndepth, parity, zoffset are included as a thread-safe
implementation equivalent to save variables. These variables store details about the local representation
tree which is computed layerwise. For scalability reasons, eigenvalues belonging to the locally relevant
representation tree might be computed on other processors. These need to be communicated before the
inspection of the RRRs can proceed on any given layer. Note that only when the variable finish is non-
zero, the computation has ended. All eigenpairs between dol and dou have been computed. m is set to
dou - dol + 1.
• ?larrv2 needs more workspace in z than the sequential slarrv. It is used to store the conformal
embedding of the local representation tree.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

1790
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters

n The order of the matrix. n≥ 0.

vl, vu Lower and upper bounds of the interval that contains the desired
eigenvalues. vl < vu. Needed to compute gaps on the left or right end of
the extremal eigenvalues in the desired range. vu is currently not used but
kept as parameter in case needed.

d Array of size n

The n diagonal elements of the diagonal matrix d. On exit, d is overwritten.

l Array of size n

The (n-1) subdiagonal elements of the unit bidiagonal matrix L are in

elements 0 to n-2 of l (if the matrix is not split.) At the end of each block is
stored the corresponding shift as given by ?larre. On exit, l is
overwritten.

pivmin The minimum pivot allowed in the sturm sequence.

isplit Array of size n

The splitting points, at which the matrix T breaks up into blocks. The first
block consists of rows/columns 1 to isplit[ 0 ], the second of rows/
columns isplit[ 0 ] + 1 through isplit[ 1 ], etc.

m The total number of input eigenvalues. 0 ≤m≤n.

dol, dou If you want to compute only selected eigenvectors from all the eigenvalues
supplied, you can specify an index range dol:dou. Or else the setting
dol=1, dou=m should be applied. Note that dol and dou refer to the order
in which the eigenvalues are stored in w. If you want to compute only
selected eigenpairs, the columns dol-1 to dou+1 of the eigenvector space
Z contain the computed eigenvectors. All other columns of Z are set to
zero.
If dol > 1, then Z(:,dol-1-zoffset) is used.

If dou < m, then Z(:,dou+1-zoffset) is used.

needil, neediu Describe which are the left and right outermost eigenvalues that still need
to be included in the computation. These indices indicate whether
eigenvalues from other processors are needed to correctly compute the
conformally embedded representation tree.
When dol≤needil≤neediu≤dou, all required eigenvalues are local to the
processor and no communication is required to compute its part of the
representation tree.

minrgp The minimum relative gap threshold to decide whether an eigenvalue or a

cluster boundary is reached.

rtol1, rtol2 Parameters for bisection. An interval [left,right] has converged if right-left <
max( rtol1*gap, rtol2*max(|left|,|right|) )

w Array of size n

1791
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

The first m elements of w contain the approximate eigenvalues for which

eigenvectors are to be computed. The eigenvalues should be grouped by
split-off block and ordered from smallest to largest within the block. (The
output array w from ?stegr2a is expected here.) Furthermore, they are
with respect to the shift of the corresponding root representation for their
block.

werr Array of size n

The first m elements contain the semiwidth of the uncertainty interval of the
corresponding eigenvalue in w.

wgap Array of size n

The separation from the right neighbor eigenvalue in w.

iblock Array of size n

The indices of the blocks (submatrices) associated with the corresponding

eigenvalues in w; iblock[i]=1 if eigenvalue w[i] belongs to the first block
from the top, iblock[i]=2 if w[i] belongs to the second block, and so on.

indexw Array of size n

The indices of the eigenvalues within each block (submatrix). For example:
indexw[i]= 10 and iblock[i]=2 imply that the (i+1)-th eigenvalue w[i] is
the 10th eigenvalue in block 2.

gers Array of size 2*n

The n Gerschgorin intervals (the i-th Gerschgorin interval is (gers[2*i-2],

gers[2*i-1])). The Gerschgorin intervals should be computed from the
original unshifted matrix.
Not used but kept as parameter for possible future use.

sdiam Array of size n

The spectral diameters for all unreduced blocks.

ldz The leading dimension of the array z. ldz≥ 1, and if stegr2b parameter
jobz = 'V', ldz≥ max(1,n).

work (workspace) array of size 12*n

iwork (workspace) Array of size 7*n

vstart Non-zero on initialization, set to zero afterwards.

finish A flag that indicates whether all eigenpairs have been computed.

maxcls The largest cluster worked on by this processor in the representation tree.

ndepth The current depth of the representation tree. Set to zero on initial pass,
changed when the deeper levels of the representation tree are generated.

parity An internal parameter needed for the storage of the clusters on the current
level of the representation tree.

zoffset Offset for storing the eigenpairs when z is distributed in 1D-cyclic fashion.

1792
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
OUTPUT Parameters

needil, neediu
w Unshifted eigenvalues for which eigenvectors have already been computed.

werr Contains refined values of its input approximations.

wgap Contains refined values of its input approximations. Very small gaps are
changed.

z Array of size ldz * max(1,m)

If info = 0, the first m columns of the matrix Z, stored in the array z,

contain the orthonormal eigenvectors of the matrix T corresponding to the
input eigenvalues, with the i-th column of Z holding the eigenvector
associated with w[i - 1].

In the distributed version, only a subset of columns is accessed, see dol,

dou, and zoffset.

isuppz Array of size 2*max(1,m)

The support of the eigenvectors in z, i.e., the indices indicating the non-
zero elements in z. The i-th eigenvector is non-zero only in elements
isuppz[ 2*i-2 ] through isuppz[ 2*i-1 ].

vstart Non-zero on initialization, set to zero afterwards.

finish A flag that indicates whether all eigenpairs have been computed.

maxcls The largest cluster worked on by this processor in the representation tree.

ndepth The current depth of the representation tree. Set to zero on initial pass,
changed when the deeper levels of the representation tree are generated.

parity An internal parameter needed for the storage of the clusters on the current
level of the representation tree.

info = 0: successful exit

> 0: A problem occured in ?larrv2.

< 0: One of the called functions signaled an internal problem.

Needs inspection of the corresponding parameter info for further
information.
=-1: Problem in ?larrb2 when refining a child's eigenvalues.

=-2: Problem in ?larrf2 when computing the RRR of a child. When a child
is inside a tight cluster, it can be difficult to find an RRR. A partial remedy
from the user's point of view is to make the parameter minrgp smaller and
recompile. However, as the orthogonality of the computed vectors is
proportional to 1/minrgp, be aware that decreasing minrgp might be
reduce precision.
=-3: Problem in ?larrb2 when refining a single eigenvalue after the
Rayleigh correction was rejected.
= 5: The Rayleigh Quotient Iteration failed to converge to full accuracy.

1793
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

?lasorte
Sorts eigenpairs by real and complex data types.

Syntax
void slasorte (float *s , MKL_INT *lds , MKL_INT *j , float *out , MKL_INT *info );
void dlasorte (double *s , MKL_INT *lds , MKL_INT *j , double *out , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The ?lasortefunction sorts eigenpairs so that real eigenpairs are together and complex eigenpairs are
together. This helps to employ 2x2 shifts easily since every second subdiagonal is guaranteed to be zero. This
function does no parallel work and makes no calls.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

s (local)
Array of size lds.
On entry, a matrix already in Schur form.

lds (local)
On entry, the leading dimension of the array s; unchanged on exit.

j (local)
On entry, the order of the matrix S; unchanged on exit.

out (local)
Array of size 2*j. The work buffer required by the function.

info (local)
Set, if the input matrix had an odd number of real eigenvalues and things
could not be paired or if the input matrix S was not originally in Schur form.
0 indicates successful completion.

Output Parameters

s On exit, the diagonal blocks of S have been rewritten to pair the

eigenvalues. The resulting matrix is no longer similar to the input.

out Work buffer.

1794
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

?lasrt2
Sorts numbers in increasing or decreasing order.

Syntax
void slasrt2 (char *id , MKL_INT *n , float *d , MKL_INT *key , MKL_INT *info );
void dlasrt2 (char *id , MKL_INT *n , double *d , MKL_INT *key , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The ?lasrt2function is modified LAPACK function ?lasrt, which sorts the numbers in d in increasing order
(if id = 'I') or in decreasing order (if id = 'D' ). It uses Quick Sort, reverting to Insertion Sort on arrays
of size ≤ 20. The size of STACK limits n to about 232.

Input Parameters

id = 'I': sort d in increasing order;

= 'D': sort d in decreasing order.

n The length of the array d.

d Array of size n.
On entry, the array to be sorted.

key Array of size n.

On entry, key contains a key to each of the entries in d.

Typically, key[i]= i+1 for all i = 0, ..., n-1.

Output Parameters

d On exit, d has been sorted into increasing order

(d[0] ≤ ... ≤ d[n - 1] )
or into decreasing order
(d[0] ≥ ... ≥ d[n - 1] ),
depending on id.

info = 0: successful exit

< 0: if info = -i, the i-th argument had an illegal value.

key On exit, key is permuted in exactly the same manner as d was permuted
from input to output. Therefore, if key[i] = i+1 for all i =0, ..., n-1 on input,
d[i] on output equals d[key[i]-1] on input.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

1795
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

?stegr2
Computes selected eigenvalues and eigenvectors of a
real symmetric tridiagonal matrix.

Syntax
void sstegr2(char* jobz, char* range, MKL_INT* n, float* d, float* e, float* vl, float*
vu, MKL_INT* il, MKL_INT* iu, MKL_INT* m, float* w, float* z, MKL_INT* ldz, MKL_INT*
nzc, MKL_INT* isuppz, float* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork,
MKL_INT* dol, MKL_INT* dou, MKL_INT* zoffset, MKL_INT* info);
void dstegr2(char* jobz, char* range, MKL_INT* n, double* d, double* e, double* vl,
double* vu, MKL_INT* il, MKL_INT* iu, MKL_INT* m, double* w, double* z, MKL_INT* ldz,
MKL_INT* nzc, MKL_INT* isuppz, double* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT*
liwork, MKL_INT* dol, MKL_INT* dou, MKL_INT* zoffset, MKL_INT* info);

Include Files
• mkl_scalapack.h

Description
?stegr2 computes selected eigenvalues and, optionally, eigenvectors of a real symmetric tridiagonal matrix
T. It is invoked in the ScaLAPACK MRRR driver p?syevr and the corresponding Hermitian version either when
only eigenvalues are to be computed, or when only a single processor is used (the sequential-like case).
?stegr2 has been adapted from LAPACK's ?stegr. Please note the following crucial changes.

1. The calling sequence has two additional integer parameters, dol and dou, that should satisfy
m≥dou≥dol≥1. ?stegr2only computes the eigenpairs corresponding to eigenvalues dol through dou in
w, indexed dol-1 through dou-1. (That is, instead of computing the eigenpairs belonging to w[0]
through w[m-1], only the eigenvectors belonging to eigenvalues w[dol-1] through w[dou-1] are
computed. In this case, only the eigenvalues dol through dou are guaranteed to be fully accurate.
2. m is not the number of eigenvalues specified by range, but is m = dou - dol + 1. This concerns the
case where only eigenvalues are computed, but on more than one processor. Thus, in this case m refers
to the number of eigenvalues computed on this processor.
3. The arrays w and z might not contain all the wanted eigenpairs locally, instead this information is
distributed over other processors.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

jobz = 'N': Compute eigenvalues only;

= 'V': Compute eigenvalues and eigenvectors.

range = 'A': all eigenvalues will be found.

= 'V': all eigenvalues in the half-open interval (vl,vu] will be found.

= 'I': eigenvalues of the entire matrix with the indices in a given range will
be found.

1796
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
n The order of the matrix. n≥ 0.

d Array of size n

On entry, the n diagonal elements of the tridiagonal matrix T. Overwritten

on exit.

e Array of size n

On entry, the (n-1) subdiagonal elements of the tridiagonal matrix T in

elements 0 to n-2 of e. e[n-1] need not be set on input, but is used
internally as workspace. Overwritten on exit.

vl
vu If range='V', the lower and upper bounds of the interval to be searched for
eigenvalues. vl < vu.

Not referenced if range = 'A' or 'I'.

il, iu If range='I', the indices (in ascending order) of the smallest eigenvalue, to
be returned in w[il-1], and largest eigenvalue, to be returned in w[iu-1].

1 ≤il≤iu≤n, if n > 0.

Not referenced if range = 'A' or 'V'.

ldz The leading dimension of the array z. ldz≥ 1, and if jobz = 'V', then ldz≥
max(1,n).

nzc The number of eigenvectors to be held in the array z, storing the matrix Z.

If range = 'A', then nzc≥ max(1,n).

If range = 'V', then nzc≥ the number of eigenvalues in (vl,vu].

If range = 'I', then nzc≥iu-il+1.

If nzc = -1, then a workspace query is assumed; the function calculates the
number of columns of the matrix Z that are needed to hold the
eigenvectors. This value is returned as the first entry of the z array, and no
error message related to nzc is issued.

lwork The size of the array work. lwork≥ max(1,18*n)

if jobz = 'V', and lwork≥ max(1,12*n) if jobz = 'N'. If lwork = -1, then a
workspace query is assumed; the function only calculates the optimal size
of the work array, returns this value as the first entry of the work array,
and no error message related to lwork is issued.

liwork The size of the array iwork. liwork≥ max(1,10*n) if the eigenvectors are
desired, and liwork≥ max(1,8*n) if only the eigenvalues are to be
computed.
If liwork = -1, then a workspace query is assumed; the function only
calculates the optimal size of the iwork array, returns this value as the first
entry of the iwork array, and no error message related to liwork is issued.

dol, dou From the eigenvalues w[0] through w[m-1], only eigenvectors Z(:,dol) to
Z(:,dou) are computed.

If dol > 1, then Z(:,dol-1-zoffset) is used and overwritten.

1797
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If dou < m, then Z(:,dou+1-zoffset) is used and overwritten.

zoffset Offset for storing the eigenpairs when z is distributed in 1D-cyclic fashion

OUTPUT Parameters

m Globally summed over all processors, m equals the total number of

eigenvalues found. 0 ≤m≤n. If range = 'A', m = n, and if range = 'I', m = iu-
il+1. The local output equals m = dou - dol + 1.

w Array of size n

The first m elements contain the selected eigenvalues in ascending order.

Note that immediately after exiting this function, only the eigenvalues
indexed dol-1 through dou-1 are reliable on this processor because the
eigenvalue computation is done in parallel. Other processors will hold
reliable information on other parts of the w array. This information is
communicated in the ScaLAPACK driver.

z Array of size ldz * max(1,m).

If jobz = 'V', and if info = 0, then the first m columns of the matrix Z
stored in z contain some of the orthonormal eigenvectors of the matrix T
corresponding to the selected eigenvalues, with the i-th column of Z holding
the eigenvector associated with w[i-1].

If jobz = 'N', then z is not referenced.

Note: the user must ensure that at least max(1,m) columns of the matrix
are supplied in the array z; if range = 'V', the exact value of m is not known
in advance and can be computed with a workspace query by setting nzc =
-1, see below.

isuppz array of size 2*max(1,m)

The support of the eigenvectors in z, i.e., the indices indicating the nonzero
elements in z. The i-th computed eigenvector is nonzero only in elements
isuppz[ 2*i-2 ] through isuppz[ 2*i -1]. This is relevant in the case when
the matrix is split. isuppz is only set if n>2.

work On exit, if info = 0, work[0] returns the optimal (and minimal) lwork.

iwork On exit, if info = 0, iwork[0] returns the optimal liwork.

info On exit, info

= 0: successful exit
other:if info = -i, the i-th argument had an illegal value

if info = 10X, internal error in ?larre2,

if info = 20X, internal error in ?larrv.

Here, the digit X = ABS( iinfo ) < 10, where iinfo is the nonzero error
code returned by ?larre2 or ?larrv, respectively.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

1798
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

?stegr2a
Computes selected eigenvalues and initial
representations needed for eigenvector computations.

Syntax
void sstegr2a(char* jobz, char* range, MKL_INT* n, float* d, float* e, float* vl, float*
vu, MKL_INT* il, MKL_INT* iu, MKL_INT* m, float* w, float* z, MKL_INT* ldz, MKL_INT*
nzc, float* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork, MKL_INT* dol,
MKL_INT* dou, MKL_INT* needil, MKL_INT* neediu, MKL_INT* inderr, MKL_INT* nsplit,
float* pivmin, float* scale, float* wl, float* wu, MKL_INT* info);
void dstegr2a(char* jobz, char* range, MKL_INT* n, double* d, double* e, double* vl,
double* vu, MKL_INT* il, MKL_INT* iu, MKL_INT* m, double* w, double* z, MKL_INT* ldz,
MKL_INT* nzc, double* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork, MKL_INT*
dol, MKL_INT* dou, MKL_INT* needil, MKL_INT* neediu, MKL_INT* inderr, MKL_INT* nsplit,
double* pivmin, double* scale, double* wl, double* wu, MKL_INT* info);

Include Files
• mkl_scalapack.h

Description
?stegr2a computes selected eigenvalues and initial representations needed for eigenvector computations
in ?stegr2b. It is invoked in the ScaLAPACK MRRR driver p?syevr and the corresponding Hermitian version
when both eigenvalues and eigenvectors are computed in parallel on multiple processors. For this
case, ?stegr2a implements the first part of the MRRR algorithm, parallel eigenvalue computation and finding
the root RRR. At the end of ?stegr2a, other processors might have a part of the spectrum that is needed to
continue the computation locally. Once this eigenvalue information has been received by the processor, the
computation can then proceed by calling the second part of the parallel MRRR algorithm, ?stegr2b.

Please note:

• The calling sequence has two additional integer parameters, (compared to LAPACK's stegr), these are
dol and dou and should satisfy m≥dou≥dol≥1. These parameters are only relevant for the case jobz = 'V'.
Globally invoked over all processors, ?stegr2a computes all the eigenvalues specified by range.

?stegr2a locally only computes the eigenvalues corresponding to eigenvalues dol through dou in w,
indexed dol-1 through dou-1. (That is, instead of computing the eigenvectors belonging to w([0] through
w[m-1], only the eigenvectors belonging to eigenvalues w[dol-1] through w[dou-1] are computed. In this
case, only the eigenvalues dol through dou are guaranteed to be fully accurate.
• m is not the number of eigenvalues specified by range, but it is m = dou - dol + 1. Instead, m refers to
the number of eigenvalues computed on this processor.
• While no eigenvectors are computed in ?stegr2a itself (this is done later in ?stegr2b), the interface
If jobz = 'V' then, depending on range and dol, dou, ?stegr2a might need more workspace in z then
the original ?stegr. In particular, the arrays w and z might not contain all the wanted eigenpairs locally,
instead this information is distributed over other processors.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

1799
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

jobz = 'N': Compute eigenvalues only;

= 'V': Compute eigenvalues and eigenvectors.

range = 'A': all eigenvalues will be found.

= 'V': all eigenvalues in the half-open interval (vl,vu] will be found.

= 'I': eigenvalues of the entire matrix with the indices in a given range will
be found.

n The order of the matrix. n≥ 0.

d Array of size n

The n diagonal elements of the tridiagonal matrix T. Overwritten on exit.

e Array of size n

On entry, the (n-1) subdiagonal elements of the tridiagonal matrix T in

elements 0 to n-2 of e. e[n-1] need not be set on input, but is used
internally as workspace. Overwritten on exit.

vl, vu If range='V', the lower and upper bounds of the interval to be searched for
eigenvalues. vl < vu.

Not referenced if range = 'A' or 'I'.

il, iu If range='I', the indices (in ascending order) of the smallest eigenvalue, to
be returned in w[il-1], and largest eigenvalue, to be returned in w[iu-1]. 1
≤il≤iu≤n, if n > 0.

Not referenced if range = 'A' or 'V'.

ldz The leading dimension of the array z. ldz≥ 1, and if jobz = 'V', then ldz≥
max(1,n).

nzc The number of eigenvectors to be held in the array z.

If range = 'A', then nzc≥ max(1,n).

If range = 'V', then nzc≥ the number of eigenvalues in (vl,vu].

If range = 'I', then nzc≥iu-il+1.

If nzc = -1, then a workspace query is assumed; the function calculates the
number of columns of the matrix stored in array z that are needed to hold
the eigenvectors. This value is returned as the first entry of the z array, and
no error message related to nzc is issued.

lwork The size of the array work. lwork≥ max(1,18*n) if jobz = 'V', and lwork≥
max(1,12*n) if jobz = 'N'.

If lwork = -1, then a workspace query is assumed; the function only

calculates the optimal size of the work array, returns this value as the first
entry of the work array, and no error message related to lwork is issued.

liwork The size of the array iwork. liwork≥ max(1,10*n) if the eigenvectors are
desired, and liwork≥ max(1,8*n) if only the eigenvalues are to be
computed.

1800
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If liwork = -1, then a workspace query is assumed; the function only
calculates the optimal size of the iwork array, returns this value as the first
entry of the iwork array, and no error message related to liwork is issued.

dol, dou From all the eigenvalues w[0] through w[m-1], only eigenvalues w[dol-1]
through w[dou-1] are computed.

OUTPUT Parameters

m Globally summed over all processors, m equals the total number of

eigenvalues found. 0 ≤m≤n.

If range = 'A', m = n, and if range = 'I', m = iu-il+1.

The local output equals m = dou - dol + 1.

w Array of size n

The first m elements contain approximations to the selected eigenvalues in

ascending order. Note that immediately after exiting this function, only the
eigenvalues indexed dol-1 through dou-1 are reliable on this processor
because the eigenvalue computation is done in parallel. The other entries
are very crude preliminary approximations. Other processors hold reliable
information on these other parts of the w array.

This information is communicated in the ScaLAPACK driver.

z Array of size ldz * max(1,m).

?stegr2a does not compute eigenvectors, this is done in ?stegr2b. The

argument z as well as all related other arguments only appear to keep the
interface consistent and to signal to the user that this function is meant to
be used when eigenvectors are computed.

work On exit, if info = 0, work[0] returns the optimal (and minimal) lwork.

iwork On exit, if info = 0, iwork[0] returns the optimal liwork.

needil, neediu The indices of the leftmost and rightmost eigenvalues needed to accurately
compute the relevant part of the representation tree. This information can
be used to find out which processors have the relevant eigenvalue
information needed so that it can be communicated.

inderr inderr points to the place in the work space where the eigenvalue
uncertainties (errors) are stored.

nsplit The number of blocks into which T splits. 1 ≤nsplit≤n.

pivmin The minimum pivot in the sturm sequence for T.

scale The scaling factor for the tridiagonal T.

wl, wu The interval (wl, wu] contains all the wanted eigenvalues.

It is either given by the user or computed in ?larre2a.

info On exit, info = 0: successful exit

other: if info = -i, the i-th argument had an illegal value

if info = 10x, internal error in ?larre2a,

1801
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Here, the digit x = abs( iinfo ) < 10, where iinfo is the nonzero error
code returned by ?larre2a.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

?stegr2b
From eigenvalues and initial representations computes
the selected eigenvalues and eigenvectors of the real
symmetric tridiagonal matrix in parallel on multiple
processors.

Syntax
void sstegr2b(char* jobz, MKL_INT* n, float* d, float* e, MKL_INT* m, float* w, float*
z, MKL_INT* ldz, MKL_INT* nzc, MKL_INT* isuppz, float* work, MKL_INT* lwork, MKL_INT*
iwork, MKL_INT* liwork, MKL_INT* dol, MKL_INT* dou, MKL_INT* needil, MKL_INT* neediu,
MKL_INT* indwlc, float* pivmin, float* scale, float* wl, float* wu, MKL_INT* vstart,
MKL_INT* finish, MKL_INT* maxcls, MKL_INT* ndepth, MKL_INT* parity, MKL_INT* zoffset,
MKL_INT* info);
void dstegr2b(char* jobz, MKL_INT* n, double* d, double* e, MKL_INT* m, double* w,
double* z, MKL_INT* ldz, MKL_INT* nzc, MKL_INT* isuppz, double* work, MKL_INT* lwork,
MKL_INT* iwork, MKL_INT* liwork, MKL_INT* dol, MKL_INT* dou, MKL_INT* needil, MKL_INT*
neediu, MKL_INT* indwlc, double* pivmin, double* scale, double* wl, double* wu, MKL_INT*
vstart, MKL_INT* finish, MKL_INT* maxcls, MKL_INT* ndepth, MKL_INT* parity, MKL_INT*
zoffset, MKL_INT* info);

Include Files
• mkl_scalapack.h

Description
?stegr2b should only be called after a call to ?stegr2a. From eigenvalues and initial representations
computed by ?stegr2a, ?stegr2b computes the selected eigenvalues and eigenvectors of the real
symmetric tridiagonal matrix in parallel on multiple processors. It is potentially invoked multiple times on a
given processor because the locally relevant representation tree might depend on spectral information that is
"owned" by other processors and might need to be communicated.
Please note:

• The calling sequence has two additional integer parameters, dol and dou, that should satisfy
m≥dou≥dol≥1. These parameters are only relevant for the case jobz = 'V'. ?stegr2b only computes the
eigenvectors corresponding to eigenvalues dol through dou in w, indexed dol-1 through dou-1. (That is,
instead of computing the eigenvectors belonging to w([0] through w[m-1], only the eigenvectors belonging
to eigenvalues w[dol-1] through w[dou-1] are computed. In this case, only the eigenvalues dol through
dou are guaranteed to be accurately refined to all figures by Rayleigh-Quotient iteration.
• The additional arguments vstart, finish, ndepth, parity, zoffset are included as a thread-safe
implementation equivalent to save variables. These variables store details about the local representation
tree which is computed layerwise. For scalability reasons, eigenvalues belonging to the locally relevant
representation tree might be computed on other processors. These need to be communicated before the
inspection of the RRRs can proceed on any given layer. Note that only when the variable finishis non-
zero, the computation has ended. All eigenpairs between dol and dou have been computed. m is set to
dou - dol + 1.
• ?stegr2b needs more workspace in z than the sequential ?stegr. It is used to store the conformal
embedding of the local representation tree.

1802
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

jobz = 'N': Compute eigenvalues only;

= 'V': Compute eigenvalues and eigenvectors.

n The order of the matrix. n≥ 0.

d Array of size n

The n diagonal elements of the tridiagonal matrix T. Overwritten on exit.

e Array of size n

The (n-1) subdiagonal elements of the tridiagonal matrix T in elements 0 to

n-2 of e. e[n-1] need not be set on input, but is used internally as
workspace. Overwritten on exit.

m The total number of eigenvalues found in ?stegr2a. 0 ≤m≤n.

w Array of size n

The first m elements contain approximations to the selected eigenvalues in

ascending order. Note that only the eigenvalues from the locally relevant
part of the representation tree, that is all the clusters that include
eigenvalues from dol through dou, are reliable on this processor. (It does
not need to know about any others anyway.)

ldz The leading dimension of the array z. ldz≥ 1, and if jobz = 'V', then ldz≥
max(1,n).

nzc The number of eigenvectors to be held in the array z, storing the matrix Z.

lwork The size of the array work. lwork≥ max(1,18*n)

if jobz = 'V', and lwork≥ max(1,12*n) if jobz = 'N'.

If lwork = -1, then a workspace query is assumed; the function only

calculates the optimal size of the work array, returns this value as the first
entry of the work array, and no error message related to lwork is issued.

dol, dou From the eigenvalues w[0] through w[m-1], only eigenvectors Z(:,dol) to
Z(:,dou) are computed.

If dol > 1, then Z(:,dol-1-zoffset) is used and overwritten.

1803
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If dou < m, then Z(:,dou+1-zoffset) is used and overwritten.

needil, neediu Describes which are the left and right outermost eigenvalues still to be
computed. Initially computed by ?larre2a, modified in the course of the
algorithm.

pivmin The minimum pivot in the sturm sequence for T.

scale The scaling factor for T. Used for unscaling the eigenvalues at the very end
of the algorithm.

wl, wu The interval (wl, wu] contains all the wanted eigenvalues.

vstart Non-zero on initialization, set to zero afterwards.

finish Indicates whether all eigenpairs have been computed.

maxcls The largest cluster worked on by this processor in the representation tree.

ndepth The current depth of the representation tree. Set to zero on initial pass,
changed when the deeper levels of the representation tree are generated.

parity An internal parameter needed for the storage of the clusters on the current
level of the representation tree.

zoffset Offset for storing the eigenpairs when z is distributed in 1D-cyclic fashion.

OUTPUT Parameters

z Array of size ldz * max(1,m)

If jobz = 'V', and if info = 0, then a subset of the first m columns of the
matrix Z, stored in z, contain the orthonormal eigenvectors of the matrix T
corresponding to the selected eigenvalues, with the i-th column of Z holding
the eigenvector associated with w[i-1].

See dol, dou for more information.

isuppz array of size 2*max(1,m).

work On exit, if info = 0, work[0] returns the optimal (and minimal) lwork.

iwork On exit, if info = 0, iwork[0] returns the optimal liwork.

needil, neediu Modified in the course of the algorithm.

indwlc Pointer into the workspace location where the local eigenvalue
representations are stored. ("Local eigenvalues" are those relative to the
individual shifts of the RRRs.)

vstart Non-zero on initialization, set to zero afterwards.

finish Indicates whether all eigenpairs have been computed

maxcls The largest cluster worked on by this processor in the representation tree.

1804
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ndepth The current depth of the representation tree. Set to zero on initial pass,
changed when the deeper levels of the representation tree are generated.

parity An internal parameter needed for the storage of the clusters on the current
level of the representation tree.

info On exit, info

= 0: successful exit
other:if info = -i, the i-th argument had an illegal value

if info = 20x, internal error in ?larrv2.

Here, the digit x = abs( iinfo ) < 10, where iinfo is the nonzero error
code returned by ?larrv2

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

?stein2
Computes the eigenvectors corresponding to specified
eigenvalues of a real symmetric tridiagonal matrix,
using inverse iteration.

Syntax
void sstein2 (MKL_INT *n , float *d , float *e , MKL_INT *m , float *w , MKL_INT
*iblock , MKL_INT *isplit , float *orfac , float *z , MKL_INT *ldz , float *work ,
MKL_INT *iwork , MKL_INT *ifail , MKL_INT *info );
void dstein2 (MKL_INT *n , double *d , double *e , MKL_INT *m , double *w , MKL_INT
*iblock , MKL_INT *isplit , double *orfac , double *z , MKL_INT *ldz , double *work ,
MKL_INT *iwork , MKL_INT *ifail , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The ?stein2function is a modified LAPACK function ?stein. It computes the eigenvectors of a real
symmetric tridiagonal matrix T corresponding to specified eigenvalues, using inverse iteration.
The maximum number of iterations allowed for each eigenvector is specified by an internal parameter maxits
(currently set to 5).

Input Parameters

n The order of the matrix T (n ≥ 0).

m The number of eigenvectors to be found (0 ≤ m ≤ n).

d, e , w Arrays:
d, of size n. The n diagonal elements of the tridiagonal matrix T.
e, of size n.
The (n-1) subdiagonal elements of the tridiagonal matrix T, in elements 1
to n-1. e[n-1] need not be set.

1805
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

w, of size n.
The first m elements of w contain the eigenvalues for which eigenvectors are
to be computed. The eigenvalues should be grouped by split-off block and
ordered from smallest to largest within the block. (The output array w
from ?stebz with ORDER = 'B' is expected here).

The size of w must be at least max(1, n).

iblock Array of size n.

The submatrix indices associated with the corresponding eigenvalues in w;

iblock[i] = 1, if eigenvalue w[i] belongs to the first submatrix from the

top,
iblock[i] = 2, if eigenvalue w[i] belongs to the second submatrix, etc. (The
output array iblock from ?stebz is expected here).

isplit Array of size n.

The splitting points, at which T breaks up into submatrices. The first

submatrix consists of rows/columns 1 to isplit[0], the second submatrix
consists of rows/columns isplit[0]+1 through isplit[1], etc. (The
output array isplit from ?stebz is expected here).

orfac orfac specifies which eigenvectors should be orthogonalized. Eigenvectors

that correspond to eigenvalues which are within orfac*||T|| of each other
are to be orthogonalized.

ldz The leading dimension of the output array z; ldz ≥ max(1, n).

work Workspace array of size 5n.

iwork Workspace array of size n.

Output Parameters

z Array of size ldz * m.

The computed eigenvectors. The eigenvector associated with the eigenvalue

w[i] is stored in the (i+1)-th column of the matrix Z represented by z,
i=0, ..., m-1. Any vector that fails to converge is set to its current iterate
after maxits iterations.

ifail Array of size m.

On normal exit, all elements of ifail are zero. If one or more eigenvectors
fail to converge after maxits iterations, then their indices are stored in the
array ifail.

info info = 0, the exit is successful.

info < 0: if info = -i, the i-th had an illegal value.
info > 0: if info = i, then i eigenvectors failed to converge in maxits
iterations. Their indices are stored in the array ifail.

1806
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

?dbtf2
Computes an LU factorization of a general band matrix
with no pivoting (local unblocked algorithm).

Syntax
void sdbtf2 (MKL_INT *m , MKL_INT *n , MKL_INT *kl , MKL_INT *ku , float *ab , MKL_INT
*ldab , MKL_INT *info );
void ddbtf2 (MKL_INT *m , MKL_INT *n , MKL_INT *kl , MKL_INT *ku , double *ab , MKL_INT
*ldab , MKL_INT *info );
void cdbtf2 (MKL_INT *m , MKL_INT *n , MKL_INT *kl , MKL_INT *ku , MKL_Complex8 *ab ,
MKL_INT *ldab , MKL_INT *info );
void zdbtf2 (MKL_INT *m , MKL_INT *n , MKL_INT *kl , MKL_INT *ku , MKL_Complex16 *ab ,
MKL_INT *ldab , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The ?dbtf2function computes an LU factorization of a general real/complex m-by-n band matrix A without
using partial pivoting with row interchanges.
This is the unblocked version of the algorithm, calling BLAS Routines and Functions.

Input Parameters

m The number of rows of the matrix A(m ≥ 0).

n The number of columns in A(n ≥ 0).

kl The number of sub-diagonals within the band of A(kl ≥ 0).

ku The number of super-diagonals within the band of A(ku ≥ 0).

ab Array of size ldab * n.

The matrix A in band storage, in rows kl+1 to 2kl+ku+1; rows 1 to kl
of the matrix need not be set. The j-th column of A is stored in the
array ab as follows: ab[kl+ku+i-j+(j-1)*ldab] = A(i,j) for max(1,j-
ku) ≤ i ≤ min(m,j+kl).
ldab The leading dimension of the array ab.
(ldab ≥ 2kl + ku +1)

Output Parameters

ab On exit, details of the factorization: U is stored as an upper triangular band

matrix with kl+ku superdiagonals in rows 1 to kl+ku+1, and the multipliers
used during the factorization are stored in rows kl+ku+2 to 2*kl+ku+1.
See the Application Notes below for further details.

info = 0: successful exit

1807
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

< 0: if info = - i, the i-th argument had an illegal value,

> 0: if info = + i, the matrix elementU(i,i) is 0. The factorization has been
completed, but the factor U is exactly singular. Division by 0 will occur if
you use the factor U for solving a system of linear equations.

Application Notes
The band storage scheme is illustrated by the following example, when m = n = 6, kl = 2, ku = 1:

The function does not use array elements marked *; elements marked + need not be set on entry, but the
function requires them to store elements of U, because of fill-in resulting from the row interchanges.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

?dbtrf
Computes an LU factorization of a general band matrix
with no pivoting (local blocked algorithm).

Syntax
void sdbtrf (MKL_INT *m , MKL_INT *n , MKL_INT *kl , MKL_INT *ku , float *ab , MKL_INT
*ldab , MKL_INT *info );
void ddbtrf (MKL_INT *m , MKL_INT *n , MKL_INT *kl , MKL_INT *ku , double *ab , MKL_INT
*ldab , MKL_INT *info );
void cdbtrf (MKL_INT *m , MKL_INT *n , MKL_INT *kl , MKL_INT *ku , MKL_Complex8 *ab ,
MKL_INT *ldab , MKL_INT *info );
void zdbtrf (MKL_INT *m , MKL_INT *n , MKL_INT *kl , MKL_INT *ku , MKL_Complex16 *ab ,
MKL_INT *ldab , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
This function computes an LU factorization of a real m-by-n band matrix A without using partial pivoting or
row interchanges.
This is the blocked version of the algorithm, calling BLAS Routines and Functions.

Input Parameters

m The number of rows of the matrix A (m ≥ 0).

1808
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
n The number of columns in A(n ≥ 0).

kl The number of sub-diagonals within the band of A(kl ≥ 0).

ku The number of super-diagonals within the band of A(ku ≥ 0).

ab Array of size ldab * n.

The matrix A in band storage, in rows kl+1 to 2kl+ku+1; rows 1 to kl need

not be set. The j-th column of A is stored in the array ab as follows: ab[kl
+ku+i-j+(j-1)*ldab] = A(i,j) for max(1,j-ku) ≤ i ≤ min(m,j
+kl).

ldab The leading dimension of the array ab.

(ldab ≥ 2kl + ku +1)

Output Parameters

ab On exit, details of the factorization: U is stored as an upper triangular band

matrix with kl+ku superdiagonals in rows 1 to kl+ku+1, and the multipliers
used during the factorization are stored in rows kl+ku+2 to 2*kl+ku+1.
See the Application Notes below for further details.

info = 0: successful exit

< 0: if info = - i, the i-th argument had an illegal value,
> 0: if info = + i, the matrix element U(i,i) is 0. The factorization
has been completed, but the factor U is exactly singular. Division by
0 will occur if you use the factor U for solving a system of linear
equations.

Application Notes
The band storage scheme is illustrated by the following example, when m = n = 6, kl = 2, ku = 1:

The function does not use array elements marked *.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

?dttrf
Computes an LU factorization of a general tridiagonal
matrix with no pivoting (local blocked algorithm).

Syntax
void sdttrf (MKL_INT *n , float *dl , float *d , float *du , MKL_INT *info );

1809
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

void ddttrf (MKL_INT *n , double *dl , double *d , double *du , MKL_INT *info );
void cdttrf (MKL_INT *n , MKL_Complex8 *dl , MKL_Complex8 *d , MKL_Complex8 *du ,
MKL_INT *info );
void zdttrf (MKL_INT *n , MKL_Complex16 *dl , MKL_Complex16 *d , MKL_Complex16 *du ,
MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The ?dttrffunction computes an LU factorization of a real or complex tridiagonal matrix A using elimination
without partial pivoting.
The factorization has the form A = L*U, where L is a product of unit lower bidiagonal matrices and U is upper
triangular with nonzeros only in the main diagonal and first superdiagonal.

Input Parameters

n The order of the matrix A(n ≥ 0).

dl, d, du Arrays containing elements of A.

The array dl of size (n-1) contains the sub-diagonal elements of A.

The array d of size n contains the diagonal elements of A.

The array du of size (n-1) contains the super-diagonal elements of A.

Output Parameters

dl Overwritten by the (n-1) multipliers that define the matrix L from the LU
factorization of A.

d Overwritten by the n diagonal elements of the upper triangular matrix U

from the LU factorization of A.

du Overwritten by the (n-1) elements of the first super-diagonal of U.

info = 0: successful exit

< 0: if info = - i, the i-th argument had an illegal value,

> 0: if info = i, the matrix element U(i,i) is exactly 0. The factorization has
been completed, but the factor U is exactly singular. Division by 0 will occur
if you use the factor U for solving a system of linear equations.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

?dttrsv
Solves a general tridiagonal system of linear equations
using the LU factorization computed by ?dttrf.

Syntax
void sdttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , float *dl , float
*d , float *du , float *b , MKL_INT *ldb , MKL_INT *info );

1810
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void ddttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , double *dl ,
double *d , double *du , double *b , MKL_INT *ldb , MKL_INT *info );
void cdttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8
*dl , MKL_Complex8 *d , MKL_Complex8 *du , MKL_Complex8 *b , MKL_INT *ldb , MKL_INT
*info );
void zdttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16
*dl , MKL_Complex16 *d , MKL_Complex16 *du , MKL_Complex16 *b , MKL_INT *ldb , MKL_INT
*info );

Include Files
• mkl_scalapack.h

Description
The ?dttrsvfunction solves one of the following systems of linear equations:

LX = B, LTX = B, or LH*X = B,

U*X = B, UT*X = B, or UH*X = B
with factors of the tridiagonal matrix A from the LU factorization computed by ?dttrf.

Input Parameters

uplo Specifies whether to solve with L or U.

trans Must be 'N' or 'T' or 'C'.

Indicates the form of the equations:

If trans = 'N', then A*X=B is solved for X (no transpose).

If trans = 'T', then AT*X = B is solved for X (transpose).

If trans = 'C', then AH*X = B is solved for X (conjugate transpose).

n The order of the matrix A(n ≥ 0).

nrhs The number of right-hand sides, that is, the number of columns in the
matrix B(nrhs ≥ 0).

dl,d,du,b The array dl of size (n - 1) contains the (n - 1) multipliers that define the
matrix L from the LU factorization of A.
The array d of size n contains n diagonal elements of the upper triangular
matrix U from the LU factorization of A.
The array du of size (n - 1) contains the (n - 1) elements of the first super-
diagonal of U.
On entry, the array b of size ldb * nrhs contains the right-hand side of
matrix B.

ldb The leading dimension of the array b; ldb ≥ max(1, n).

Output Parameters

b Overwritten by the solution matrix X.

info If info=0, the execution is successful.

1811
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

If info = -i, the i-th parameter had an illegal value.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

?pttrsv
Solves a symmetric (Hermitian) positive-definite
tridiagonal system of linear equations, using the
L*D*LH factorization computed by ?pttrf.

Syntax
void spttrsv (char *trans , MKL_INT *n , MKL_INT *nrhs , float *d , float *e , float
*b , MKL_INT *ldb , MKL_INT *info );
void dpttrsv (char *trans , MKL_INT *n , MKL_INT *nrhs , double *d , double *e , double
*b , MKL_INT *ldb , MKL_INT *info );
void cpttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , float *d ,
MKL_Complex8 *e , MKL_Complex8 *b , MKL_INT *ldb , MKL_INT *info );
void zpttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , double *d ,
MKL_Complex16 *e , MKL_Complex16 *b , MKL_INT *ldb , MKL_INT *info );

Include Files
• mkl_scalapack.h

Description
The ?pttrsvfunction solves one of the triangular systems:

LTX = B, or LX = B for real flavors,

or
L*X = B, or LH*X = B,
U*X = B, or UH*X = B for complex flavors,
where L (or U for complex flavors) is the Cholesky factor of a Hermitian positive-definite tridiagonal matrix A
such that
A = L*D*LH (computed by spttrf/dpttrf)
or
A = UH*D*U or A = L*D*LH (computed by cpttrf/zpttrf).

Input Parameters

uplo Must be 'U' or 'L'.

Specifies whether the superdiagonal or the subdiagonal of the tridiagonal

matrix A is stored and the form of the factorization:
If uplo = 'U', e is the superdiagonal of U, and A = UH*D*U or A =
L*D*LH;
if uplo = 'L', e is the subdiagonal of L, and A = L*D*LH.

The two forms are equivalent, if A is real.

1812
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
trans Specifies the form of the system of equations:
for real flavors:
if trans = 'N': L*X = B (no transpose)

if trans = 'T': LT*X = B (transpose)

for complex flavors:

if trans = 'N': U*X = B or L*X = B (no transpose)

if trans = 'C': UHX = B or LHX = B (conjugate transpose).

n The order of the tridiagonal matrix A. n ≥ 0.

nrhs The number of right hand sides, that is, the number of columns of the
matrix B. nrhs ≥ 0.

d array of size n. The n diagonal elements of the diagonal matrix D from the
factorization computed by ?pttrf.

e array of size (n-1). The (n-1) off-diagonal elements of the unit bidiagonal
factor U or L from the factorization computed by ?pttrf. See uplo.

b array of size ldb* nrhs.

On entry, the right hand side matrix B.

ldb The leading dimension of the array b.

ldb ≥ max(1, n).

Output Parameters

b On exit, the solution matrix X.

info = 0: successful exit

< 0: if info = -i, the i-th argument had an illegal value.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

?steqr2
Computes all eigenvalues and, optionally,
eigenvectors of a symmetric tridiagonal matrix using
the implicit QL or QR method.

Syntax
void ssteqr2 (char *compz , MKL_INT *n , float *d , float *e , float *z , MKL_INT *ldz ,
MKL_INT *nr , float *work , MKL_INT *info );
void dsteqr2 (char *compz , MKL_INT *n , double *d , double *e , double *z , MKL_INT
*ldz , MKL_INT *nr , double *work , MKL_INT *info );

Include Files
• mkl_scalapack.h

1813
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Description
The ?steqr2function is a modified version of LAPACK function ?steqr. The ?steqr2function computes all
eigenvalues and, optionally, eigenvectors of a symmetric tridiagonal matrix using the implicit QL or QR
method. ?steqr2 is modified from ?steqr to allow each ScaLAPACK process running ?steqr2 to perform
updates on a distributed matrix Q. Proper usage of ?steqr2 can be gleaned from examination of ScaLAPACK
function p?syev.

Input Parameters

compz Must be 'N' or 'I'.

If compz = 'N', the function computes eigenvalues only. If compz = 'I',

the function computes the eigenvalues and eigenvectors of the tridiagonal
matrix T.
z must be initialized to the identity matrix by p?laset or ?laset prior to
entering this function.

n The order of the matrix T(n ≥ 0).

d, e, work Arrays:
d contains the diagonal elements of T. The size of d must be at least
max(1, n).
e contains the (n-1) subdiagonal elements of T. The size of e must be at
least max(1, n-1).

work is a workspace array. The size of work is max(1, 2*n-2). If compz =

'N', then work is not referenced.

z (local)
Array of global size n* n and of local size ldz* nr.

If compz = 'V', then z contains the orthogonal matrix used in the

reduction to tridiagonal form.

ldz The leading dimension of the array z. Constraints:

ldz ≥ 1,
ldz ≥ max(1, n), if eigenvectors are desired.

nr nr = max(1, numroc(n, nb, myprow, 0, nprocs)).

If compz = 'N', then nr is not referenced.

Output Parameters

d On exit, the eigenvalues in ascending order, if info = 0.

e On exit, e has been destroyed.

z On exit, if info = 0, then,

1814
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
if compz = 'V', z contains the orthonormal eigenvectors of the original
symmetric matrix, and if compz = 'I', z contains the orthonormal
eigenvectors of the symmetric tridiagonal matrix. If compz = 'N', then z is
not referenced.

info info = 0, the exit is successful.

info < 0: if info = -i, the i-th had an illegal value.
info > 0: the algorithm has failed to find all the eigenvalues in a total of
30n iterations;

if info = i, then i elements of e have not converged to zero; on exit, d

and e contain the elements of a symmetric tridiagonal matrix, which is
orthogonally similar to the original matrix.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

?trmvt
Performs matrix-vector operations.

Syntax
void strmvt (const char* uplo, const MKL_INT* n, const float* t, const MKL_INT* ldt,
float* x, const MKL_INT* incx, const float* y, const MKL_INT* incy, float* w, const
MKL_INT* incw, const float* z, const MKL_INT* incz);
void dtrmvt (const char* uplo, const MKL_INT* n, const double* t, const MKL_INT* ldt,
double* x, const MKL_INT* incx, const double* y, const MKL_INT* incy, double* w, const
MKL_INT* incw, const double* z, const MKL_INT* incz);
void ctrmvt (const char* uplo, const MKL_INT* n, const MKL_Complex8* t, const MKL_INT*
ldt, MKL_Complex8* x, const MKL_INT* incx, const MKL_Complex8* y, const MKL_INT* incy,
MKL_Complex8* w, const MKL_INT* incw, const MKL_Complex8* z, const MKL_INT* incz);
void ztrmvt (const char* uplo, const MKL_INT* n, const MKL_Complex16* t, const MKL_INT*
ldt, MKL_Complex16* x, const MKL_INT* incx, const MKL_Complex16* y, const MKL_INT*
incy, MKL_Complex16* w, const MKL_INT* incw, const MKL_Complex16* z, const MKL_INT*
incz);

Include Files
• mkl_scalapack.h

Description
?trmvt performs the matrix-vector operations as follows:
strmvt and dtrmvt: x := T' *y, and w := T *z

ctrmvt and ztrmvt: x := conjg( T' ) y, and w := T z,

where x is an n element vector and T is an n-by-n upper or lower triangular matrix.

Input Parameters

uplo On entry, uplo specifies whether the matrix is an upper or lower triangular
matrix as follows:

1815
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

uplo = 'U' or 'u'

A is an upper triangular matrix.
uplo = 'L' or 'l'
A is a lower triangular matrix.
Unchanged on exit.

n On entry, n specifies the order of the matrix A. n must be at least zero.

Unchanged on exit.

t Array of size ( ldt, n ).

Before entry with uplo = 'U' or 'u', the leading n-by-n upper triangular part
of the array t must contain the upper triangular matrix and the strictly
lower triangular part of t is not referenced.

Before entry with uplo = 'L' or 'l', the leading n-by-n lower triangular part
of the array t must contain the lower triangular matrix and the strictly
upper triangular part of t is not referenced.

ldt On entry, lda specifies the first dimension of A as declared in the calling
(sub) program. lda must be at least max( 1, n ).

Unchanged on exit.

incx On entry, incx specifies the increment for the elements of x. incx must
not be zero.
Unchanged on exit.

y Array of size at least ( 1 + ( n - 1 )*abs( incy ) ).

Before entry, the incremented array y must contain the n element vector y.

Unchanged on exit.

incy On entry, incy specifies the increment for the elements of y. incy must
not be zero.
Unchanged on exit.

incw On entry, incw specifies the increment for the elements of w. incw must
not be zero.
Unchanged on exit.

z Array of size at least ( 1 + ( n - 1 )*abs( incz ) ).

Before entry, the incremented array z must contain the n element vector z.

Unchanged on exit.

incz On entry, incz specifies the increment for the elements of z. incz must
not be zero.
Unchanged on exit.

1816
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters

t Before entry with uplo = 'U' or 'u', the leading n-by-n upper
triangular part of the array t must contain the upper triangular matrix
and the strictly lower triangular part of t is not referenced.

Before entry with uplo = 'L' or 'l', the leading n-by-n lower triangular
part of the array t must contain the lower triangular matrix and the
strictly upper triangular part of t is not referenced.

x Array of size at least ( 1 + ( n - 1 )*abs( incx ) ).

On exit, x = T' * y.

w Array of size at least ( 1 + ( n - 1 )*abs( incw ) ).

On exit, w = T * z.

pilaenv
Returns the positive integer value of the logical
blocking size.

Syntax
MKL_INT pilaenv (const MKL_INT *ictxt , const char *prec);

Include Files
• mkl_pblas.h

Description
pilaenv returns the positive integer value of the logical blocking size. This value is machine and precision
specific. This version provides a logical blocking size which should give good though not optimal performance
on many of the currently available distributed-memory concurrent computers. You are encouraged to modify
this subroutine to set this tuning parameter for your particular machine.

Input Parameters

ictxt On entry, ictxt specifies the BLACS context handle, indicating the global
context of the operation. The context itself is global, but the value of ictxt
is local.

prec On input, prec specifies the precision for which the logical block size should
be returned as follows:
prec = 'S' or 's' single precision real,
prec = 'D' or 'd' double precision real,
prec = 'C' or 'c' single precision complex,
prec = 'Z' or 'z' double precision complex,
prec = 'I' or 'i' integer.

Application Notes
Before modifying this routine to tune the library performance on your system, be aware of the following:

1. The value this function returns must be strictly larger than zero,

1817
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

2. If you are planning to link your program with different instances of the library (for example, on a
heterogeneous machine), you must compile each instance of the library with exactly the same version
of this routine for obvious interoperability reasons.

pilaenvx
Called from the ScaLAPACK routines to choose
problem-dependent parameters for the local
environment.

Syntax
MKL_INT pilaenvx (const MKL_INT* ictxt, const MKL_INT* ispec, const char* name, const
char* opts, const MKL_INT* n1, const MKL_INT* n2, const MKL_INT* n3, const MKL_INT*

Include Files
• mkl.h

Description
pilaenvx is called from the ScaLAPACK routines to choose problem-dependent parameters for the local
environment. See ispec for a description of the parameters. This version provides a set of parameters which
should give good, though not optimal, performance on many of the currently available computers. You are
encouraged to modify this subroutine to set the tuning parameters for your particular machine using the
option and problem size information in the arguments.

Input Parameters

ictxt (local input)On entry, ictxt specifies the BLACS context handle, indicating
the global context of the operation. The context itself is global, but the
value of ictxt is local.

ispec (global input)

Specifies the parameter to be returned as the value of pilaenvx.

= 1: the optimal blocksize; if this value is 1, an unblocked algorithm will

give the best performance (unlikely).
= 2: the minimum block size for which the block routine should be used; if
the usable block size is less than this value, an unblocked routine should be
used.
= 3: the crossover point (in a block routine, for N less than this value, an
unblocked routine should be used).
= 4: the number of shifts, used in the nonsymmetric eigenvalue routines
(DEPRECATED).
= 5: the minimum column dimension for blocking to be used; rectangular
blocks must have dimension at least k by m, where k is given by
pilaenvx(2,...) and m by pilaenvx(5,...).
= 6: the crossover point for the SVD (when reducing an m by n matrix to
bidiagonal form, if max(m,n)/min(m,n) exceeds this value, a QR
factorization is used first to reduce the matrix to a triangular form).
= 7: the number of processors.
= 8: the crossover point for the multishift QR method for nonsymmetric
eigenvalue problems (DEPRECATED).

1818
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
= 9: maximum size of the subproblems at the bottom of the computation
tree in the divide-and-conquer algorithm (used by ?gelsd and ?gesdd).

=10: IEEE NaN arithmetic can be trusted not to trap.

=11: infinity arithmetic can be trusted not to trap.
12 <= ispec <= 16:

p?hseqr or one of its subroutines, see piparmq for detailed explanation.

17 <= ispec <= 22:

Parameters for pb?trord/p?hseqr (not all), as follows:

=17: maximum number of concurrent computational windows;

=18: number of eigenvalues/bulges in each window;
=19: computational window size;
=20: minimal percentage of FLOPS required for performing matrix-matrix
multiplications instead of pipelined orthogonal transformations;
=21: width of block column slabs for row-wise application of pipelined
orthogonal transformations in their factorized form;
=22: the maximum number of eigenvalues moved together over a process
border;
=23: the number of processors involved in Aggressive Early Deflation
(AED);
=99: Maximum iteration chunksize in OpenMP parallelization.

name (global input)

The name of the calling subroutine, in either upper case or lower case.

opts (global input) The character options to the subroutine name, concatenated
into a single character string. For example, uplo = 'U', trans = 'T',
and diag = 'N' for a triangular routine would be specified as opts =
'UTN'.

n1, n2, n3, and n4 (global input) Problem dimensions for the subroutine name; these may not
all be required.

Output Parameters

result (global output)

>= 0: the value of the parameter specified by ispec.

< 0: if pilaenvx = -k, the k-th argument had an illegal value.

Application Notes
The following conventions have been used when calling ilaenv from the LAPACK routines:

1819
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

2. The problem dimensions n1, n2, n3, and n4 are specified in the order that they appear in the argument
list for name. n1 is used first, n2 second, and so on, and unused problem dimensions are passed a value
of -1.
3. The parameter value returned by ilaenv is checked for validity in the calling subroutine. For example,
ilaenv is used to retrieve the optimal block size for strtri as follows:

NB = ilaenv( 1, 'STRTRI', UPLO // DIAG, N, -1, -1, -1 );

if( NB<=1 ) {
NB = MAX( 1, N );
}
The same conventions hold for this ScaLAPACK-style variant.

pjlaenv
Called from the ScaLAPACK symmetric and Hermitian
tailored eigen-routines to choose problem-dependent
parameters for the local environment.

Syntax
MKL_INT pjlaenv (const MKL_INT* ictxt, const MKL_INT* ispec, const char* name, const
char* opts, const MKL_INT* n1, const MKL_INT* n2, const MKL_INT* n3, const MKL_INT*
n4);

Include Files
• mkl.h

Description
pjlaenv is called from the ScaLAPACK symmetric and Hermitian tailored eigen-routines to choose problem-
dependent parameters for the local environment. See ispec for a description of the parameters. This version
provides a set of parameters which should give good, though not optimal, performance on many of the
currently available computers. You are encouraged to modify this subroutine to set the tuning parameters for
your particular machine using the option and problem size information in the arguments.

Input Parameters

ispec (global input) Specifies the parameter to be returned as the value of

pjlaenv.
= 1: the data layout blocksize;
= 2: the panel blocking factor;
= 3: the algorithmic blocking factor;
= 4: execution path control;
= 5: maximum size for direct call to the LAPACK routine.

name (global input) The name of the calling subroutine, in either upper case or
lower case.

1820
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
n1, n2, n3, and n4 (global input) Problem dimensions for the subroutine name; these may not
all be required. At present, only n1 is used, and it (n1) is used only for
'TTRD'.

Output Parameters

result (global or local output)

>= 0: the value of the parameter specified by ispec.

< 0: if pjlaenv = -k, the k-th argument had an illegal value. Most
parameters set via a call to pjlaenv must be identical on all
processors and hence pjlaenv will return the same value to all
procesors (i.e. global output). However some, in particular, the panel
blocking factor can be different on each processor and hence pjlaenv
can return different values on different processors (i.e. local output).

Application Notes
The following conventions have been used when calling pjlaenv from the ScaLAPACK routines:

1. opts is a concatenation of all of the character options to subroutine name, in the same order that they
appear in the argument list for name, even if they are not used in determining the value of the
parameter specified by ispec.
2. The problem dimensions n1, n2, n3, and n4 are specified in the order that they appear in the argument
list for name. n1 is used first, n2 second, and so on, and unused problem dimensions are passed a
value of -1.
a. The parameter value returned by pjlaenv is checked for validity in the calling subroutine. For
example, pjlaenv is used to retrieve the optimal blocksize for STRTRI as follows:

NB = pjlaenv( 1, 'STRTRI', UPLO // DIAG, N, -1, -1, -1 );

IF( NB>=1 ) {
NB = MAX( 1, N );
}
pjlaenv is patterned after ilaenv and keeps the same interface in anticipation of future needs, even though
pjlaenv is only sparsely used at present in ScaLAPACK. Most ScaLAPACK codes use the input data layout
blocking factor as the algorithmic blocking factor - hence there is no need or opportunity to set the
algorithmic or data decomposition blocking factor. pXYYtevx.f and pXYYtgvx.f and pXYYttrd.f are the
only codes which call pjlaenv. pXYYtevx.f and pXYYtgvx.f redistribute the data to the best data layout
for each transformation. pXYYttrd.f uses a data layout blocking factor of 1.

Additional ScaLAPACK Routines

void pchettrd (const char *uplo , const MKL_INT *n , MKL_Complex8 *a , const MKL_INT
*ia , const MKL_INT *ja , const MKL_INT *desca , float *d , float *e , MKL_Complex8
*tau , MKL_Complex8 *work , const MKL_INT *lwork , MKL_INT *info );
void pzhettrd (const char *uplo , const MKL_INT *n , MKL_Complex16 *a , const MKL_INT
*ia , const MKL_INT *ja , const MKL_INT *desca , double *d , double *e , MKL_Complex16
*tau , MKL_Complex16 *work , const MKL_INT *lwork , MKL_INT *info );
void pslaed0 (const MKL_INT *n , float *d , float *e , float *q , const MKL_INT *iq ,
const MKL_INT *jq , const MKL_INT *descq , float *work , MKL_INT *iwork , MKL_INT
*info );

1821
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

void pdlaed0 (const MKL_INT n , double d , double e , double q , const MKL_INT

*iq , const MKL_INT *jq , const MKL_INT *descq , double *work , MKL_INT *iwork ,
MKL_INT *info );
void pslaed1 (const MKL_INT *n , const MKL_INT *n1 , float *d , const MKL_INT *id ,
float *q , const MKL_INT *iq , const MKL_INT *jq , const MKL_INT *descq , const float
*rho , float *work , MKL_INT *iwork , MKL_INT *info );
void pdlaed1 (const MKL_INT *n , const MKL_INT *n1 , double *d , const MKL_INT *id ,
double *q , const MKL_INT *iq , const MKL_INT *jq , const MKL_INT *descq , const double
*rho , double *work , MKL_INT *iwork , MKL_INT *info );
void pslaed2 (const MKL_INT *ictxt , MKL_INT *k , const MKL_INT *n , const MKL_INT
*n1 , const MKL_INT *nb , float *d , const MKL_INT *drow , const MKL_INT *dcol , float
*q , const MKL_INT *ldq , float *rho , const float *z , float *w , float *dlamda , float
*q2 , const MKL_INT *ldq2 , float *qbuf , MKL_INT *ctot , MKL_INT *psm , const MKL_INT
*npcol , MKL_INT *indx , MKL_INT *indxc , MKL_INT *indxp , MKL_INT *indcol , MKL_INT
*coltyp , MKL_INT *nn , MKL_INT *nn1 , MKL_INT *nn2 , MKL_INT *ib1 , MKL_INT *ib2 );
void pdlaed2 (const MKL_INT *ictxt , MKL_INT *k , const MKL_INT *n , const MKL_INT
*n1 , const MKL_INT *nb , double *d , const MKL_INT *drow , const MKL_INT *dcol ,
double *q , const MKL_INT *ldq , double *rho , const double *z , double *w , double
*dlamda , double *q2 , const MKL_INT *ldq2 , double *qbuf , MKL_INT *ctot , MKL_INT
*psm , const MKL_INT *npcol , MKL_INT *indx , MKL_INT *indxc , MKL_INT *indxp , MKL_INT
*indcol , MKL_INT *coltyp , MKL_INT *nn , MKL_INT *nn1 , MKL_INT *nn2 , MKL_INT *ib1 ,
MKL_INT *ib2 );
void pslaed3 (const MKL_INT *ictxt , MKL_INT *k , const MKL_INT *n , const MKL_INT
*nb , float *d , const MKL_INT *drow , const MKL_INT *dcol , float *rho , float
*dlamda , float *w , const float *z , float *u , const MKL_INT *ldu , float *buf ,
MKL_INT *indx , MKL_INT *indcol , MKL_INT *indrow , MKL_INT *indxr , MKL_INT *indxc ,
MKL_INT *ctot , const MKL_INT *npcol , MKL_INT *info );
void pdlaed3 (const MKL_INT *ictxt , MKL_INT *k , const MKL_INT *n , const MKL_INT
*nb , double *d , const MKL_INT *drow , const MKL_INT *dcol , double *rho , double
*dlamda , double *w , const double *z , double *u , const MKL_INT *ldu , double *buf ,
MKL_INT *indx , MKL_INT *indcol , MKL_INT *indrow , MKL_INT *indxr , MKL_INT *indxc ,
MKL_INT *ctot , const MKL_INT *npcol , MKL_INT *info );
void pslaedz (const MKL_INT *n , const MKL_INT *n1 , const MKL_INT *id , const float
*q , const MKL_INT *iq , const MKL_INT *jq , const MKL_INT *ldq , const MKL_INT
*descq , float *z , float *work );
void pdlaedz (const MKL_INT *n , const MKL_INT *n1 , const MKL_INT *id , const double
*q , const MKL_INT *iq , const MKL_INT *jq , const MKL_INT *ldq , const MKL_INT
*descq , double *z , double *work );
void pdlaiectb (const double *sigma , const MKL_INT *n , const double *d , MKL_INT
*count );
void pdlaiectl (const double *sigma , const MKL_INT *n , const double *d , MKL_INT
*count );
void slamov (const char *UPLO , const MKL_INT *M , const MKL_INT *N , const float *A ,
const MKL_INT *LDA , float *B , const MKL_INT *LDB );
void dlamov (const char *UPLO , const MKL_INT *M , const MKL_INT *N , const double *A ,
const MKL_INT *LDA , double *B , const MKL_INT *LDB );
void clamov (const char *UPLO , const MKL_INT *M , const MKL_INT *N , const
MKL_Complex8 *A , const MKL_INT *LDA , MKL_Complex8 *B , const MKL_INT *LDB );
void zlamov (const char *UPLO , const MKL_INT *M , const MKL_INT *N , const
MKL_Complex16 *A , const MKL_INT *LDA , MKL_Complex16 *B , const MKL_INT *LDB );

1822
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void pslamr1d (const MKL_INT *n , float *a , const MKL_INT *ia , const MKL_INT *ja ,
const MKL_INT *desca , float *b , const MKL_INT *ib , const MKL_INT *jb , const MKL_INT
*descb );
void pdlamr1d (const MKL_INT *n , double *a , const MKL_INT *ia , const MKL_INT *ja ,
const MKL_INT *desca , double *b , const MKL_INT *ib , const MKL_INT *jb , const
MKL_INT *descb );
void pclamr1d (const MKL_INT *n , MKL_Complex8 *a , const MKL_INT *ia , const MKL_INT
*ja , const MKL_INT *desca , MKL_Complex8 *b , const MKL_INT *ib , const MKL_INT *jb ,
const MKL_INT *descb );
void pzlamr1d (const MKL_INT *n , MKL_Complex16 *a , const MKL_INT *ia , const MKL_INT
*ja , const MKL_INT *desca , MKL_Complex16 *b , const MKL_INT *ib , const MKL_INT *jb ,
const MKL_INT *descb );
void clanv2 (MKL_Complex8 *a , MKL_Complex8 *b , MKL_Complex8 *c , MKL_Complex8 *d ,
MKL_Complex8 *rt1 , MKL_Complex8 *rt2 , float *cs , MKL_Complex8 *sn );
void zlanv2 (MKL_Complex16 *a , MKL_Complex16 *b , MKL_Complex16 *c , MKL_Complex16
*d , MKL_Complex16 *rt1 , MKL_Complex16 *rt2 , double *cs , MKL_Complex16 *sn );
void pclattrs (const char *uplo , const char *trans , const char *diag , const char
*normin , const MKL_INT *n , const MKL_Complex8 *a , const MKL_INT *ia , const MKL_INT
*ja , const MKL_INT *desca , MKL_Complex8 *x , const MKL_INT *ix , const MKL_INT *jx ,
const MKL_INT *descx , float *scale , float *cnorm , MKL_INT *info );
void pzlattrs (const char *uplo , const char *trans , const char *diag , const char
*normin , const MKL_INT *n , const MKL_Complex16 *a , const MKL_INT *ia , const MKL_INT
*ja , const MKL_INT *desca , MKL_Complex16 *x , const MKL_INT *ix , const MKL_INT *jx ,
const MKL_INT *descx , double *scale , double *cnorm , MKL_INT *info );
void pssyttrd (const char *uplo , const MKL_INT *n , float *a , const MKL_INT *ia ,
const MKL_INT *ja , const MKL_INT *desca , float *d , float *e , float *tau , float
*work , const MKL_INT *lwork , MKL_INT *info );
void pdsyttrd (const char *uplo , const MKL_INT *n , double *a , const MKL_INT *ia ,
const MKL_INT *ja , const MKL_INT *desca , double *d , double *e , double *tau , double
*work , const MKL_INT *lwork , MKL_INT *info );
MKL_INT piparmq (const MKL_INT *ictxt , const MKL_INT *ispec , const char *name , const
char *opts , const MKL_INT *n , const MKL_INT *ilo , const MKL_INT *ihi , const MKL_INT
*lworknb );
For descriptions of these functions, please see http://www.netlib.org/scalapack/explore-html/files.html.

ScaLAPACK Utility Functions and Routines

This section describes ScaLAPACK utility functions and routines. Summary information about these routines is
given in the following table:

ScaLAPACK Utility Functions and Routines

Routine Name Data Types Description

p?labad s,d Returns the square root of the underflow and overflow thresholds if the
exponent-range is very large.

p?lachkieee s,d Performs a simple check for the features of the IEEE standard.

p?lamch s,d Determines machine parameters for floating-point arithmetic.

p?lasnbt s,d Computes the position of the sign bit of a floating-point number.

descinit N/A Initializes the array descriptor for distributed matrix.

1823
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Routine Name Data Types Description

numroc N/A Computes the number of rows or columns of a distributed matrix

owned by the process.

See Also
pxerbla Error handling routine called by ScaLAPACK routines.

p?labad
Returns the square root of the underflow and overflow
thresholds if the exponent-range is very large.

Syntax
void pslabad (MKL_INT *ictxt , float *small , float *large );
void pdlabad (MKL_INT *ictxt , double *small , double *large );

Include Files
• mkl_scalapack.h

Description
The p?labadfunction takes as input the values computed by p?lamch for underflow and overflow, and
returns the square root of each of these values if the log of large is sufficiently large. This function is
intended to identify machines with a large exponent range, such as the Crays, and redefine the underflow
and overflow limits to be the square roots of the values computed by p?lamch. This function is needed
because p?lamch does not compensate for poor arithmetic in the upper half of the exponent range, as is
found on a Cray.
In addition, this function performs a global minimization and maximization on these values, to support
heterogeneous computing networks.

Input Parameters

ictxt (global)
The BLACS context handle in which the computation takes place.

small (local).
On entry, the underflow threshold as computed by p?lamch.

large (local).
On entry, the overflow threshold as computed by p?lamch.

Output Parameters

small (local).
On exit, if log10(large) is sufficiently large, the square root of small,
otherwise unchanged.

large (local).
On exit, if log10(large) is sufficiently large, the square root of large,
otherwise unchanged.

1824
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?lachkieee
Performs a simple check for the features of the IEEE
standard.

Syntax
void pslachkieee (MKL_INT *isieee , float *rmax , float *rmin );
void pdlachkieee (MKL_INT *isieee , float *rmax , float *rmin );

Include Files
• mkl_scalapack.h

Description
The p?lachkieeefunction performs a simple check to make sure that the features of the IEEE standard are
implemented. In some implementations, p?lachkieee may not return.

This is a ScaLAPACK internal function and arguments are not checked for unreasonable values.

Input Parameters

rmax (local).
The overflow threshold(= ?lamch ('O')).

rmin (local).
The underflow threshold(= ?lamch ('U')).

Output Parameters

isieee (local).
On exit, isieee = 1 implies that all the features of the IEEE standard that
we rely on are implemented. On exit, isieee = 0 implies that some the
features of the IEEE standard that we rely on are missing.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?lamch
Determines machine parameters for floating-point
arithmetic.

Syntax
float pslamch (MKL_INT *ictxt , char *cmach );
double pdlamch (MKL_INT *ictxt , char *cmach );

Include Files
• mkl_scalapack.h

Description
The p?lamchfunction determines single precision machine parameters.

1825
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input Parameters

ictxt (global). The BLACS context handle in which the computation takes place.

cmach (global)
Specifies the value to be returned by p?lamch:

= 'E' or 'e', p?lamch := eps

= 'S' or 's' , p?lamch := sfmin

= 'B' or 'b', p?lamch := base

= 'P' or 'p', p?lamch := eps*base

= 'N' or 'n', p?lamch := t

= 'R' or 'r', p?lamch := rnd

= 'M' or 'm', p?lamch := emin

= 'U' or 'u', p?lamch := rmin

= 'L' or 'l', p?lamch := emax

= 'O' or 'o', p?lamch := rmax,

where
eps = relative machine precision
sfmin = safe minimum, such that 1/sfmin does not overflow
base = base of the machine
prec = eps*base
t = number of (base) digits in the mantissa
rnd = 1.0 when rounding occurs in addition, 0.0 otherwise
emin = minimum exponent before (gradual) underflow
rmin = underflow threshold - base(emin-1)
emax = largest exponent before overflow
rmax = overflow threshold - (baseemax)*(1-eps)

Output Parameters

val Value returned by the function.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?lasnbt
Computes the position of the sign bit of a floating-
point number.

Syntax
void pslasnbt (MKL_INT *ieflag );
void pdlasnbt (MKL_INT *ieflag );

1826
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl_scalapack.h

Description
The p?lasnbtfunction finds the position of the signbit of a single/double precision floating point number. This
function assumes IEEE arithmetic, and hence, tests only the 32-nd bit (for single precision) or 32-nd and 64-
th bits (for double precision) as a possibility for the signbit. sizeof(int) is assumed equal to 4 bytes.

If a compile time flag (NO_IEEE) indicates that the machine does not have IEEE arithmetic, ieflag = 0 is
returned.

Output Parameters

ieflag This flag indicates the position of the signbit of any single/double precision
floating point number.
ieflag = 0, if the compile time flag NO_IEEE indicates that the machine
does not have IEEE arithmetic, or if sizeof(int) is different from 4 bytes.

ieflag = 1 indicates that the signbit is the 32-nd bit for a single precision
function.
In the case of a double precision function:
ieflag = 1 indicates that the signbit is the 32-nd bit (Big Endian).
ieflag = 2 indicates that the signbit is the 64-th bit (Little Endian).

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

descinit
Initializes the array descriptor for distributed matrix.

Syntax
void descinit (MKL_INT *desc, const MKL_INT *m, const MKL_INT *n, const MKL_INT *mb,
const MKL_INT *nb, const MKL_INT *irsrc, const MKL_INT *icsrc, const MKL_INT *ictxt,
const MKL_INT *lld, MKL_INT *info);

Description
The descintfunction initializes the array descriptor for distributed matrix.

Input Parameters

desc (global) array of dimension DLEN_. The array descriptor of a distributed

matrix to be set.

m (global input) The number of rows in the distributed matrix. M >=0.

n (global input) The number of columns in the distributed matrix. N >=0.

mb (global input) The blocking factor used to distribute the rows of the matrix.
MB >= 1.

nb (global input) The blocking factor used to distribute the columns of the
matrix. NB >= 1.

1827
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

lrsrc (global input) The process row over which the first row of the matrix is
distributed. 0 <= IRSRC < NPROW.

lcsrc (global input) The process column over which the first column of the matrix
is distributed. 0 <= ICSRC < NPCOL.

ictxt (global input) The BLACS context handle, indicating the global context of
the operation on the matrix. The context itself is global.

lld (local input) The leading dimension of the local array storing the local
blocks of the distributed matrix. LLD >= MAX(1,LOCr(M)). LOCr() denotes
the number of rows of a global dense matrix that the process in a grid
receives after data distributing.

Output Parameters

info (output)
= 0: successful exit
< 0: if INFO = -i, the i-th argument had an illegal value

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

numroc
Computes the number of rows or columns of a
distributed matrix owned by the process.

Syntax
MKL_INT numroc (const MKL_INT *n, const MKL_INT *nb, const MKL_INT *iproc, const
MKL_INT *srcproc, const MKL_INT *nprocs);

Description
The numrocfunction computes the number of rows or columns of a distributed matrix owned by the process.

Input Parameters

n (global input) The number of rows/columns in distributed matrix.

nb (global input) Block size, size of the blocks the distributed matrix is split
into.

iproc (local input) The coordinate of the process whose local array row or column
is to be determined.

srcproc (global input) The coordinate of the process that possesses the first row or
column of the distributed matrix.

nprocs (global input) The total number processes over which the matrix is
distributed.

Output Parameters

info (output) Value returned by the function.

1828
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

ScaLAPACK Redistribution/Copy Routines

This section describes ScaLAPACK redistribution/copy routines. Summary information about these routines is
given in the following table:

ScaLAPACK Redistribution/Copy Routines

Routine Name Data Types Description

p?gemr2d s,d,c,z,i Copies a submatrix from one general rectangular matrix to another.

p?trmr2d s,d,c,z,i Copies a submatrix from one trapezoidal matrix to another.

See Also
pxerbla Error handling routine called by ScaLAPACK routines.

p?gemr2d
Copies a submatrix from one general rectangular
matrix to another.

Syntax
void psgemr2d (MKL_INT *m, MKL_INT *n, float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT *ictxt );
void pdgemr2d (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT *ictxt );
void pcgemr2d (MKL_INT *m , MKL_INT *n MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT
*ictxt );
void pzgemr2d (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb ,
MKL_INT *ictxt );
void pigemr2d (MKL_INT *m , MKL_INT *n , MKL_INT *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT
*ictxt );

Include Files
• mkl_scalapack.h

Description
The p?gemr2dfunction copies the indicated matrix or submatrix of A to the indicated matrix or submatrix of
B. It provides a truly general copy from any block cyclicly-distributed matrix or submatrix to any other block
cyclicly-distributed matrix or submatrix. With p?trmr2d, these functions are the only ones in the ScaLAPACK
library which provide inter-context operations: they can take a matrix or submatrix A in context A
(distributed over process grid A) and copy it to a matrix or submatrix B in context B (distributed over process
grid B).
There does not need to be a relationship between the two operand matrices or submatrices other than their
global size and the fact that they are both legal block cyclicly-distributed matrices or submatrices. This
means that they can, for example, be distributed across different process grids, have varying block sizes and
differing matrix starting points, or be contained in different sized distributed matrices.

1829
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Take care when context A is disjoint from context B. The general rules for which parameters need to be set
are:

• All calling processes must have the correct m and n.

• Processes in context A must correctly define all parameters describing A.
• Processes in context B must correctly define all parameters describing B.
• Processes which are not members of context A must pass ctxt_a = -1 and need not set other parameters
describing A.
• Processes which are not members of contextB must pass ctxt_b = -1 and need not set other parameters
describing B.
Because of its generality, p?gemr2d can be used for many operations not usually associated with copy
functions. For instance, it can be used to a take a matrix on one process and distribute it across a process
grid, or the reverse. If a supercomputer is grouped into a virtual parallel machine with a workstation, for
instance, this function can be used to move the matrix from the workstation to the supercomputer and back.
In ScaLAPACK, it is called to copy matrices from a two-dimensional process grid to a one-dimensional
process grid. It can be used to redistribute matrices so that distributions providing maximal performance can
be used by various component libraries, as well.
Note that this function requires an array descriptor with dtype_ = 1.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

m (global) The number of rows of matrix A to be copied (m≥0).

n (global) The number of columns of matrix A to be copied (n≥0).

a (local)
Pointer into the local memory to array of size lld_a* LOCc(ja+n-1)
containing the source matrix A.

ia, ja (global) The row and column indices in the array A indicating the first row
and the first column, respectively, of the submatrix of A) to copy. 1
≤ia≤total_rows_in_a - m +1, 1 ≤ja≤total_columns_in_a - n +1.

desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
Only dtype_a = 1 is supported, so dlen_ = 9.
If the calling process is not part of the context of A, ctxt_a must be equal to
-1.

ib, jb (global) The row and column indices in the array B indicating the first row
and the first column, respectively, of the submatrix B to which to copy the
matrix. 1 ≤ib≤total_rows_in_b - m +1, 1 ≤jb≤total_columns_in_b - n +1.

descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
Only dtype_b = 1 is supported, so dlen_ = 9.

1830
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If the calling process is not part of the context of B, ctxt_b must be equal to
-1.

ictxt (global).
The context encompassing at least the union of all processes in context A
and context B. All processes in the context ictxt must call this function,
even if they do not own a piece of either matrix.

Output Parameters

b Pointer into the local memory to array of size lld_b*LOCc(jb+n-1).

Overwritten by the submatrix from A.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

p?trmr2d
Copies a submatrix from one trapezoidal matrix to
another.

Syntax
void pstrmr2d (char *uplo , char *diag , MKL_INT *m , MKL_INT *n , float *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , MKL_INT *ictxt );
void pdtrmr2d (char *uplo , char *diag , MKL_INT *m , MKL_INT *n , MKL_INT *nrhs ,
double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib ,
MKL_INT *jb , MKL_INT *descb , MKL_INT *ictxt );
void pctrmr2d (char *uplo , char *diag , MKL_INT *m , MKL_INT *n , MKL_INT *nrhs ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b ,
MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT *ictxt );
void pztrmr2d (char *uplo , char *diag , MKL_INT *m , MKL_INT *n , MKL_INT *nrhs ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b ,
MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT *ictxt );
void pitrmr2d (char *uplo , char *diag , MKL_INT *m , MKL_INT *n , MKL_INT *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , MKL_INT *ictxt );

Include Files
• mkl_scalapack.h

Description
The p?trmr2dfunction copies the indicated matrix or submatrix of A to the indicated matrix or submatrix of
B. It provides a truly general copy from any block cyclicly-distributed matrix or submatrix to any other block
cyclicly-distributed matrix or submatrix. With p?gemr2d, these functions are the only ones in the ScaLAPACK
library which provide inter-context operations: they can take a matrix or submatrix A in context A
(distributed over process grid A) and copy it to a matrix or submatrix B in context B (distributed over process
grid B).
The p?trmr2dfunction assumes the matrix or submatrix to be trapezoidal. Only the upper or lower part is
copied, and the other part is unchanged.

1831
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

There does not need to be a relationship between the two operand matrices or submatrices other than their
global size and the fact that they are both legal block cyclicly-distributed matrices or submatrices. This
means that they can, for example, be distributed across different process grids, have varying block sizes and
differing matrix starting points, or be contained in different sized distributed matrices.
Take care when context A is disjoint from context B. The general rules for which parameters need to be set
are:

• All calling processes must have the correct m and n.

Because of its generality, p?trmr2d can be used for many operations not usually associated with copy
functions. For instance, it can be used to a take a matrix on one process and distribute it across a process
grid, or the reverse. If a supercomputer is grouped into a virtual parallel machine with a workstation, for
instance, this function can be used to move the matrix from the workstation to the supercomputer and back.
In ScaLAPACK, it is called to copy matrices from a two-dimensional process grid to a one-dimensional
process grid. It can be used to redistribute matrices so that distributions providing maximal performance can
be used by various component libraries, as well.
Note that this function requires an array descriptor with dtype_ = 1.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

uplo (global) Specifies whether to copy the upper or lower part of the matrix or
submatrix.

uplo = 'U' Copy the upper triangular part.

uplo = 'L' Copy the lower triangular part.

diag (global) Specifies whether to copy the diagonal of the matrix or submatrix.

diag = 'U' Do not copy the diagonal.

diag = 'N' Copy the diagonal.

m (global) The number of rows of matrix A to be copied (m≥0).

n (global) The number of columns of matrix A to be copied (n≥0).

a (local)
Pointer into the local memory to array of size lld_a* LOCc(ja+n-1)
containing the source matrix A.

1832
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
Only dtype_a = 1 is supported, so dlen_ = 9.
If the calling process is not part of the context of A, ctxt_a must be equal to
-1.

descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
Only dtype_b = 1 is supported, so dlen_ = 9.
If the calling process is not part of the context of B, ctxt_b must be equal to
-1.

Output Parameters

b Pointer into the local memory to array of size lld_b*LOCc(jb+n-1).

Overwritten by the submatrix from A.

See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.

Sparse Solver Routines

Intel® oneAPI Math Kernel Library (oneMKL) sparse solver algorithms for solving real or complex, symmetric,
structurally symmetric or nonsymmetric, positive definite, indefinite or Hermitian square sparse linear system
of algebraic equations.
The terms and concepts required to understand the use of the Intel® oneAPI Math Kernel Library (oneMKL)
sparse solver routines are discussed in the Appendix "Linear Solvers Basics". If you are familiar with linear
sparse solvers and sparse matrix storage schemes, you can skip these sections and go directly to the
interface descriptions.
See the description of
• the direct sparse solver based on PARDISO*, which is referred to here as Intel MKL PARDISO;
• the alternative interface for the direct sparse solver, which is referred to here as the DSS interface;
• iterative sparse solvers (ISS) based on the reverse communication interface (RCI);
• preconditioners based on the incomplete LU factorization technique.
• a direct sparse solver based on QR decomposition.

oneMKL PARDISO - Parallel Direct Sparse Solver Interface

This section describes the interface to the shared-memory multiprocessing parallel direct sparse solver
known as the Intel® oneAPI Math Kernel Library (oneMKL) PARDISO solver.
The Intel® oneAPI Math Kernel Library (oneMKL) PARDISO package is a high-performance, robust, memory
efficient, and easy to use software package for solving large sparse linear systems of equations on shared
memory multiprocessors. The solver uses a combination of left- and right-looking Level-3 BLAS supernode

1833
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

techniques [Schenk00-2]. To improve sequential and parallel sparse numerical factorization performance, the
algorithms are based on a Level-3 BLAS update and pipelining parallelism is used with a combination of left-
and right-looking supernode techniques [Schenk00, Schenk01, Schenk02, Schenk03]. The parallel pivoting
methods allow complete supernode pivoting to compromise numerical stability and scalability during the
factorization process. For sufficiently large problem sizes, numerical experiments demonstrate that the
scalability of the parallel algorithm is nearly independent of the shared-memory multiprocessing architecture.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

The following table lists the names of the Intel® oneAPI Math Kernel Library (oneMKL) PARDISO routines and
describes their general use.
oneMKL PARDISO Routines
Routine Description
pardisoinit
Initializes Intel® oneAPI Math Kernel Library (oneMKL)
PARDISO with default parameters depending on the
matrix type.
pardiso
Calculates the solution of a set of sparse linear equations
with single or multiple right-hand sides.
pardiso_64
Calculates the solution of a set of sparse linear equations
with single or multiple right-hand sides, 64-bit integer
version.
mkl_pardiso_pivot
Replaces routine which handles Intel® oneAPI Math Kernel
Library (oneMKL) PARDISO pivots with user-defined
routine.
pardiso_getdiag
Returns diagonal elements of initial and factorized matrix.
pardiso_export
Places pointers dedicated for sparse representation of
requested matrix into MKL PARDISO.
pardiso_handle_store
Store internal structures from pardiso to a file.
pardiso_handle_restore
Restore pardiso internal structures from a file.
pardiso_handle_delete
Delete files with pardiso internal structure data.
pardiso_handle_store_64
Store internal structures from pardiso_64 to a file.
pardiso_handle_restore_64
Restore pardiso_64 internal structures from a file.
pardiso_handle_delete_64
Delete files with pardiso_64 internal structure data.

The Intel® oneAPI Math Kernel Library (oneMKL) PARDISO solver supports a wide range of real and complex
sparse matrix types (seethe figure below).

1834
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
__border__top
Sparse Matrices That Can Be Solved with the oneMKL PARDISO Solver

The Intel® oneAPI Math Kernel Library (oneMKL) PARDISO solver performs four tasks:
• analysis and symbolic factorization
• numerical factorization
• forward and backward substitution including iterative refinement
• termination to release all internal solver memory.
To find code examples that use Intel® oneAPI Math Kernel Library (oneMKL) PARDISO routines to solve
systems of linear equations, unzip theC archive file in the examplesfolder of the Intel® oneAPI Math Kernel
Library (oneMKL) installation directory. Code examples will be in theexamples/solverc/source folder.

Supported Matrix Types

The analysis steps performed by Intel® oneAPI Math Kernel Library (oneMKL) PARDISO depend on the
structure of the input matrixA.

Symmetric Matrices The solver first computes a symmetric fill-in reducing permutation P based on
either the minimum degree algorithm [Liu85] or the nested dissection algorithm
from the METIS package [Karypis98] (both included with Intel® oneAPI Math
Kernel Library (oneMKL)), followed by the parallel left-right looking numerical
Cholesky factorization [Schenk00-2] of PAPT = LLT for symmetric positive-
definite matrices, or PAPT = LDLT for symmetric indefinite matrices. The solver
uses diagonal pivoting, or 1x1 and 2x2 Bunch-Kaufman pivoting for symmetric
indefinite matrices. An approximation of X is found by forward and backward
substitution and optional iterative refinement.
Whenever numerically acceptable 1x1 and 2x2 pivots cannot be found within the
diagonal supernode block, the coefficient matrix is perturbed. One or two passes
of iterative refinement may be required to correct the effect of the perturbations.
This restricting notion of pivoting with iterative refinement is effective for highly
indefinite symmetric systems. Furthermore, for a large set of matrices from
different applications areas, this method is as accurate as a direct factorization
method that uses complete sparse pivoting techniques [Schenk04].
Another method of improving the pivoting accuracy is to use symmetric weighted
matching algorithms. These algorithms identify large entries in the coefficient
matrix A that, if permuted close to the diagonal, permit the factorization process

1835
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

to identify more acceptable pivots and proceed with fewer pivot perturbations.
These algorithms are based on maximum weighted matchings and improve the
quality of the factor in a complementary way to the alternative of using more
complete pivoting techniques.
The inertia is also computed for real symmetric indefinite matrices.

Structurally Symmetric The solver first computes a symmetric fill-in reducing permutation P followed by
Matrices the parallel numerical factorization of PAPT = QLUT. The solver uses partial
pivoting in the supernodes and an approximation of X is found by forward and
backward substitution and optional iterative refinement.

Nonsymmetric Matrices The solver first computes a nonsymmetric permutation PMPS and scaling matrices
Dr and Dc with the aim of placing large entries on the diagonal to enhance
reliability of the numerical factorization process [Duff99]. In the next step the
solver computes a fill-in reducing permutation P based on the matrix PMPSA +
(PMPSA)T followed by the parallel numerical factorization
QLUR = PPMPSDrADcP
with supernode pivoting matrices Q and R. When the factorization algorithm
reaches a point where it cannot factor the supernodes with this pivoting strategy,
it uses a pivoting perturbation strategy similar to [Li99]. The magnitude of the
potential pivot is tested against a constant threshold of
alpha = eps*||A2||inf,
where eps is the machine precision, A2 = P*PMPS*Dr*A*Dc*P, and ||A2||inf is
the infinity norm of A. Any tiny pivots encountered during elimination are set to
the sign (lII)*eps*||A2||inf, which trades off some numerical stability for the
ability to keep pivots from getting too small. Although many failures could render
the factorization well-defined but essentially useless, in practice the diagonal
elements are rarely modified for a large class of matrices. The result of this
pivoting approach is that the factorization is, in general, not exact and iterative
refinement may be needed.

Sparse Data Storage

Intel® oneAPI Math Kernel Library (oneMKL) PARDISO stores sparse data in several formats:
• CSR3: The 3-array variation of the compressed sparse row format described in Three Array Variation of
CSR Format.
• BSR3: The three-array variation of the block compressed sparse row format described in Three Array
Variation of BSR Format. Use iparm[36] to specify the block size.
• VBSR: Variable BSR format. Intel® oneAPI Math Kernel Library (oneMKL) PARDISO analyzes the matrix
provided in CSR3 format and converts it into an internal structure which can improve performance for
matrices with a block structure. Useiparm[36] = -t (0 < t≤ 100) to specify use of internal VBSR format
and to set the degree of similarity required to combine elements of the matrix. For example, if you set
iparm[36] = -80, two rows of the input matrix are combined when their non-zero patterns are 80% or
more similar.

1836
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
NOTE
Intel® oneAPI Math Kernel Library (oneMKL) supports only the VBSR format for real and symmetric
positive definite or indefinite matrices (mtype = 2 or mtype = -2).

Intel® oneAPI Math Kernel Library (oneMKL) supports these features for all matrix types as long
asiparm[23]=1:

• iparm[30] > 0: Partial solution

• iparm[35] > 0: Schur complement
• iparm[59] > 0: OOC Intel® oneAPI Math Kernel Library (oneMKL) PARDISO

For all storage formats, the Intel® oneAPI Math Kernel Library (oneMKL) PARDISO parameterja is used for
the columns array, ia is used for rowIndex, and a is used for values. The algorithms in Intel® oneAPI Math
Kernel Library (oneMKL) PARDISO require column indicesja to be in increasing order per row and that the
diagonal element in each row be present for any structurally symmetric matrix. For symmetric or
nonsymmetric matrices the diagonal elements which are equal to zero are not necessary.

Caution
Intel® oneAPI Math Kernel Library (oneMKL) PARDISO column indicesja must be in increasing order
per row. You can validate the sparse matrix structure with the matrix checker (iparm[26])

NOTE
While the presence of zero diagonal elements for symmetric matrices is not required, you should
explicitly set zero diagonal elements for symmetric matrices. Otherwise, Intel® oneAPI Math Kernel
Library (oneMKL) PARDISO creates internal copies of arraysia, ja, and a full of diagonal elements,
which require additional memory and computational time. However, the memory and time required the
diagonal elements in internal arrays is usually not significant compared to the memory and the time
required to factor and solve the matrix.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Storage of Matrices
By default, Intel® oneAPI Math Kernel Library (oneMKL) PARDISO stores data in RAM. This is referred to as
In-Core (IC) mode. However, you can specify that Intel® oneAPI Math Kernel Library (oneMKL) PARDISO
store matrices on disk by settingiparm[59]. This mode is called the Out-of-Core (OOC) mode.

You can set the following parameters for the OOC mode.

Parameter/Environment Variable Description

Name
MKL_PARDISO_OOC_PATH
Directory for storing data created in the OOC mode.
MKL_PARDISO_OOC_FILE_NAME
Full file name (incl. path) which will be used for the OOC files

1837
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Parameter/Environment Variable Description

Name
MKL_PARDISO_OOC_MAX_CORE_SIZE
Maximum size of RAM (in megabytes) available for Intel®
oneAPI Math Kernel Library (oneMKL) PARDISO
MKL_PARDISO_OOC_MAX_SWAP_SIZE
Maximum swap size (in megabytes) available for Intel® oneAPI
Math Kernel Library (oneMKL) PARDISO
MKL_PARDISO_OOC_KEEP_FILE
A flag which determines whether temporary data files will be
deleted or stored

By default, the current working directory is used in the OOC mode as a directory path for storing data. All
work arrays will be stored in files named ooc_temp with different extensions. When
MKL_PARDISO_OOC_FILE_NAME is not set and MKL_PARDISO_OOC_PATH is set, the names for the created files
will contain <path>/mkl_pardiso or <path>\mkl_pardiso depending on the OS. Setting
MKL_PARDISO_OOC_FILE_NAME=<filename> will override the path which could have been set in
MKL_PARDISO_OOC_PATH. In this case <filename> will be used for naming the OOC files.
By default, MKL_PARDISO_OOC_MAX_CORE_SIZE is 2000 (MB) and MKL_PARDISO_OOC_MAX_SWAP_SIZE is 0.

NOTE
Do not set the sum of MKL_PARDISO_OOC_MAX_CORE_SIZE and MKL_PARDISO_OOC_MAX_SWAP_SIZE
greater than the size of the RAM plus the size of the swap memory. Be sure to allow enough free
memory for the operating system and any other processes which need to be running.

By default, all temporary data files will be deleted. For keeping them it is required to set
MKL_PARDISO_OOC_KEEP_FILE to 0.
OOC parameters can be set in a configuration file. You can set the path to this file and its name using
environmental variables MKL_PARDISO_OOC_CFG_PATH and MKL_PARDISO_OOC_CFG_FILE_NAME.

For setting parameters of OOC mode either environment variables or a configuration file can be used. When
the last option is chosen, by default the name of the file is pardiso_ooc.cfg and it should be placed in the
working directory. If needed, the user can set the path to the configuration file using environmental variables
MKL_PARDISO_OOC_CFG_PATH and MKL_PARDISO_OOC_CFG_FILE_NAME. These variables specify the path and
filename as follows:
• Linux* OS and OS X*: <MKL_PARDISO_OOC_CFG_PATH>/ <MKL_PARDISO_OOC_CFG_FILE_NAME>
• Windows* OS: <MKL_PARDISO_OOC_CFG_PATH>\<MKL_PARDISO_OOC_CFG_FILE_NAME>

An example of the configuration file:

MKL_PARDISO_OOC_PATH = <path>
MKL_PARDISO_OOC_MAX_CORE_SIZE = N
MKL_PARDISO_OOC_MAX_SWAP_SIZE = K
MKL_PARDISO_OOC_KEEP_FILE = 0 (or 1)

Caution
The maximum length of the path lines in the configuration files is 1000 characters.

Alternatively, the OOC parameters can be set as environment variables via command line.

1838
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
For Linux* OS and OS X*:

export MKL_PARDISO_OOC_PATH = <path>

export MKL_PARDISO_OOC_MAX_CORE_SIZE = N
export MKL_PARDISO_OOC_MAX_SWAP_SIZE = K
export MKL_PARDISO_OOC_KEEP_FILE = 0 (or 1)
For Windows* OS:

set MKL_PARDISO_OOC_PATH = <path>

set MKL_PARDISO_OOC_MAX_CORE_SIZE = N
set MKL_PARDISO_OOC_MAX_SWAP_SIZE = K
set MKL_PARDISO_OOC_KEEP_FILE = 0 (or 1)
where <path> should follow the OS naming convention.

Direct-Iterative Preconditioning for Nonsymmetric Linear Systems

The solver uses a combination of direct and iterative methods [Sonn89] to accelerate the linear solution
process for transient simulation. Most applications of sparse solvers require solutions of systems with
gradually changing values of the nonzero coefficient matrix, but with an identical sparsity pattern. In these
applications, the analysis phase of the solvers has to be performed only once and the numerical
factorizations are the important time-consuming steps during the simulation. Intel® oneAPI Math Kernel
Library (oneMKL) PARDISO uses a numerical factorization and applies the factors in a preconditioned Krylov
Subspace iteration. If the iteration does not converge, the solver automatically switches back to the
numerical factorization. This method can be applied to nonsymmetric matrices in Intel® oneAPI Math Kernel
Library (oneMKL) PARDISO. You can select the method using theiparm[3] input parameter. The
iparm[19]parameter returns the error status after running Intel® oneAPI Math Kernel Library (oneMKL)
PARDISO.

Single and Double Precision Computations

Intel® oneAPI Math Kernel Library (oneMKL) PARDISO solves tasks using single or double precision. Each
precision has its benefits and drawbacks. Double precision variables have more digits to store value, so the
solver uses more memory for keeping data. But this mode solves matrices with better accuracy, which is
especially important for input matrices with large condition numbers.
Single precision variables have fewer digits to store values, so the solver uses less memory than in the
double precision mode. Additionally this mode usually takes less time. But as computations are made less
precisely, only some systems of equations can be solved accurately enough using single precision.

Separate Forward and Backward Substitution

The solver execution step (see parameterphase = 33 below) can be divided into two or three separate
substitutions: forward, backward, and possible diagonal. This separation can be explained by the examples of
solving systems with different matrix types.
A real symmetric positive definite matrix A (mtype = 2) is factored by Intel® oneAPI Math Kernel Library
(oneMKL) PARDISO asA = L*LT . In this case the solution of the system A*x=b can be found as sequence of
substitutions: L*y=b (forward substitution, phase =331) andLT*x=y (backward substitution, phase =333).

A real nonsymmetric matrix A (mtype = 11) is factored by Intel® oneAPI Math Kernel Library (oneMKL)
PARDISO asA = L*U . In this case the solution of the system A*x=b can be found by the following sequence:
L*y=b (forward substitution, phase =331) andU*x=y (backward substitution, phase =333).
Solving a system with a real symmetric indefinite matrix A (mtype = -2) is slightly different from the cases
above. Intel® oneAPI Math Kernel Library (oneMKL) PARDISO factors this matrix asA=LDLT, and the solution
of the system A*x=b can be calculated as the following sequence of substitutions: L*y=b (forward

1839
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

substitution, phase =331), D*v=y (diagonal substitution, phase =332), and finally LT*x=v (backward
substitution, phase =333). Diagonal substitution makes sense only for symmetric indefinite matrices (mtype
= -2, -4, 6). For matrices of other types a solution can be found as described in the first two examples.

Caution
The number of refinement steps (iparm[7]) must be set to zero if a solution is calculated with
separate substitutions (phase = 331, 332, 333), otherwise Intel® oneAPI Math Kernel Library
(oneMKL) PARDISO produces the wrong result.

NOTE
Different pivoting (iparm[20]) produces different LDLT factorization. Therefore results of forward,
diagonal and backward substitutions with diagonal pivoting can differ from results of the same steps
with Bunch-Kaufman pivoting. Of course, the final results of sequential execution of forward, diagonal
and backward substitution are equal to the results of the full solving step (phase=33) regardless of the
pivoting used.

Callback Function for Pivoting Control

In-core Intel® oneAPI Math Kernel Library (oneMKL) PARDISO allows you to control pivoting with a callback
routine,mkl_pardiso_pivot. You can then use the pardiso_getdiag routine to access the diagonal elements.
Set iparm[55] to 1 in order to use the callback functionality.

Low Rank Update

Use low rank update to accelerate the factorization step in Intel® oneAPI Math Kernel Library (oneMKL)
PARDISO when you use multiple matrices with identical structure and similar values. After callingpardiso in
the usual manner for factorization (phase = 12, 13, 22, or 23) for some matrix A1, low rank update can be
applied to the factorization step (phase = 22 or 23) of some matrix A2 with identical structure.

To use the low rank update feature, set iparm[38] = 1 while also setting iparm[23] = 10. Additionally,
supply an array that lists the values in A2 that are different from A1 using the perm parameter as outlined in
the pardiso perm parameter description.

Important
Low rank update can only be called for matrices with the exact same pattern of nonzero values. As
such, the value of the mtype, ia, ja, and iparm[23] parameters should also be identical. In general,
the low rank factorization should be called with the same parameters as the preceding factorization
step for the same internal data structure handle (except for array a, iparm[38], and perm).

Low rank update does not currently support Intel TBB threading. In this case, Intel® oneAPI Math
Kernel Library (oneMKL) PARDISO defaults to full factorization instead.
Low rank update cannot be used in combination with a user-supplied permutation vector - in other
words, you must use the default values of iparm[4] = 0, iparm[30] = 0, and iparm[35] = 0).
Additionally, iparm[3], iparm[5], iparm[27], iparm[36], iparm[55], and iparm[59] must all be
set to the default value of 0.

pardiso
Calculates the solution of a set of sparse linear
equations with single or multiple right-hand sides.

1840
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void pardiso (_MKL_DSS_HANDLE_t pt, const MKL_INT *maxfct, const MKL_INT *mnum, const
MKL_INT *mtype, const MKL_INT *phase, const MKL_INT *n, const void *a, const MKL_INT
*ia, const MKL_INT *ja, MKL_INT *perm, const MKL_INT *nrhs, MKL_INT *iparm, const
MKL_INT *msglvl, void *b, void *x, MKL_INT *error);

Include Files
• mkl.h

Description
The pardiso routine calculates the solution of a set of sparse linear equations

A*X = B
with single or multiple right-hand sides, using a parallel LU, LDL, or LLT factorization, where A is an n-by-n
matrix, and X and B are n-by-nrhs vectors or matrices.

Notes
• This routine supports usage of the mkl_progress with OpenMP, TBB, and sequential threading. See
mkl_progress for details. The case of iparm[23]=10 does not support this feature.
• If iparm[26] is set to 1 (Matrix checker), Intel® oneAPI Math Kernel Library PARDISO uses the
auxiliary routine sparse_matrix_checker to check integer arrays ia and ja.
sparse_matrix_checker has its own set of error values (from 21 to 24) that are returned in the
event of an unsuccessful matrix check. For more details, refer to the sparse_matrix_checker
documentation.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201

Input Parameters

pt Array with size of 64.

Handle to internal data structure. The entries must be set to zero prior to
the first call to pardiso. Unique for factorization.

Caution
After the first call to pardiso do not directly modify pt, as that
could cause a serious memory leak.

Use the pardiso_handle_store or pardiso_handle_store_64 routine to

store the content of pt to a file. Restore the contents of pt from the file
using pardiso_handle_restore or pardiso_handle_restore_64. Use
pardiso_handle_store and pardiso_handle_restore with pardiso,
and pardiso_handle_store_64 and pardiso_handle_restore_64 with
pardiso_64.

1841
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

maxfct Maximum number of factors with identical sparsity structure that must be
kept in memory at the same time. In most applications this value is equal
to 1. It is possible to store several different factorizations with the same
nonzero structure at the same time in the internal data structure
management of the solver.
pardiso can process several matrices with an identical matrix sparsity
pattern and it can store the factors of these matrices at the same time.
Matrices with a different sparsity structure can be kept in memory with
different memory address pointers pt.

mnum Indicates the actual matrix for the solution phase. With this scalar you can
define which matrix to factorize. The value must be: 1 ≤mnum≤maxfct.

In most applications this value is 1.

mtype Defines the matrix type, which influences the pivoting method. The Intel®
oneAPI Math Kernel Library (oneMKL) PARDISO solver supports the
following matrices:

1 real and structurally symmetric

2 real and symmetric positive definite

-2 real and symmetric indefinite

3 complex and structurally symmetric

4 complex and Hermitian positive definite

-4 complex and Hermitian indefinite

6 complex and symmetric

11 real and nonsymmetric

13 complex and nonsymmetric

phase Controls the execution of the solver. Usually it is a two- or three-digit

integer. The first digit indicates the starting phase of execution and the
second digit indicates the ending phase. Intel® oneAPI Math Kernel Library
(oneMKL) PARDISO has the following phases of execution:
• Phase 1: Fill-reduction analysis and symbolic factorization
• Phase 2: Numerical factorization
• Phase 3: Forward and Backward solve including optional iterative
refinement
This phase can be divided into two or three separate substitutions:
forward, backward, and diagonal (see Separate Forward and Backward
Substitution).
• Memory release phase (phase= 0 or phase= -1)

If a previous call to the routine has computed information from previous

phases, execution may start at any phase. The phase parameter can have
the following values:

phase Solver Execution Steps

11 Analysis

1842
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
phase Solver Execution Steps
12 Analysis, numerical factorization

13 Analysis, numerical factorization, solve, iterative

refinement

22 Numerical factorization

23 Numerical factorization, solve, iterative refinement

33 Solve, iterative refinement

331 like phase=33, but only forward substitution

332 like phase=33, but only diagonal substitution (if

available)

333 like phase=33, but only backward substitution

0 Release internal memory for L and U matrix number

mnum

-1 Release all internal memory for all matrices

If iparm[35] = 0, phases 331, 332, and 333 perform this decomposition:

L11 0 D11 0 U 11 U 21
A=
L12 L22 0 D22 0 U 22

If iparm[35] = 2, phases 331, 332, and 333 perform a different

decomposition:
L11 0 I 0 U 11 U 21
A=
L12 I 0 S 0 I

You can supply a custom implementation for phase 332 instead of calling
pardiso. For example, it can be implemented with dense LAPACK
functionality. Custom implementation also allows you to substitute the
matrix S with your own.

NOTE
For very large Schur complement matrices use LAPACK
functionality to compute the Schur complement vector instead
of the Intel® oneAPI Math Kernel Library (oneMKL) PARDISO
phase 332 implementation.

n Number of equations in the sparse linear systems of equations A*X = B.

Constraint: n > 0.

a Array. Contains the non-zero elements of the coefficient matrix A

corresponding to the indices in ja. The coefficient matrix can be either real
or complex. The matrix must be stored in the three-array variant of the
compressed sparse row (CSR3) or in the three-array variant of the block
compressed sparse row (BSR3) format, and the matrix must be stored with
increasing values of ja for each row.

1843
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

For CSR3 format, the size of a is the same as that of ja. Refer to the
values array description in Three Array Variation of CSR Format for more
details.
For BSR3 format the size of a is the size of ja multiplied by the square of
the block size. Refer to the values array description in Three Array
Variation of BSR Format for more details.

NOTE
If you set iparm[36]to a negative value, Intel® oneAPI Math
Kernel Library (oneMKL) PARDISO converts the data from CSR3
format to an internal variable BSR (VBSR) format. SeeSparse
Data Storage.

ia Array, size (n+1).

For CSR3 format, ia[i] (i<n) points to the first column index of row i in
the array ja. That is, ia[i] gives the index of the element in array a that
contains the first non-zero element from row i of A. The last element ia[n]
is taken to be equal to the number of non-zero elements in A, plus one.
Refer to rowIndex array description in Three Array Variation of CSR Format
for more details.
For BSR3 format, ia[i] (i<n) points to the first column index of row i in
the array ja. That is, ia[i] gives the index of the element in array a that
contains the first non-zero block from row i of A. The last element ia[n] is
taken to be equal to the number of non-zero blcoks in A, plus one. Refer to
rowIndex array description in Three Array Variation of BSR Format for more
details.
The array ia is accessed in all phases of the solution process.

Indexing of ia is one-based by default, but it can be changed to zero-based

by setting the appropriate value to the parameter iparm[34].

ja For CSR3 format, array ja contains column indices of the sparse matrix A.
It is important that the indices are in increasing order per row. For
structurally symmetric matrices it is assumed that all diagonal elements are
stored (even if they are zeros) in the list of non-zero elements in a and ja.
For symmetric matrices, the solver needs only the upper triangular part of
the system as is shown for columns array in Three Array Variation of CSR
Format.
For BSR3 format, array ja contains column indices of the sparse matrix A.
It is important that the indices are in increasing order per row. For
structurally symmetric matrices it is assumed that all diagonal blocks are
stored (even if they are zeros) in the list of non-zero blocks in a and ja. For
symmetric matrices, the solver needs only the upper triangular part of the
system as is shown for columns array in Three Array Variation of BSR
Format.
The array ja is accessed in all phases of the solution process.

Indexing of ja is one-based by default, but it can be changed to zero-based

by setting the appropriate value to the parameter iparm[34].

1844
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
perm Array, size (n). Depending on the value of iparm[4] and iparm[30], holds
the permutation vector of size n, specifies elements used for computing a
partial solution, or specifies differing values of the input matrices for low
rank update.

• If iparm[4] = 1, iparm[30] = 0, and iparm[35] = 0, perm specifies

the fill-in reducing ordering to the solver. Let A be the original matrix
and C = P*A*PT be the permuted matrix. Row (column) i of C is the
perm[i] row (column) of A. The array perm is also used to return the
permutation vector calculated during fill-in reducing ordering stage.

NOTE
Be aware that setting iparm[4] = 1 prevents use of a parallel
algorithm for the solve step.

• If iparm[4] = 2, iparm[30] = 0, and iparm[35] = 0, the permutation

vector computed in phase 11 is returned in the perm array.
• If iparm[4] = 0, iparm[30] > 0, and iparm[35] = 0, perm specifies
elements of the right-hand side to use or of the solution to compute for
a partial solution.
• If iparm[4] = 0, iparm[30] = 0, and iparm[35] > 0, perm specifies
elements for a Schur complement.
• If iparm[38] = 1, perm specifies values that differ in A for low rank
update (see Low Rank Update). The size of the array must be at least
2*ndiff + 1, where ndiff is the number of values of A that are different.
The values of perm should be:
perm = {ndiff, row_index1, column_index1, row_index2,
column_index2, ...., row_index_ndiff, column_index_ndiff}
where row_index_m and column_index_m are the row and column
indices of the m-th differing non-zero value in matrix A. The row and
column index pairs can be in any order, but must use zero-based
indexing regardless of the value of iparm[34].

See iparm[4], iparm[30], and iparm[38] for more details.

Indexing of perm is one-based by default, but unless iparm[38] = 1 it can

be changed to zero-based by setting the appropriate value to the parameter
iparm[34].

nrhs Number of right-hand sides that need to be solved for.

iparm Array, size (64). This array is used to pass various parameters to Intel®
oneAPI Math Kernel Library (oneMKL) PARDISO and to return some useful
information after execution of the solver.
See pardiso iparm Parameter for more details about the iparm parameters.

msglvl Message level information. If msglvl = 0 then pardiso generates no

output, if msglvl = 1 the solver prints statistical information to the screen.

b Array, size (n*nrhs). On entry, contains the right-hand side vector/matrix

B, which is placed in memory contiguously. The b[+k*nrhs] element must
hold the i-th component of k-th right-hand side vector. Note that b is only
accessed in the solution phase.

1845
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Output Parameters
(See also Intel MKL PARDISO Parameters in Tabular Form.)

pt Handle to internal data structure.

perm See the Input Parameter description of the perm array.

iparm On output, some iparm values report information such as the numbers of
non-zero elements in the factors.
See pardiso iparm Parameter for more details about the iparm parameters.

b On output, the array is replaced with the solution if iparm[5] = 1.

x Array, size (n*nrhs). If iparm[5]=0 it contains solution vector/matrix X,

which is placed contiguously in memory. The x[i + k*n] element must
hold the i-th component of the k-th solution vector. Note that x is only
accessed in the solution phase.

error The error indicator according to the below table:

error Information
0 no error

-1 input inconsistent

-2 not enough memory

-3 reordering problem

-4 Zero pivot, numerical factorization or iterative

refinement problem. If the error appears during the
solution phase, try to change the pivoting perturbation
(iparm[9]) and also increase the number of iterative
refinement steps. If it does not help, consider changing
the scaling, matching and pivoting options (iparm[10],
iparm[12], iparm[20])

-5 unclassified (internal) error

-6 reordering failed (matrix types 11 and 13 only)

-7 diagonal matrix is singular

-8 32-bit integer overflow problem

-9 not enough memory for OOC

-10 error opening OOC files

-11 read/write error with OOC files

-12 (pardiso_64 only) pardiso_64 called from 32-bit

library

-13 interrupted by the (user-defined) mkl_progress function

1846
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
error Information
-15 internal error which can appear for iparm[23]=10 and
iparm[12]=1. Try switch matching off (set
iparm[12]=0 and rerun.)

pardisoinit
Initialize Intel® oneAPI Math Kernel Library (oneMKL)
PARDISO with default parameters in accordance with
the matrix type.

Syntax
void pardisoinit (_MKL_DSS_HANDLE_t pt, const MKL_INT *mtype, MKL_INT *iparm );

Include Files
• mkl.h

Description

This function initializes the solver handle pt for Intel® oneAPI Math Kernel Library (oneMKL) PARDISO with
zero values (as needed for the very first call of pardiso) and sets default iparm values in accordance with
the matrix type mtype.

The recommended way is to avoid using pardisoinit and to initialize pt and set the values of the iparm
array manually as the default parameters might not be the best for a particular use case.
An alternative method to set default iparm values is to call pardiso in the analysis phase with iparm(1)=0.
In this case, the solver handle pt must be initialized with zero values.

The pardisoinit routine initializes only the in-core version of Intel® oneAPI Math Kernel Library (oneMKL)
PARDISO. Switching to the out-of-core version of Intel® oneAPI Math Kernel Library (oneMKL) PARDISO as
well as changing default iparm values can be done after the call to pardisoinit but before the first call to
pardiso.
The pardisoinit routine cannot be used together with the pardiso_64 routine.

Input Parameters

mtype Matrix type. Based on this value pardisoinit chooses default values for
the iparm array. Refer to the section oneMKL PARDISO Parameters in
Tabular Formfor more details about the default values of Intel® oneAPI Math
Kernel Library (oneMKL) PARDISO.

Output Parameters

pt Array of size 64. Handle to internal data structure. The pardisoinit

routine nullifies the array pt.

1847
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

NOTE
It is very important that pt is initialized with zero before the
first call of Intel® oneAPI Math Kernel Library (oneMKL)
PARDISO. After that first call you must never modify the array,
because it could cause a serious memory leak or a crash.

iparm Array of size 64. This array is used to set various options for Intel® oneAPI
Math Kernel Library (oneMKL) PARDISO and to return some useful
information after execution of the solver. Thepardisoinit routine fills in
the iparm array with the default values. Refer to the section oneMKL
PARDISO Parameters in Tabular Form for more details about the default
values of Intel® oneAPI Math Kernel Library (oneMKL) PARDISO.

pardiso_64
Calculates the solution of a set of sparse linear
equations with single or multiple right-hand sides, 64-
bit integer version.

Syntax
void pardiso_64 (_MKL_DSS_HANDLE_t pt, const long long int *maxfct, const long long int
*mnum, const long long int *mtype, const long long int *phase, const long long int *n,
const void *a, const long long int *ia, const long long int *ja, long long int *perm,
const long long int *nrhs, long long int *iparm, const long long int *msglvl, void *b,
void *x, long long int *error);

Include Files
• mkl.h

Description
pardiso_64 is an alternative ILP64 (64-bit integer) version of the pardiso routine (see Description section
for more details). The interface of pardiso_64 is the same as the interface of pardiso, but it accepts and
returns all integer data as long long int.

Use pardiso_64 when pardisofor solving large matrices (with the number of non-zero elements on the
order of 500 million or more). You can use it together with the usual LP64 interfaces for the rest of Intel®
oneAPI Math Kernel Library (oneMKL) functionality. In other words, if you use 64-bit integer version
(pardiso_64), you do not need to re-link your applications with ILP64 libraries. Take into account that
pardiso_64 may perform slower than regular pardiso on the reordering and symbolic factorization phase.

NOTE
pardiso_64 is supported only in the 64-bit libraries. If pardiso_64 is called from the 32-bit libraries,
it returns error =-12.

NOTE
This routine supports the Progress Routine feature. See Progress Function for details.

1848
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
The input parameters of pardiso_64 are the same as the input parameters of pardiso, but pardiso_64
accepts all integer data as long long int.

Output Parameters
The output parameters of pardiso_64 are the same as the output parameters of pardiso, but pardiso_64
returns all integer data as long long int.

mkl_pardiso_pivot
Replaces routine which handles Intel® oneAPI Math
Kernel Library (oneMKL) PARDISO pivots with user-
defined routine.

Syntax
void mkl_pardiso_pivot (const void *ai, void *bi, const void *eps);

Include Files
• mkl.h

Description
The mkl_pardiso_pivotroutine allows you to handle diagonal elements which arise during numerical
factorization that are zero or near zero. By default, Intel® oneAPI Math Kernel Library (oneMKL) PARDISO
determines that a diagonal elementbi is a pivot if bi < eps, and if so, replaces it with eps. But you can
provide your own routine to modify the resulting factorized matrix in case there are small elements on the
diagonal during the factorization step.

NOTE
To use this routine, you must set iparm[55] to 1 before the main pardiso loop.

NOTE
The matrix types mtype=2 (symmetric positive-definite matrix) and mtype=4 (complex and
Hermitian positive definite) are not supported, because the Cholesky factorization without
pivoting is used for these matrix types.

Input Parameters

ai Diagonal element of initial matrix corresponding to pivot element.

bi Diagonal element of factorized matrix that could be chosen as a pivot

element.

eps Scalar to compare with diagonal of factorized matrix. On input equal to

parameter described by iparm[9].

Output Parameters

bi In case element is chosen as a pivot, value with which to replace the pivot.

1849
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

pardiso_getdiag
Returns diagonal elements of initial and factorized
matrix.

Syntax
void pardiso_getdiag (const _MKL_DSS_HANDLE_t pt, void *df, void *da, const MKL_INT
*mnum, MKL_INT *error);

Include Files
• mkl.h

Description
This routine returns the diagonal elements of the initial and factorized matrix for a real or Hermitian matrix.

NOTE
In order to use this routine, you must set iparm[55] to 1 before the main pardiso loop.
If iparm[23] is set to 10 (an improved two-level factorization algorithm for nonsymmetric matrices),
Intel® oneAPI Math Kernel Library PARDISO will automatically use the classic algorithm for
factorization.

Input Parameters

pt Array with a size of 64. Handle to internal data structure for the Intel®
oneAPI Math Kernel Library (oneMKL) PARDISO solver. The entries must be
set to zero prior to the first call topardiso. Unique for factorization.

mnum Indicates the actual matrix for the solution phase of the Intel® oneAPI Math
Kernel Library (oneMKL) PARDISO solver. With this scalar you can define the
diagonal elements of the factorized matrix that you want to obtain. The
value must be: 1 ≤mnum ≤ maxfct. In most applications this value is 1.

Output Parameters

df Array with a dimension of n. Contains diagonal elements of the factorized

matrix after factorization.

NOTE
Elements of df correspond to diagonal elements of matrix
Lcomputed during phase 22. Because during phase 22 Intel®
oneAPI Math Kernel Library (oneMKL) PARDISO makes
additional permutations to improve stability, it is possible that
arraydf is not in line with the perm array computed during phase
11.

da Array with a dimension of n. Contains diagonal elements of the initial

matrix.

error The error indicator.

1850
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
error Information
0 no error

-1 Diagonal information not turned on before pardiso

main loop (iparm[55]=0).

pardiso_export
Places pointers dedicated for sparse representation of
a requested matrix (values, rows, and columns) into
MKL PARDISO

Syntax
void pardiso_export (const _MKL_DSS_HANDLE_t pt, void* values, MKL_INT* rows, MKL_INT*
columns, MKL_INT* step, MKL_INT* iparm, MKL_INT* error);

Include Files
• mkl.h

Description
This auxiliary routine places pointers dedicated for sparse representation of a requested matrix (values,
rows, and columns) into MKL PARDISO. The matrix will be stored in the three-array variant of the
compressed sparse row (CSR3 format) with 0-based indexing.

NOTE
Currently, this routine can be used only for a sparse Schur complement matrix. All
parameters related to the Schur complement matrix (perm, iparm) must be set before the
reordering stage of MKL PARDISO (phase = 11) is called.

Input Parameters
®
pt Array with a size of 64. Handle to internal data structure for the Intel
MKL PARDISO solver. The entries must be set to zero prior to the first
call to pardiso. Unique for factorization.
®
iparm This array is used to pass various parameters to Intel MKL PARDISO
and to return some useful information after execution of the solver.

step Stage indicator. These are the currently supported values:

Step Notes
value
1
Used to place pointers related to a Schur complement
matrix in MKL PARDISO. The routine with step equal to
1 must be called between the reordering and
factorization phases of MKL PARDISO.
−1
Used to clean the internal handle.

1851
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Input/Output Parameters

values Parameter type: input/output parameter.

This array contains the non-zero elements of the requested matrix.

rows Parameter type: input/output parameter.

Array of size (size + 1)
For CSR3 format, rows[i] ( i < size ) points to the first column
index of row i in the array columns; that is, rows[i] gives the index
of the element in the array values that contains the first non-zero
element from row i of the sparse matrix. The last element,
rows[size], is equal to the number of non-zero elements in the
sparse matrix.

columns Parameter type: input/output parameter.

This array contains the column indices for the non-zero elements of
the requested matrix.

error Parameter type: output parameter.

The error status:

• 0 indicates no error.
• 1 indicates inconsistent input data.

Usage Example
The following C-style example demonstrates how to use the pardiso_export routine to get the sparse
representation (that is, three-array CSR format) of a Schur complement matrix.

#include "mkl.h"

/*
* Call the reordering phase of MKL PARDISO with iparm[35] set to -1 in
* order to compute the Schur complement matrix only, or -2 to compute all
* factorization arrays. perm array indices related to the Schur complement
* matrix must be set to 1.
*/
phase = 11;
for ( i = 0; i < schur_size; i++ ) { perm[i] = 1.; }
iparm[35] = -1;
pardiso(pt, &maxfct, &mnum, &mtype, &phase, &n, a, ia, ja, perm, &nrhs,
iparm, &msglvl, b, x, &error);

/*
* After the reordering phase, iparm[35] will contain the number of non-zero
* elements for the Schur complement matrix. Arrays dedicated to the sparse
* representation of the Schur complement matrix must be allocated before
* the factorization stage of MKL PARDISO is called.
*/
schur_nnz = iparm[35];
schur_rows = (MKL_INT *) mkl_malloc(schur_size+1, ALIGNMENT);
schur_columns = (MKL_INT *) mkl_malloc(schur_nnz , ALIGNMENT);
schur_values = (DATA_TYPE *) mkl_malloc(schur_nnz , ALIGNMENT);

/*
* Call to the pardiso_export routine with step equal to 1 in order to put

1852
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
* pointers related to the three-array CSR format into MKL PARDISO:
*/
pardiso_export(pt, schur_values, schur_ia, schur_ja, &step, iparm, &error);

/*
* Call the factorization phase of PARDISO with iparm[35] equal to -1 or -2
* to compute the Schur complement matrix:
*/
phase = 22;
iparm[35] = -1;
pardiso(pt, &maxfct, &mnum, &mtype, &phase, &n, a, ia, ja, perm, &nrhs,
iparm, &msglvl, b, x, &error);

/*
* After the factorization stage, schur_values, schur_rows, and
* schur_columns will contain the Schur complement matrix in CSR3 format.
*/

pardiso_handle_store
Store internal structures from pardiso to a file.

Syntax
void pardiso_handle_store (_MKL_DSS_HANDLE_t pt, const char *dirname, MKL_INT *error);

Include Files
• mkl.h

Description
This function stores Intel® oneAPI Math Kernel Library (oneMKL) PARDISO structures to a file, allowing you to
store Intel® oneAPI Math Kernel Library (oneMKL) PARDISO internal structures between the stages of
thepardiso routine. The pardiso_handle_restoreroutine can restore the Intel® oneAPI Math Kernel Library
(oneMKL) PARDISO internal structures from the file.

Input Parameters

pt Array with a size of 64. Handle to internal data structure.

dirname String containing the name of the directory to which to write the files with
the content of the internal structures. Use an empty string ("") to specify
the current directory. The routine creates a file named handle.pds in the
directory.

Output Parameters

pt Handle to internal data structure.

error The error indicator.

error Information
0 No error.

-2 Not enough memory.

-10 Cannot open file for writing.

1853
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

error Information
-11 Error while writing to file.

-13 Wrong file format.

pardiso_handle_restore
Restore pardiso internal structures from a file.

Syntax
void pardiso_handle_restore (_MKL_DSS_HANDLE_t pt, const char *dirname, MKL_INT
*error);

Include Files
• mkl.h

Description
This function restores Intel® oneAPI Math Kernel Library (oneMKL) PARDISO structures from a file. This
allows you to restore Intel® oneAPI Math Kernel Library (oneMKL) PARDISO internal structures stored
bypardiso_handle_store after a phase of the pardiso routine and continue execution of the next phase.

Input Parameters

dirname String containing the name of the directory in which the file with the
content of the internal structures are located. Use an empty string ("") to
specify the current directory.

Output Parameters

pt Array with a dimension of 64. Handle to internal data structure.

error The error indicator.

error Information
0 No error.

-2 Not enough memory.

-10 Cannot open file for reading.

-11 Error while reading from file.

-13 Wrong file format.

pardiso_handle_delete
Delete files with pardiso internal structure data.

Syntax
void pardiso_handle_delete (const char *dirname, MKL_INT *error);

Include Files
• mkl.h

1854
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
This function deletes files generated with pardiso_handle_storethat contain Intel® oneAPI Math Kernel
Library (oneMKL) PARDISO internal structures.

Input Parameters

dirname String containing the name of the directory in which the file with the
content of the internal structures are located. Use an empty string ("") to
specify the current directory.

Output Parameters

error The error indicator.

error Information
0 No error.

-10 Cannot delete files.

pardiso_handle_store_64
Store internal structures from pardiso_64 to a file.

Syntax
void pardiso_handle_store_64 (_MKL_DSS_HANDLE_t pt, const char *dirname, MKL_INT
*error);

Include Files
• mkl.h

Description
This function stores Intel® oneAPI Math Kernel Library (oneMKL) PARDISO structures to a file, allowing you to
store Intel® oneAPI Math Kernel Library (oneMKL) PARDISO internal structures between the stages of
thepardiso_64 routine. The pardiso_handle_restore_64routine can restore the Intel® oneAPI Math Kernel
Library (oneMKL) PARDISO internal structures from the file.

Input Parameters

pt Array with a dimension of 64. Handle to internal data structure.

Output Parameters

pt Handle to internal data structure.

error The error indicator.

error Information
0 No error.

1855
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

error Information
-2 Not enough memory.

-10 Cannot open file for writing.

-11 Error while writing to file.

-12 Not supported in 32-bit library - routine is only

supported in 64-bit libraries.

-13 Wrong file format.

pardiso_handle_restore_64
Restore pardiso_64 internal structures from a file.

Syntax
void pardiso_handle_restore_64 (_MKL_DSS_HANDLE_t pt, const char *dirname, MKL_INT
*error);

Include Files
• mkl.h

Description
This function restores Intel® oneAPI Math Kernel Library (oneMKL) PARDISO structures from a file. This
allows you to restore Intel® oneAPI Math Kernel Library (oneMKL) PARDISO internal structures stored
bypardiso_handle_store_64 after a phase of the pardiso_64 routine and continue execution of the next
phase.

Input Parameters

dirname String containing the name of the directory in which the file with the
content of the internal structures are located. Use an empty string ("") to
specify the current directory.

Input Parameters

pt Array with a dimension of 64. Handle to internal data structure.

error The error indicator.

error Information
0 No error.

-2 Not enough memory.

-10 Cannot open file for reading.

-11 Error while reading from file.

-13 Wrong file format.

1856
Developer Reference for Intel® oneAPI Math Kernel Library - C 1

pardiso_handle_delete_64

Syntax
Delete files with pardiso_64 internal structure data.
void pardiso_handle_delete_64 (const char *dirname, MKL_INT *error);

Include Files
• mkl.h

Description
This function deletes files generated with pardiso_handle_store_64that contain Intel® oneAPI Math Kernel
Library (oneMKL) PARDISO internal structures.

Input Parameters

dirname String containing the name of the directory in which the file with the
content of the internal structures are located. Use an empty string ("") to
specify the current directory.

Output Parameters

error The error indicator.

error Information
0 No error.

-10 Cannot delete files.

-12 Not supported in 32-bit library - routine is only

supported in 64-bit libraries.

oneMKL PARDISO Parameters in Tabular Form

The following table lists all parameters of Intel® oneAPI Math Kernel Library (oneMKL) PARDISO and gives
their brief descriptions.
Parameter Type Description Values Comments In/
Out
pt 0 in/o
void* Solver internal Must be initialized with ut
data address zeros and never be
pointer modified later
maxfct >0 in
MKL_INT* Maximal number of Generally used value is 1
factors in memory
mnum in
MKL_INT* The number of [1: Generally used value is 1
matrix (from 1 to maxfct]
maxfct) to solve
mtype 1 in
MKL_INT* Matrix type Real and structurally
symmetric
2
Real and symmetric
positive definite

1857
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Parameter Type Description Values Comments In/

Out
-2
Real and symmetric
indefinite
3
Complex and structurally
symmetric
4
Complex and Hermitian
positive definite
-4
Complex and Hermitian
indefinite
6
Complex and symmetric
matrix
11
Real and nonsymmetric
matrix
13
Complex and
nonsymmetric matrix
phase 11 in
MKL_INT* Controls the Analysis
execution of the 12
solver Analysis, numerical
factorization
For iparm[35] >
13
0, phases 331, Analysis, numerical
332, and 333 factorization, solve
perform a different 22
decomposition. See Numerical factorization
the phase 23
Numerical factorization,
parameter of solve
pardiso for details.
33
Solve, iterative refinement
331
phase=33, but only
forward substitution
332
phase=33, but only
diagonal substitution
333
phase=33, but only
backward substitution
0
Release internal memory
for L and U of the matrix
number mnum
-1
Release all internal
memory for all matrices
n >0 in
MKL_INT* Number of
equations in the
sparse linear
system A*X = B
a * in
void* Contains the non- The size of a is the same
zero elements of as that of ja, and the
the coefficient coefficient matrix can be
matrix A either real or complex. The

1858
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Parameter Type Description Values Comments In/
Out

matrix must be stored in

the 3-array variation of
compressed sparse row
(CSR3) format with
increasing values of ja for
each row
ia[n ] >=0 in
MKL_INT* rowIndex array in ia[i] gives the index of
CSR3 format the element in array a that
contains the first non-zero
element from row i of A.
The last element ia(n) is
taken to be equal to the
number of non-zero
elements in A.
Note: iparm[34] indicates
whether row/column
indexing starts from 1 or
0.
ja >=0 in
MKL_INT* columns array in The column indices for
CSR3 format each row of A must be
sorted in increasing order.
For structurally symmetric
matrices zero diagonal
elements must be stored
in a and ja. Zero diagonal
elements should be stored
for symmetric matrices,
although they are not
required. For symmetric
matrices, the solver needs
only the upper triangular
part of the system.
Note: iparm[34] indicates
whether row/column
indexing starts from 1 or
0.
perm[n ] >=0 in/o
MKL_INT* Holds the You can apply your own ut
permutation vector fill-in reducing ordering
of size n, specifies (iparm[4]= 1) or return
elements used for the permutation from the
computing a partial solver (iparm[4]= 2 ).
solution, or
Let C = P*A*PT be the
specifies differing
values of the input permuted matrix. Row
matrices for low (column) i of C is the
rank update perm(i) row (column) of
A. The numbering of the
array must describe a
permutation.

1859
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Parameter Type Description Values Comments In/

Out

To specify elements for a

partial solution, set
iparm[4]= 0,
iparm[30]> 0, and
iparm[35]= 0.
To specify elements for a
Schur complement, set
iparm[4]= 0,
iparm[30]= 0, and
iparm[35]> 0.
To specify values that
differ in A for low rank
update (see Low Rank
Update), set iparm[38] =
1. The size of the array
must be at least 2*ndiff +
1, where ndiff is the
number of values of A that
are different. The values of
perm should be:
perm = {ndiff,
row_index1,
column_index1,
row_index2,
column_index2, ....,
row_index_ndiff,
column_index_ndiff}
where row_index_m and
column_index_m are the
row and column indices of
the m-th differing non-
zero value in matrix A. The
row and column index
pairs can be in any order,
but must use zero-based
indexing regardless of the
value of iparm[34].

NOTE
Unless you have specified
low rank update,
iparm[34] indicates
whether row/column
indexing starts from 1 or 0.

nrhs >=0 in
MKL_INT* Number of right- Generally used value is 1
hand sides that
To obtain better Intel®
need to be solved
oneAPI Math Kernel
for
Library (oneMKL) PARDISO
performance, during the
numerical factorization
phase you can provide the
maximum number of

1860
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Parameter Type Description Values Comments In/
Out

right-hand sides, which

can be used further during
the solving phase.
iparm[64] * in/o
MKL_INT* This array is used If iparm[0]=0, Intel® ut
to pass various oneAPI Math Kernel
parameters to Library (oneMKL) PARDISO
Intel® oneAPI Math fillsiparm[1] through
Kernel Library iparm[63] with default
(oneMKL) PARDISO values and uses them.
and to return some
useful information
after execution of
the solver (see
pardiso iparm
Parameter for more
details)
msglvl 0 in
MKL_INT* Message level Intel® oneAPI Math Kernel
information Library (oneMKL) PARDISO
generates no output
1
Intel® oneAPI Math Kernel
Library (oneMKL) PARDISO
prints statistical
information
b[n*nrhs] * in/o
void* Right-hand side On entry, contains the ut
vectors right-hand side vector/
matrix B, which is placed
contiguously in memory.
The b[i+k*n] element
must hold the i-th
component of k-th right-
hand side vector. Note that
b is only accessed in the
solution phase.
On output, the array is
replaced with the solution
if iparm[5]=1.
x[n*nrhs] * out
void* Solution vectors On output, if iparm[5]=0,
contains solution vector/
matrix X which is placed
contiguously in memory.
The x[i+k*n] element
must hold the i-th
component of k-th solution
vector. Note that x is only
accessed in the solution
phase.
error 0 out
MKL_INT* Error indicator No error
-1
Input inconsistent
-2
Not enough memory

1861
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Parameter Type Description Values Comments In/

Out
-3
Reordering problem
-4
Zero pivot, numerical
factorization or iterative
refinement problem
-5
Unclassified (internal)
error
-6
Reordering failed (matrix
types 11 and 13 only)
-7
Diagonal matrix is singular
-8
32-bit integer overflow
problem
-9
Not enough memory for
OOC
-10
Problems with opening
OOC temporary files
-11
Read/write problems with
the OOC data file
1) See description of PARDISO_DATA_TYPE in PARDISO_DATA_TYPE.

pardiso iparm Parameter

This table describes all individual components of the Intel® oneAPI Math Kernel Library (oneMKL)
PARDISOiparm parameter. Components which are not used must be initialized with 0. Default values are
denoted with an asterisk (*).
Component Description

iparm[0] Use default values.

input 0 iparm[1] - iparm[63] are filled with default values.
≠0 You must supply all values in components iparm[1] - iparm[63].

iparm[1] Fill-in reducing ordering for the input matrix.

input
Caution
You can control the parallel execution of the solver by explicitly setting the
MKL_NUM_THREADS environment variable. If fewer OpenMP threads are available than
specified, the execution may slow down instead of speeding up. If MKL_NUM_THREADS is
not defined, then the solver uses all available processors.

NOTE If a two-level factorization algorithm is chosen (that is, iparm[23]=1), then only
nested dissection algorithms are available (iparm[1]=2 or iparm[1]=3).

0 The minimum degree algorithm [Li99].

2* The nested dissection algorithm from the METIS package [Karypis98].
3 The parallel (OpenMP) version of the nested dissection algorithm. It can decrease the
time of computations on multi-core computers, especially when Intel® oneAPI Math
Kernel Library (oneMKL) PARDISO Phase 1 takes significant time.

1862
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Component Description

NOTE
Setting iparm[1] = 3 prevents the use of CNR mode (iparm[33] > 0)
because Intel® oneAPI Math Kernel Library (oneMKL) PARDISO uses dynamic
parallelism.

iparm[2]
Reserved. Set to zero.
iparm[3]
Preconditioned CGS/CG.
input This parameter controls preconditioned CGS [Sonn89] for nonsymmetric or structurally
symmetric matrices and Conjugate-Gradients for symmetric matrices. iparm[3] has
the form iparm[3]= 10*L+K.
K=0
The factorization is always computed as required by phase.
K=1
CGS iteration replaces the computation of LU. The preconditioner is LU that
was computed at a previous step (the first step or last step with a failure) in a
sequence of solutions needed for identical sparsity patterns.
K=2
CGS iteration for symmetric positive definite matrices replaces the computation
of LLT. The preconditioner is LLT that was computed at a previous step (the
first step or last step with a failure) in a sequence of solutions needed for
identical sparsity patterns.

The value L controls the stopping criterion of the Krylov Subspace iteration:
epsCGS = 10-L is used in the stopping criterion
||dxi|| / ||dx0|| < epsCGS
where ||dxi|| = ||inv(L*U)*ri|| for K = 1 or ||dxi|| = ||inv(L*LT)*ri|| for
K = 2 and ri is the residue at iteration i of the preconditioned Krylov Subspace
iteration.
A maximum number of 150 iterations is fixed with the assumption that the iteration
will converge before consuming half the factorization time. Intermediate convergence
rates and residue excursions are checked and can terminate the iteration process. If
phase =23, then the factorization for a given A is automatically recomputed in cases
where the Krylov Subspace iteration failed, and the corresponding direct solution is
returned. Otherwise the solution from the preconditioned Krylov Subspace iteration is
returned. Using phase =33 results in an error message (error=-4) if the stopping
criteria for the Krylov Subspace iteration can not be reached. More information on the
failure can be obtained from iparm[19].
The default is iparm[3]=0, and other values are only recommended for an advanced
user. iparm[3] must be greater than or equal to zero.
Examples:
iparm[3] Description

31 LU-preconditioned CGS iteration with a stopping criterion of 1.0E-3 for

nonsymmetric matrices

61 LU-preconditioned CGS iteration with a stopping criterion of 1.0E-6 for

nonsymmetric matrices

62 LLT-preconditioned CGS iteration with a stopping criterion of 1.0E-6 for

symmetric positive definite matrices

iparm[4]
User permutation.
input

1863
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Component Description

This parameter controls whether user supplied fill-in reducing permutation is used
instead of the integrated multiple-minimum degree or nested dissection algorithms.
Another use of this parameter is to control obtaining the fill-in reducing permutation
vector calculated during the reordering stage of Intel® oneAPI Math Kernel Library
(oneMKL) PARDISO.
This option is useful for testing reordering algorithms, adapting the code to special
applications problems (for instance, to move zero diagonal elements to the end of
P*A*PT), or for using the permutation vector more than once for matrices with
identical sparsity structures. For definition of the permutation, see the description of
the perm parameter.

Caution
You can only set one of iparm[4], iparm[30], and iparm[35], so be sure that the
iparm[30] (partial solution) and the iparm[35] (Schur complement) parameters are 0
if you set iparm[4].

0*
User permutation in the perm array is ignored.
1
Intel® oneAPI Math Kernel Library (oneMKL) PARDISO uses the user supplied
fill-in reducing permutation from theperm array. iparm[1] is ignored.

NOTE
Setting iparm[4] = 1 prevents use of a parallel algorithm for the solve step.

2
Intel® oneAPI Math Kernel Library (oneMKL) PARDISO returns the permutation
vector computed at phase 1 in theperm array.
iparm[5]
Write solution on x.
input
NOTE
The array x is always used.

0*
The array x contains the solution; right-hand side vector b is kept unchanged.
1
The solver stores the solution on the right-hand side b.
iparm[6]
Number of iterative refinement steps performed.
output Reports the number of iterative refinement steps that were actually performed during
the solve step.
iparm[7]
Iterative refinement step.
input On entry to the solve and iterative refinement step, iparm[7] must be set to the
maximum number of iterative refinement steps that the solver performs.
2*
The solver automatically performs two steps of iterative refinement when
iparm[0] is set to 0.
>0
Maximum number of iterative refinement steps that the solver performs. The
solver will stop the iterative refinement process if:
• a satisfactory level of accuracy of the solution in terms of backward error
(see the iparm[8] description) is achieved,

1864
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Component Description

• if the convergence is too slow for this matrix,

• or if the absolute value of iparm[7] steps of iterative refinement are
performed.
The number of executed iterations is reported in iparm[6].
<0
Maximum number of iterative refinement steps with a negative sign. Unlike the
case above, the accumulation of the residuum uses extended-precision real
and complex data types. The same stopping criteria mentioned above for
iparm[7] > 0 are used in this case.

NOTE Currently, this feature is supported only for sequential and OpenMP
threading.

iparm[8] input
Tolerance level for the backward error in the iterative refinement process. If set to a
non-zero value, the following criterion is used for stopping the iterative refinement:

where x is the computed solution and b is the right-hand side. A modulus sign on a
vector or a matrix is used to indicate the vector or matrix obtained by replacing all
entries by their moduli (absolute values). The i-th component of the scaled residual

is computed for all equations for which the denominator is nonzero.

If set to zero, the backward error is computed but it is not used as a stopping criterion
and two other default checks (see the iparm[7] description) are used to determine
when to stop the iterations. If msglvl = 1, the solver prints the number of performed
iterative refinement steps, tolerance, and backward error values.
iparm[9]
Pivoting perturbation.
input This parameter instructs Intel® oneAPI Math Kernel Library (oneMKL) PARDISO how to
handle small pivots or zero pivots for nonsymmetric matrices (mtype =11 or mtype
=13) and symmetric matrices (mtype =-2, mtype =-4, or mtype =6). For these
matrices the solver uses a complete supernode pivoting approach. When the
factorization algorithm reaches a point where it cannot factor the supernodes with this
pivoting strategy, it uses a pivoting perturbation strategy similar to [Li99],
[Schenk04].
Small pivots are perturbed with eps = 10-iparm[9].
The magnitude of the potential pivot is tested against a constant threshold of
alpha = eps*||A2||inf,
where eps = 10(-iparm[9]), A2 = P*PMPS*Dr*A*Dc*P, and ||A2||inf is the infinity
norm of the scaled and permuted matrix A. Any tiny pivots encountered during
elimination are set to the sign (lII)*eps*||A2||inf, which trades off some
numerical stability for the ability to keep pivots from getting too small. Small pivots
are therefore perturbed with eps = 10(-iparm[9]).
13*
The default value for nonsymmetric matrices(mtype =11, mtype=13), eps =
10-13.

1865
1 Developer Reference for Intel® oneAPI Math Kernel Library for C

Component Description
8*
The default value for symmetric indefinite matrices (mtype =-2, mtype=-4,
mtype=6), eps = 10-8.
iparm[10]
Scaling vectors.
input Intel® oneAPI Math Kernel Library (oneMKL) PARDISO uses a maximum weight
matching algorithm to permute large elements on the diagonal and to scale so that the
diagonal elements are equal to 1 and the absolute values of the off-diagonal entries
are less than or equal to 1. This scaling method is applied only to nonsymmetric
matrices (mtype = 11 or mtype = 13). The scaling can also be used for symmetric
indefinite matrices (mtype = -2, mtype =-4, or mtype = 6) when the symmetric
weighted matchings are applied (iparm[12] = 1).
Use iparm[10] = 1 (scaling) and iparm[12] = 1 (matching) for highly indefinite
symmetric matrices, for example, from interior point optimizations or saddle point
problems. Note that in the analysis phase (phase=11) you must provide the numerical
values of the matrix A in array a in case of scaling and symmetric weighted matching.
0*
Disable scaling. Default for symmetric indefinite matrices.
1*
Enable scaling. Default for nonsymmetric matrices.
Scale the matrix so that the diagonal elements are equal to 1 and the absolute
values of the off-diagonal entries are less or equal to 1. This scaling method is
applied to nonsymmetric matrices (mtype = 11, mtype = 13). The scaling can
also be used for symmetric indefinite matrices (mtype = -2, mtype = -4,
mtype = 6) when the symmetric weighted matchings are applied (iparm[12]
= 1).
Note that in the analysis phase (phase=11) you must provide the numerical
values of the matrix A in case of scaling.
iparm[11]
Solve with transposed or conjugate transposed matrix A.
input
NOTE
For real matrices, the terms transposed and conjugate transposed are equivalent.

0*
Solve a linear system AX = B.
1
Solve a conjugate transposed system AHX = B based on the factorization of the
matrix A.
2
Solve a transposed system ATX = B based on the factorization of the matrix A.
iparm[12]
Improved accuracy using (non-) symmetric weighted matching.
input Intel® oneAPI Math Kernel Library (oneMKL) PARDISO can use a maximum weighted
matching algorithm to permute large elements close the diagonal. This strategy adds
an additional level of reliability to the factorization methods and complements the
alternative of using more complete pivoting techniques during the numerical
factorization.

0*
Disable matching. Default for symmetric indefinite matrices.
1*
Enable matching. Default for nonsymmetric matrices.
Maximum weighted matching algorithm to permute large elements close to the
diagonal.

1866
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Component Description

It is recommended to use iparm[10] = 1 (scaling) and iparm[12]= 1

(matching) for highly indefinite symmetric matrices, for example from interior
point optimizations or saddle point problems.
Note that in the analysis phase (phase=11) you must provide the numerical
values of the matrix A in case of symmetric weighted matching.
iparm[13]
Number of perturbed pivots.
output After factorization, contains the number of perturbed pivots for the matrix types: 1, 3,
11, 13, -2, -4 and 6.
iparm[14]
Peak memory on symbolic factorization.
output The total peak memory in kilobytes that the solver needs during the analysis and
symbolic factorization phase.
This value is only computed in phase 1.
iparm[15]
Permanent memory on symbolic factorization.
output Permanent memory from the analysis and symbolic factorization phase in kilobytes
that the solver needs in the factorization and solve phases.
This value is only computed in phase 1.
iparm[16]
Size of factors/Peak memory on numerical factorization and solution.
output This parameter provides the size in kilobytes of the total memory consumed by in-core
Intel® oneAPI Math Kernel Library (oneMKL) PARDISO for internal floating point arrays.
This parameter is computed in phase 1. Seeiparm[62] for the OOC mode.
The total peak memory consumed by Intel® oneAPI Math Kernel Library (oneMKL)
PARDISO ismax(iparm[14], iparm[15]+iparm[16])
iparm[17]
Report the number of non-zero elements in the factors.
input/output <0
Enable reporting if iparm[17] < 0 on entry. The default value is -1.
>=0
Disable reporting.
iparm[18]
Re

1Z0-1047-24 - Oracle Absence Cloud - Final
100% (1)
1Z0-1047-24 - Oracle Absence Cloud - Final
22 pages
MKL 2017 Developer Reference C
No ratings yet
MKL 2017 Developer Reference C
2,496 pages
Physics Unit-6 Last Touch Review IAL Edexcel
100% (3)
Physics Unit-6 Last Touch Review IAL Edexcel
56 pages
Mklman PDF
No ratings yet
Mklman PDF
3,464 pages
Ipp - Developer Reference - 2021.7 773258 773259
No ratings yet
Ipp - Developer Reference - 2021.7 773258 773259
1,801 pages
MKL 2017 Developer Reference Fortran PDF
No ratings yet
MKL 2017 Developer Reference Fortran PDF
3,348 pages
Ch01 - Introduction To MATLAB+Ch02 - MATLAB Basics (Part 1)
No ratings yet
Ch01 - Introduction To MATLAB+Ch02 - MATLAB Basics (Part 1)
66 pages
Matlab Oop
No ratings yet
Matlab Oop
652 pages
MKL 2020 Developer Guide Win
No ratings yet
MKL 2020 Developer Guide Win
131 pages
Accelerate FWRef
No ratings yet
Accelerate FWRef
556 pages
Onemkl - Developer Guide Windows - 2024.0 766692 792717
No ratings yet
Onemkl - Developer Guide Windows - 2024.0 766692 792717
133 pages
Matlab External
No ratings yet
Matlab External
1,092 pages
Asd SDF
No ratings yet
Asd SDF
94 pages
Matrix Computation On The GPU
No ratings yet
Matrix Computation On The GPU
455 pages
Matlab External
No ratings yet
Matlab External
1,008 pages
Parallel and Scalable
No ratings yet
Parallel and Scalable
195 pages
Linear Algebra Libraries
100% (1)
Linear Algebra Libraries
35 pages
Ecp2018 Magma Tutorial 1
No ratings yet
Ecp2018 Magma Tutorial 1
50 pages
Matlab BGL v2.1 PDF
No ratings yet
Matlab BGL v2.1 PDF
61 pages
P 1014 Ap 06
No ratings yet
P 1014 Ap 06
24 pages
Elpa Userguide
No ratings yet
Elpa Userguide
147 pages
Ga Irrsp Study Guide
100% (3)
Ga Irrsp Study Guide
7 pages
Psblas-3 8
No ratings yet
Psblas-3 8
170 pages
The Bitset Package: Heiko Oberdiek 2019/12/09 v1.3
No ratings yet
The Bitset Package: Heiko Oberdiek 2019/12/09 v1.3
45 pages
Apiref
No ratings yet
Apiref
704 pages
Interface For Sparse Linear Algebra Operations
No ratings yet
Interface For Sparse Linear Algebra Operations
43 pages
Psblas-3 7
No ratings yet
Psblas-3 7
160 pages
Gennady Fedorov - Technical Consulting Engineer Intel Architecture, Graphics and Software (IAGS) LRZ Workshop, June 2020
No ratings yet
Gennady Fedorov - Technical Consulting Engineer Intel Architecture, Graphics and Software (IAGS) LRZ Workshop, June 2020
25 pages
Assignment 2 (If Else If Ladder)
100% (1)
Assignment 2 (If Else If Ladder)
2 pages
Intel MKL Sparse Blas Overview
No ratings yet
Intel MKL Sparse Blas Overview
14 pages
The Parma Polyhedra Library User's Manual (Version 1.2)
No ratings yet
The Parma Polyhedra Library User's Manual (Version 1.2)
542 pages
Libquadmath
No ratings yet
Libquadmath
26 pages
UsingFlexCelAPI PDF
No ratings yet
UsingFlexCelAPI PDF
33 pages
Calling Fortran Subroutines From Matlab
No ratings yet
Calling Fortran Subroutines From Matlab
27 pages
Blis1 Toms Rev3
No ratings yet
Blis1 Toms Rev3
33 pages
Icl Utk 1031 2017
No ratings yet
Icl Utk 1031 2017
45 pages
Mpi2 Report
No ratings yet
Mpi2 Report
370 pages
Onemkl - Tutorial C - 2021.4 758506 758507
No ratings yet
Onemkl - Tutorial C - 2021.4 758506 758507
17 pages
Userguide 5.6.0
No ratings yet
Userguide 5.6.0
126 pages
KLU UserGuide
No ratings yet
KLU UserGuide
41 pages
An Efficient Use of The Computer Memory Blas, Linpack, Eispack, Lapack
No ratings yet
An Efficient Use of The Computer Memory Blas, Linpack, Eispack, Lapack
81 pages
Viennamath Manual Current
No ratings yet
Viennamath Manual Current
25 pages
Intel MKL 2019 Developer Guide Linux PDF
No ratings yet
Intel MKL 2019 Developer Guide Linux PDF
124 pages
Features
No ratings yet
Features
28 pages
Math Kernel Library
No ratings yet
Math Kernel Library
4 pages
Floating-Point Package For Intel 8008 and 8080 Microprocessor - LL Lab (171286)
No ratings yet
Floating-Point Package For Intel 8008 and 8080 Microprocessor - LL Lab (171286)
46 pages
使用 Onemkl 进行矩阵乘法 - c
No ratings yet
使用 Onemkl 进行矩阵乘法 - c
17 pages
Develop
No ratings yet
Develop
23 pages
Lecture Slides 04 043 x86 Address Comp
No ratings yet
Lecture Slides 04 043 x86 Address Comp
15 pages
ITW3
No ratings yet
ITW3
23 pages
QCVM Improvement Proposal
No ratings yet
QCVM Improvement Proposal
15 pages
11 CS SSM 2022-23 Ahmd
No ratings yet
11 CS SSM 2022-23 Ahmd
113 pages
Num Lin Alg Software
No ratings yet
Num Lin Alg Software
28 pages
Regression Analysis With LAPACK/ScaLAPACK - Part 1 & 2
No ratings yet
Regression Analysis With LAPACK/ScaLAPACK - Part 1 & 2
23 pages
PL01 Guiao
No ratings yet
PL01 Guiao
3 pages
NLM Qna Paper
No ratings yet
NLM Qna Paper
7 pages
Dlib-Ml: A Machine Learning Toolkit: Davis E. King
No ratings yet
Dlib-Ml: A Machine Learning Toolkit: Davis E. King
4 pages
Call Matlab Functions and Scripts From An Existing C++ Application - The Black Hat
No ratings yet
Call Matlab Functions and Scripts From An Existing C++ Application - The Black Hat
5 pages
Blas Lapack
No ratings yet
Blas Lapack
21 pages
LibreOffice Calc Spreadsheets On The GPU
No ratings yet
LibreOffice Calc Spreadsheets On The GPU
41 pages
DIA5ED2130303EN (Web)
No ratings yet
DIA5ED2130303EN (Web)
42 pages
Lapacke
No ratings yet
Lapacke
10 pages
A General Approach To Creating FORTRAN Interface For C++ Application Libraries
No ratings yet
A General Approach To Creating FORTRAN Interface For C++ Application Libraries
19 pages
Q1 Science Iis MPS Conso
No ratings yet
Q1 Science Iis MPS Conso
2 pages
Koren - CH 02 PDF
No ratings yet
Koren - CH 02 PDF
19 pages
Undercarriage
No ratings yet
Undercarriage
8 pages
Carel Probes and Sensors Selection and Optimal Installation Guide 2021 06 26
No ratings yet
Carel Probes and Sensors Selection and Optimal Installation Guide 2021 06 26
40 pages
HF-Katalog 2 EN - Technische Informationen PDF
No ratings yet
HF-Katalog 2 EN - Technische Informationen PDF
27 pages
Technical Specification-SAN-BBv1
No ratings yet
Technical Specification-SAN-BBv1
9 pages
Experimental Psychology Adora
No ratings yet
Experimental Psychology Adora
25 pages
JBL Tr125 Manual de Servicio
No ratings yet
JBL Tr125 Manual de Servicio
2 pages
Bones of Upper Limb (Anatomy Practical) Mansoura
100% (1)
Bones of Upper Limb (Anatomy Practical) Mansoura
27 pages
Data Sheet Switch Serie w23-w31
No ratings yet
Data Sheet Switch Serie w23-w31
3 pages
MODULE 1-Forces in Equilibrium-2
No ratings yet
MODULE 1-Forces in Equilibrium-2
9 pages
Introduction To Teradata Data Mover Create Your First Job
No ratings yet
Introduction To Teradata Data Mover Create Your First Job
5 pages
Ina102 PDF
No ratings yet
Ina102 PDF
13 pages
Hematoxylin: A Simple, Multiple-Use Dye For Chromosome Analysis
No ratings yet
Hematoxylin: A Simple, Multiple-Use Dye For Chromosome Analysis
11 pages
A Modified Two-Step Sequential Spin-Coating Method For Perovskite Solar Cells Using CsI Containing Organic Salts in Mixed Ethanol Methanol Solvent
No ratings yet
A Modified Two-Step Sequential Spin-Coating Method For Perovskite Solar Cells Using CsI Containing Organic Salts in Mixed Ethanol Methanol Solvent
7 pages
Oral Recit Formula
No ratings yet
Oral Recit Formula
1 page
Assignment 1
No ratings yet
Assignment 1
8 pages
MCQ 01
No ratings yet
MCQ 01
5 pages
Spur Gear Solidwork
No ratings yet
Spur Gear Solidwork
2 pages
Vishay 601-1045
No ratings yet
Vishay 601-1045
2 pages
Summary Performance Rating
No ratings yet
Summary Performance Rating
51 pages
Conveyor Assembly
No ratings yet
Conveyor Assembly
1 page