OneAPI Math Kernel Library for C 开发人员参考
OneAPI Math Kernel Library for C 开发人员参考
Contents
Chapter 1: Developer Reference for Intel® oneAPI Math Kernel
Library - C
Getting Help and Support ......................................................................... 17
What's New ............................................................................................ 18
Notational Conventions ............................................................................ 18
Overview................................................................................................ 19
Performance Enhancements.............................................................. 24
Parallelism ..................................................................................... 24
C Datatypes Specific to Intel MKL ...................................................... 25
OpenMP* Offload..................................................................................... 26
OpenMP* Offload for Intel® oneAPI Math Kernel Library ........................ 26
BLAS and Sparse BLAS Routines................................................................ 33
BLAS Routines ................................................................................ 33
Naming Conventions for BLAS Routines...................................... 33
C Interface Conventions for BLAS Routines ................................. 35
Matrix Storage Schemes for BLAS Routines ................................ 36
BLAS Level 1 Routines and Functions ......................................... 36
BLAS Level 2 Routines ............................................................. 51
BLAS Level 3 Routines ............................................................. 95
Sparse BLAS Level 1 Routines ......................................................... 121
Vector Arguments ................................................................. 122
Naming Conventions for Sparse BLAS Routines ......................... 122
Routines and Data Types........................................................ 122
BLAS Level 1 Routines That Can Work With Sparse Vectors......... 123
cblas_?axpyi ........................................................................ 123
cblas_?doti .......................................................................... 124
cblas_?dotci ......................................................................... 125
cblas_?dotui......................................................................... 125
cblas_?gthr .......................................................................... 126
cblas_?gthrz......................................................................... 127
cblas_?roti ........................................................................... 128
cblas_?sctr........................................................................... 128
Sparse BLAS Level 2 and Level 3 Routines ........................................ 129
Naming Conventions in Sparse BLAS Level 2 and Level 3............ 130
Sparse Matrix Storage Formats for Sparse BLAS Routines........... 130
Routines and Supported Operations......................................... 131
Interface Consideration.......................................................... 132
Sparse BLAS Level 2 and Level 3 Routines................................ 137
Sparse QR Routines....................................................................... 243
mkl_sparse_set_qr_hint ........................................................ 243
mkl_sparse_?_qr .................................................................. 244
mkl_sparse_qr_reorder.......................................................... 246
mkl_sparse_?_qr_factorize..................................................... 247
mkl_sparse_?_qr_solve ......................................................... 248
mkl_sparse_?_qr_qmult ........................................................ 250
mkl_sparse_?_qr_rsolve ........................................................ 252
Compact BLAS and LAPACK Functions .............................................. 253
mkl_?gemm_compact............................................................ 257
2
Contents
3
Developer Reference for Intel® oneAPI Math Kernel Library for C
cblas_?gemv_batch............................................................... 444
cblas_?dgmm_batch_strided .................................................. 446
cblas_?dgmm_batch.............................................................. 448
mkl_jit_create_?gemm .......................................................... 450
mkl_jit_get_?gemm_ptr ........................................................ 452
mkl_jit_destroy .................................................................... 455
LAPACK Routines ................................................................................... 456
C Interface Conventions for LAPACK Routines.................................... 456
Matrix Layout for LAPACK Routines .................................................. 458
Matrix Storage Schemes for LAPACK Routines ................................... 460
Mathematical Notation for LAPACK Routines...................................... 467
Error Analysis ............................................................................... 468
LAPACK Linear Equation Routines .................................................... 469
LAPACK Linear Equation Computational Routines....................... 469
LAPACK Linear Equation Driver Routines .................................. 679
LAPACK Least Squares and Eigenvalue Problem Routines .................... 783
LAPACK Least Squares and Eigenvalue Problem Computational
Routines .......................................................................... 784
LAPACK Least Squares and Eigenvalue Problem Driver Routines .1002
LAPACK Auxiliary Routines.............................................................1177
?lacgv ................................................................................1177
?lacrm................................................................................1178
?syconv ..............................................................................1179
?syr ...................................................................................1180
i?max1 ...............................................................................1182
?sum1 ................................................................................1182
?gelq2 ................................................................................1183
?geqr2 ...............................................................................1184
?geqrt2 ..............................................................................1186
?geqrt3 ..............................................................................1188
?getf2 ................................................................................1190
?lacn2 ................................................................................1191
?lacpy ................................................................................1193
?lakf2.................................................................................1194
?lange ................................................................................1195
?lansy ................................................................................1196
?lanhe ................................................................................1197
?lantr .................................................................................1198
LAPACKE_set_nancheck ........................................................1200
LAPACKE_get_nancheck........................................................1200
?lapmr ...............................................................................1200
?lapmt................................................................................1202
?lapy2 ................................................................................1203
?lapy3 ................................................................................1203
?laran.................................................................................1204
?larfb .................................................................................1204
?larfg .................................................................................1207
?larft ..................................................................................1208
?larfx .................................................................................1211
?large ................................................................................1212
?larnd ................................................................................1213
?larnv.................................................................................1214
?laror .................................................................................1215
?larot .................................................................................1217
?lartgp ...............................................................................1220
4
Contents
?lartgs................................................................................1221
?lascl .................................................................................1222
?lasd0 ................................................................................1223
?lasd1 ................................................................................1224
?lasd2 ................................................................................1227
?lasd3 ................................................................................1229
?lasd4 ................................................................................1231
?lasd5 ................................................................................1232
?lasd6 ................................................................................1233
?lasd7 ................................................................................1236
?lasd8 ................................................................................1239
?lasd9 ................................................................................1241
?lasda ................................................................................1242
?lasdq ................................................................................1245
?lasdt .................................................................................1247
?laset .................................................................................1247
?lasrt .................................................................................1249
?laswp................................................................................1250
?latm1................................................................................1251
?latm2................................................................................1253
?latm3................................................................................1255
?latm5................................................................................1259
?latm6................................................................................1262
?latme................................................................................1264
?latmr ................................................................................1268
?lauum ...............................................................................1274
?syswapr ............................................................................1275
?heswapr ............................................................................1276
?sfrk ..................................................................................1278
?hfrk ..................................................................................1279
?tfsm .................................................................................1281
?tfttp .................................................................................1283
?tfttr ..................................................................................1284
?tpqrt2 ...............................................................................1286
?tprfb .................................................................................1288
?tpttf .................................................................................1291
?tpttr .................................................................................1292
?trttf ..................................................................................1294
?trttp .................................................................................1295
?lacp2 ................................................................................1296
?larcm................................................................................1297
mkl_?tppack .......................................................................1298
mkl_?tpunpack ....................................................................1300
LAPACK Utility Functions and Routines ............................................1302
ilaver .................................................................................1303
ilaenv .................................................................................1303
?lamch ...............................................................................1306
LAPACK Test Functions and Routines ...............................................1307
?lagge ................................................................................1307
?laghe ................................................................................1308
?lagsy ................................................................................1309
?latms ................................................................................1310
Additional LAPACK Routines (Included for Compatibility with Netlib
LAPACK) .................................................................................1314
ScaLAPACK Routines .............................................................................1318
5
Developer Reference for Intel® oneAPI Math Kernel Library for C
6
Contents
p?gebal ..............................................................................1608
p?gebd2 .............................................................................1611
p?gehd2 .............................................................................1614
p?gelq2 ..............................................................................1616
p?geql2 ..............................................................................1618
p?geqr2..............................................................................1620
p?gerq2..............................................................................1622
p?getf2...............................................................................1624
p?labrd...............................................................................1626
p?lacon ..............................................................................1629
p?laconsb ...........................................................................1631
p?lacp2 ..............................................................................1632
p?lacp3 ..............................................................................1633
p?lacpy...............................................................................1635
p?laevswp...........................................................................1636
p?lahrd...............................................................................1638
p?laiect ..............................................................................1640
p?lamve .............................................................................1641
p?lange ..............................................................................1642
p?lanhs ..............................................................................1644
p?lansy, p?lanhe ..................................................................1646
p?lantr ...............................................................................1648
p?lapiv ...............................................................................1649
p?lapv2 ..............................................................................1652
p?laqge ..............................................................................1654
p?laqr0...............................................................................1655
p?laqr1...............................................................................1658
p?laqr2...............................................................................1661
p?laqr3...............................................................................1663
p?laqr5...............................................................................1666
p?laqsy...............................................................................1668
p?lared1d ...........................................................................1670
p?lared2d ...........................................................................1671
p?larf .................................................................................1672
p?larfb ...............................................................................1675
p?larfc................................................................................1678
p?larfg ...............................................................................1680
p?larft ................................................................................1682
p?larz.................................................................................1684
p?larzb ...............................................................................1687
p?larzc ...............................................................................1691
p?larzt................................................................................1693
p?lascl................................................................................1696
p?lase2 ..............................................................................1698
p?laset ...............................................................................1699
p?lasmsub ..........................................................................1701
p?lasrt................................................................................1702
p?lassq...............................................................................1704
p?laswp ..............................................................................1705
p?latra ...............................................................................1707
p?latrd ...............................................................................1708
p?latrs................................................................................1711
p?latrz................................................................................1713
p?lauu2 ..............................................................................1715
p?lauum .............................................................................1717
7
Developer Reference for Intel® oneAPI Math Kernel Library for C
p?lawil................................................................................1718
p?org2l/p?ung2l...................................................................1719
p?org2r/p?ung2r..................................................................1721
p?orgl2/p?ungl2...................................................................1723
p?orgr2/p?ungr2..................................................................1725
p?orm2l/p?unm2l.................................................................1727
p?orm2r/p?unm2r................................................................1730
p?orml2/p?unml2.................................................................1734
p?ormr2/p?unmr2................................................................1737
p?pbtrsv .............................................................................1740
p?pttrsv..............................................................................1744
p?potf2...............................................................................1746
p?rot ..................................................................................1748
p?rscl .................................................................................1750
p?sygs2/p?hegs2 .................................................................1751
p?sytd2/p?hetd2..................................................................1753
p?trord ...............................................................................1756
p?trsen...............................................................................1760
p?trti2 ................................................................................1764
?lahqr2...............................................................................1765
?lamsh ...............................................................................1767
?lapst .................................................................................1768
?laqr6 ................................................................................1769
?lar1va ...............................................................................1772
?laref .................................................................................1773
?larrb2 ...............................................................................1776
?larrd2 ...............................................................................1778
?larre2 ...............................................................................1781
?larre2a..............................................................................1784
?larrf2 ................................................................................1788
?larrv2 ...............................................................................1789
?lasorte ..............................................................................1794
?lasrt2................................................................................1795
?stegr2...............................................................................1796
?stegr2a .............................................................................1799
?stegr2b .............................................................................1802
?stein2 ...............................................................................1805
?dbtf2 ................................................................................1807
?dbtrf .................................................................................1808
?dttrf .................................................................................1809
?dttrsv ...............................................................................1810
?pttrsv ...............................................................................1812
?steqr2...............................................................................1813
?trmvt ................................................................................1815
pilaenv ...............................................................................1817
pilaenvx .............................................................................1818
pjlaenv ...............................................................................1820
Additional ScaLAPACK Routines..............................................1821
ScaLAPACK Utility Functions and Routines .......................................1823
p?labad ..............................................................................1824
p?lachkieee .........................................................................1825
p?lamch .............................................................................1825
p?lasnbt .............................................................................1826
descinit ..............................................................................1827
numroc ..............................................................................1828
8
Contents
9
Developer Reference for Intel® oneAPI Math Kernel Library for C
10
Contents
11
Developer Reference for Intel® oneAPI Math Kernel Library for C
p?scal ................................................................................2382
p?swap...............................................................................2383
PBLAS Level 2 Routines.................................................................2384
p?gemv ..............................................................................2385
p?agemv ............................................................................2387
p?ger .................................................................................2390
p?gerc................................................................................2391
p?geru ...............................................................................2393
p?hemv ..............................................................................2395
p?ahemv ............................................................................2396
p?her .................................................................................2398
p?her2 ...............................................................................2400
p?symv ..............................................................................2402
p?asymv.............................................................................2404
p?syr .................................................................................2405
p?syr2................................................................................2407
p?trmv ...............................................................................2409
p?atrmv..............................................................................2411
p?trsv ................................................................................2413
PBLAS Level 3 Routines.................................................................2415
p?geadd .............................................................................2416
p?tradd ..............................................................................2417
p?gemm .............................................................................2419
p?hemm .............................................................................2421
p?herk................................................................................2423
p?her2k ..............................................................................2425
p?symm .............................................................................2427
p?syrk ................................................................................2429
p?syr2k ..............................................................................2431
p?tran ................................................................................2434
p?tranu ..............................................................................2435
p?tranc...............................................................................2436
p?trmm ..............................................................................2437
p?trsm ...............................................................................2440
Partial Differential Equations Support ......................................................2442
Trigonometric Transform Routines...................................................2442
Trigonometric Transforms Implemented ..................................2443
Sequence of Invoking TT Routines ..........................................2444
Trigonometric Transform Interface Description .........................2445
TT Routines.........................................................................2446
Common Parameters of the Trigonometric Transforms ...............2453
Trigonometric Transform Implementation Details ......................2456
Fast Poisson Solver Routines .........................................................2457
Poisson Solver Implementation ..............................................2457
Sequence of Invoking Poisson Solver Routines .........................2463
Fast Poisson Solver Interface Description ................................2465
Routines for the Cartesian Solver ...........................................2466
Routines for the Spherical Solver ...........................................2475
Common Parameters for the Poisson Solver .............................2482
Poisson Solver Implementation Details....................................2491
Nonlinear Optimization Problem Solvers ..................................................2492
Nonlinear Solver Organization and Implementation ...........................2492
Nonlinear Solver Routine Naming Conventions .................................2494
Nonlinear Least Squares Problem without Constraints .......................2494
?trnlsp_init .........................................................................2495
12
Contents
?trnlsp_check ......................................................................2497
?trnlsp_solve.......................................................................2498
?trnlsp_get .........................................................................2500
?trnlsp_delete .....................................................................2501
Nonlinear Least Squares Problem with Linear (Bound) Constraints ......2502
?trnlspbc_init ......................................................................2502
?trnlspbc_check...................................................................2504
?trnlspbc_solve....................................................................2506
?trnlspbc_get ......................................................................2507
?trnlspbc_delete ..................................................................2509
Jacobian Matrix Calculation Routines...............................................2509
?jacobi_init .........................................................................2510
?jacobi_solve ......................................................................2511
?jacobi_delete .....................................................................2512
?jacobi ...............................................................................2512
?jacobix..............................................................................2513
Support Functions ................................................................................2515
Version Information......................................................................2518
mkl_get_version ..................................................................2518
mkl_get_version_string ........................................................2519
Threading Control ........................................................................2520
mkl_set_num_threads ..........................................................2521
mkl_domain_set_num_threads ..............................................2522
mkl_set_num_threads_local ..................................................2523
mkl_set_dynamic.................................................................2525
mkl_get_max_threads ..........................................................2526
mkl_domain_get_max_threads ..............................................2526
mkl_get_dynamic ................................................................2527
mkl_set_num_stripes ...........................................................2528
mkl_get_num_stripes ...........................................................2529
Error Handling .............................................................................2530
Error Handling for Linear Algebra Routines ..............................2530
Handling Fatal Errors ............................................................2533
Character Equality Testing .............................................................2534
lsame.................................................................................2534
lsamen ...............................................................................2534
Timing ........................................................................................2535
second/dsecnd ....................................................................2535
mkl_get_cpu_clocks .............................................................2536
mkl_get_cpu_frequency........................................................2536
mkl_get_max_cpu_frequency ................................................2537
mkl_get_clocks_frequency ....................................................2537
Memory Management ...................................................................2538
mkl_free_buffers .................................................................2538
mkl_thread_free_buffers.......................................................2539
mkl_disable_fast_mm ..........................................................2539
mkl_mem_stat ....................................................................2540
mkl_peak_mem_usage .........................................................2541
mkl_malloc .........................................................................2542
mkl_calloc ..........................................................................2542
mkl_realloc .........................................................................2543
mkl_free.............................................................................2544
mkl_set_memory_limit .........................................................2544
Usage Examples for the Memory Functions ..............................2545
Single Dynamic Library Control ......................................................2546
13
Developer Reference for Intel® oneAPI Math Kernel Library for C
mkl_set_interface_layer........................................................2546
mkl_set_threading_layer ......................................................2547
mkl_set_xerbla....................................................................2548
mkl_set_progress ................................................................2549
mkl_set_pardiso_pivot..........................................................2550
Conditional Numerical Reproducibility Control...................................2550
mkl_cbwr_set......................................................................2551
mkl_cbwr_get .....................................................................2552
mkl_cbwr_get_auto_branch ..................................................2553
Named Constants for CNR Control ..........................................2554
Reproducibility Conditions .....................................................2555
Usage Examples for CNR Support Functions.............................2556
Miscellaneous ..............................................................................2557
mkl_progress ......................................................................2557
mkl_enable_instructions .......................................................2558
mkl_set_env_mode ..............................................................2561
mkl_verbose .......................................................................2561
mkl_verbose_output_file.......................................................2562
mkl_set_mpi .......................................................................2563
mkl_finalize ........................................................................2564
BLACS Routines ...................................................................................2565
Matrix Shapes..............................................................................2566
Repeatability and Coherence..........................................................2567
BLACS Combine Operations ...........................................................2570
?gamx2d ............................................................................2571
?gamn2d ............................................................................2572
?gsum2d ............................................................................2574
BLACS Point To Point Communication ..............................................2575
?gesd2d .............................................................................2577
?trsd2d...............................................................................2578
?gerv2d ..............................................................................2578
?trrv2d ...............................................................................2579
BLACS Broadcast Routines.............................................................2579
?gebs2d .............................................................................2581
?trbs2d...............................................................................2581
?gebr2d..............................................................................2582
?trbr2d ...............................................................................2583
BLACS Support Routines ...............................................................2584
Initialization Routines ...........................................................2584
Destruction Routines ............................................................2590
Informational Routines .........................................................2592
Miscellaneous Routines .........................................................2594
BLACS Routines Usage Examples....................................................2595
Data Fitting Functions ...........................................................................2595
Data Fitting Function Naming Conventions .......................................2595
Data Fitting Function Data Types ....................................................2596
Mathematical Conventions for Data Fitting Functions.........................2596
Data Fitting Usage Model...............................................................2599
Data Fitting Usage Examples .........................................................2599
Data Fitting Function Task Status and Error Reporting .......................2605
Data Fitting Task Creation and Initialization Routines ........................2607
df?NewTask1D ......................................................................2607
Task Configuration Routines...........................................................2609
df?EditPPSpline1D ..............................................................2610
14
Contents
df?EditPtr .........................................................................2617
dfiEditVal .........................................................................2618
df?EditIdxPtr.....................................................................2620
df?QueryPtr ........................................................................2622
dfiQueryVal ........................................................................2622
df?QueryIdxPtr ...................................................................2623
Data Fitting Computational Routines ...............................................2624
df?Construct1D ...................................................................2625
df?Interpolate1D/df?InterpolateEx1D ..................................2626
df?Integrate1D/df?IntegrateEx1D ........................................2634
df?SearchCells1D/df?SearchCellsEx1D ..................................2638
df?InterpCallBack ..............................................................2640
df?IntegrCallBack ..............................................................2641
df?SearchCellsCallBack ......................................................2643
Data Fitting Task Destructors .........................................................2644
dfDeleteTask ......................................................................2644
Appendix A: Linear Solvers Basics ..........................................................2645
Sparse Linear Systems..................................................................2645
Matrix Fundamentals ............................................................2646
Direct Method......................................................................2647
Sparse Matrix Storage Formats ......................................................2653
DSS Symmetric Matrix Storage ..............................................2654
DSS Nonsymmetric Matrix Storage .........................................2655
DSS Structurally Symmetric Matrix Storage .............................2655
DSS Distributed Symmetric Matrix Storage..............................2656
Sparse BLAS CSR Matrix Storage Format.................................2657
Sparse BLAS CSC Matrix Storage Format.................................2659
Sparse BLAS Coordinate Matrix Storage Format .......................2660
Sparse BLAS Diagonal Matrix Storage Format ..........................2661
Sparse BLAS Skyline Matrix Storage Format ............................2662
Sparse BLAS BSR Matrix Storage Format.................................2663
Appendix B: Routine and Function Arguments ..........................................2665
Vector Arguments in BLAS .............................................................2665
Vector Arguments in Vector Math ...................................................2666
Matrix Arguments.........................................................................2667
Appendix C: FFTW Interface to Intel® oneAPI Math Kernel Library (oneMKL) .2672
Notational Conventions .................................................................2672
FFTW2 Interface to Intel® oneAPI Math Kernel Library (oneMKL) .........2672
Wrappers Reference .............................................................2673
Limitations of the FFTW2 Interface to Intel® oneAPI Math Kernel
Library (oneMKL) .............................................................2675
Installing FFTW2 Interface Wrappers ......................................2676
MPI FFTW2 Wrappers ...........................................................2676
FFTW3 Interface to Intel® oneAPI Math Kernel Library (oneMKL) .........2679
Using FFTW3 Wrappers .........................................................2679
Building Your Own Wrapper Library.........................................2680
Building an Application With FFTW3 Interface Wrappers ............2681
Running FFTW3 Interface Wrapper Examples ...........................2681
MPI FFTW3 Wrappers ...........................................................2682
Appendix D: Code Examples ..................................................................2683
BLAS Code Examples ....................................................................2683
Fourier Transform Functions Code Examples ....................................2689
FFT Code Examples ..............................................................2689
Examples for Cluster FFT Functions ........................................2695
15
Developer Reference for Intel® oneAPI Math Kernel Library for C
16
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Basic Linear Algebra The BLAS routines provide vector, matrix-vector, and matrix-matrix operations.
Subprograms (BLAS)
Sparse BLAS The Sparse BLAS routines provide basic operations on sparse vectors and
matrices.
LAPACK The LAPACK routines solve systems of linear equations, least square problems,
eigenvalue and singular value problems, and Sylvester's equations.
Statistical Functions The Statistical Functions provides a set of routines implementing commonly used
pseudorandom random number generators (RNG) with continuous distribution.
Direct and Iterative Among several options for solving sparse linear systems of equations, oneMKL
Sparse Solvers offers a direct sparse solver based on PARDISO*, which is referred to here as
Intel MKL PARDISO.
Vector Mathematics The Vector Mathematics (VM) functions compute core mathematical functions on
Functions vector arguments.
Vector Statistics Functions The Vector Statistics (VS) functions generate vectors of pseudorandom numbers
with different types of statistical distributions and perform convolution and
correlation computations.
Fourier Transform The Fourier Transform Functions offer several options for computing Fast Fourier
Functions Transforms (FFTs).
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
17
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
What's New
This Developer Reference documents Intel® oneAPI Math Kernel Library (oneMKL) release for the C interface.
Intel® Math Kernel Library is now Intel® oneAPI Math Kernel Library (oneMKL). Documentation for older
versions of Intel® Math Kernel Library is available for download only. For a list of available documentation
downloads by product version, see these pages:
• Download Documentation for Intel® Parallel Studio XE
• Download Documentation for Intel® System Studio
The manual has been updated to reflect enhancements to the product, besides improvements and error
corrections.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Notational Conventions
This manual uses the following terms to refer to operating systems:
Windows* OS This term refers to information that is valid on all supported Windows* operating
systems.
Linux* OS This term refers to information that is valid on all supported Linux* operating
systems.
macOS* This term refers to information that is valid on Intel®-based systems running the
macOS* operating system.
?swap Refers to all four data types of the vector-vector ?swap routine:
sswap, dswap, cswap, and zswap.
Font Conventions
The following font conventions are used:
18
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lowercase courier mixed with Function names; for example, vmlSetMode
UpperCase courier
lowercase courier italic Variables in arguments and parameters description. For example, incx.
Overview
Intel® oneAPI Math Kernel Library (oneMKL) is optimized for performance on Intel processors. oneMKL also
runs on non-Intel x86-compatible processors.
NOTE
oneMKL provides limited input validation to minimize the performance overheads. It is your
responsibility when using oneMKL to ensure that input data has the required format and does not
contain invalid characters. These can cause unexpected behavior of the library. Examples of the inputs
that may result in unexpected behavior:
• Not-a-number (NaN) and other special floating point values
• Large inputs may lead to accumulator overflow
As the oneMKL API accepts raw pointers, it is your application's responsibility to validate the buffer
sizes before passing them to the library. The library requires subroutine and function parameters to be
valid before being passed. While some oneMKL routines do limited checking of parameter errors, your
application should check for NULL pointers, for example.
The Intel® oneAPI Math Kernel Library includes Fortran routines and functions optimized for Intel® processor-
based computers running operating systems that support multiprocessing. In addition to the Fortran
interface, Intel® oneAPI Math Kernel Library (oneMKL) includes a C-language interface for the Discrete
Fourier transform functions, as well as for the Vector Mathematics, Vector Statistics, and many other
functions. For hardware and software requirements to use Intel® oneAPI Math Kernel Library (oneMKL),
seeIntel® oneAPI Math Kernel Library (oneMKL) Release Notes.
NOTE
Function calls at runtime for Intel® oneAPI Math Kernel Library (oneMKL) libraries on the Microsoft
Windows* operating system can utilize the functionLoadLibrary() and related loading functions in
static, dynamic, and single-dynamic library linking models. These functions attempt to access the
loader lock which when used within or at the same time as another DllMainfunction call, can lead to a
deadlock. If possible, avoid making your calls to Intel® oneAPI Math Kernel Library (oneMKL) in
aDllMain function or at the same time as other calls to DllMain even on separate threads. Refer to
the Microsoft documentation about DllMain and Dynamic-Link Library Best Practices for more details.
BLAS Routines
The BLAS routines and functions are divided into the following groups according to the operations they
perform:
• BLAS Level 1 Routines perform operations of both addition and reduction on vectors of data. Typical
operations include scaling and dot products.
• BLAS Level 2 Routines perform matrix-vector operations, such as matrix-vector multiplication, rank-1 and
rank-2 matrix updates, and solution of triangular systems.
• BLAS Level 3 Routines perform matrix-matrix operations, such as matrix-matrix multiplication, rank-k
update, and solution of triangular systems.
19
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Starting from release 8.0, Intel® oneAPI Math Kernel Library (oneMKL) also supports the Fortran 95 interface
to the BLAS routines.
Starting from release 10.1, a number of BLAS-like Extensions are added to enable the user to perform
certain data manipulation, including matrix in-place and out-of-place transposition operations combined with
simple matrix arithmetic operations.
Sparse QR
Sparse QRin Intel® oneAPI Math Kernel Library (oneMKL) is a set of routines used to solve sparse matrices
with real coefficients and general structure. All Sparse QR routines can be divided into three steps:
reordering, factorization, and solving. Currently, only CSR format is supported for the input matrix, and
Sparse QR operates on the matrix handle used in all SpBLAS IE routines. (For details on how to create a
matrix handle, refer tomkl-sparse-create-csr.)
LAPACK Routines
The Intel® oneAPI Math Kernel Library fully supports the LAPACK 3.7 set of computational, driver, auxiliary
and utility routines.
The original versions of LAPACK from which that part of Intel® oneAPI Math Kernel Library (oneMKL) was
derived can be obtained fromhttp://www.netlib.org/lapack/index.html. The authors of LAPACK are E.
Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S.
Hammarling, A. McKenney, and D. Sorensen.
The LAPACK routines can be divided into the following groups according to the operations they perform:
• Routines for solving systems of linear equations, factoring and inverting matrices, and estimating
condition numbers (see LAPACK Routines: Linear Equations).
• Routines for solving least squares problems, eigenvalue and singular value problems, and Sylvester's
equations (see LAPACK Routines: Least Squares and Eigenvalue Problems).
Starting from release 8.0, Intel® oneAPI Math Kernel Library (oneMKL) also supports the Fortran 95 interface
to LAPACK computational and driver routines. This interface provides an opportunity for simplified calls of
LAPACK routines with fewer required arguments.
20
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Extended Eigensolver Routines
TheExtended Eigensolver RCI Routines is a set of high-performance numerical routines for solving standard
(Ax = λx) and generalized (Ax = λBx) eigenvalue problems, where A and B are symmetric or Hermitian. It
yields all the eigenvalues and eigenvectors within a given search interval. It is based on the Feast algorithm,
an innovative fast and stable numerical algorithm presented in [Polizzi09], which deviates fundamentally
from the traditional Krylov subspace iteration based techniques (Arnoldi and Lanczos algorithms [Bai00]) or
other Davidson-Jacobi techniques [Sleijpen96]. The Feast algorithm is inspired by the density-matrix
representation and contour integration technique in quantum mechanics.
It is free from orthogonalization procedures. Its main computational tasks consist of solving very few inner
independent linear systems with multiple right-hand sides and one reduced eigenvalue problem orders of
magnitude smaller than the original one. The Feast algorithm combines simplicity and efficiency and offers
many important capabilities for achieving high performance, robustness, accuracy, and scalability on parallel
architectures. This algorithm is expected to significantly augment numerical performance in large-scale
modern applications.
Some of the characteristics of the Feast algorithm [Polizzi09] are:
• Converges quickly in 2-3 iterations with very high accuracy
• Naturally captures all eigenvalue multiplicities
• No explicit orthogonalization procedure
• Can reuse the basis of pre-computed subspace as suitable initial guess for performing outer-refinement
iterations
This capability can also be used for solving a series of eigenvalue problems that are close one another.
• The number of internal iterations is independent of the size of the system and the number of eigenpairs in
the search interval
• The inner linear systems can be solved either iteratively (even with modest relative residual error) or
directly
VM Functions
The Vector Mathematics functions (see Vector Mathematical Functions) include a set of highly optimized
implementations of certain computationally expensive core mathematical functions (power, trigonometric,
exponential, hyperbolic, etc.) that operate on vectors of real and complex numbers.
Application programs that might significantly improve performance with VM include nonlinear programming
software, integrals computation, and many others. VM provides interfaces both for Fortran and C languages.
Statistical Functions
Vector Statistics (VS) contains three sets of functions (see Statistical Functions) providing:
• Pseudorandom, quasi-random, and non-deterministic random number generator subroutines
implementing basic continuous and discrete distributions. To provide best performance, the VS
subroutines use calls to highly optimized Basic Random Number Generators (BRNGs) and a set of vector
mathematical functions.
• A wide variety of convolution and correlation operations.
• Initial statistical analysis of raw single and double precision multi-dimensional datasets.
21
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Support Functions
The Intel® oneAPI Math Kernel Library (oneMKL) support functions (seeSupport Functions) are used to
support the operation of the Intel® oneAPI Math Kernel Library (oneMKL) software and provide basic
information on the library and library operation, such as the current library version, timing, setting and
measuring of CPU frequency, error handling, and memory allocation.
Starting from release 10.0, the Intel® oneAPI Math Kernel Library (oneMKL) support functions provide
additional threading control.
Starting from release 10.1, Intel® oneAPI Math Kernel Library (oneMKL) selectively supports aProgress
Routine feature to track progress of a lengthy computation and/or interrupt the computation using a callback
function mechanism. The user application can define a function called mkl_progressthat is regularly called
from the Intel® oneAPI Math Kernel Library (oneMKL) routine supporting the progress routine feature.
SeeProgress Routine in Support Functions for reference. Refer to a specific LAPACK or DSS/PARDISO function
description to see whether the function supports this feature or not.
Linux
Using oneMKL Verbose Mode
Windows
Using oneMKL Verbose Mode
The oneMKL Verbose feature is enabled only for certain domains such as BLAS (and BLAS-like
extensions), LAPACK, selected functionality in ScaLAPACK and FFT, and (in the DPC++ API only) RNG.
22
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• The next item in the list is the oneMKL dispatcher. oneMKL dispatcher checks the hardware used for
running the application and the available instruction set. Based on the results from dispatcher, different
function implementations (optimized for different hardware and instruction-sets) will be called. More
details can be found in the oneMKL documentation here:
Linux
Instruction Set–Specific Dispatching
Windows
Instruction Set–Specific Dispatching
• During the function run (or even before), you may need to allocate the memory. oneMKL has a memory
manager that provides a list of support functions, the ability to redefine memory functions, and internal
fast memory allocations with memory reuse. See the following for more information:
Memory Management
Redefining Memory Functions (Linux)
Redefining Memory Functions (Windows)
• If you're in the threading mode, oneMKL will also call its own threading manager where it will check for
different environment variables and set the number of threads. You can read more about this in oneMKL
documentation here:
Linux
Improving Performance with Threading
Windows
Improving Performance with Threading
As an example, BLAS dgemm was run on the 4th Gen Intel® Xeon® Scalable Processors system. Sizes of
matrices A and B were 10000x10000. Running the dgemm function in sequential mode took 32.5 seconds
(32500 milliseconds), from which:
• Setting oneMKL xerbla took 0.001 millisecond.
• Setting/checking oneMKL verbose mode took 0.009 milliseconds.
• Checking for MKL_CBWR settings and detecting CPU using MKL dispatcher took 0.004 milliseconds.
• Additional internal memory allocations in dgemm took 0.009 milliseconds followed by 0.002 milliseconds of
deallocation.
As you can see in the example, before the dgemm function runs there are several mkl_malloc calls to
allocate memory for the A, B, and C matrices. Overall memory allocation took around 0.084 milliseconds.
After the dgemm function completes, there are several mkl_free calls to free the A, B, and C matrix memory.
This took around 5.159 milliseconds.
If you run dgemm with intel omp threading, you'll spend 24 milliseconds in the oneMKL threading manager.
If you run dgemm with tbb threading, you'll spend around 5 milliseconds in oneMKL threading manager.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
23
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Performance Enhancements
The Intel® oneAPI Math Kernel Library has been optimized by exploiting both processor and system features
and capabilities. Special care has been given to those routines that most profit from cache-management
techniques. These especially include matrix-matrix operation routines such asdgemm().
In addition, code optimization techniques have been applied to minimize dependencies of scheduling integer
and floating-point units on the results within the processor.
The major optimization techniques used throughout the library include:
• Loop unrolling to minimize loop management costs
• Blocking of data to improve data reuse opportunities
• Copying to reduce chances of data eviction from cache
• Data prefetching to help hide memory latency
• Multiple simultaneous operations (for example, dot products in dgemm) to eliminate stalls due to
arithmetic unit pipelines
• Use of hardware features such as the SIMD arithmetic units, where appropriate
These are techniques from which the arithmetic code benefits the most.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Parallelism
Intel® oneAPI Math Kernel Library (oneMKL) offers performance gains through parallelism provided by the
symmetric multiprocessing performance (SMP) feature. You can obtain improvements from SMP in the
following ways:
• One way is based on user-managed threads in the program and further distribution of the operations over
the threads based on data decomposition, domain decomposition, control decomposition, or some other
parallelizing technique. Each thread can use any of the Intel® oneAPI Math Kernel Library (oneMKL)
functions (except for the deprecated?lacon LAPACK routine) because the library has been designed to be
thread-safe.
• Another method is to use the FFT and BLAS level 3 routines. They have been parallelized and require no
alterations of your application to gain the performance enhancements of multiprocessing. Performance
using multiple processors on the level 3 BLAS shows excellent scaling. Since the threads are called and
managed within the library, the application does not need to be recompiled thread-safe.
• Yet another method is to use tuned LAPACK routines. Currently these include the single- and double
precision flavors of routines for QR factorization of general matrices, triangular factorization of general
and symmetric positive-definite matrices, solving systems of equations with such matrices, as well as
solving symmetric eigenvalue problems.
For instructions on setting the number of available processors for the BLAS level 3 and LAPACK routines, see
Intel® oneAPI Math Kernel Library (oneMKL) Developer Guide.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
24
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Product and Performance Information
You can redefine datatypes specific to Intel® oneAPI Math Kernel Library (oneMKL). One reason to do this is if
you have your own types which are binary-compatible with Intel® oneAPI Math Kernel Library (oneMKL)
datatypes, with the same representation or memory layout. To redefine a datatype, use one of these
methods:
• Insert the #define statement redefining the datatype before the mkl.h header file #include statement.
For example,
#define MKL_INT size_t
#include "mkl.h"
• Use the compiler -D option to redefine the datatype. For example,
...-DMKL_INT=size_t...
NOTE
As the user, if you redefine Intel® oneAPI Math Kernel Library (oneMKL) datatypes you are responsible
for making sure that your definition is compatible with that of Intel® oneAPI Math Kernel Library
(oneMKL). If not, it might cause unpredictable results or crash the application.
25
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
OpenMP* Offload
This section describes how to perform OpenMP offload computations using Intel® oneAPI Math Kernel Library.
26
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• ?syev, ?heev
• ssyevd, cheevd
• ssyevx, cheevx
• ssygvd, chegvd
• ssygvx, chegvx
• ?sytrd, ?hetrd
• Vector Statistics
• Random number generators
NOTE
All distributions are supported. See https://www.intel.com/content/www/us/en/docs/onemkl/
developer-reference-c/2025-0/distribution-generators.html
Important Check the oneMKL DPC++ developer reference for the BRNG data type used in the
distributions in case the offload device doesn't have sycl::aspect::fp64 support.
• Summary statistics
Supports the vsl?SSCompute routine for the following estimates:
• VSL_SS_MEAN
• VSL_SS_SUM
• VSL_SS_2R_MOM
• VSL_SS_2R_SUM
• VSL_SS_3R_MOM
• VSL_SS_3R_SUM
• VSL_SS_4R_MOM
• VSL_SS_4R_SUM
• VSL_SS_2C_MOM
• VSL_SS_2C_SUM
• VSL_SS_3C_MOM
• VSL_SS_3C_SUM
• VSL_SS_4C_MOM
• VSL_SS_4C_SUM
• VSL_SS_KURTOSIS
• VSL_SS_SKEWNESS
• VSL_SS_MIN
• VSL_SS_MAX
• VSL_SS_VARIATION
Supported methods:
• VSL_SS_METHOD_FAST
• VSL_SS_METHOD_FAST_USER_MEAN
• FFTs through both DFTI and FFTW3 interfaces in one, two, and three dimensions.
• For COMPLEX_STORAGE, only the DFTI_COMPLEX_COMPLEX format is currently supported on CPU and
GPU devices.
• Both synchronous and asynchronous computations are supported.
27
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
• Arbitrary strides and batch distances are not supported for multi-dimensional R2C transforms offloaded
to the GPU. Considering the last dimension of the data, every element must be separated from its two
nearest peers (along another dimension and/or in another batch) by a constant distance. For example,
to compute a batched, two-dimensional R2C FFT of size [N2, N1] with input strides [0, S2, 1]
(row-major layout with unit elementary stride and no offset), INPUT_DISTANCE must be equal to
N2*S2 so that every element is separated from its nearest last-dimension counterpart(s) by a distance
S2 (in this example), even across batches.
• Due to the variadic implementation of DftiComputeForward and DftiComputeBackward, out-of-place
compute calls using the DFTI API with the OpenMP 5.1 dispatch construct differ from common dispatch
construct usage by requiring a "need_device_ptr" clause. The oneMKL examples provided on
installation demonstrate this usage.
• Transforms on GPU devices may overwrite FFT-irrelevant, padding entries in the output data.
• Sparse BLAS
• mkl_sparse_{s, d}_create_csr
• mkl_sparse_{s, d}_export_csr
• mkl_sparse_destroy
• mkl_sparse_order
• Currently supports only CSR matrix format.
• mkl_sparse_set_mv_hint
• Currently supports only SPARSE_OPERATION_NON_TRANSPOSE with CSR matrix format for general
MV (SPARSE_MATRIX_TYPE_GENERAL) and triangular MV (SPARSE_MATRIX_TYPE_TRIANGULAR with
fill modes SPARSE_FILL_MODE_LOWER/SPARSE_FILL_MODE_UPPER).
• mkl_sparse_set_sv_hint
• mkl_sparse_set_sm_hint
• Currently supports only CSR matrix format and SPARSE_MATRIX_TYPE_TRIANGULAR type.
• mkl_sparse_optimize
• Supports optimization for mkl_sparse_{s, d}_mv functionality based on supported hints added
through mkl_sparse_set_mv_hint offload.
• Supports optimization for mkl_sparse_{s, d}_trsv functionality based on supported hints added
through mkl_sparse_set_sv_hint offload.
• Supports optimization for mkl_sparse_{s, d}_trsm functionality based on supported hints added
through mkl_sparse_set_sm_hint offload.
• Both synchronous and asynchronous executions are supported.
NOTE Note that although you can run the mkl_sparse_optimize offload function asynchronously,
you are responsible for the data dependency between the optimization routine and the execution
routines.
• mkl_sparse_{s, d}_mv:
• Currently supports only SPARSE_OPERATION_NON_TRANSPOSE with the following combinations of
matrix types:
• SPARSE_MATRIX_TYPE_GENERAL
• SPARSE_MATRIX_TYPE_TRIANGULAR with fill modes SPARSE_FILL_MODE_LOWER/
SPARSE_FILL_MODE_UPPER and diagonal types SPARSE_DIAG_UNIT/SPARSE_DIAG_NON_UNIT
• SPARSE_MATRIX_TYPE_SYMMETRIC fill modes SPARSE_FILL_MODE_LOWER/
SPARSE_FILL_MODE_UPPER and diagonal type SPARSE_DIAG_NON_UNIT (currently,
SPARSE_DIAG_UNIT is not supported)
28
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• Both synchronous and asynchronous computations are supported.
• mkl_sparse_{s, d}_mm:
• Currently supported only with SPARSE_MATRIX_TYPE_GENERAL and
SPARSE_OPERATION_NON_TRANSPOSE.
• Both SPARSE_LAYOUT_ROW_MAJOR and SPARSE_LAYOUT_COLUMN_MAJOR are supported.
• Both synchronous and asynchronous computations are supported.
• mkl_sparse_{s, d}_trsv
• Currently supports only CSR matrix format with SPARSE_MATRIX_TYPE_TRIANGULAR and
SPARSE_OPERATION_NON_TRANSPOSE.
• Both synchronous and asynchronous computations are supported
• mkl_sparse_{s, d}_trsm
• Currently supports only CSR matrix format with SPARSE_MATRIX_TYPE_TRIANGULAR and
SPARSE_OPERATION_NON_TRANSPOSE.
• Both SPARSE_LAYOUT_ROW_MAJOR and SPARSE_LAYOUT_COLUMN_MAJOR are supported.
• Both synchronous and asynchronous computations are supported.
• mkl_sparse_sp2m
• Currently supported only with SPARSE_MATRIX_TYPE_GENERAL.
• Both synchronous and asynchronous computations are supported with Level Zero backend, and
currently only synchronous computations are supported with OpenCL backend.
• Note that you can run the mkl_sparse_sp2m offload function asynchronously, but you are
responsible for the data dependency between the first stage and the second stage of
mkl_sparse_sp2m.
• mkl_sparse_sp2m internally creates arrays for the sparse C matrix output. As they may be
expected to be used subsequently on both host and device, they are created internally using USM
shared memory. The arrays are managed by the library and will be cleaned up when the
corresponding C matrix handle is destroyed; however, direct access to the arrays is provided by the
mkl_sparse_{s,d}_export_csr() OpenMP offload function. Users are recommended to make a
copy to their own arrays if they want to have such data beyond the scope of the C matrix handle.
The choice of USM shared memory for C arrays is made for functional support of the OpenMP
Offload paradigm and has a performance impact over choosing USM device memory, which would
be more performant but not functional in all subsequent use cases.
• The created C matrix in the provided handle is not guaranteed to be sorted, so the
mkl_sparse_order() OpenMP offload API is provided for user convenience if that property is
needed.
• The input matrix handle A is not required to be sorted on input, but the input matrix handle B is
required to be sorted on input.
• In Sparse BLAS, the usage model consists of the creation stage, the inspection stage, the execution
stage, and the destruction stage. For Sparse BLAS with C OpenMP Offload, all stages can be
asynchronously executed, provided any data dependencies are already respected.
The OpenMP offload feature from Intel® oneAPI Math Kernel Library (oneMKL) enables you to run oneMKL
computations on Intel GPUs through the standard oneMKL APIs within an omp dispatch section. For
example, the standard CBLAS API for single precision real data type matrix multiply is:
void cblas_sgemm(const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE TransA,
const CBLAS_TRANSPOSE TransB, const MKL_INT M, const MKL_INT N,
const MKL_INT K, const float alpha, const float *A, const MKL_INT lda,
const float *B, const MKL_INT ldb, const float beta, float *C,
const MKL_INT ldc);
If the oneMKL function (for example, cblas_sgemm) is called outside of an omp dispatch section, or if
offload is disabled, then the CPU implementation is dispatched. If the same function is called within an omp
dispatch section and offload is possible then the GPU implementation is dispatched. By default the
execution of the oneMKL function within a dispatch construct is synchronous. OpenMP offload computations
may be done asynchronously by adding the nowait clause to the dispatch construct. This ensures that the
host thread encountering the task region generated by this construct will not be blocked by the oneMKL call.
Rather, the host thread is returned to the caller for further use. To finish the asynchronous (nowait)
29
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
computations and ensure memory and execution model consistency (for example, that the results of a
computation will be ready in memory to map), the last such nowait computation is followed by the stand-
alone construct #pragma omp taskwait.
From the OpenMP Application Programming Interface version 5.0 specification: "The taskwait region binds to
the current task region [i.e., in this case, the last nowait computation]. The current task region is suspended
at an implicit task scheduling point associated with the construct. The current task region remains suspended
until all child tasks that it generated before the taskwait region complete execution [currently, depend clause
is not supported]."
Example
Examples for using the OpenMP offload for oneMKL are located in the Intel® oneAPI Math Kernel Library
(oneMKL) installation directory, under:
examples/c_offload
The following code snippet shows how to use OpenMP offload for single-call oneMKL features such as most
dense linear algebra functionality.
#include <omp.h>
#include "mkl.h"
#include "mkl_omp_offload.h" // MKL header file for OpenMP offload
int dnum = 0;
int main() {
float *a, *b, *c, alpha = 1.0, beta = 1.0;
MKL_INT m = 150, n = 200, k = 128, lda = m, ldb = k, ldc = m;
MKL_INT sizea = lda * k, sizeb = ldb * n, sizec = ldc * n;
// allocate matrices and check pointers
a = (float *)mkl_malloc(sizea * sizeof(float), 64);
...
// initialize matrices
#pragma omp target map(c[0:sizec])
{
for (i = 0; i < sizec; i++) {
c[i] = 42;
}
...
}
// run gemm on host, use standard MKL interface
cblas_sgemm(CblasColMajor, CblasNoTrans, CblasNoTrans, m, n, k, alpha, a,
lda, b, ldb, beta, c, ldc);
// map the a, b, and c matrices on the device memory
#pragma omp target data map(to:a[0:sizea],b[0:sizeb]) map(tofrom:c[0:sizec])
device(dnum)
{
// run gemm on gpu, use standard MKL interface within a dispatch construct
// if offload is not possible, default to cpu
#pragma omp dispatch device(dnum)
cblas_sgemm(
CblasColMajor, CblasNoTrans, CblasNoTrans, m, n, k,
alpha, a, lda, b, ldb, beta, c, ldc
);
}
// Free matrices
mkl_free(a);
…
}
30
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Some of the oneMKL functionality requires to call a set of functions to perform the corresponding
computation. This is the case, for example, for the Discrete Fourier Transform which for a typical computation
involves calling the functions.
DFTI_EXTERN MKL_LONG DftiCreateDescriptor(DFTI_DESCRIPTOR_HANDLE*,
enum DFTI_CONFIG_VALUE,
enum DFTI_CONFIG_VALUE,
MKL_LONG, ...);
DFTI_EXTERN MKL_LONG DftiCommitDescriptor(DFTI_DESCRIPTOR_HANDLE);
DFTI_EXTERN MKL_LONG DftiComputeForward(DFTI_DESCRIPTOR_HANDLE, void*, ...);
DFTI_EXTERN MKL_LONG DftiComputeBackward(DFTI_DESCRIPTOR_HANDLE, void*, ...);
DFTI_EXTERN MKL_LONG DftiFreeDescriptor(DFTI_DESCRIPTOR_HANDLE*);
In that case, only a subset of the calls must be wrapped in an omp dispatch construct as shown in the
following code snippet for DFTI.
#include <omp.h>
#include "mkl.h"
#include "mkl_omp_offload.h"
int main(void)
{
const int devNum = 0;
const MKL_LONG N = 64; // Size of 1D transform
MKL_LONG status = 0;
MKL_LONG statusGPU = 0;
DFTI_DESCRIPTOR_HANDLE descHandle = NULL;
DFTI_DESCRIPTOR_HANDLE descHandleGPU = NULL;
MKL_Complex8 *x = NULL;
MKL_Complex8 *xGPU = NULL;
printf("Create DFTI descriptor\n");
status = DftiCreateDescriptor(&descHandle, DFTI_SINGLE, DFTI_COMPLEX, 1, N);
printf("Create GPU DFTI descriptor\n");
statusGPU = DftiCreateDescriptor(&descHandleGPU, DFTI_SINGLE, DFTI_COMPLEX,
1, N);
printf("Commit DFTI descriptor\n");
status = DftiCommitDescriptor(descHandle);
printf("Commit GPU DFTI descriptor\n");
#pragma omp dispatch device(devNum)
statusGPU = DftiCommitDescriptor(descHandleGPU);
printf("Allocate memory for input array\n");
x = (MKL_Complex8 *)mkl_malloc(N*sizeof(MKL_Complex8), 64);
printf("Allocate memory for GPU input array\n");
xGPU = (MKL_Complex8 *)mkl_malloc(N*sizeof(MKL_Complex8), 64);
printf("Initialize input for forward FFT\n");
// init x and xGPU ...
printf("Compute forward FFT in-place\n");
status = DftiComputeForward(descHandle, x);
printf("Compute GPU forward FFT in-place\n");
#pragma omp target data map(tofrom:xGPU[0:N]) device(devNum)
{
#pragma omp dispatch device(devNum)
statusGPU = DftiComputeForward(descHandleGPU, xGPU);
}
// use results now in x and xGPU ...
cleanup:
DftiFreeDescriptor(&descHandle);
DftiFreeDescriptor(&descHandleGPU);
31
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
mkl_free(x);
mkl_free(xGPU);
}
For asynchronous execution of multi-call oneMKL computation, the nowait clause needs to be used only on
the call to the function performing the actual computation (for example,
DftiCompute{Forward,Backward}). For instance, the following snippet shows how the DFTI example above
could be changed to have two, back-to-back, asynchronous (nowait) computations dispatched, with a
taskwait at the end of the second to ensure the completion of both computations before their results are
accessed:
printf("Compute Intel GPU forward FFT 1 in-place\n");
#pragma omp target data map(tofrom:x1GPU[0:N1], x2GPU[0:N2]) device(devNum)
{
#pragma omp dispatch device(devNum) nowait
status1GPU = DftiComputeForward(descHandle1GPU, x1GPU);
printf("Compute Intel GPU forward FFT 2 in-place\n");
#pragma omp dispatch device(devNum) nowait
status2GPU = DftiComputeForward(descHandle2GPU, x2GPU);
#pragma omp taskwait
}
if (status1GPU != DFTI_NO_ERROR) goto failed;
if (status2GPU != DFTI_NO_ERROR) goto failed;
For sparse BLAS computations, the workflow ‘create a CSR matrix handle’ → ‘compute’ → ‘destroy the CSR
matrix handle’ must be done so that the offloaded data arrays are alive through the full workflow. For
instance, if you are using a target data map, then the workflow must be contained in a single target data
region. On the other hand, if the arrays were allocated directly using omp_target_alloc() or the Intel
Extensions omp_target_alloc_host/omp_target_alloc_device/omp_target_alloc_shared, then the
workflow must be contained at least in a subset of the scope where those arrays are usable; that is, before
the corresponding calls to omp_target_free. The following snippet shows how the Sparse BLAS OpenMP
Offload example for mkl_sparse_s_mv() could be run using a target data map region, where N is the
number of rows, M is the number of columns, and NNZ is the number of non-zero entries of the sparse
matrix csrA_gpu, x is the input vector, and the output is stored in the z array:
#pragma omp target data map(to:ia[0:N+1],ja[0:NNZ],a[0:NNZ],x[0:M]) map(tofrom:z[0:N])
device(devNum)
{
#pragma omp dispatch device(devNum)
status_gpu1 = mkl_sparse_s_create_csr(&csrA_gpu, SPARSE_INDEX_BASE_ZERO, N, M, ia, ia +
1, ja, a);
#pragma omp dispatch device(devNum)
status_gpu2 = mkl_sparse_s_mv(SPARSE_OPERATION_NON_TRANSPOSE, alpha, csrA_gpu, descrA,
x, beta, z);
#pragma omp dispatch device(devNum)
status_gpu3 = mkl_sparse_destroy(csrA_gpu);
}
For asynchronous execution of multi-call oneMKL Sparse BLAS computation, the nowait clause can be added
to the call of the function performing the actual computation (for example, calls to the
mkl_sparse_{s,d}_mv() function).
As an example, the following snippet shows how the Sparse BLAS example above could be changed to have
two asynchronous (nowait) computations using the same matrix handle, csrA_gpu, but unrelated vector
data so there is no read/write dependency between them. Add a taskwait at the end of the second
execution to ensure the completion of both computations before the mkl_sparse_destroy() function is
called:
#pragma omp target data map(to:ia[0:N+1],ja[0:NNZ],a[0:NNZ],x[0:M],w[0:M])
map(tofrom:y[0:N],z[0:N]) device(devNum)
{
32
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
BLAS Routines
NOTE Different arrays used as parameters to Intel® MKL BLAS routines must not overlap.
Some routines and functions can have combined character codes, such as sc or dz.
33
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
For example, the function scasum uses a complex input array and returns a real value.
The <name> field, in BLAS level 1, indicates the operation type. For example, the BLAS level 1
routines ?dot, ?rot, ?swap compute a vector dot product, vector rotation, and vector swap, respectively.
In BLAS level 2 and 3, <name> reflects the matrix argument type:
ge general matrix
sy symmetric matrix
he Hermitian matrix
tr triangular matrix
The <mod> field, if present, provides additional details of the operation. BLAS level 1 names can have the
following characters in the <mod> field:
c conjugated vector
u unconjugated vector
BLAS level 2 names can have the following characters in the <mod> field:
mv matrix-vector product
BLAS level 3 names can have the following characters in the <mod> field:
mm matrix-matrix product
On 64-bit platforms, routines with the _64 suffix support large data arrays in the LP64 interface library and
enable you to mix integer types in one application. For example, when an application is linked with the LP64
interface library, SGEMM indexes arrays with the 32-bit integer type, while SGEMM_64 indexes arrays with the
64-bit integer type. For more interface library details, see "Using the ILP64 Interface vs. LP64 Interface" in
the developer guide.
The examples below illustrate how to interpret BLAS routine names:
34
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ddot <d> <dot>: real and double precision, vector-vector dot product
cdotc <c> <dot> <c>: complex and single precision, vector-vector dot product,
conjugated
cdotu <c> <dot> <u>: complex and single precision, vector-vector dot product,
unconjugated
scasum <sc> <asum>: real and single-precision output, complex and single-precision
input, sum of magnitudes of vector elements
sgemv <s> <ge> <mv>: real and single precision, general matrix, matrix-vector product
ztrmm <z> <tr> <mm> _64: complex and double precision, triangular matrix, matrix-
matrix product, 64-bit integer type
Sparse BLAS level 1 naming conventions are similar to those of BLAS level 1. For more information, see
Naming Conventions.
NOTE
This reference contains syntax in C for both the CBLAS interface and the Fortran BLAS routines.
In CBLAS, the Fortran routine names are prefixed with cblas_ (for example, dasum becomes cblas_dasum).
Names of all CBLAS functions are in lowercase letters. Like BLAS routines, Intel® oneAPI Math Kernel Library
provides CBLAS routines with the _64 suffix (for example, cblas_dasum_64) to support large data arrays in
the LP64 interface library on 64-bit platforms. For more interface library details, see "Using the ILP64
Interface vs. LP64 Interface" in the developer guide.
Complex functions ?dotc and ?dotu become CBLAS subroutines (void functions); they return the complex
result via a void pointer, added as the last parameter. CBLAS names of these functions are suffixed with
_sub. For example, the BLAS function cdotc corresponds to cblas_cdotc_sub.
WARNING
Users of the CBLAS interface should be aware that the CBLAS are just a C interface to the BLAS, which
is based on the FORTRAN standard and subject to the FORTRAN standard restrictions. In particular, the
output parameters should not be referenced through more than one argument.
NOTE
This interface is not implemented in the Sparse BLAS Level 2 and Level 3 routines.
35
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Enumerated Types
The CBLAS interface uses the following enumerated types:
enum CBLAS_LAYOUT {
CblasRowMajor=101, /* row-major arrays */
CblasColMajor=102}; /* column-major arrays */
enum CBLAS_TRANSPOSE {
CblasNoTrans=111, /* trans='N' */
CblasTrans=112, /* trans='T' */
CblasConjTrans=113}; /* trans='C' */
enum CBLAS_UPLO {
CblasUpper=121, /* uplo ='U' */
CblasLower=122}; /* uplo ='L' */
enum CBLAS_DIAG {
CblasNonUnit=131, /* diag ='N' */
CblasUnit=132}; /* diag ='U' */
enum CBLAS_SIDE {
CblasLeft=141, /* side ='L' */
CblasRight=142}; /* side ='R' */
36
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
BLAS Level 1 Routine and Function Groups and Their Data Types
Routine or Data Types Description
Function Group
cblas_?asum
Computes the sum of magnitudes of the vector
elements.
Syntax
float cblas_sasum (const MKL_INT n, const float *x, const MKL_INT incx);
float cblas_scasum (const MKL_INT n, const void *x, const MKL_INT incx);
double cblas_dasum (const MKL_INT n, const double *x, const MKL_INT incx);
double cblas_dzasum (const MKL_INT n, const void *x, const MKL_INT incx);
Include Files
• mkl.h
Description
37
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The ?asum routine computes the sum of the magnitudes of elements of a real vector, or the sum of
magnitudes of the real and imaginary parts of elements of a complex vector:
Input Parameters
Output Parameters
res Contains the sum of magnitudes of real and imaginary parts of all elements
of the vector.
Return Values
Contains the sum of magnitudes of real and imaginary parts of all elements of the vector.
cblas_?axpy
Computes a vector-scalar product and adds the result
to a vector.
Syntax
void cblas_saxpy (const MKL_INT n, const float a, const float *x, const MKL_INT incx,
float *y, const MKL_INT incy);
void cblas_daxpy (const MKL_INT n, const double a, const double *x, const MKL_INT incx,
double *y, const MKL_INT incy);
void cblas_caxpy (const MKL_INT n, const void *a, const void *x, const MKL_INT incx,
void *y, const MKL_INT incy);
void cblas_zaxpy (const MKL_INT n, const void *a, const void *x, const MKL_INT incx,
void *y, const MKL_INT incy);
Include Files
• mkl.h
Description
y := a*x + y
where:
a is a scalar
x and y are vectors each with a number of elements that equals n.
38
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
Output Parameters
cblas_?copy
Copies a vector to another vector.
Syntax
void cblas_scopy (const MKL_INT n, const float *x, const MKL_INT incx, float *y, const
MKL_INT incy);
void cblas_dcopy (const MKL_INT n, const double *x, const MKL_INT incx, double *y,
const MKL_INT incy);
void cblas_ccopy (const MKL_INT n, const void *x, const MKL_INT incx, void *y, const
MKL_INT incy);
void cblas_zcopy (const MKL_INT n, const void *x, const MKL_INT incx, void *y, const
MKL_INT incy);
Include Files
• mkl.h
Description
y = x,
where x and y are vectors.
Input Parameters
39
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
cblas_?dot
Computes a vector-vector dot product.
Syntax
float cblas_sdot (const MKL_INT n, const float *x, const MKL_INT incx, const float *y,
const MKL_INT incy);
double cblas_ddot (const MKL_INT n, const double *x, const MKL_INT incx, const double
*y, const MKL_INT incy);
Include Files
• mkl.h
Description
Input Parameters
Return Values
The result of the dot product of x and y, if n is positive. Otherwise, returns 0.
cblas_?sdot
Computes a vector-vector dot product with double
precision.
Syntax
float cblas_sdsdot (const MKL_INT n, const float sb, const float *sx, const MKL_INT
incx, const float *sy, const MKL_INT incy);
40
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
double cblas_dsdot (const MKL_INT n, const float *sx, const MKL_INT incx, const float
*sy, const MKL_INT incy);
Include Files
• mkl.h
Description
The ?sdot routines compute the inner product of two vectors with double precision. Both routines use double
precision accumulation of the intermediate results, but the sdsdot routine outputs the final result in single
precision, whereas the dsdot routine outputs the double precision result. The function sdsdot also adds
scalar value sb to the inner product.
Input Parameters
Output Parameters
res Contains the result of the dot product of sx and sy (with sb added for
sdsdot), if n is positive. Otherwise, res contains sb for sdsdot and 0 for
dsdot.
Return Values
The result of the dot product of sx and sy (with sb added for sdsdot), if n is positive. Otherwise, returns sb
for sdsdot and 0 for dsdot.
cblas_?dotc
Computes a dot product of a conjugated vector with
another vector.
Syntax
void cblas_cdotc_sub (const MKL_INT n, const void *x, const MKL_INT incx, const void
*y, const MKL_INT incy, void *dotc);
void cblas_zdotc_sub (const MKL_INT n, const void *x, const MKL_INT incx, const void
*y, const MKL_INT incy, void *dotc);
Include Files
• mkl.h
Description
41
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
Output Parameters
dotc Contains the result of the dot product of the conjugated x and unconjugated
y, if n is positive. Otherwise, it contains 0.
cblas_?dotu
Computes a complex vector-vector dot product.
Syntax
void cblas_cdotu_sub (const MKL_INT n, const void *x, const MKL_INT incx, const void
*y, const MKL_INT incy, void *dotu);
void cblas_zdotu_sub (const MKL_INT n, const void *x, const MKL_INT incx, const void
*y, const MKL_INT incy, void *dotu);
Include Files
• mkl.h
Description
NOTE The _sub suffix on cblas_cdotu_sub and cblas_zdotu_sub is to emphasize that these
are subroutines rather than functions (the return value is stored into the dotu pointer).
42
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
Output Parameters
dotu Contains the result of the dot product of x and y, if n is positive. Otherwise,
it contains 0.
cblas_?nrm2
Computes the Euclidean norm of a vector.
Syntax
float cblas_snrm2 (const MKL_INT n, const float *x, const MKL_INT incx);
double cblas_dnrm2 (const MKL_INT n, const double *x, const MKL_INT incx);
float cblas_scnrm2 (const MKL_INT n, const void *x, const MKL_INT incx);
double cblas_dznrm2 (const MKL_INT n, const void *x, const MKL_INT incx);
Include Files
• mkl.h
Description
res = ||x||,
where:
x is a vector,
res is a value containing the Euclidean norm of the elements of x.
Input Parameters
Return Values
The Euclidean norm of the vector x.
43
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
cblas_?rot
Performs rotation of points in the plane.
Syntax
void cblas_srot (const MKL_INT n, float *x, const MKL_INT incx, float *y, const MKL_INT incy,
const float c, const float s);
void cblas_drot (const MKL_INT n, double *x, const MKL_INT incx, double *y, const MKL_INT incy,
const double c, const double s);
void cblas_crot (const MKL_INT n, void *x, const MKL_INT incx, void *y, const MKL_INT incy,
const float c, const void* s);
void cblas_zrot (const MKL_INT n, void *x, const MKL_INT incx, void *y, const MKL_INT incy,
const double c, const void* s);
void cblas_csrot (const MKL_INT n, void *x, const MKL_INT incx, void *y, const MKL_INT incy,
const float c, const float s);
void cblas_zdrot (const MKL_INT n, void *x, const MKL_INT incx, void *y, const MKL_INT incy,
const double c, const double s);
Description
Given two complex vectors x and y, each vector element of these vectors is replaced as follows:
xi = c*xi + s*yi
yi = c*yi - s*xi
If s is a complex type, each vector element is replaced as follows:
xi = c*xi + s*yi
yi = c*yi - conj(s)*xi
Input Parameters
c A scalar.
s A scalar.
Output Parameters
cblas_?rotg
Computes the parameters for a Givens rotation.
Syntax
void cblas_srotg (float *a, float *b, float *c, float *s);
void cblas_drotg (double *a, double *b, double *c, double *s);
44
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void cblas_crotg (void *a, const void *b, float *c, void *s);
void cblas_zrotg (void *a, const void *b, double *c, void *s);
Include Files
• mkl.h
Description
Given the Cartesian coordinates (a, b) of a point, these routines return the parameters c, s, r, and z
associated with the Givens rotation. The parameters c and s define a unitary matrix such that:
The parameter z is defined such that if |a| > |b|, z is s; otherwise if c is not 0 z is 1/c; otherwise z is 1.
Input Parameters
Output Parameters
cblas_?rotm
Performs modified Givens rotation of points in the
plane.
Syntax
void cblas_srotm (const MKL_INT n, float *x, const MKL_INT incx, float *y, const
MKL_INT incy, const float *param);
void cblas_drotm (const MKL_INT n, double *x, const MKL_INT incx, double *y, const
MKL_INT incy, const double *param);
45
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
Given two vectors x and y, each vector element of these vectors is replaced as follows:
xi xi
=H
yi yi
for i=1 to n, where H is a modified Givens transformation matrix whose values are stored in the param[1]
through param[4] array. See discussion on the param argument.
Input Parameters
1.0 h12
flag = 0.0: H =
h21 1.0
h11 1.0
flag = 1.0: H =
−1.0 h22
1.0 0.0
flag = -2.0: H =
0.0 1.0
In the last three cases, the matrix entries of 1.0, -1.0, and 0.0 are assumed
based on the value of flag and are not required to be set in the param
vector.
Output Parameters
46
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
cblas_?rotmg
Computes the parameters for a modified Givens
rotation.
Syntax
void cblas_srotmg (float *d1, float *d2, float *x1, const float y1, float *param);
void cblas_drotmg (double *d1, double *d2, double *x1, const double y1, double *param);
Include Files
• mkl.h
Description
Given Cartesian coordinates (x1, y1) of an input vector, these routines compute the components of a
modified Givens transformation matrix H that zeros the y-component of the resulting vector:
x1 x1 d1
=H
0 y1 d2
Input Parameters
d1 Provides the scaling factor for the x-coordinate of the input vector.
d2 Provides the scaling factor for the y-coordinate of the input vector.
Output Parameters
1.0 h12
flag = 0.0: H =
h21 1.0
h11 1.0
flag = 1.0: H =
−1.0 h22
1.0 0.0
flag = -2.0: H =
0.0 1.0
47
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
In the last three cases, the matrix entries of 1.0, -1.0, and 0.0 are assumed
based on the value of flag and are not required to be set in the param
vector.
cblas_?scal
Computes the product of a vector by a scalar.
Syntax
void cblas_sscal (const MKL_INT n, const float a, float *x, const MKL_INT incx);
void cblas_dscal (const MKL_INT n, const double a, double *x, const MKL_INT incx);
void cblas_cscal (const MKL_INT n, const void *a, void *x, const MKL_INT incx);
void cblas_zscal (const MKL_INT n, const void *a, void *x, const MKL_INT incx);
void cblas_csscal (const MKL_INT n, const float a, void *x, const MKL_INT incx);
void cblas_zdscal (const MKL_INT n, const double a, void *x, const MKL_INT incx);
Include Files
• mkl.h
Description
x = a*x
where:
a is a scalar, x is an n-element vector.
Input Parameters
Output Parameters
x Updated vector x.
cblas_?swap
Swaps a vector with another vector.
Syntax
void cblas_sswap (const MKL_INT n, float *x, const MKL_INT incx, float *y, const
MKL_INT incy);
void cblas_dswap (const MKL_INT n, double *x, const MKL_INT incx, double *y, const
MKL_INT incy);
48
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void cblas_cswap (const MKL_INT n, void *x, const MKL_INT incx, void *y, const MKL_INT
incy);
void cblas_zswap (const MKL_INT n, void *x, const MKL_INT incx, void *y, const MKL_INT
incy);
Include Files
• mkl.h
Description
Given two vectors x and y, the ?swap routines return vectors y and x swapped, each replacing the other.
Input Parameters
Output Parameters
cblas_i?amax
Finds the index of the element with maximum
absolute value.
Syntax
CBLAS_INDEX cblas_isamax (const MKL_INT n, const float *x, const MKL_INT incx);
CBLAS_INDEX cblas_idamax (const MKL_INT n, const double *x, const MKL_INT incx);
CBLAS_INDEX cblas_icamax (const MKL_INT n, const void *x, const MKL_INT incx);
CBLAS_INDEX cblas_izamax (const MKL_INT n, const void *x, const MKL_INT incx);
Include Files
• mkl.h
Description
Given a vector x, the i?amax functions return the position of the vector element x[i] that has the largest
absolute value for real flavors, or the largest sum |Re(x[i])|+|Im(x[i])| for complex flavors.
If more than one vector element is found with the same largest absolute value, the index of the first one
encountered is returned.
49
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If the vector contains NaN values, then the routine returns the index of the first NaN.
Input Parameters
Return Values
Returns the position of vector element that has the largest absolute value such that x[index-1] has the
largest absolute value. The index returned is zero-based.
cblas_i?amin
Finds the index of the element with the smallest
absolute value.
Syntax
CBLAS_INDEX cblas_isamin (const MKL_INT n, const float *x, const MKL_INT incx);
CBLAS_INDEX cblas_idamin (const MKL_INT n, const double *x, const MKL_INT incx);
CBLAS_INDEX cblas_icamin (const MKL_INT n, const void *x, const MKL_INT incx);
CBLAS_INDEX cblas_izamin (const MKL_INT n, const void *x, const MKL_INT incx);
Include Files
• mkl.h
Description
Given a vector x, the i?amin functions return the position of the vector element x[i] that has the smallest
absolute value for real flavors, or the smallest sum |Re(x[i])|+|Im(x[i])| for complex flavors.
If more than one vector element is found with the same smallest absolute value, the index of the first one
encountered is returned.
If the vector contains NaN values, then the routine returns the index of the first NaN.
Input Parameters
Return Values
Indicates the position of vector element with the smallest absolute value such that x[index-1] has the
smallest absolute value. The index returned is zero-based.
cblas_?cabs1
Computes absolute value of complex number.
50
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
float cblas_scabs1 (const void *z);
double cblas_dcabs1 (const void *z);
Include Files
• mkl.h
Description
The ?cabs1 is an auxiliary routine for a few BLAS Level 1 routines. This routine performs an operation
defined as
res=|Re(z)|+|Im(z)|,
where z is a scalar, and res is a value containing the absolute value of a complex number z.
Input Parameters
z Scalar.
Return Values
The absolute value of a complex number z.
51
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
cblas_?gbmv
Computes a matrix-vector product with a general
band matrix.
Syntax
void cblas_sgbmv (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE trans, const MKL_INT
m, const MKL_INT n, const MKL_INT kl, const MKL_INT ku, const float alpha, const float
*a, const MKL_INT lda, const float *x, const MKL_INT incx, const float beta, float *y,
const MKL_INT incy);
void cblas_dgbmv (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE trans, const MKL_INT
m, const MKL_INT n, const MKL_INT kl, const MKL_INT ku, const double alpha, const
double *a, const MKL_INT lda, const double *x, const MKL_INT incx, const double beta,
double *y, const MKL_INT incy);
void cblas_cgbmv (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE trans, const MKL_INT
m, const MKL_INT n, const MKL_INT kl, const MKL_INT ku, const void *alpha, const void
*a, const MKL_INT lda, const void *x, const MKL_INT incx, const void *beta, void *y,
const MKL_INT incy);
void cblas_zgbmv (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE trans, const MKL_INT
m, const MKL_INT n, const MKL_INT kl, const MKL_INT ku, const void *alpha, const void
*a, const MKL_INT lda, const void *x, const MKL_INT incx, const void *beta, void *y,
const MKL_INT incy);
Include Files
• mkl.h
Description
52
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The ?gbmv routines perform a matrix-vector operation defined as
y := alpha*A*x + beta*y,
or
y := alpha*A'*x + beta*y,
or
y := alpha *conjg(A')*x + beta*y,
where:
alpha and beta are scalars,
x and y are vectors,
A is an m-by-n band matrix, with kl sub-diagonals and ku super-diagonals.
Input Parameters
53
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
incx Specifies the increment for the elements of x. incx must not be zero.
beta Specifies the scalar beta. When beta is equal to zero, then y need not be
set on input.
Output Parameters
cblas_?gemv
Computes a matrix-vector product using a general
matrix.
54
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void cblas_sgemv (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE trans, const MKL_INT
m, const MKL_INT n, const float alpha, const float *a, const MKL_INT lda, const float
*x, const MKL_INT incx, const float beta, float *y, const MKL_INT incy);
void cblas_dgemv (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE trans, const MKL_INT
m, const MKL_INT n, const double alpha, const double *a, const MKL_INT lda, const
double *x, const MKL_INT incx, const double beta, double *y, const MKL_INT incy);
void cblas_cgemv (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE trans, const MKL_INT
m, const MKL_INT n, const void *alpha, const void *a, const MKL_INT lda, const void *x,
const MKL_INT incx, const void *beta, void *y, const MKL_INT incy);
void cblas_zgemv (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE trans, const MKL_INT
m, const MKL_INT n, const void *alpha, const void *a, const MKL_INT lda, const void *x,
const MKL_INT incx, const void *beta, void *y, const MKL_INT incy);
Include Files
• mkl.h
Description
y := alpha*A*x + beta*y,
or
y := alpha*A'*x + beta*y,
or
y := alpha*conjg(A')*x + beta*y,
where:
alpha and beta are scalars,
x and y are vectors,
A is an m-by-n matrix.
Input Parameters
55
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
beta Specifies the scalar beta. When beta is set to zero, then y need not be set
on input.
Output Parameters
y Updated vector y.
cblas_?ger
Performs a rank-1 update of a general matrix.
Syntax
void cblas_sger (const CBLAS_LAYOUT Layout, const MKL_INT m, const MKL_INT n, const
float alpha, const float *x, const MKL_INT incx, const float *y, const MKL_INT incy,
float *a, const MKL_INT lda);
void cblas_dger (const CBLAS_LAYOUT Layout, const MKL_INT m, const MKL_INT n, const
double alpha, const double *x, const MKL_INT incx, const double *y, const MKL_INT incy,
double *a, const MKL_INT lda);
Include Files
• mkl.h
56
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
A := alpha*x*y'+ A,
where:
alpha is a scalar,
x is an m-element vector,
y is an n-element vector,
A is an m-by-n general matrix.
Input Parameters
57
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
cblas_?gerc
Performs a rank-1 update (conjugated) of a general
matrix.
Syntax
void cblas_cgerc (const CBLAS_LAYOUT Layout, const MKL_INT m, const MKL_INT n, const
void *alpha, const void *x, const MKL_INT incx, const void *y, const MKL_INT incy, void
*a, const MKL_INT lda);
void cblas_zgerc (const CBLAS_LAYOUT Layout, const MKL_INT m, const MKL_INT n, const
void *alpha, const void *x, const MKL_INT incx, const void *y, const MKL_INT incy, void
*a, const MKL_INT lda);
Include Files
• mkl.h
Description
A := alpha*x*conjg(y') + A,
where:
alpha is a scalar,
x is an m-element vector,
y is an n-element vector,
A is an m-by-n matrix.
Input Parameters
58
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
y Array, size at least (1 + (n - 1)*abs(incy)). Before entry, the
incremented array y must contain the n-element vector y.
Output Parameters
cblas_?geru
Performs a rank-1 update (unconjugated) of a general
matrix.
Syntax
void cblas_cgeru (const CBLAS_LAYOUT Layout, const MKL_INT m, const MKL_INT n, const
void *alpha, const void *x, const MKL_INT incx, const void *y, const MKL_INT incy, void
*a, const MKL_INT lda);
void cblas_zgeru (const CBLAS_LAYOUT Layout, const MKL_INT m, const MKL_INT n, const
void *alpha, const void *x, const MKL_INT incx, const void *y, const MKL_INT incy, void
*a, const MKL_INT lda);
Include Files
• mkl.h
Description
A := alpha*x*y ' + A,
where:
alpha is a scalar,
x is an m-element vector,
y is an n-element vector,
A is an m-by-n matrix.
59
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
Output Parameters
cblas_?hbmv
Computes a matrix-vector product using a Hermitian
band matrix.
Syntax
void cblas_chbmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const MKL_INT k, const void *alpha, const void *a, const MKL_INT lda, const void *x,
const MKL_INT incx, const void *beta, void *y, const MKL_INT incy);
60
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void cblas_zhbmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const MKL_INT k, const void *alpha, const void *a, const MKL_INT lda, const void *x,
const MKL_INT incx, const void *beta, void *y, const MKL_INT incy);
Include Files
• mkl.h
Description
where:
alpha and beta are scalars,
x and y are n-element vectors,
A is an n-by-n Hermitian band matrix, with k super-diagonals.
Input Parameters
uplo Specifies whether the upper or lower triangular part of the Hermitian band
matrix A is used:
If uplo = CblasUpper, then the upper triangular part of the matrix A is
used.
If uplo = CblasLower, then the low triangular part of the matrix A is used.
n Specifies the order of the matrix A. The value of n must be at least zero.
Layout = CblasColMajor:
Before entry with uplo = CblasUpper, the leading (k + 1) by n part of
the array a must contain the upper triangular band part of the Hermitian
matrix. The matrix must be supplied column-by-column, with the leading
diagonal of the matrix in row k of the array, the first super-diagonal starting
at position 1 in row (k - 1), and so on. The top left k by k triangle of the
array a is not referenced.
61
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
62
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The following program segment transfers the lower triangular part of a
Hermitian row-major band matrix from row-major full matrix storage
(matrix, with leading dimension ldm) to row-major band storage (a, with
leading dimension lda):
Output Parameters
cblas_?hemv
Computes a matrix-vector product using a Hermitian
matrix.
Syntax
void cblas_chemv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const void *alpha, const void *a, const MKL_INT lda, const void *x, const MKL_INT incx,
const void *beta, void *y, const MKL_INT incy);
void cblas_zhemv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const void *alpha, const void *a, const MKL_INT lda, const void *x, const MKL_INT incx,
const void *beta, void *y, const MKL_INT incy);
Include Files
• mkl.h
Description
63
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
y := alpha*A*x + beta*y,
where:
alpha and beta are scalars,
x and y are n-element vectors,
A is an n-by-n Hermitian matrix.
Input Parameters
uplo Specifies whether the upper or lower triangular part of the array a is used.
n Specifies the order of the matrix A. The value of n must be at least zero.
Before entry with uplo = CblasUpper, the leading n-by-n upper triangular
part of the array a must contain the upper triangular part of the Hermitian
matrix and the strictly lower triangular part of a is not referenced. Before
entry with uplo = CblasLower, the leading n-by-n lower triangular part of
the array a must contain the lower triangular part of the Hermitian matrix
and the strictly upper triangular part of a is not referenced.
The imaginary parts of the diagonal elements need not be set and are
assumed to be zero.
beta Specifies the scalar beta. When beta is supplied as zero then y need not be
set on input.
Output Parameters
64
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
cblas_?her
Performs a rank-1 update of a Hermitian matrix.
Syntax
void cblas_cher (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const float alpha, const void *x, const MKL_INT incx, void *a, const MKL_INT lda);
void cblas_zher (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const double alpha, const void *x, const MKL_INT incx, void *a, const MKL_INT lda);
Include Files
• mkl.h
Description
A := alpha*x*conjg(x') + A,
where:
alpha is a real scalar,
x is an n-element vector,
A is an n-by-n Hermitian matrix.
Input Parameters
uplo Specifies whether the upper or lower triangular part of the array a is used.
n Specifies the order of the matrix A. The value of n must be at least zero.
Before entry with uplo = CblasUpper, the leading n-by-n upper triangular
part of the array a must contain the upper triangular part of the Hermitian
matrix and the strictly lower triangular part of a is not referenced.
Before entry with uplo = CblasLower, the leading n-by-n lower triangular
part of the array a must contain the lower triangular part of the Hermitian
matrix and the strictly upper triangular part of a is not referenced.
The imaginary parts of the diagonal elements need not be set and are
assumed to be zero.
65
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
cblas_?her2
Performs a rank-2 update of a Hermitian matrix.
Syntax
void cblas_cher2 (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const void *alpha, const void *x, const MKL_INT incx, const void *y, const MKL_INT
incy, void *a, const MKL_INT lda);
void cblas_zher2 (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const void *alpha, const void *x, const MKL_INT incx, const void *y, const MKL_INT
incy, void *a, const MKL_INT lda);
Include Files
• mkl.h
Description
Input Parameters
uplo Specifies whether the upper or lower triangular part of the array a is used.
n Specifies the order of the matrix A. The value of n must be at least zero.
66
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
x Array, size at least (1 + (n - 1)*abs(incx)). Before entry, the
incremented array x must contain the n-element vector x.
Before entry with uplo = CblasUpper, the leading n-by-n upper triangular
part of the array a must contain the upper triangular part of the Hermitian
matrix and the strictly lower triangular part of a is not referenced.
Before entry with uplo = CblasLower, the leading n-by-n lower triangular
part of the array a must contain the lower triangular part of the Hermitian
matrix and the strictly upper triangular part of a is not referenced.
The imaginary parts of the diagonal elements need not be set and are
assumed to be zero.
Output Parameters
cblas_?hpmv
Computes a matrix-vector product using a Hermitian
packed matrix.
Syntax
void cblas_chpmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const void *alpha, const void *ap, const void *x, const MKL_INT incx, const void *beta,
void *y, const MKL_INT incy);
void cblas_zhpmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const void *alpha, const void *ap, const void *x, const MKL_INT incx, const void *beta,
void *y, const MKL_INT incy);
Include Files
• mkl.h
Description
67
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
y := alpha*A*x + beta*y,
where:
alpha and beta are scalars,
x and y are n-element vectors,
A is an n-by-n Hermitian matrix, supplied in packed form.
Input Parameters
uplo Specifies whether the upper or lower triangular part of the matrix A is
supplied in the packed array ap.
n Specifies the order of the matrix A. The value of n must be at least zero.
Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular part of the Hermitian matrix packed sequentially, column-by-
column, so that ap[0] contains A1, 1, ap[1] and ap[2] contain A1, 2 and
A2, 2 respectively, and so on. Before entry with uplo = CblasLower, the
array ap must contain the lower triangular part of the Hermitian matrix
packed sequentially, column-by-column, so that ap[0] contains A1, 1,
ap[1] and ap[2] contain A2, 1 and A3, 1 respectively, and so on.
For Layout = CblasRowMajor:
Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular part of the Hermitian matrix packed sequentially, row-by-row,
ap[0] contains A1, 1, ap[1] and ap[2] contain A1, 2 and A1, 3 respectively,
and so on. Before entry with uplo = CblasLower, the array ap must
contain the lower triangular part of the Hermitian matrix packed
sequentially, row-by-row, so that ap[0] contains A1, 1, ap[1] and ap[2]
contain A2, 1 and A2, 2 respectively, and so on.
The imaginary parts of the diagonal elements need not be set and are
assumed to be zero.
68
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
When beta is equal to zero then y need not be set on input.
Output Parameters
cblas_?hpr
Performs a rank-1 update of a Hermitian packed
matrix.
Syntax
void cblas_chpr (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const float alpha, const void *x, const MKL_INT incx, void *ap);
void cblas_zhpr (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const double alpha, const void *x, const MKL_INT incx, void *ap);
Include Files
• mkl.h
Description
A := alpha*x*conjg(x') + A,
where:
alpha is a real scalar,
x is an n-element vector,
A is an n-by-n Hermitian matrix, supplied in packed form.
Input Parameters
uplo Specifies whether the upper or lower triangular part of the matrix A is
supplied in the packed array ap.
n Specifies the order of the matrix A. The value of n must be at least zero.
69
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
incx Specifies the increment for the elements of x. incx must not be zero.
Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular part of the Hermitian matrix packed sequentially, column-by-
column, so that ap[0] contains A1, 1, ap[1] and ap[2] contain A1, 2 and
A2, 2 respectively, and so on.
Before entry with uplo = CblasLower, the array ap must contain the lower
triangular part of the Hermitian matrix packed sequentially, column-by-
column, so that ap[0] contains A1, 1, ap[1] and ap[2] contain A2, 1 and
A3, 1 respectively, and so on.
For Layout = CblasRowMajor:
Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular part of the Hermitian matrix packed sequentially, row-by-row,
ap[0] contains A1, 1, ap[1] and ap[2] contain A1, 2 and A1, 3 respectively,
and so on.
Before entry with uplo = CblasLower, the array ap must contain the lower
triangular part of the Hermitian matrix packed sequentially, row-by-row, so
that ap[0] contains A1, 1, ap[1] and ap[2] contain A2, 1 and A2, 2
respectively, and so on.
The imaginary parts of the diagonal elements need not be set and are
assumed to be zero.
Output Parameters
cblas_?hpr2
Performs a rank-2 update of a Hermitian packed
matrix.
Syntax
void cblas_chpr2 (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const void *alpha, const void *x, const MKL_INT incx, const void *y, const MKL_INT
incy, void *ap);
void cblas_zhpr2 (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const void *alpha, const void *x, const MKL_INT incx, const void *y, const MKL_INT
incy, void *ap);
70
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h
Description
A := alpha*x*conjg(y') + conjg(alpha)*y*conjg(x') + A,
where:
alpha is a scalar,
x and y are n-element vectors,
A is an n-by-n Hermitian matrix, supplied in packed form.
Input Parameters
uplo Specifies whether the upper or lower triangular part of the matrix A is
supplied in the packed array ap.
n Specifies the order of the matrix A. The value of n must be at least zero.
Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular part of the Hermitian matrix packed sequentially, column-by-
column, so that ap[0] contains A1, 1, ap[1] and ap[2] contain A1, 2 and
A2, 2 respectively, and so on.
Before entry with uplo = CblasLower, the array ap must contain the lower
triangular part of the Hermitian matrix packed sequentially, column-by-
column, so that ap[0] contains A1, 1, ap[1] and ap[2] contain A2, 1 and
A3, 1 respectively, and so on.
71
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular part of the Hermitian matrix packed sequentially, row-by-row,
ap[0] contains A1, 1, ap[1] and ap[2] contain A1, 2 and A1, 3 respectively,
and so on.
Before entry with uplo = CblasLower, the array ap must contain the lower
triangular part of the Hermitian matrix packed sequentially, row-by-row, so
that ap[0] contains A1, 1, ap[1] and ap[2] contain A2, 1 and A2, 2
respectively, and so on.
The imaginary parts of the diagonal elements need not be set and are
assumed to be zero.
Output Parameters
cblas_?sbmv
Computes a matrix-vector product with a symmetric
band matrix.
Syntax
void cblas_ssbmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const MKL_INT k, const float alpha, const float *a, const MKL_INT lda, const float *x,
const MKL_INT incx, const float beta, float *y, const MKL_INT incy);
void cblas_dsbmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const MKL_INT k, const double alpha, const double *a, const MKL_INT lda, const double
*x, const MKL_INT incx, const double beta, double *y, const MKL_INT incy);
Include Files
• mkl.h
Description
y := alpha*A*x + beta*y,
where:
alpha and beta are scalars,
x and y are n-element vectors,
A is an n-by-n symmetric band matrix, with k super-diagonals.
72
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
uplo Specifies whether the upper or lower triangular part of the band matrix A is
used:
if uplo = CblasUpper - upper triangular part;
n Specifies the order of the matrix A. The value of n must be at least zero.
a Array, size lda*n. Before entry with uplo = CblasUpper, the leading (k +
1) by n part of the array a must contain the upper triangular band part of
the symmetric matrix, supplied column-by-column, with the leading
diagonal of the matrix in row k of the array, the first super-diagonal starting
at position 1 in row (k - 1), and so on. The top left k by k triangle of the
array a is not referenced.
The following program segment transfers the upper triangular part of a
symmetric band matrix from conventional full matrix storage (matrix, with
leading dimension ldm) to band storage (a, with leading dimension lda):
73
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
74
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
cblas_?spmv
Computes a matrix-vector product with a symmetric
packed matrix.
Syntax
void cblas_sspmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const float alpha, const float *ap, const float *x, const MKL_INT incx, const float
beta, float *y, const MKL_INT incy);
void cblas_dspmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const double alpha, const double *ap, const double *x, const MKL_INT incx, const double
beta, double *y, const MKL_INT incy);
Include Files
• mkl.h
Description
y := alpha*A*x + beta*y,
where:
alpha and beta are scalars,
x and y are n-element vectors,
A is an n-by-n symmetric matrix, supplied in packed form.
Input Parameters
uplo Specifies whether the upper or lower triangular part of the matrix A is
supplied in the packed array ap.
n Specifies the order of the matrix A. The value of n must be at least zero.
Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular part of the symmetric matrix packed sequentially, column-by-
column, so that ap[0] contains A1, 1, ap[1] and ap[2] contain A1, 2 and
A2, 2 respectively, and so on. Before entry with uplo = CblasLower, the
array ap must contain the lower triangular part of the symmetric matrix
packed sequentially, column-by-column, so that ap[0] contains A1, 1,
ap[1] and ap[2] contain A2, 1 and A3, 1 respectively, and so on.
75
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular part of the symmetric matrix packed sequentially, row-by-row,
ap[0] contains A1, 1, ap[1] and ap[2] contain A1, 2 and A1, 3 respectively,
and so on. Before entry with uplo = CblasLower, the array ap must
contain the lower triangular part of the symmetric matrix packed
sequentially, row-by-row, so that ap[0] contains A1, 1, ap[1] and ap[2]
contain A2, 1 and A2, 2 respectively, and so on.
Output Parameters
cblas_?spr
Performs a rank-1 update of a symmetric packed
matrix.
Syntax
void cblas_sspr (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const float alpha, const float *x, const MKL_INT incx, float *ap);
void cblas_dspr (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const double alpha, const double *x, const MKL_INT incx, double *ap);
Include Files
• mkl.h
Description
a:= alpha*x*x'+ A,
where:
alpha is a real scalar,
x is an n-element vector,
A is an n-by-n symmetric matrix, supplied in packed form.
76
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
uplo Specifies whether the upper or lower triangular part of the matrix A is
supplied in the packed array ap.
n Specifies the order of the matrix A. The value of n must be at least zero.
Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular part of the symmetric matrix packed sequentially, column-by-
column, so that ap[0] contains A1, 1, ap[1] and ap[2] contain A1, 2 and
A2, 2 respectively, and so on.
Before entry with uplo = CblasLower, the array ap must contain the lower
triangular part of the symmetric matrix packed sequentially, column-by-
column, so that ap[0] contains A1, 1, ap[1] and ap[2] contain A2, 1 and
A3, 1 respectively, and so on.
For Layout = CblasRowMajor:
Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular part of the symmetric matrix packed sequentially, row-by-row,
ap[0] contains A1, 1, ap[1] and ap[2] contain A1, 2 and A1, 3 respectively,
and so on.
Before entry with uplo = CblasLower, the array ap must contain the lower
triangular part of the symmetric matrix packed sequentially, row-by-row, so
that ap[0] contains A1, 1, ap[1] and ap[2] contain A2, 1 and A2, 2
respectively, and so on.
Output Parameters
77
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
cblas_?spr2
Computes a rank-2 update of a symmetric packed
matrix.
Syntax
void cblas_sspr2 (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const float alpha, const float *x, const MKL_INT incx, const float *y, const MKL_INT
incy, float *ap);
void cblas_dspr2 (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const double alpha, const double *x, const MKL_INT incx, const double *y, const MKL_INT
incy, double *ap);
Include Files
• mkl.h
Description
Input Parameters
uplo Specifies whether the upper or lower triangular part of the matrix A is
supplied in the packed array ap.
n Specifies the order of the matrix A. The value of n must be at least zero.
incy Specifies the increment for the elements of y. The value of incy must not be
zero.
78
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ap For Layout = CblasColMajor:
Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular part of the symmetric matrix packed sequentially, column-by-
column, so that ap[0] contains A1, 1, ap[1] and ap[2] contain A1, 2 and
A2, 2 respectively, and so on.
Before entry with uplo = CblasLower, the array ap must contain the lower
triangular part of the symmetric matrix packed sequentially, column-by-
column, so that ap[0] contains A1, 1, ap[1] and ap[2] contain A2, 1 and
A3, 1 respectively, and so on.
For Layout = CblasRowMajor:
Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular part of the symmetric matrix packed sequentially, row-by-row,
ap[0] contains A1, 1, ap[1] and ap[2] contain A1, 2 and A1, 3 respectively,
and so on.
Before entry with uplo = CblasLower, the array ap must contain the lower
triangular part of the symmetric matrix packed sequentially, row-by-row, so
that ap[0] contains A1, 1, ap[1] and ap[2] contain A2, 1 and A2, 2
respectively, and so on.
Output Parameters
cblas_?symv
Computes a matrix-vector product for a symmetric
matrix.
Syntax
void cblas_ssymv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const float alpha, const float *a, const MKL_INT lda, const float *x, const MKL_INT
incx, const float beta, float *y, const MKL_INT incy);
void cblas_dsymv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const double alpha, const double *a, const MKL_INT lda, const double *x, const MKL_INT
incx, const double beta, double *y, const MKL_INT incy);
Include Files
• mkl.h
Description
y := alpha*A*x + beta*y,
where:
alpha and beta are scalars,
79
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
uplo Specifies whether the upper or lower triangular part of the array a is used.
n Specifies the order of the matrix A. The value of n must be at least zero.
Before entry with uplo = CblasUpper, the leading n-by-n upper triangular
part of the array a must contain the upper triangular part of the symmetric
matrix A and the strictly lower triangular part of a is not referenced. Before
entry with uplo = CblasLower, the leading n-by-n lower triangular part of
the array a must contain the lower triangular part of the symmetric matrix
A and the strictly upper triangular part of a is not referenced.
Output Parameters
cblas_?syr
Performs a rank-1 update of a symmetric matrix.
Syntax
void cblas_ssyr (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const float alpha, const float *x, const MKL_INT incx, float *a, const MKL_INT lda);
80
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void cblas_dsyr (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const double alpha, const double *x, const MKL_INT incx, double *a, const MKL_INT lda);
Include Files
• mkl.h
Description
A := alpha*x*x' + A ,
where:
alpha is a real scalar,
x is an n-element vector,
A is an n-by-n symmetric matrix.
Input Parameters
uplo Specifies whether the upper or lower triangular part of the array a is used.
n Specifies the order of the matrix A. The value of n must be at least zero.
Before entry with uplo = CblasUpper, the leading n-by-n upper triangular
part of the array a must contain the upper triangular part of the symmetric
matrix A and the strictly lower triangular part of a is not referenced.
Before entry with uplo = CblasLower, the leading n-by-n lower triangular
part of the array a must contain the lower triangular part of the symmetric
matrix A and the strictly upper triangular part of a is not referenced.
Output Parameters
81
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
cblas_?syr2
Performs a rank-2 update of a symmetric matrix.
Syntax
void cblas_ssyr2 (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const float alpha, const float *x, const MKL_INT incx, const float *y, const MKL_INT
incy, float *a, const MKL_INT lda);
void cblas_dsyr2 (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const double alpha, const double *x, const MKL_INT incx, const double *y, const MKL_INT
incy, double *a, const MKL_INT lda);
Include Files
• mkl.h
Description
A := alpha*x*y'+ alpha*y*x' + A,
where:
alpha is scalar,
x and y are n-element vectors,
A is an n-by-n symmetric matrix.
Input Parameters
uplo Specifies whether the upper or lower triangular part of the array a is used.
n Specifies the order of the matrix A. The value of n must be at least zero.
82
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
incy Specifies the increment for the elements of y. The value of incy must not be
zero.
Before entry with uplo = CblasUpper, the leading n-by-n upper triangular
part of the array a must contain the upper triangular part of the symmetric
matrix and the strictly lower triangular part of a is not referenced.
Before entry with uplo = CblasLower, the leading n-by-n lower triangular
part of the array a must contain the lower triangular part of the symmetric
matrix and the strictly upper triangular part of a is not referenced.
Output Parameters
cblas_?tbmv
Computes a matrix-vector product using a triangular
band matrix.
Syntax
void cblas_stbmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const MKL_INT k, const
float *a, const MKL_INT lda, float *x, const MKL_INT incx);
void cblas_dtbmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const MKL_INT k, const
double *a, const MKL_INT lda, double *x, const MKL_INT incx);
void cblas_ctbmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const MKL_INT k, const
void *a, const MKL_INT lda, void *x, const MKL_INT incx);
void cblas_ztbmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const MKL_INT k, const
void *a, const MKL_INT lda, void *x, const MKL_INT incx);
Include Files
• mkl.h
Description
83
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
A is an n-by-n unit, or non-unit, upper or lower triangular band matrix, with (k +1) diagonals.
Input Parameters
n Specifies the order of the matrix A. The value of n must be at least zero.
Layout = CblasColMajor:
Before entry with uplo = CblasUpper, the leading (k + 1) by n part of
the array a must contain the upper triangular band part of the matrix of
coefficients, supplied column-by-column, with the leading diagonal of the
matrix in row k of the array, the first super-diagonal starting at position 1 in
row (k - 1), and so on. The top left k by k triangle of the array a is not
referenced. The following program segment transfers an upper triangular
band matrix from conventional full matrix storage (matrix, with leading
dimension ldm) to band storage (a, with leading dimension lda):
84
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
referenced. The following program segment transfers a lower triangular
band matrix from conventional full matrix storage (matrix, with leading
dimension ldm) to band storage (a, with leading dimension lda):
85
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
cblas_?tbsv
Solves a system of linear equations whose coefficients
are in a triangular band matrix.
Syntax
void cblas_stbsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const MKL_INT k, const
float *a, const MKL_INT lda, float *x, const MKL_INT incx);
void cblas_dtbsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const MKL_INT k, const
double *a, const MKL_INT lda, double *x, const MKL_INT incx);
void cblas_ctbsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const MKL_INT k, const
void *a, const MKL_INT lda, void *x, const MKL_INT incx);
void cblas_ztbsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const MKL_INT k, const
void *a, const MKL_INT lda, void *x, const MKL_INT incx);
Include Files
• mkl.h
Description
Input Parameters
86
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
trans Specifies the system of equations:
if trans=CblasNoTrans, then A*x = b;
n Specifies the order of the matrix A. The value of n must be at least zero.
Layout = CblasColMajor:
Before entry with uplo = CblasUpper, the leading (k + 1) by n part of
the array a must contain the upper triangular band part of the matrix of
coefficients, supplied column-by-column, with the leading diagonal of the
matrix in row k of the array, the first super-diagonal starting at position 1 in
row (k - 1), and so on. The top left k by k triangle of the array a is not
referenced.
The following program segment transfers an upper triangular band matrix
from conventional full matrix storage (matrix, with leading dimension ldm)
to band storage (a, with leading dimension lda):
87
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
88
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
cblas_?tpmv
Computes a matrix-vector product using a triangular
packed matrix.
Syntax
void cblas_stpmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const float *ap, float
*x, const MKL_INT incx);
void cblas_dtpmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const double *ap, double
*x, const MKL_INT incx);
void cblas_ctpmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const void *ap, void *x,
const MKL_INT incx);
void cblas_ztpmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const void *ap, void *x,
const MKL_INT incx);
Include Files
• mkl.h
Description
Input Parameters
89
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
n Specifies the order of the matrix A. The value of n must be at least zero.
Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular matrix packed sequentially, column-by-column, so that
respectively, and so on. Before entry with uplo = CblasLowerap[0]
contains A1, 1, ap[1] and ap[2] contain A1, 2 and A2, 2, the array ap must
contain the lower triangular matrix packed sequentially, column-by-column,
so thatap[0] contains A1, 1, ap[1] and ap[2] contain A2, 1 and A3, 1
respectively, and so on. When diag = CblasUnit, the diagonal elements of
a are not referenced, but are assumed to be unity.
For Layout = CblasRowMajor:
Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular matrix packed sequentially, row-by-row, ap[0] contains A1, 1,
ap[1] and ap[2] contain A1, 2 and A1, 3 respectively, and so on.
Before entry with uplo = CblasLower, the array ap must contain the lower
triangular matrix packed sequentially, row-by-row, so that ap[0] contains
A1, 1, ap[1] and ap[2] contain A2, 1 and A2, 2 respectively, and so on.
Output Parameters
cblas_?tpsv
Solves a system of linear equations whose coefficients
are in a triangular packed matrix.
Syntax
void cblas_stpsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const float *ap, float
*x, const MKL_INT incx);
void cblas_dtpsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const double *ap, double
*x, const MKL_INT incx);
void cblas_ctpsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const void *ap, void *x,
const MKL_INT incx);
void cblas_ztpsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const void *ap, void *x,
const MKL_INT incx);
90
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h
Description
Input Parameters
n Specifies the order of the matrix A. The value of n must be at least zero.
Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular part of the triangular matrix packed sequentially, column-by-
column, so that ap[0] contains A1, 1, ap[1] and ap[2] contain A1, 2 and
A2, 2 respectively, and so on.
Before entry with uplo = CblasLower, the array ap must contain the lower
triangular part of the triangular matrix packed sequentially, column-by-
column, so that ap[0] contains A1, 1, ap[1] and ap[2] contain A2, 1 and
A3, 1 respectively, and so on.
For Layout = CblasRowMajor:
91
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular part of the triangular matrix packed sequentially, row-by-row,
ap[0] contains A1, 1, ap[1] and ap[2] contain A1, 2 and A1, 3 respectively,
and so on. Before entry with uplo = CblasLower, the array ap must
contain the lower triangular part of the triangular matrix packed
sequentially, row-by-row, so that ap[0] contains A1, 1, ap[1] and ap[2]
contain A2, 1 and A2, 2 respectively, and so on.
When diag = CblasUnit, the diagonal elements of a are not referenced,
but are assumed to be unity.
Output Parameters
cblas_?trmv
Computes a matrix-vector product using a triangular
matrix.
Syntax
void cblas_strmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const float *a, const
MKL_INT lda, float *x, const MKL_INT incx);
void cblas_dtrmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const double *a, const
MKL_INT lda, double *x, const MKL_INT incx);
void cblas_ctrmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const void *a, const
MKL_INT lda, void *x, const MKL_INT incx);
void cblas_ztrmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const void *a, const
MKL_INT lda, void *x, const MKL_INT incx);
Include Files
• mkl.h
Description
The ?trmv routines perform one of the following matrix-vector operations defined as
92
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
n Specifies the order of the matrix A. The value of n must be at least zero.
a Array, size lda*n. Before entry with uplo = CblasUpper, the leading n-by-
n upper triangular part of the array a must contain the upper triangular
matrix and the strictly lower triangular part of a is not referenced. Before
entry with uplo = CblasLower, the leading n-by-n lower triangular part of
the array a must contain the lower triangular matrix and the strictly upper
triangular part of a is not referenced.
When diag = CblasUnit, the diagonal elements of a are not referenced
either, but are assumed to be unity.
Output Parameters
cblas_?trsv
Solves a system of linear equations whose coefficients
are in a triangular matrix.
Syntax
void cblas_strsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const float *a, const
MKL_INT lda, float *x, const MKL_INT incx);
93
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
Input Parameters
n Specifies the order of the matrix A. The value of n must be at least zero.
a Array, size lda*n . Before entry with uplo = CblasUpper, the leading n-
by-n upper triangular part of the array a must contain the upper triangular
matrix and the strictly lower triangular part of a is not referenced. Before
94
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
entry with uplo = CblasLower, the leading n-by-n lower triangular part of
the array a must contain the lower triangular matrix and the strictly upper
triangular part of a is not referenced.
When diag = CblasUnit, the diagonal elements of a are not referenced
either, but are assumed to be unity.
Output Parameters
95
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
cblas_?gemm
Computes a matrix-matrix product with general
matrices.
Syntax
void cblas_hgemm (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, const
CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const
MKL_F16 alpha, const MKL_F16 *a, const MKL_INT lda, const MKL_F16 *b, const MKL_INT
ldb, const MKL_F16 beta, MKL_F16 *c, const MKL_INT ldc);
void cblas_sgemm (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, const
CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const float
alpha, const float *a, const MKL_INT lda, const float *b, const MKL_INT ldb, const
float beta, float *c, const MKL_INT ldc);
void cblas_dgemm (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, const
CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const double
alpha, const double *a, const MKL_INT lda, const double *b, const MKL_INT ldb, const
double beta, double *c, const MKL_INT ldc);
void cblas_cgemm (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, const
CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const void
*alpha, const void *a, const MKL_INT lda, const void *b, const MKL_INT ldb, const void
*beta, void *c, const MKL_INT ldc);
void cblas_zgemm (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, const
CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const void
*alpha, const void *a, const MKL_INT lda, const void *b, const MKL_INT ldb, const void
*beta, void *c, const MKL_INT ldc);
Include Files
• mkl.h
Description
The ?gemm routines compute a scalar-matrix-matrix product and add the result to a scalar-matrix product,
with general matrices. The operation is defined as
C := alpha*op(A)*op(B) + beta*C
where:
op(X) is one of op(X) = X, or op(X) = XT, or op(X) = XH,
alpha and beta are scalars,
A, B and C are matrices:
op(A) is an m-by-k matrix,
96
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
op(B) is a k-by-n matrix,
C is an m-by-n matrix.
See also:
• ?gemm3m, BLAS-like extension routines, that use matrix multiplication for similar matrix-matrix operations
Input Parameters
m Specifies the number of rows of the matrix op(A) and of the matrix C. The
value of m must be at least zero.
n Specifies the number of columns of the matrix op(B) and the number of
columns of the matrix C. The value of n must be at least zero.
k Specifies the number of columns of the matrix op(A) and the number of
rows of the matrix op(B). The value of k must be at least zero.
a
transa=CblasNoTrans transa=CblasTrans or
transa=CblasConjTrans
transa=CblasNoTrans transa=CblasTrans or
transa=CblasConjTrans
97
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
b
transb=CblasNoTrans transb=CblasTrans or
transb=CblasConjTrans
transb=CblasNoTrans transb=CblasTrans or
transb=CblasConjTrans
beta Specifies the scalar beta. When beta is equal to zero, then c need not be
set on input.
c
Layout = Array, size ldc by n. Before entry, the leading m-
CblasColMajor by-n part of the array c must contain the matrix C,
except when beta is equal to zero, in which case c
need not be set on entry.
98
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ldc Specifies the leading dimension of c as declared in the calling
(sub)program.
Output Parameters
Example
For examples of routine usage, see these code examples in the Intel® oneAPI Math Kernel Library (oneMKL)
installation directory:
• cblas_hgemm: examples\cblas\source\cblas_hgemmx.c
• cblas_sgemm: examples\cblas\source\cblas_sgemmx.c
• cblas_dgemm: examples\cblas\source\cblas_dgemmx.c
• cblas_cgemm: examples\cblas\source\cblas_cgemmx.c
• cblas_zgemm: examples\cblas\source\cblas_zgemmx.c
cblas_?hemm
Computes a matrix-matrix product where one input
matrix is Hermitian.
Syntax
void cblas_chemm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const MKL_INT m, const MKL_INT n, const void *alpha, const void *a, const MKL_INT
lda, const void *b, const MKL_INT ldb, const void *beta, void *c, const MKL_INT ldc);
void cblas_zhemm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const MKL_INT m, const MKL_INT n, const void *alpha, const void *a, const MKL_INT
lda, const void *b, const MKL_INT ldb, const void *beta, void *c, const MKL_INT ldc);
Include Files
• mkl.h
Description
The ?hemm routines compute a scalar-matrix-matrix product using a Hermitian matrix A and a general matrix
B and add the result to a scalar-matrix product using a general matrix C. The operation is defined as
C := alpha*A*B + beta*C
or
C := alpha*B*A + beta*C
where:
alpha and beta are scalars,
A is a Hermitian matrix,
B and C are m-by-n matrices.
99
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
side Specifies whether the Hermitian matrix A appears on the left or right in the
operation as follows:
if side = CblasLeft, then C := alpha*A*B + beta*C;
uplo Specifies whether the upper or lower triangular part of the Hermitian matrix
A is used:
If uplo = CblasUpper, then the upper triangular part of the Hermitian
matrix A is used.
If uplo = CblasLower, then the low triangular part of the Hermitian matrix
A is used.
b For Layout = CblasColMajor: array, size ldb*n. The leading m-by-n part
of the array b must contain the matrix B.
For Layout = CblasRowMajor: array, size ldb*m. The leading n-by-m part
of the array b must contain the matrix B
100
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ldb Specifies the leading dimension of b as declared in the calling
(sub)program.When Layout = CblasColMajor, ldb must be at least
max(1, m); otherwise, ldb must be at least max(1, n) .
c For Layout = CblasColMajor: array, size ldc*n. Before entry, the leading
m-by-n part of the array c must contain the matrix C, except when beta is
zero, in which case c need not be set on entry.
For Layout = CblasRowMajor: array, size ldc*m. Before entry, the leading
n-by-m part of the array c must contain the matrix C, except when beta is
zero, in which case c need not be set on entry.
Output Parameters
cblas_?herk
Performs a Hermitian rank-k update.
Syntax
void cblas_cherk (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const float alpha, const void
*a, const MKL_INT lda, const float beta, void *c, const MKL_INT ldc);
void cblas_zherk (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const double alpha, const void
*a, const MKL_INT lda, const double beta, void *c, const MKL_INT ldc);
Include Files
• mkl.h
Description
The ?herk routines perform a rank-k matrix-matrix operation using a general matrix A and a Hermitian
matrix C. The operation is defined as:
C := alpha*A*AH + beta*C,
or
C := alpha*AH*A + beta*C,
where:
alpha and beta are real scalars,
C is an n-by-n Hermitian matrix,
A is an n-by-k matrix in the first case and a k-by-n matrix in the second case.
101
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
uplo Specifies whether the upper or lower triangular part of the array c is used.
n Specifies the order of the matrix C. The value of n must be at least zero.
a
trans=CblasNoTrans trans=CblasConjTrans
lda
trans=CblasNoTrans trans=CblasConjTrans
102
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Before entry with uplo = CblasUpper, the leading n-by-n upper triangular
part of the array c must contain the upper triangular part of the Hermitian
matrix and the strictly lower triangular part of c is not referenced.
Before entry with uplo = CblasLower, the leading n-by-n lower triangular
part of the array c must contain the lower triangular part of the Hermitian
matrix and the strictly upper triangular part of c is not referenced.
The imaginary parts of the diagonal elements need not be set, they are
assumed to be zero.
Output Parameters
cblas_?her2k
Performs a Hermitian rank-2k update.
Syntax
void cblas_cher2k (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const void *alpha, const void
*a, const MKL_INT lda, const void *b, const MKL_INT ldb, const float beta, void *c,
const MKL_INT ldc);
void cblas_zher2k (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const void *alpha, const void
*a, const MKL_INT lda, const void *b, const MKL_INT ldb, const double beta, void *c,
const MKL_INT ldc);
Include Files
• mkl.h
Description
The ?her2k routines perform a rank-2k matrix-matrix operation using general matrices A and B and a
Hermitian matrix C. The operation is defined as
103
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
uplo Specifies whether the upper or lower triangular part of the array c is used.
n Specifies the order of the matrix C. The value of n must be at least zero.
a
trans=CblasNoTrans trans=CblasConjTrans
trans=CblasNoTrans trans=CblasConjTrans
104
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
b
trans=CblasNoTrans trans=CblasConjTrans
trans=CblasNoTrans trans=CblasConjTrans
Output Parameters
105
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
cblas_?symm
Computes a matrix-matrix product where one input
matrix is symmetric.
Syntax
void cblas_ssymm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const MKL_INT m, const MKL_INT n, const float alpha, const float *a, const
MKL_INT lda, const float *b, const MKL_INT ldb, const float beta, float *c, const
MKL_INT ldc);
void cblas_dsymm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const MKL_INT m, const MKL_INT n, const double alpha, const double *a, const
MKL_INT lda, const double *b, const MKL_INT ldb, const double beta, double *c, const
MKL_INT ldc);
void cblas_csymm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const MKL_INT m, const MKL_INT n, const void *alpha, const void *a, const MKL_INT
lda, const void *b, const MKL_INT ldb, const void *beta, void *c, const MKL_INT ldc);
void cblas_zsymm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const MKL_INT m, const MKL_INT n, const void *alpha, const void *a, const MKL_INT
lda, const void *b, const MKL_INT ldb, const void *beta, void *c, const MKL_INT ldc);
Include Files
• mkl.h
Description
The ?symm routines compute a scalar-matrix-matrix product with one symmetric matrix and add the result to
a scalar-matrix product . The operation is defined as
C := alpha*A*B + beta*C,
or
C := alpha*B*A + beta*C,
where:
alpha and beta are scalars,
A is a symmetric matrix,
B and C are m-by-n matrices.
Input Parameters
side Specifies whether the symmetric matrix A appears on the left or right in the
operation:
if side = CblasLeft, then C := alpha*A*B + beta*C;
uplo Specifies whether the upper or lower triangular part of the symmetric
matrix A is used:
106
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
if uplo = CblasUpper, then the upper triangular part is used;
b For Layout = CblasColMajor: array, size ldb*n. The leading m-by-n part
of the array b must contain the matrix B.
For Layout = CblasRowMajor: array, size ldb*m. The leading n-by-m part
of the array b must contain the matrix B
c For Layout = CblasColMajor: array, size ldc*n. Before entry, the leading
m-by-n part of the array c must contain the matrix C, except when beta is
zero, in which case c need not be set on entry.
For Layout = CblasRowMajor: array, size ldc*m. Before entry, the leading
n-by-m part of the array c must contain the matrix C, except when beta is
zero, in which case c need not be set on entry.
107
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
cblas_?syrk
Performs a symmetric rank-k update.
Syntax
void cblas_ssyrk (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const float alpha, const float
*a, const MKL_INT lda, const float beta, float *c, const MKL_INT ldc);
void cblas_dsyrk (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const double alpha, const
double *a, const MKL_INT lda, const double beta, double *c, const MKL_INT ldc);
void cblas_csyrk (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const void *alpha, const void
*a, const MKL_INT lda, const void *beta, void *c, const MKL_INT ldc);
void cblas_zsyrk (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const void *alpha, const void
*a, const MKL_INT lda, const void *beta, void *c, const MKL_INT ldc);
Include Files
• mkl.h
Description
The ?syrk routines perform a rank-k matrix-matrix operation for a symmetric matrix C using a general
matrix A . The operation is defined as:
C := alpha*A*A' + beta*C,
or
C := alpha*A'*A + beta*C,
where:
alpha and beta are scalars,
C is an n-by-n symmetric matrix,
A is an n-by-k matrix in the first case and a k-by-n matrix in the second case.
Input Parameters
uplo Specifies whether the upper or lower triangular part of the array c is used.
108
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If uplo = CblasUpper, then the upper triangular part of the array c is
used.
If uplo = CblasLower, then the low triangular part of the array c is used.
n Specifies the order of the matrix C. The value of n must be at least zero.
trans=CblasNoTrans trans=CblasConjTrans
lda
trans=CblasNoTrans trans=CblasConjTrans
c Array, size ldc* n. Before entry with uplo = CblasUpper, the leading n-
by-n upper triangular part of the array c must contain the upper triangular
part of the symmetric matrix and the strictly lower triangular part of c is not
referenced.
109
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Before entry with uplo = CblasLower, the leading n-by-n lower triangular
part of the array c must contain the lower triangular part of the symmetric
matrix and the strictly upper triangular part of c is not referenced.
Output Parameters
cblas_?syr2k
Performs a symmetric rank-2k update.
Syntax
void cblas_ssyr2k (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const float alpha, const float
*a, const MKL_INT lda, const float *b, const MKL_INT ldb, const float beta, float *c,
const MKL_INT ldc);
void cblas_dsyr2k (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const double alpha, const
double *a, const MKL_INT lda, const double *b, const MKL_INT ldb, const double beta,
double *c, const MKL_INT ldc);
void cblas_csyr2k (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const void *alpha, const void
*a, const MKL_INT lda, const void *b, const MKL_INT ldb, const void *beta, void *c,
const MKL_INT ldc);
void cblas_zsyr2k (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const void *alpha, const void
*a, const MKL_INT lda, const void *b, const MKL_INT ldb, const void *beta, void *c,
const MKL_INT ldc);
Include Files
• mkl.h
Description
The ?syr2k routines perform a rank-2k matrix-matrix operation for a symmetric matrix C using general
matrices A and BThe operation is defined as:
where:
alpha and beta are scalars,
110
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
C is an n-by-n symmetric matrix,
A and B are n-by-k matrices in the first case, and k-by-n matrices in the second case.
Input Parameters
uplo Specifies whether the upper or lower triangular part of the array c is used.
n Specifies the order of the matrix C.The value of n must be at least zero.
a
trans=CblasNoTrans trans=CblasConjTrans
trans=CblasNoTrans trans=CblasConjTrans
111
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
b
trans=CblasNoTrans trans=CblasConjTrans
trans=CblasNoTrans trans=CblasConjTrans
c Array, size ldc* n. Before entry with uplo = CblasUpper, the leading n-
by-n upper triangular part of the array c must contain the upper triangular
part of the symmetric matrix and the strictly lower triangular part of c is not
referenced.
Before entry with uplo = CblasLower, the leading n-by-n lower triangular
part of the array c must contain the lower triangular part of the symmetric
matrix and the strictly upper triangular part of c is not referenced.
Output Parameters
112
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
cblas_?trmm
Computes a matrix-matrix product where one input
matrix is triangular.
Syntax
void cblas_strmm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m, const
MKL_INT n, const float alpha, const float *a, const MKL_INT lda, float *b, const
MKL_INT ldb);
void cblas_dtrmm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m, const
MKL_INT n, const double alpha, const double *a, const MKL_INT lda, double *b, const
MKL_INT ldb);
void cblas_ctrmm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m, const
MKL_INT n, const void *alpha, const void *a, const MKL_INT lda, void *b, const MKL_INT
ldb);
void cblas_ztrmm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m, const
MKL_INT n, const void *alpha, const void *a, const MKL_INT lda, void *b, const MKL_INT
ldb);
Include Files
• mkl.h
Description
The ?trmm routines compute a scalar-matrix-matrix product with one triangular matrix . The operation is
defined as
B := alpha*op(A)*B
or
B := alpha*B*op(A)
where:
alpha is a scalar,
B is an m-by-n matrix,
A is a unit, or non-unit, upper or lower triangular matrix
op(A) is one of op(A) = A, or op(A) = A', or op(A) = conjg(A').
Input Parameters
side Specifies whether op(A) appears on the left or right of B in the operation:
113
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
b For Layout = CblasColMajor: array, size ldb*n. Before entry, the leading
m-by-n part of the array b must contain the matrix B.
For Layout = CblasRowMajor: array, size ldb*m. Before entry, the leading
n-by-m part of the array b must contain the matrix B.
Output Parameters
cblas_?trsm
Solves a triangular matrix equation.
114
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void cblas_strsm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m, const
MKL_INT n, const float alpha, const float *a, const MKL_INT lda, float *b, const
MKL_INT ldb);
void cblas_dtrsm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m, const
MKL_INT n, const double alpha, const double *a, const MKL_INT lda, double *b, const
MKL_INT ldb);
void cblas_ctrsm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m, const
MKL_INT n, const void *alpha, const void *a, const MKL_INT lda, void *b, const MKL_INT
ldb);
void cblas_ztrsm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m, const
MKL_INT n, const void *alpha, const void *a, const MKL_INT lda, void *b, const MKL_INT
ldb);
Include Files
• mkl.h
Description
op(A)*X = alpha*B,
or
X*op(A) = alpha*B,
where:
alpha is a scalar,
X and B are m-by-n matrices,
A is a unit, or non-unit, upper or lower triangular matrix, and
op(A) is one of op(A) = A, or op(A) = A', or op(A) = conjg(A').
The matrix B is overwritten by the solution matrix X.
Input Parameters
side Specifies whether op(A) appears on the left or right of X in the equation:
115
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
if transa=CblasTrans;
b For Layout = CblasColMajor: array, size ldb*n. Before entry, the leading
m-by-n part of the array b must contain the matrix B.
For Layout = CblasRowMajor: array, size ldb*m. Before entry, the leading
n-by-m part of the array b must contain the matrix B.
Output Parameters
cblas_?trmm_oop
Computes a matrix-matrix product where one input
matrix is triangular and the other matrix is general,
putting output into a different matrix.
116
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void cblas_strmm_oop (
const CBLAS_LAYOUT layout, const CBLAS_SIDE side, const CBLAS_UPLO uplo,
const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m,
const MKL_INT n, const float alpha, const float *a, const MKL_INT lda,
const float *b, const MKL_INT ldb, const float beta, float *c,
const MKL_INT ldc);
void cblas_dtrmm_oop (
const CBLAS_LAYOUT layout, const CBLAS_SIDE side, const CBLAS_UPLO uplo,
const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m,
const MKL_INT n, const double alpha, const double *a, const MKL_INT lda,
const double *b, const MKL_INT ldb, const double beta, double *c,
const MKL_INT ldc);
void cblas_ctrmm_oop (
const CBLAS_LAYOUT layout, const CBLAS_SIDE side,
const CBLAS_UPLO uplo, const CBLAS_TRANSPOSE transa,
const CBLAS_DIAG diag, const MKL_INT m, const MKL_INT n,
const void* alpha, const void *a, const MKL_INT lda, const void *b,
const MKL_INT ldb, const void* beta, void *c, const MKL_INT ldc);
void cblas_ztrmm_oop (
const CBLAS_LAYOUT layout, const CBLAS_SIDE side,
const CBLAS_UPLO uplo, const CBLAS_TRANSPOSE transa,
const CBLAS_DIAG diag, const MKL_INT m, const MKL_INT n,
const void* alpha, const void *a, const MKL_INT lda, const void *b,
const MKL_INT ldb, const void* beta, void *c, const MKL_INT ldc);
Include Files
mkl.h
Description
The cblas_?trmm_oop routines compute a scalar-matrix-matrix product where one of the matrices in the
multiplication is triangular, and then add the result to a scalar-matrix product. The operation is defined as
C := alpha*op(A)*B + beta*C
or
C := alpha*B*op(A) + beta*C
where:
Input Parameters
side Specifies whether op(A) appears on the left or right of B in the operation.
117
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Before entry with uplo = CblasLower the lower triangular part of the
array a must contain the lower triangular matrix and the strictly upper
triangular part of a is not referenced.
lda Specifies the leading dimension of a. When side = CblasLeft, then lda
must be at least max(1, m). When side = CblasRight, then lda must
be at least max(1, n).
118
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
For layout = CblasRowMajor, array of size ldc*m. Before entry, the
leading n-by-m part of the array c must contain the matrix C.
Output Parameters
cblas_?trsm_oop
Solves a triangular matrix equation and adds the
result to another scaled matrix.
Syntax
void cblas_strsm_oop (
const CBLAS_LAYOUT layout, const CBLAS_SIDE side, const CBLAS_UPLO uplo,
const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m,
const MKL_INT n, const float alpha, const float *a, const MKL_INT lda,
const float *b, const MKL_INT ldb, const float beta, float *c,
const MKL_INT ldc);
void cblas_dtrsm_oop (
const CBLAS_LAYOUT layout, const CBLAS_SIDE side, const CBLAS_UPLO uplo,
const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m,
const MKL_INT n, const double alpha, const double *a, const MKL_INT lda,
const double *b, const MKL_INT ldb, const double beta, double *c,
const MKL_INT ldc);
void cblas_ctrsm_oop (
const CBLAS_LAYOUT layout, const CBLAS_SIDE side,
const CBLAS_UPLO uplo, const CBLAS_TRANSPOSE transa,
const CBLAS_DIAG diag, const MKL_INT m, const MKL_INT n,
const void* alpha, const void *a, const MKL_INT lda, const void *b,
const MKL_INT ldb, const void* beta, void *c, const MKL_INT ldc);
void cblas_ztrsm_oop (
const CBLAS_LAYOUT layout, const CBLAS_SIDE side,
const CBLAS_UPLO uplo, const CBLAS_TRANSPOSE transa,
const CBLAS_DIAG diag, const MKL_INT m, const MKL_INT n,
const void* alpha, const void *a, const MKL_INT lda, const void *b,
const MKL_INT ldb, const void* beta, void *c, const MKL_INT ldc);
Include Files
mkl.h
Description
The cblas_?trsm_oop routines perform a triangular matrix solve followed by a scaled matrix addition.
op(A)*X=alpha*B
119
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
C := X + beta*C
For a right-side solve, the routine solves
X*op(A)=alpha*B
followed by the same-scaled addition
C := X + beta*C
where:
Input Parameters
side Specifies whether op(A) appears on the left or right of B in the triangular
solve.
If side = CblasLeft, then we solve op(A)*X = alpha*B before
performing C := X + beta*C.
120
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Before entry with uplo = CblasUpper, the leading k by k upper triangular
part of the array a must contain the upper triangular matrix and the strictly
lower triangular part of a is not referenced.
Before entry with uplo = CblasLower the lower triangular part of the
array a must contain the lower triangular matrix and the strictly upper
triangular part of a is not referenced.
lda Specifies the leading dimension of a. When side = CblasLeft, then lda
must be at least max(1, m). When side = CblasRight, then lda must
be at least max(1, n).
b For layout = CblasColMajor, array of size ldb*n. Before entry, the leading m-
by-n part of the array b must contain the matrix B.
For layout = CblasRowMajor, array of size ldb*m. Before entry, the leading
n-by-m part of the array b must contain the matrix B.
Output Parameters
121
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Vector Arguments
Compressed sparse vectors. Let a be a vector stored in an array, and assume that the only non-zero
elements of a are the following:
a[k1], a[k2], a[k3] . . . a[knz],
where nz is the total number of non-zero elements in a.
In Sparse BLAS, this vector can be represented in compressed form by two arrays, x (values) and indx
(indices). Each array has nz elements:
x[0]=a[k1], x[1]=a[k2], . . . x[nz-1]= a[knz],
indx[0]=k1, indx[1]=k2, . . . indx[nz-1]= knz.
Thus, a sparse vector is fully determined by the triple (nz, x, indx). If you pass a negative or zero value of nz
to Sparse BLAS, the subroutines do not modify any arrays or variables.
Full-storage vectors. Sparse BLAS routines can also use a vector argument fully stored in a single array (a
full-storage vector). If y is a full-storage vector, its elements must be stored contiguously: the first element
in y[0], the second in y[1], and so on. This corresponds to an increment incy = 1 in BLAS Level 1. No
increment value for full-storage vectors is passed as an argument to Sparse BLAS routines or functions.
122
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
cblas_i?amax index of the element with the largest absolute value for real flavors, or the
largest sum |Re(x[i])|+|Im(x[i])| for complex flavors.
cblas_i?amin index of the element with the smallest absolute value for real flavors, or the
smallest sum |Re(x[i])|+|Im(x[i])| for complex flavors.
The result i returned by i?amax and i?amin should be interpreted as index in the compressed-form array, so
that the largest (smallest) value is x[i-1]; the corresponding index in full-storage array is indx[i-1].
You can also call cblas_?rotg to compute the parameters of Givens rotation and then pass these
parameters to the Sparse BLAS routines cblas_?roti.
cblas_?axpyi
Adds a scalar multiple of compressed sparse vector to
a full-storage vector.
Syntax
void cblas_saxpyi (const MKL_INT nz, const float a, const float *x, const MKL_INT
*indx, float *y);
void cblas_daxpyi (const MKL_INT nz, const double a, const double *x, const MKL_INT
*indx, double *y);
void cblas_caxpyi (const MKL_INT nz, const void *a, const void *x, const MKL_INT *indx,
void *y);
void cblas_zaxpyi (const MKL_INT nz, const void *a, const void *x, const MKL_INT *indx,
void *y);
Include Files
• mkl.h
Description
y := a*x + y
where:
a is a scalar,
x is a sparse vector stored in compressed form,
y is a vector in full storage form.
The ?axpyi routines reference or modify only the elements of y whose indices are listed in the array indx.
123
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
Output Parameters
cblas_?doti
Computes the dot product of a compressed sparse real
vector by a full-storage real vector.
Syntax
float cblas_sdoti (const MKL_INT nz, const float *x, const MKL_INT *indx, const float
*y);
double cblas_ddoti (const MKL_INT nz, const double *x, const MKL_INT *indx, const
double *y);
Include Files
• mkl.h
Description
Input Parameters
124
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
cblas_?dotci
Computes the conjugated dot product of a
compressed sparse complex vector with a full-storage
complex vector.
Syntax
void cblas_cdotci_sub (const MKL_INT nz, const void *x, const MKL_INT *indx, const void
*y, void *dotui);
void cblas_zdotci_sub (const MKL_INT nz, const void *x, const MKL_INT *indx, const void
*y, void *dotui);
Include Files
• mkl.h
Description
Input Parameters
Output Parameters
cblas_?dotui
Computes the dot product of a compressed sparse
complex vector by a full-storage complex vector.
Syntax
void cblas_cdotui_sub (const MKL_INT nz, const void *x, const MKL_INT *indx, const void
*y, void *dotui);
125
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
void cblas_zdotui_sub (const MKL_INT nz, const void *x, const MKL_INT *indx, const void
*y, void *dotui);
Include Files
• mkl.h
Description
Input Parameters
Output Parameters
cblas_?gthr
Gathers a full-storage sparse vector's elements into
compressed form.
Syntax
void cblas_sgthr (const MKL_INT nz, const float *y, float *x, const MKL_INT *indx);
void cblas_dgthr (const MKL_INT nz, const double *y, double *x, const MKL_INT *indx);
void cblas_cgthr (const MKL_INT nz, const void *y, void *x, const MKL_INT *indx);
void cblas_zgthr (const MKL_INT nz, const void *y, void *x, const MKL_INT *indx);
Include Files
• mkl.h
Description
The ?gthr routines gather the specified elements of a full-storage sparse vector y into compressed form(nz,
x, indx). The routines reference only the elements of y whose indices are listed in the array indx:
x[i] = y]indx[i]], for i=0,1,... ,nz-1.
126
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
Output Parameters
cblas_?gthrz
Gathers a sparse vector's elements into compressed
form, replacing them by zeros.
Syntax
void cblas_sgthrz (const MKL_INT nz, float *y, float *x, const MKL_INT *indx);
void cblas_dgthrz (const MKL_INT nz, double *y, double *x, const MKL_INT *indx);
void cblas_cgthrz (const MKL_INT nz, void *y, void *x, const MKL_INT *indx);
void cblas_zgthrz (const MKL_INT nz, void *y, void *x, const MKL_INT *indx);
Include Files
• mkl.h
Description
The ?gthrz routines gather the elements with indices specified by the array indx from a full-storage vector y
into compressed form (nz, x, indx) and overwrite the gathered elements of y by zeros. Other elements of y
are not referenced or modified (see also ?gthr).
Input Parameters
Output Parameters
127
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
cblas_?roti
Applies Givens rotation to sparse vectors one of which
is in compressed form.
Syntax
void cblas_sroti (const MKL_INT nz, float *x, const MKL_INT *indx, float *y, const
float c, const float s);
void cblas_droti (const MKL_INT nz, double *x, const MKL_INT *indx, double *y, const
double c, const double s);
Include Files
• mkl.h
Description
The ?roti routines apply the Givens rotation to elements of two real vectors, x (in compressed form nz, x,
indx) and y (in full storage form):
Input Parameters
c A scalar.
s A scalar.
Output Parameters
cblas_?sctr
Converts compressed sparse vectors into full storage
form.
Syntax
void cblas_ssctr (const MKL_INT nz, const float *x, const MKL_INT *indx, float *y);
void cblas_dsctr (const MKL_INT nz, const double *x, const MKL_INT *indx, double *y);
void cblas_csctr (const MKL_INT nz, const void *x, const MKL_INT *indx, void *y);
128
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void cblas_zsctr (const MKL_INT nz, const void *x, const MKL_INT *indx, void *y);
Include Files
• mkl.h
Description
The ?sctr routines scatter the elements of the compressed sparse vector (nz, x, indx) to a full-storage
vector y. The routines modify only the elements of y whose indices are listed in the array indx:
y[indx[i]] = x[i], for i=0,1,... ,nz-1.
Input Parameters
Output Parameters
NOTE The Intel® oneAPI Math Kernel Library (oneMKL) Sparse BLAS Level 2 and Level 3 routines are
deprecated. Use the corresponding routine from the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface as indicated in the description for each routine.
This section describes Sparse BLAS Level 2 and Level 3 routines included in the Intel® oneAPI Math Kernel
Library (oneMKL) . Sparse BLAS Level 2 is a group of routines and functions that perform operations between
a sparse matrix and dense vectors. Sparse BLAS Level 3 is a group of routines and functions that perform
operations between a sparse matrix and dense matrices.
The terms and concepts required to understand the use of the Intel® oneAPI Math Kernel Library (oneMKL)
Sparse BLAS Level 2 and Level 3 routines are discussed in theLinear Solvers Basics appendix.
The Sparse BLAS routines can be useful to implement iterative methods for solving large sparse systems of
equations or eigenvalue problems. For example, these routines can be considered as building blocks for
Iterative Sparse Solvers based on Reverse Communication Interface (RCI ISS).
Intel® oneAPI Math Kernel Library (oneMKL) provides Sparse BLAS Level 2 and Level 3 routines with typical
(or conventional) interface similar to the interface used in the NIST* Sparse BLAS library [Rem05].
Some software packages and libraries (the PARDISO* Solverused in Intel® oneAPI Math Kernel Library
(oneMKL),Sparskit 2 [Saad94], the Compaq* Extended Math Library (CXML)[CXML01]) use different (early)
variation of the compressed sparse row (CSR) format and support only Level 2 operations with simplified
interfaces. Intel® oneAPI Math Kernel Library (oneMKL) provides an additional set of Sparse BLAS Level 2
routines with similar simplified interfaces. Each of these routines operates only on a matrix of the fixed type.
The routines described in this section support both one-based indexing and zero-based indexing of the input
data (see details in the section One-based and Zero-based Indexing).
129
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The <data> field indicates the sparse matrix storage format (see section Sparse Matrix Storage Formats):
130
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• compressed sparse row format (CSR) and its variations;
• compressed sparse column format (CSC);
• coordinate format;
• diagonal format;
• skyline storage format;
and one block entry storage format:
• block sparse row format (BSR) and its variations.
For more information see "Sparse Matrix Storage Formats" in the Appendix"Linear Solvers Basics".
Intel® oneAPI Math Kernel Library (oneMKL) provides auxiliary routines -matrix converters - that convert
sparse matrix from one storage format to another.
y := alpha*op(A)*x + beta*y
• solving a single triangular system:
y := alpha*inv(op(A))*x
• computing a product between sparse matrix and dense matrix:
C := alpha*op(A)*B + beta*C
• solving a sparse triangular system with multiple right-hand sides:
C := alpha*inv(op(A))*B
Intel® oneAPI Math Kernel Library (oneMKL) provides an additional set of the Sparse BLAS Level 2 routines
withsimplified interfaces. Each of these routines operates on a matrix of the fixed type. The following
operations are supported:
• computing the vector product between a sparse matrix and a dense vector (for general and symmetric
matrices):
y := op(A)*x
• solving a single triangular system (for triangular matrices):
y := inv(op(A))*x
Matrix type is indicated by the field <mtype> in the routine name (see section Naming Conventions in Sparse
BLAS Level 2 and Level 3).
131
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
The routines with simplified interfaces support only four sparse matrix storage formats, specifically:
CSR format in the 3-array variation accepted in the direct sparse solvers and in the CXML;
diagonal format accepted in the CXML;
coordinate format;
BSR format in the 3-array variation.
Note that routines with both typical (conventional) and simplified interfaces use the same computational
kernels that work with certain internal data structures.
The Intel® oneAPI Math Kernel Library (oneMKL) Sparse BLAS Level 2 and Level 3 routines do not support in-
place operations.
Complete list of all routines is given in the “Sparse BLAS Level 2 and Level 3 Routines”.
Interface Consideration
The detailed descriptions of the one-based and zero-based variants of the sparse data storage formats are
given in the "Sparse Matrix Storage Formats" in the Appendix "Linear Solvers Basics".
Most parameters of the routines are identical for both one-based and zero-based indexing, but some of them
have certain differences. The following table lists all these differences.
val Array containing non-zero elements of the Array containing non-zero elements of
matrix A, its length is . pntre[m] - the matrix A, its length is . pntre[m-1]
pntrb[1] - pntrb[0]
pntrb Array of length m. This array contains row Array of length m. This array contains row
indices, such that pntrb[i] - indices, such that pntrb[i] - pntrb[0]
pntrb[1]+1 is the first index of row i in is the first index of row i in the arrays
the arrays val and indx val and indx.
132
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Parameter One-based Indexing Zero-based Indexing
pntre Array of length m. This array contains row Array of length m. This array contains row
indices, such that pntre[I] - pntrb[1] indices, such that pntre[i] -
is the last index of row i in the arrays pntrb[0]-1 is the last index of row i in
val and indx. the arrays val and indx.
ia Array of length m + 1, containing indices Array of length m+1, containing indices of
of elements in the array a, such that elements in the array a, such that ia[i]
ia[i] is the index in the array a of the is the index in the array a of the first
first non-zero element from the row i. non-zero element from the row i. The
The value of the last element ia[m + 1] value of the last element ia[m] is equal
is equal to the number of non-zeros plus to the number of non-zeros.
one.
ldb Specifies the leading dimension of b as Specifies the second dimension of b as
declared in the calling (sub)program. declared in the calling (sub)program.
ldc Specifies the leading dimension of c as Specifies the second dimension of c as
declared in the calling (sub)program. declared in the calling (sub)program.
The analogous NIST* Sparse BLAS (NSB) library routines have the following interfaces:
xyyymm(transa, m, n, k, alpha, descra, arg(A), b, ldb, beta, c, ldc, work, lwork), for
matrix-matrix product;
xyyysm(transa, m, n, unitd, dv, alpha, descra, arg(A), b, ldb, beta, c, ldc, work,
lwork), for triangular solvers with multiple right-hand sides.
Some similar arguments are used in both libraries. The argument transa indicates what operation is
performed and is slightly different in the NSB library (see Table "Parameter transa"). The arguments m and k
are the number of rows and column in the matrix A, respectively, n is the number of columns in the matrix C.
The arguments alpha and beta are scalar alpha and beta respectively (betais not used in the Intel® oneAPI
Math Kernel Library (oneMKL) triangular solvers.) The argumentsb and c are rectangular arrays with the
leading dimension ldb and ldc, respectively. arg(A) denotes the list of arguments that describe the sparse
representation of A.
Parameter transa
value N or n 0 op(A) = A
T or t 1 op(A) = AT
133
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
C or c 2 op(A) = AT or op(A) =
AH
Parameter matdescra
The parameter matdescra describes the relevant characteristic of the matrix A. This manual describes
matdescraas an array of six elements in line with the NIST* implementation. However, only the first four
elements of the array are used in the current versions of the Intel® oneAPI Math Kernel Library (oneMKL)
Sparse BLAS routines. Elementsmatdescra[4] and matdescra[5] are reserved for future use. Note that
whether matdescrais described in your application as an array of length 6 or 4 is of no importance because
the array is declared as a pointer in the Intel® oneAPI Math Kernel Library (oneMKL) routines. To learn more
about declaration of thematdescraarray, see the Sparse BLAS examples located in the Intel® oneAPI Math
Kernel Library (oneMKL) installation directory:examples/spblasc/ for C. The table below lists elements of
the parameter matdescra, their Fortran values, and their meanings. The parameter matdescra corresponds
to the argument descra from NSB library.
one-based zero-based
indexing indexing
value G G 0 general
S S 1 symmetric (A = AT)
H H 2 Hermitian (A = (AH))
T T 3 triangular
A A 4 skew(anti)-symmetric (A = -AT)
D D 5 diagonal
value L L 1 lower
U U 2 upper
value N N 0 non-unit
U U 1 unit
type of indexing
4th element matdescra[4] matdescra[3] descra[3]
one-based indexing
value F 1
zero-based indexing
C 0
134
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
In some cases possible element values of the parameter matdescra depend on the values of other elements.
The Table "Possible Combinations of Element Values of the Parameter matdescra" lists all possible
combinations of element values for both multiplication routines and triangular solvers.
For a matrix in the skyline format with the main diagonal declared to be a unit, diagonal elements must be
stored in the sparse representation even if they are zero. In all other formats, diagonal elements can be
stored (if needed) in the sparse representation if they are not zero.
A = L + D + U
where L is the strict lower triangle of A, U is the strict upper triangle of A, D is the main diagonal.
Table "Output Matrices for Multiplication Routines" shows correspondence between the output matrices and
values of the parameter matdescra for the sparse matrix A for multiplication routines.
S or H L N alpha*op(L+D+L')*x + beta*y
alpha*op(L+D+L')*B + beta*C
S or H L U alpha*op(L+I+L')*x + beta*y
alpha*op(L+I+L')*B + beta*C
135
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
S or H U N alpha*op(U'+D+U)*x + beta*y
alpha*op(U'+D+U)*B + beta*C
S or H U U alpha*op(U'+I+U)*x + beta*y
alpha*op(U'+I+U)*B + beta*C
T L U alpha*op(L+I)*x + beta*y
alpha*op(L+I)*B + beta*C
T L N alpha*op(L+D)*x + beta*y
alpha*op(L+D)*B + beta*C
T U U alpha*op(U+I)*x + beta*y
alpha*op(U+I)*B + beta*C
T U N alpha*op(U+D)*x + beta*y
alpha*op(U+D)*B + beta*C
Table "Output Matrices for Triangular Solvers" shows correspondence between the output matrices and values
of the parameter matdescra for the sparse matrix A for triangular solvers.
Output Matrices for Triangular Solvers
matdescra[0] matdescra[1] matdescra[2] Output Matrix
T L N alpha*inv(op(L))*x
alpha*inv(op(L))*B
T L U alpha*inv(op(L))*x
alpha*inv(op(L))*B
T U N alpha*inv(op(U))*x
alpha*inv(op(U))*B
T U U alpha*inv(op(U))*x
alpha*inv(op(U))*B
D ignored N alpha*inv(D)*x
alpha*inv(D)*B
D ignored U alpha*x
alpha*B
136
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
NOTE The Intel® oneAPI Math Kernel Library (oneMKL) Sparse BLAS Level 2 and Level 3 routines are
deprecated. Use the corresponding routine from the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface as indicated in the description for each routine.
Table “Sparse BLAS Level 2 and Level 3 Routines” lists the sparse BLAS Level 2 and Level 3 routines
described in more detail later in this section.
137
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Routine/Function Description
138
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Routine/Function Description
Matrix converters
139
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Routine/Function Description
mkl_?csradd Computes the sum of two sparse matrices stored in the CSR
format (3-array variation) with one-based indexing.
mkl_?csrgemv
Computes matrix - vector product of a sparse general
matrix stored in the CSR format (3-array variation)
with one-based indexing (deprecated).
Syntax
void mkl_scsrgemv (const char *transa , const MKL_INT *m , const float *a , const
MKL_INT *ia , const MKL_INT *ja , const float *x , float *y );
void mkl_dcsrgemv (const char *transa , const MKL_INT *m , const double *a , const
MKL_INT *ia , const MKL_INT *ja , const double *x , double *y );
void mkl_ccsrgemv (const char *transa , const MKL_INT *m , const MKL_Complex8 *a ,
const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_zcsrgemv (const char *transa , const MKL_INT *m , const MKL_Complex16 *a ,
const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex16 *x , MKL_Complex16 *y );
Include Files
• mkl.h
Description
This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?csrgemv routine performs a matrix-vector operation defined as
y := A*x
or
y := AT*x,
where:
x and y are vectors,
A is an m-by-m sparse square matrix in the CSR format (3-array variation), AT is the transpose of A.
NOTE
This routine supports only one-based indexing of the input arrays.
140
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
ja Array containing the column indices plus one for each non-zero element of
the matrix A.
Its length is equal to the length of the array a. Refer to columns array
description in Sparse Matrix Storage Formats for more details.
x Array, size is m.
Output Parameters
mkl_?bsrgemv
Computes matrix - vector product of a sparse general
matrix stored in the BSR format (3-array variation)
with one-based indexing (deprecated).
Syntax
void mkl_sbsrgemv (const char *transa , const MKL_INT *m , const MKL_INT *lb , const
float *a , const MKL_INT *ia , const MKL_INT *ja , const float *x , float *y );
void mkl_dbsrgemv (const char *transa , const MKL_INT *m , const MKL_INT *lb , const
double *a , const MKL_INT *ia , const MKL_INT *ja , const double *x , double *y );
void mkl_cbsrgemv (const char *transa , const MKL_INT *m , const MKL_INT *lb , const
MKL_Complex8 *a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex8 *x ,
MKL_Complex8 *y );
void mkl_zbsrgemv (const char *transa , const MKL_INT *m , const MKL_INT *lb , const
MKL_Complex16 *a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex16 *x ,
MKL_Complex16 *y );
Include Files
• mkl.h
141
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Description
This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?bsrgemv routine performs a matrix-vector operation defined as
y := A*x
or
y := AT*x,
where:
x and y are vectors,
A is an m-by-m block sparse square matrix in the BSR format (3-array variation), AT is the transpose of A.
NOTE
This routine supports only one-based indexing of the input arrays.
Input Parameters
ja Array containing the column indices plus one for each non-zero block in the
matrix A.
Its length is equal to the number of non-zero blocks of the matrix A. Refer
to columns array description in BSR Format for more details.
142
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
mkl_?coogemv
Computes matrix-vector product of a sparse general
matrix stored in the coordinate format with one-based
indexing (deprecated).
Syntax
void mkl_scoogemv (const char *transa , const MKL_INT *m , const float *val , const
MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const float *x , float
*y );
void mkl_dcoogemv (const char *transa , const MKL_INT *m , const double *val , const
MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const double *x , double
*y );
void mkl_ccoogemv (const char *transa , const MKL_INT *m , const MKL_Complex8 *val ,
const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const MKL_Complex8
*x , MKL_Complex8 *y );
void mkl_zcoogemv (const char *transa , const MKL_INT *m , const MKL_Complex16 *val ,
const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const
MKL_Complex16 *x , MKL_Complex16 *y );
Include Files
• mkl.h
Description
This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?coogemv routine performs a matrix-vector operation defined as
y := A*x
or
y := AT*x,
where:
x and y are vectors,
A is an m-by-m sparse square matrix in the coordinate format, AT is the transpose of A.
NOTE
This routine supports only one-based indexing of the input arrays.
Input Parameters
143
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
val Array of length nnz, contains non-zero elements of the matrix A in the
arbitrary order.
Refer to values array description in Coordinate Format for more details.
rowind Array of length nnz, contains the row indices plus one for each non-zero
element of the matrix A.
Refer to rows array description in Coordinate Format for more details.
colind Array of length nnz, contains the column indices plus one for each non-zero
element of the matrix A. Refer to columns array description in Coordinate
Format for more details.
x Array, size is m.
Output Parameters
mkl_?diagemv
Computes matrix - vector product of a sparse general
matrix stored in the diagonal format with one-based
indexing (deprecated).
Syntax
void mkl_sdiagemv (const char *transa , const MKL_INT *m , const float *val , const
MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const float *x , float
*y );
void mkl_ddiagemv (const char *transa , const MKL_INT *m , const double *val , const
MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const double *x , double
*y );
void mkl_cdiagemv (const char *transa , const MKL_INT *m , const MKL_Complex8 *val ,
const MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const MKL_Complex8
*x , MKL_Complex8 *y );
void mkl_zdiagemv (const char *transa , const MKL_INT *m , const MKL_Complex16 *val ,
const MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const MKL_Complex16
*x , MKL_Complex16 *y );
144
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h
Description
This routine is deprecated, but no replacement is available yet in the Inspector-Executor Sparse BLAS API
interfaces. You can continue using this routine until a replacement is provided and this can be fully removed.
The mkl_?diagemv routine performs a matrix-vector operation defined as
y := A*x
or
y := AT*x,
where:
x and y are vectors,
A is an m-by-m sparse square matrix in the diagonal storage format, AT is the transpose of A.
NOTE
This routine supports only one-based indexing of the input arrays.
Input Parameters
idiag Array of length ndiag, contains the distances between main diagonal and
each non-zero diagonals in the matrix A.
Refer to distance array description in Diagonal Storage Scheme for more
details.
x Array, size is m.
Output Parameters
145
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
mkl_?csrsymv
Computes matrix - vector product of a sparse
symmetrical matrix stored in the CSR format (3-array
variation) with one-based indexing (deprecated).
Syntax
void mkl_scsrsymv (const char *uplo , const MKL_INT *m , const float *a , const MKL_INT
*ia , const MKL_INT *ja , const float *x , float *y );
void mkl_dcsrsymv (const char *uplo , const MKL_INT *m , const double *a , const
MKL_INT *ia , const MKL_INT *ja , const double *x , double *y );
void mkl_ccsrsymv (const char *uplo , const MKL_INT *m , const MKL_Complex8 *a , const
MKL_INT *ia , const MKL_INT *ja , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_zcsrsymv (const char *uplo , const MKL_INT *m , const MKL_Complex16 *a , const
MKL_INT *ia , const MKL_INT *ja , const MKL_Complex16 *x , MKL_Complex16 *y );
Include Files
• mkl.h
Description
This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?csrsymv routine performs a matrix-vector operation defined as
y := A*x
where:
x and y are vectors,
A is an upper or lower triangle of the symmetrical sparse matrix in the CSR format (3-array variation).
NOTE
This routine supports only one-based indexing of the input arrays.
Input Parameters
uplo Specifies whether the upper or low triangle of the matrix A is used.
If uplo = 'U' or 'u', then the upper triangle of the matrix A is used.
If uplo = 'L' or 'l', then the low triangle of the matrix A is used.
146
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ia Array of length m + 1, containing indices of elements in the array a, such
that ia[i] - ia[0] is the index in the array a of the first non-zero
element from the row i. The value of the last element ia[m] - ia[0] is
equal to the number of non-zeros. Refer to rowIndex array description in
Sparse Matrix Storage Formats for more details.
ja Array containing the column indices plus one for each non-zero element of
the matrix A.
Its length is equal to the length of the array a. Refer to columns array
description in Sparse Matrix Storage Formats for more details.
x Array, size is m.
Output Parameters
mkl_?bsrsymv
Computes matrix-vector product of a sparse
symmetrical matrix stored in the BSR format (3-array
variation) with one-based indexing (deprecated).
Syntax
void mkl_sbsrsymv (const char *uplo , const MKL_INT *m , const MKL_INT *lb , const
float *a , const MKL_INT *ia , const MKL_INT *ja , const float *x , float *y );
void mkl_dbsrsymv (const char *uplo , const MKL_INT *m , const MKL_INT *lb , const
double *a , const MKL_INT *ia , const MKL_INT *ja , const double *x , double *y );
void mkl_cbsrsymv (const char *uplo , const MKL_INT *m , const MKL_INT *lb , const
MKL_Complex8 *a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex8 *x ,
MKL_Complex8 *y );
void mkl_zbsrsymv (const char *uplo , const MKL_INT *m , const MKL_INT *lb , const
MKL_Complex16 *a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex16 *x ,
MKL_Complex16 *y );
Include Files
• mkl.h
Description
This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?bsrsymv routine performs a matrix-vector operation defined as
y := A*x
where:
x and y are vectors,
147
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
A is an upper or lower triangle of the symmetrical sparse matrix in the BSR format (3-array variation).
NOTE
This routine supports only one-based indexing of the input arrays.
Input Parameters
uplo Specifies whether the upper or low triangle of the matrix A is considered.
If uplo = 'U' or 'u', then the upper triangle of the matrix A is used.
If uplo = 'L' or 'l', then the low triangle of the matrix A is used.
ja Array containing the column indices plus one for each non-zero block in the
matrix A.
Its length is equal to the number of non-zero blocks of the matrix A. Refer
to columns array description in BSR Format for more details.
Output Parameters
mkl_?coosymv
Computes matrix - vector product of a sparse
symmetrical matrix stored in the coordinate format
with one-based indexing (deprecated).
Syntax
void mkl_scoosymv (const char *uplo , const MKL_INT *m , const float *val , const
MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const float *x , float
*y );
148
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void mkl_dcoosymv (const char *uplo , const MKL_INT *m , const double *val , const
MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const double *x , double
*y );
void mkl_ccoosymv (const char *uplo , const MKL_INT *m , const MKL_Complex8 *val ,
const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const MKL_Complex8
*x , MKL_Complex8 *y );
void mkl_zcoosymv (const char *uplo , const MKL_INT *m , const MKL_Complex16 *val ,
const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const
MKL_Complex16 *x , MKL_Complex16 *y );
Include Files
• mkl.h
Description
This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?coosymv routine performs a matrix-vector operation defined as
y := A*x
where:
x and y are vectors,
A is an upper or lower triangle of the symmetrical sparse matrix in the coordinate format.
NOTE
This routine supports only one-based indexing of the input arrays.
Input Parameters
uplo Specifies whether the upper or low triangle of the matrix A is used.
If uplo = 'U' or 'u', then the upper triangle of the matrix A is used.
If uplo = 'L' or 'l', then the low triangle of the matrix A is used.
val Array of length nnz, contains non-zero elements of the matrix A in the
arbitrary order.
Refer to values array description in Coordinate Format for more details.
rowind Array of length nnz, contains the row indices plus one for each non-zero
element of the matrix A.
Refer to rows array description in Coordinate Format for more details.
colind Array of length nnz, contains the column indices plus one for each non-zero
element of the matrix A. Refer to columns array description in Coordinate
Format for more details.
149
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
x Array, size is m.
Output Parameters
mkl_?diasymv
Computes matrix - vector product of a sparse
symmetrical matrix stored in the diagonal format with
one-based indexing (deprecated).
Syntax
void mkl_sdiasymv (const char *uplo , const MKL_INT *m , const float *val , const
MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const float *x , float
*y );
void mkl_ddiasymv (const char *uplo , const MKL_INT *m , const double *val , const
MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const double *x , double
*y );
void mkl_cdiasymv (const char *uplo , const MKL_INT *m , const MKL_Complex8 *val ,
const MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const MKL_Complex8
*x , MKL_Complex8 *y );
void mkl_zdiasymv (const char *uplo , const MKL_INT *m , const MKL_Complex16 *val ,
const MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const MKL_Complex16
*x , MKL_Complex16 *y );
Include Files
• mkl.h
Description
This routine is deprecated, but no replacement is available yet in the Inspector-Executor Sparse BLAS API
interfaces. You can continue using this routine until a replacement is provided and this can be fully removed.
The mkl_?diasymv routine performs a matrix-vector operation defined as
y := A*x
where:
x and y are vectors,
A is an upper or lower triangle of the symmetrical sparse matrix.
NOTE
This routine supports only one-based indexing of the input arrays.
150
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
uplo Specifies whether the upper or low triangle of the matrix A is used.
If uplo = 'U' or 'u', then the upper triangle of the matrix A is used.
If uplo = 'L' or 'l', then the low triangle of the matrix A is used.
idiag Array of length ndiag, contains the distances between main diagonal and
each non-zero diagonals in the matrix A.
Refer to distance array description in Diagonal Storage Scheme for more
details.
x Array, size is m.
Output Parameters
mkl_?csrtrsv
Triangular solvers with simplified interface for a sparse
matrix in the CSR format (3-array variation) with one-
based indexing (deprecated).
Syntax
void mkl_scsrtrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const float *a , const MKL_INT *ia , const MKL_INT *ja , const float *x ,
float *y );
void mkl_dcsrtrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const double *a , const MKL_INT *ia , const MKL_INT *ja , const double
*x , double *y );
void mkl_ccsrtrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const MKL_Complex8 *a , const MKL_INT *ia , const MKL_INT *ja , const
MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_zcsrtrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const MKL_Complex16 *a , const MKL_INT *ia , const MKL_INT *ja , const
MKL_Complex16 *x , MKL_Complex16 *y );
151
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
This routine is deprecated. Use mkl_sparse_?_trsvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?csrtrsv routine solves a system of linear equations with matrix-vector operations for a sparse
matrix stored in the CSR format (3 array variation):
A*y = x
or
AT*y = x,
where:
x and y are vectors,
A is a sparse upper or lower triangular matrix with unit or non-unit main diagonal, AT is the transpose of A.
NOTE
This routine supports only one-based indexing of the input arrays.
Input Parameters
uplo Specifies whether the upper or low triangle of the matrix A is used.
If uplo = 'U' or 'u', then the upper triangle of the matrix A is used.
If uplo = 'L' or 'l', then the low triangle of the matrix A is used.
152
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
NOTE
The non-zero elements of the given row of the matrix must be
stored in the same order as they appear in the row (from left to
right).
No diagonal element can be omitted from a sparse storage if the solver
is called with the non-unit indicator.
ja Array containing the column indices plus one for each non-zero element of
the matrix A.
Its length is equal to the length of the array a. Refer to columns array
description in Sparse Matrix Storage Formats for more details.
NOTE
Column indices must be sorted in increasing order for each row.
x Array, size is m.
Output Parameters
mkl_?bsrtrsv
Triangular solver with simplified interface for a sparse
matrix stored in the BSR format (3-array variation)
with one-based indexing (deprecated).
Syntax
void mkl_sbsrtrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const MKL_INT *lb , const float *a , const MKL_INT *ia , const MKL_INT
*ja , const float *x , float *y );
void mkl_dbsrtrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const MKL_INT *lb , const double *a , const MKL_INT *ia , const MKL_INT
*ja , const double *x , double *y );
void mkl_cbsrtrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const MKL_INT *lb , const MKL_Complex8 *a , const MKL_INT *ia , const
MKL_INT *ja , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_zbsrtrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const MKL_INT *lb , const MKL_Complex16 *a , const MKL_INT *ia , const
MKL_INT *ja , const MKL_Complex16 *x , MKL_Complex16 *y );
153
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
This routine is deprecated. Use mkl_sparse_?_trsvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?bsrtrsv routine solves a system of linear equations with matrix-vector operations for a sparse
matrix stored in the BSR format (3-array variation) :
y := A*x
or
y := AT*x,
where:
x and y are vectors,
A is a sparse upper or lower triangular matrix with unit or non-unit main diagonal, AT is the transpose of A.
NOTE
This routine supports only one-based indexing of the input arrays.
Input Parameters
If uplo = 'L' or 'l', then the low triangle of the matrix A is used.
154
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
NOTE
The non-zero elements of the given row of the matrix must be
stored in the same order as they appear in the row (from left to
right).
No diagonal element can be omitted from a sparse storage if the solver
is called with the non-unit indicator.
ja Array containing the column indices plus one for each non-zero block in the
matrix A.
Its length is equal to the number of non-zero blocks of the matrix A. Refer
to columns array description in BSR Format for more details.
Output Parameters
mkl_?cootrsv
Triangular solvers with simplified interface for a sparse
matrix in the coordinate format with one-based
indexing (deprecated).
Syntax
void mkl_scootrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const float *val , const MKL_INT *rowind , const MKL_INT *colind , const
MKL_INT *nnz , const float *x , float *y );
void mkl_dcootrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const double *val , const MKL_INT *rowind , const MKL_INT *colind , const
MKL_INT *nnz , const double *x , double *y );
void mkl_ccootrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const MKL_Complex8 *val , const MKL_INT *rowind , const MKL_INT *colind ,
const MKL_INT *nnz , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_zcootrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const MKL_Complex16 *val , const MKL_INT *rowind , const MKL_INT *colind ,
const MKL_INT *nnz , const MKL_Complex16 *x , MKL_Complex16 *y );
Include Files
• mkl.h
155
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Description
This routine is deprecated. Use mkl_sparse_?_trsvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?cootrsv routine solves a system of linear equations with matrix-vector operations for a sparse
matrix stored in the coordinate format:
A*y = x
or
AT*y = x,
where:
x and y are vectors,
A is a sparse upper or lower triangular matrix with unit or non-unit main diagonal, AT is the transpose of A.
NOTE
This routine supports only one-based indexing of the input arrays.
Input Parameters
uplo Specifies whether the upper or low triangle of the matrix A is considered.
If uplo = 'U' or 'u', then the upper triangle of the matrix A is used.
If uplo = 'L' or 'l', then the low triangle of the matrix A is used.
val Array of length nnz, contains non-zero elements of the matrix A in the
arbitrary order.
Refer to values array description in Coordinate Format for more details.
rowind Array of length nnz, contains the row indices plus one for each non-zero
element of the matrix A.
Refer to rows array description in Coordinate Format for more details.
colind Array of length nnz, contains the column indices plus one for each non-zero
element of the matrix A. Refer to columns array description in Coordinate
Format for more details.
156
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Refer to nnz description in Coordinate Format for more details.
x Array, size is m.
Output Parameters
mkl_?diatrsv
Triangular solvers with simplified interface for a sparse
matrix in the diagonal format with one-based indexing
(deprecated).
Syntax
void mkl_sdiatrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const float *val , const MKL_INT *lval , const MKL_INT *idiag , const
MKL_INT *ndiag , const float *x , float *y );
void mkl_ddiatrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const double *val , const MKL_INT *lval , const MKL_INT *idiag , const
MKL_INT *ndiag , const double *x , double *y );
void mkl_cdiatrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const MKL_Complex8 *val , const MKL_INT *lval , const MKL_INT *idiag ,
const MKL_INT *ndiag , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_zdiatrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const MKL_Complex16 *val , const MKL_INT *lval , const MKL_INT *idiag ,
const MKL_INT *ndiag , const MKL_Complex16 *x , MKL_Complex16 *y );
Include Files
• mkl.h
Description
This routine is deprecated, but no replacement is available yet in the Inspector-Executor Sparse BLAS API
interfaces. You can continue using this routine until a replacement is provided and this can be fully removed.
The mkl_?diatrsv routine solves a system of linear equations with matrix-vector operations for a sparse
matrix stored in the diagonal format:
A*y = x
or
AT*y = x,
where:
x and y are vectors,
A is a sparse upper or lower triangular matrix with unit or non-unit main diagonal, AT is the transpose of A.
157
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
This routine supports only one-based indexing of the input arrays.
Input Parameters
uplo Specifies whether the upper or low triangle of the matrix A is used.
If uplo = 'U' or 'u', then the upper triangle of the matrix A is used.
If uplo = 'L' or 'l', then the low triangle of the matrix A is used.
idiag Array of length ndiag, contains the distances between main diagonal and
each non-zero diagonals in the matrix A.
NOTE
All elements of this array must be sorted in increasing order.
x Array, size is m.
Output Parameters
mkl_cspblas_?csrgemv
Computes matrix - vector product of a sparse general
matrix stored in the CSR format (3-array variation)
with zero-based indexing (deprecated).
158
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void mkl_cspblas_scsrgemv (const char *transa , const MKL_INT *m , const float *a ,
const MKL_INT *ia , const MKL_INT *ja , const float *x , float *y );
void mkl_cspblas_dcsrgemv (const char *transa , const MKL_INT *m , const double *a ,
const MKL_INT *ia , const MKL_INT *ja , const double *x , double *y );
void mkl_cspblas_ccsrgemv (const char *transa , const MKL_INT *m , const MKL_Complex8
*a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_cspblas_zcsrgemv (const char *transa , const MKL_INT *m , const MKL_Complex16
*a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex16 *x , MKL_Complex16
*y );
Include Files
• mkl.h
Description
This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_cspblas_?csrgemv routine performs a matrix-vector operation defined as
y := A*x
or
y := AT*x,
where:
x and y are vectors,
A is an m-by-m sparse square matrix in the CSR format (3-array variation) with zero-based indexing, AT is
the transpose of A.
NOTE
This routine supports only zero-based indexing of the input arrays.
Input Parameters
159
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ja Array containing the column indices for each non-zero element of the
matrix A.
Its length is equal to the length of the array a. Refer to columns array
description in Sparse Matrix Storage Formats for more details.
x Array, size is m.
Output Parameters
mkl_cspblas_?bsrgemv
Computes matrix - vector product of a sparse general
matrix stored in the BSR format (3-array variation)
with zero-based indexing (deprecated).
Syntax
void mkl_cspblas_sbsrgemv (const char *transa , const MKL_INT *m , const MKL_INT *lb ,
const float *a , const MKL_INT *ia , const MKL_INT *ja , const float *x , float *y );
void mkl_cspblas_dbsrgemv (const char *transa , const MKL_INT *m , const MKL_INT *lb ,
const double *a , const MKL_INT *ia , const MKL_INT *ja , const double *x , double
*y );
void mkl_cspblas_cbsrgemv (const char *transa , const MKL_INT *m , const MKL_INT *lb ,
const MKL_Complex8 *a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex8 *x ,
MKL_Complex8 *y );
void mkl_cspblas_zbsrgemv (const char *transa , const MKL_INT *m , const MKL_INT *lb ,
const MKL_Complex16 *a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex16
*x , MKL_Complex16 *y );
Include Files
• mkl.h
Description
This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_cspblas_?bsrgemv routine performs a matrix-vector operation defined as
y := A*x
or
y := AT*x,
160
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
where:
x and y are vectors,
A is an m-by-m block sparse square matrix in the BSR format (3-array variation) with zero-based indexing,
AT is the transpose of A.
NOTE
This routine supports only zero-based indexing of the input arrays.
Input Parameters
ja Array containing the column indices for each non-zero block in the matrix A.
Its length is equal to the number of non-zero blocks of the matrix A. Refer
to columns array description in BSR Format for more details.
Output Parameters
mkl_cspblas_?coogemv
Computes matrix - vector product of a sparse general
matrix stored in the coordinate format with zero-
based indexing (deprecated).
161
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
void mkl_cspblas_scoogemv (const char *transa , const MKL_INT *m , const float *val ,
const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const float *x ,
float *y );
void mkl_cspblas_dcoogemv (const char *transa , const MKL_INT *m , const double *val ,
const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const double *x ,
double *y );
void mkl_cspblas_ccoogemv (const char *transa , const MKL_INT *m , const MKL_Complex8
*val , const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const
MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_cspblas_zcoogemv (const char *transa , const MKL_INT *m , const MKL_Complex16
*val , const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const
MKL_Complex16 *x , MKL_Complex16 *y );
Include Files
• mkl.h
Description
This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_cspblas_dcoogemv routine performs a matrix-vector operation defined as
y := A*x
or
y := AT*x,
where:
x and y are vectors,
A is an m-by-m sparse square matrix in the coordinate format with zero-based indexing, AT is the transpose
of A.
NOTE
This routine supports only zero-based indexing of the input arrays.
Input Parameters
val Array of length nnz, contains non-zero elements of the matrix A in the
arbitrary order.
162
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Refer to values array description in Coordinate Format for more details.
rowind Array of length nnz, contains the row indices for each non-zero element of
the matrix A.
Refer to rows array description in Coordinate Format for more details.
colind Array of length nnz, contains the column indices for each non-zero element
of the matrix A. Refer to columns array description in Coordinate Format
for more details.
x Array, size is m.
Output Parameters
mkl_cspblas_?csrsymv
Computes matrix-vector product of a sparse
symmetrical matrix stored in the CSR format (3-array
variation) with zero-based indexing (deprecated).
Syntax
void mkl_cspblas_scsrsymv (const char *uplo , const MKL_INT *m , const float *a , const
MKL_INT *ia , const MKL_INT *ja , const float *x , float *y );
void mkl_cspblas_dcsrsymv (const char *uplo , const MKL_INT *m , const double *a ,
const MKL_INT *ia , const MKL_INT *ja , const double *x , double *y );
void mkl_cspblas_ccsrsymv (const char *uplo , const MKL_INT *m , const MKL_Complex8
*a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_cspblas_zcsrsymv (const char *uplo , const MKL_INT *m , const MKL_Complex16
*a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex16 *x , MKL_Complex16
*y );
Include Files
• mkl.h
Description
This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_cspblas_?csrsymv routine performs a matrix-vector operation defined as
y := A*x
where:
x and y are vectors,
163
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
A is an upper or lower triangle of the symmetrical sparse matrix in the CSR format (3-array variation) with
zero-based indexing.
NOTE
This routine supports only zero-based indexing of the input arrays.
Input Parameters
uplo Specifies whether the upper or low triangle of the matrix A is used.
If uplo = 'U' or 'u', then the upper triangle of the matrix A is used.
If uplo = 'L' or 'l', then the low triangle of the matrix A is used.
ja Array containing the column indices for each non-zero element of the
matrix A.
Its length is equal to the length of the array a. Refer to columns array
description in Sparse Matrix Storage Formats for more details.
x Array, size is m.
Output Parameters
mkl_cspblas_?bsrsymv
Computes matrix-vector product of a sparse
symmetrical matrix stored in the BSR format (3-arrays
variation) with zero-based indexing (deprecated).
Syntax
void mkl_cspblas_sbsrsymv (const char *uplo , const MKL_INT *m , const MKL_INT *lb ,
const float *a , const MKL_INT *ia , const MKL_INT *ja , const float *x , float *y );
void mkl_cspblas_dbsrsymv (const char *uplo , const MKL_INT *m , const MKL_INT *lb ,
const double *a , const MKL_INT *ia , const MKL_INT *ja , const double *x , double
*y );
164
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void mkl_cspblas_cbsrsymv (const char *uplo , const MKL_INT *m , const MKL_INT *lb ,
const MKL_Complex8 *a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex8 *x ,
MKL_Complex8 *y );
void mkl_cspblas_zbsrsymv (const char *uplo , const MKL_INT *m , const MKL_INT *lb ,
const MKL_Complex16 *a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex16
*x , MKL_Complex16 *y );
Include Files
• mkl.h
Description
This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_cspblas_?bsrsymv routine performs a matrix-vector operation defined as
y := A*x
where:
x and y are vectors,
A is an upper or lower triangle of the symmetrical sparse matrix in the BSR format (3-array variation) with
zero-based indexing.
NOTE
This routine supports only zero-based indexing of the input arrays.
Input Parameters
uplo Specifies whether the upper or low triangle of the matrix A is used.
If uplo = 'U' or 'u', then the upper triangle of the matrix A is used.
If uplo = 'L' or 'l', then the low triangle of the matrix A is used.
ja Array containing the column indices for each non-zero block in the matrix A.
Its length is equal to the number of non-zero blocks of the matrix A. Refer
to columns array description in BSR Format for more details.
165
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
mkl_cspblas_?coosymv
Computes matrix - vector product of a sparse
symmetrical matrix stored in the coordinate format
with zero-based indexing (deprecated).
Syntax
void mkl_cspblas_scoosymv (const char *uplo , const MKL_INT *m , const float *val ,
const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const float *x ,
float *y );
void mkl_cspblas_dcoosymv (const char *uplo , const MKL_INT *m , const double *val ,
const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const double *x ,
double *y );
void mkl_cspblas_ccoosymv (const char *uplo , const MKL_INT *m , const MKL_Complex8
*val , const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const
MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_cspblas_zcoosymv (const char *uplo , const MKL_INT *m , const MKL_Complex16
*val , const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const
MKL_Complex16 *x , MKL_Complex16 *y );
Include Files
• mkl.h
Description
This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_cspblas_?coosymv routine performs a matrix-vector operation defined as
y := A*x
where:
x and y are vectors,
A is an upper or lower triangle of the symmetrical sparse matrix in the coordinate format with zero-based
indexing.
NOTE
This routine supports only zero-based indexing of the input arrays.
166
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
uplo Specifies whether the upper or low triangle of the matrix A is used.
If uplo = 'U' or 'u', then the upper triangle of the matrix A is used.
If uplo = 'L' or 'l', then the low triangle of the matrix A is used.
val Array of length nnz, contains non-zero elements of the matrix A in the
arbitrary order.
Refer to values array description in Coordinate Format for more details.
rowind Array of length nnz, contains the row indices for each non-zero element of
the matrix A.
Refer to rows array description in Coordinate Format for more details.
colind Array of length nnz, contains the column indices for each non-zero element
of the matrix A. Refer to columns array description in Coordinate Format
for more details.
x Array, size is m.
Output Parameters
mkl_cspblas_?csrtrsv
Triangular solvers with simplified interface for a sparse
matrix in the CSR format (3-array variation) with
zero-based indexing (deprecated).
Syntax
void mkl_cspblas_scsrtrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const float *a , const MKL_INT *ia , const MKL_INT *ja , const float
*x , float *y );
void mkl_cspblas_dcsrtrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const double *a , const MKL_INT *ia , const MKL_INT *ja , const
double *x , double *y );
void mkl_cspblas_ccsrtrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const MKL_Complex8 *a , const MKL_INT *ia , const MKL_INT *ja ,
const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_cspblas_zcsrtrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const MKL_Complex16 *a , const MKL_INT *ia , const MKL_INT *ja ,
const MKL_Complex16 *x , MKL_Complex16 *y );
167
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
This routine is deprecated. Use mkl_sparse_?_trsvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_cspblas_?csrtrsv routine solves a system of linear equations with matrix-vector operations for a
sparse matrix stored in the CSR format (3-array variation) with zero-based indexing:
A*y = x
or
AT*y = x,
where:
x and y are vectors,
A is a sparse upper or lower triangular matrix with unit or non-unit main diagonal, AT is the transpose of A.
NOTE
This routine supports only zero-based indexing of the input arrays.
Input Parameters
uplo Specifies whether the upper or low triangle of the matrix A is used.
If uplo = 'U' or 'u', then the upper triangle of the matrix A is used.
If uplo = 'L' or 'l', then the low triangle of the matrix A is used.
168
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
NOTE
The non-zero elements of the given row of the matrix must be
stored in the same order as they appear in the row (from left to
right).
No diagonal element can be omitted from a sparse storage if the solver
is called with the non-unit indicator.
ia Array of length m+1, containing indices of elements in the array a, such that
ia[i] is the index in the array a of the first non-zero element from the row
i. The value of the last element ia[m] is equal to the number of non-zeros.
Refer to rowIndex array description in Sparse Matrix Storage Formats for
more details.
ja Array containing the column indices for each non-zero element of the
matrix A.
Its length is equal to the length of the array a. Refer to columns array
description in Sparse Matrix Storage Formats for more details.
NOTE
Column indices must be sorted in increasing order for each row.
x Array, size is m.
Output Parameters
mkl_cspblas_?bsrtrsv
Triangular solver with simplified interface for a sparse
matrix stored in the BSR format (3-array variation)
with zero-based indexing (deprecated).
Syntax
void mkl_cspblas_sbsrtrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const MKL_INT *lb , const float *a , const MKL_INT *ia , const
MKL_INT *ja , const float *x , float *y );
void mkl_cspblas_dbsrtrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const MKL_INT *lb , const double *a , const MKL_INT *ia , const
MKL_INT *ja , const double *x , double *y );
void mkl_cspblas_cbsrtrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const MKL_INT *lb , const MKL_Complex8 *a , const MKL_INT *ia ,
const MKL_INT *ja , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_cspblas_zbsrtrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const MKL_INT *lb , const MKL_Complex16 *a , const MKL_INT *ia ,
const MKL_INT *ja , const MKL_Complex16 *x , MKL_Complex16 *y );
169
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
This routine is deprecated. Use mkl_sparse_?_trsvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_cspblas_?bsrtrsv routine solves a system of linear equations with matrix-vector operations for a
sparse matrix stored in the BSR format (3-array variation) with zero-based indexing:
y := A*x
or
y := AT*x,
where:
x and y are vectors,
A is a sparse upper or lower triangular matrix with unit or non-unit main diagonal, AT is the transpose of A.
NOTE
This routine supports only zero-based indexing of the input arrays.
Input Parameters
If uplo = 'L' or 'l', then the low triangle of the matrix A is used.
170
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
NOTE
The non-zero elements of the given row of the matrix must be
stored in the same order as they appear in the row (from left to
right).
No diagonal element can be omitted from a sparse storage if the solver
is called with the non-unit indicator.
ja Array containing the column indices for each non-zero block in the matrix A.
Its length is equal to the number of non-zero blocks of the matrix A. Refer
to columns array description in BSR Format for more details.
Output Parameters
mkl_cspblas_?cootrsv
Triangular solvers with simplified interface for a sparse
matrix in the coordinate format with zero-based
indexing (deprecated).
Syntax
void mkl_cspblas_scootrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const float *val , const MKL_INT *rowind , const MKL_INT *colind ,
const MKL_INT *nnz , const float *x , float *y );
void mkl_cspblas_dcootrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const double *val , const MKL_INT *rowind , const MKL_INT *colind ,
const MKL_INT *nnz , const double *x , double *y );
void mkl_cspblas_ccootrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const MKL_Complex8 *val , const MKL_INT *rowind , const MKL_INT
*colind , const MKL_INT *nnz , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_cspblas_zcootrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const MKL_Complex16 *val , const MKL_INT *rowind , const MKL_INT
*colind , const MKL_INT *nnz , const MKL_Complex16 *x , MKL_Complex16 *y );
Include Files
• mkl.h
171
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Description
This routine is deprecated. Use mkl_sparse_?_trsvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_cspblas_?cootrsv routine solves a system of linear equations with matrix-vector operations for a
sparse matrix stored in the coordinate format with zero-based indexing:
A*y = x
or
AT*y = x,
where:
x and y are vectors,
A is a sparse upper or lower triangular matrix with unit or non-unit main diagonal, AT is the transpose of A.
NOTE
This routine supports only zero-based indexing of the input arrays.
Input Parameters
uplo Specifies whether the upper or low triangle of the matrix A is considered.
If uplo = 'U' or 'u', then the upper triangle of the matrix A is used.
If uplo = 'L' or 'l', then the low triangle of the matrix A is used.
val Array of length nnz, contains non-zero elements of the matrix A in the
arbitrary order.
Refer to values array description in Coordinate Format for more details.
rowind Array of length nnz, contains the row indices for each non-zero element of
the matrix A.
Refer to rows array description in Coordinate Format for more details.
colind Array of length nnz, contains the column indices for each non-zero element
of the matrix A. Refer to columns array description in Coordinate Format
for more details.
172
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Refer to nnz description in Coordinate Format for more details.
x Array, size is m.
Output Parameters
mkl_?csrmv
Computes matrix - vector product of a sparse matrix
stored in the CSR format (deprecated).
Syntax
void mkl_scsrmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const float
*alpha , const char *matdescra , const float *val , const MKL_INT *indx , const MKL_INT
*pntrb , const MKL_INT *pntre , const float *x , const float *beta , float *y );
void mkl_dcsrmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
double *alpha , const char *matdescra , const double *val , const MKL_INT *indx , const
MKL_INT *pntrb , const MKL_INT *pntre , const double *x , const double *beta , double
*y );
void mkl_ccsrmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8 *val , const MKL_INT
*indx , const MKL_INT *pntrb , const MKL_INT *pntre , const MKL_Complex8 *x , const
MKL_Complex8 *beta , MKL_Complex8 *y );
void mkl_zcsrmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16 *val , const MKL_INT
*indx , const MKL_INT *pntrb , const MKL_INT *pntre , const MKL_Complex16 *x , const
MKL_Complex16 *beta , MKL_Complex16 *y );
Include Files
• mkl.h
Description
This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?csrmv routine performs a matrix-vector operation defined as
y := alpha*A*x + beta*y
or
y := alpha*AT*x + beta*y,
where:
alpha and beta are scalars,
x and y are vectors,
A is an m-by-k sparse matrix in the CSR format, AT is the transpose of A.
173
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
This routine supports a CSR format both with one-based indexing and zero-based indexing.
Input Parameters
matdescra Array of six elements, specifies properties of the matrix used for operation.
Only first four array elements are used, their possible values are given in
Table “Possible Values of the Parameter matdescra (descra)”. Possible
combinations of element values of this parameter are given in Table
“Possible Combinations of Element Values of the Parameter matdescra”.
indx For one-based indexing, array containing the column indices plus one for
each non-zero element of the matrix A. For zero-based indexing, array
containing the column indices for each non-zero element of the matrix A.
Its length is equal to length of the val array.
This array contains row indices, such that pntrb[i] - pntrb[0] is the
first index of row i in the arrays val and indx.
This array contains row indices, such that pntre[i] - pntrb[0]-1 is the
last index of row i in the arrays val and indx.
174
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
mkl_?bsrmv
Computes matrix - vector product of a sparse matrix
stored in the BSR format (deprecated).
Syntax
void mkl_sbsrmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_INT *lb , const float *alpha , const char *matdescra , const float *val , const
MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const float *x , const
float *beta , float *y );
void mkl_dbsrmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_INT *lb , const double *alpha , const char *matdescra , const double *val , const
MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const double *x , const
double *beta , double *y );
void mkl_cbsrmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_INT *lb , const MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8
*val , const MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const
MKL_Complex8 *x , const MKL_Complex8 *beta , MKL_Complex8 *y );
void mkl_zbsrmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_INT *lb , const MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16
*val , const MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const
MKL_Complex16 *x , const MKL_Complex16 *beta , MKL_Complex16 *y );
Include Files
• mkl.h
Description
This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?bsrmv routine performs a matrix-vector operation defined as
y := alpha*A*x + beta*y
or
y := alpha*AT*x + beta*y,
where:
alpha and beta are scalars,
x and y are vectors,
A is an m-by-k block sparse matrix in the BSR format, AT is the transpose of A.
NOTE
This routine supports a BSR format both with one-based indexing and zero-based indexing.
175
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
matdescra Array of six elements, specifies properties of the matrix used for operation.
Only first four array elements are used, their possible values are given in
Table “Possible Values of the Parameter matdescra (descra)”. Possible
combinations of element values of this parameter are given in Table
“Possible Combinations of Element Values of the Parameter matdescra”.
val Array containing elements of non-zero blocks of the matrix A. Its length is
equal to the number of non-zero blocks in the matrix A multiplied by lb*lb.
indx For one-based indexing, array containing the column indices plus one for
each non-zero block of the matrix A. For zero-based indexing, array
containing the column indices for each non-zero block of the matrix A.
Its length is equal to the number of non-zero blocks in the matrix A.
Refer to columns array description in BSR Format for more details.
This array contains row indices, such that pntrb[i] - pntrb[0] is the
first index of block row i in the array indx
For zero-based indexing this array contains row indices, such that
pntre[i] - pntrb[0] - 1 is the last index of block row i in the array
indx.
Refer to pointerE array description in BSR Format for more details.
x Array, size at least (k*lb) if transa = 'N' or 'n', and at least (m*lb)
otherwise. On entry, the array x must contain the vector x.
y Array, size at least (m*lb) if transa = 'N' or 'n', and at least (k*lb)
otherwise. On entry, the array y must contain the vector y.
176
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
mkl_?cscmv
Computes matrix-vector product for a sparse matrix in
the CSC format (deprecated).
Syntax
void mkl_scscmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const float
*alpha , const char *matdescra , const float *val , const MKL_INT *indx , const MKL_INT
*pntrb , const MKL_INT *pntre , const float *x , const float *beta , float *y );
void mkl_dcscmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
double *alpha , const char *matdescra , const double *val , const MKL_INT *indx , const
MKL_INT *pntrb , const MKL_INT *pntre , const double *x , const double *beta , double
*y );
void mkl_ccscmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8 *val , const MKL_INT
*indx , const MKL_INT *pntrb , const MKL_INT *pntre , const MKL_Complex8 *x , const
MKL_Complex8 *beta , MKL_Complex8 *y );
void mkl_zcscmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16 *val , const MKL_INT
*indx , const MKL_INT *pntrb , const MKL_INT *pntre , const MKL_Complex16 *x , const
MKL_Complex16 *beta , MKL_Complex16 *y );
Include Files
• mkl.h
Description
This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?cscmv routine performs a matrix-vector operation defined as
y := alpha*A*x + beta*y
or
y := alpha*AT*x + beta*y,
where:
alpha and beta are scalars,
x and y are vectors,
A is an m-by-k sparse matrix in compressed sparse column (CSC) format, AT is the transpose of A.
NOTE
This routine supports CSC format both with one-based indexing and zero-based indexing.
177
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
matdescra Array of six elements, specifies properties of the matrix used for operation.
Only first four array elements are used, their possible values are given in
Table “Possible Values of the Parameter matdescra (descra)”. Possible
combinations of element values of this parameter are given in Table
“Possible Combinations of Element Values of the Parameter matdescra”.
indx For one-based indexing, array containing the column indices plus one for
each non-zero element of the matrix A. For zero-based indexing, array
containing the column indices for each non-zero element of the matrix A.
Its length is equal to length of the val array.
For one-based indexing this array contains column indices, such that
pntre[i] - pntrb[1] is the last index of column i in the arrays val and
indx.
For zero-based indexing this array contains column indices, such that
pntre[i] - pntrb[1] - 1 is the last index of column i in the arrays val
and indx.
178
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
mkl_?coomv
Computes matrix - vector product for a sparse matrix
in the coordinate format (deprecated).
Syntax
void mkl_scoomv (const char *transa , const MKL_INT *m , const MKL_INT *k , const float
*alpha , const char *matdescra , const float *val , const MKL_INT *rowind , const
MKL_INT *colind , const MKL_INT *nnz , const float *x , const float *beta , float *y );
void mkl_dcoomv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
double *alpha , const char *matdescra , const double *val , const MKL_INT *rowind ,
const MKL_INT *colind , const MKL_INT *nnz , const double *x , const double *beta ,
double *y );
void mkl_ccoomv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8 *val , const MKL_INT
*rowind , const MKL_INT *colind , const MKL_INT *nnz , const MKL_Complex8 *x , const
MKL_Complex8 *beta , MKL_Complex8 *y );
void mkl_zcoomv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16 *val , const MKL_INT
*rowind , const MKL_INT *colind , const MKL_INT *nnz , const MKL_Complex16 *x , const
MKL_Complex16 *beta , MKL_Complex16 *y );
Include Files
• mkl.h
Description
This routine is deprecated. Use mkl_sparse_?_mvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?coomv routine performs a matrix-vector operation defined as
y := alpha*A*x + beta*y
or
y := alpha*AT*x + beta*y,
where:
alpha and beta are scalars,
x and y are vectors,
A is an m-by-k sparse matrix in compressed coordinate format, AT is the transpose of A.
NOTE
This routine supports a coordinate format both with one-based indexing and zero-based indexing.
179
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
matdescra Array of six elements, specifies properties of the matrix used for operation.
Only first four array elements are used, their possible values are given in
Table “Possible Values of the Parameter matdescra (descra)”. Possible
combinations of element values of this parameter are given in Table
“Possible Combinations of Element Values of the Parameter matdescra”.
val Array of length nnz, contains non-zero elements of the matrix A in the
arbitrary order.
Refer to values array description in Coordinate Format for more details.
For one-based indexing, contains the row indices plus one for each non-zero
element of the matrix A.
For zero-based indexing, contains the row indices for each non-zero
element of the matrix A.
Refer to rows array description in Coordinate Format for more details.
For one-based indexing, contains the column indices plus one for each non-
zero element of the matrix A.
For zero-based indexing, contains the column indices for each non-zero
element of the matrix A.
Refer to columns array description in Coordinate Format for more details.
Output Parameters
180
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
mkl_?csrsv
Solves a system of linear equations for a sparse
matrix in the CSR format (deprecated).
Syntax
void mkl_scsrsv (const char *transa , const MKL_INT *m , const float *alpha , const
char *matdescra , const float *val , const MKL_INT *indx , const MKL_INT *pntrb , const
MKL_INT *pntre , const float *x , float *y );
void mkl_dcsrsv (const char *transa , const MKL_INT *m , const double *alpha , const
char *matdescra , const double *val , const MKL_INT *indx , const MKL_INT *pntrb ,
const MKL_INT *pntre , const double *x , double *y );
void mkl_ccsrsv (const char *transa , const MKL_INT *m , const MKL_Complex8 *alpha ,
const char *matdescra , const MKL_Complex8 *val , const MKL_INT *indx , const MKL_INT
*pntrb , const MKL_INT *pntre , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_zcsrsv (const char *transa , const MKL_INT *m , const MKL_Complex16 *alpha ,
const char *matdescra , const MKL_Complex16 *val , const MKL_INT *indx , const MKL_INT
*pntrb , const MKL_INT *pntre , const MKL_Complex16 *x , MKL_Complex16 *y );
Include Files
• mkl.h
Description
This routine is deprecated. Use mkl_sparse_?_trsvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?csrsv routine solves a system of linear equations with matrix-vector operations for a sparse
matrix in the CSR format:
y := alpha*inv(A)*x
or
y := alpha*inv(AT)*x,
where:
alpha is scalar, x and y are vectors, A is a sparse upper or lower triangular matrix with unit or non-unit main
diagonal, AT is the transpose of A.
NOTE
This routine supports a CSR format both with one-based indexing and zero-based indexing.
Input Parameters
181
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
matdescra Array of six elements, specifies properties of the matrix used for operation.
Only first four array elements are used, their possible values are given in
Table “Possible Values of the Parameter matdescra (descra)”. Possible
combinations of element values of this parameter are given in Table
“Possible Combinations of Element Values of the Parameter matdescra”.
NOTE
The non-zero elements of the given row of the matrix must be
stored in the same order as they appear in the row (from left to
right).
No diagonal element can be omitted from a sparse storage if the solver
is called with the non-unit indicator.
indx For one-based indexing, array containing the column indices plus one for
each non-zero element of the matrix A. For zero-based indexing, array
containing the column indices for each non-zero element of the matrix A.
Its length is equal to length of the val array.
NOTE
Column indices must be sorted in increasing order for each row.
This array contains row indices, such that pntrb[i] - pntrb[0] is the
first index of row i in the arrays val and indx.
On entry, the array x must contain the vector x. The elements are accessed
with unit increment.
On entry, the array y must contain the vector y. The elements are accessed
with unit increment.
Output Parameters
182
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
mkl_?bsrsv
Solves a system of linear equations for a sparse
matrix in the BSR format (deprecated).
Syntax
void mkl_sbsrsv (const char *transa , const MKL_INT *m , const MKL_INT *lb , const
float *alpha , const char *matdescra , const float *val , const MKL_INT *indx , const
MKL_INT *pntrb , const MKL_INT *pntre , const float *x , float *y );
void mkl_dbsrsv (const char *transa , const MKL_INT *m , const MKL_INT *lb , const
double *alpha , const char *matdescra , const double *val , const MKL_INT *indx , const
MKL_INT *pntrb , const MKL_INT *pntre , const double *x , double *y );
void mkl_cbsrsv (const char *transa , const MKL_INT *m , const MKL_INT *lb , const
MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8 *val , const MKL_INT
*indx , const MKL_INT *pntrb , const MKL_INT *pntre , const MKL_Complex8 *x ,
MKL_Complex8 *y );
void mkl_zbsrsv (const char *transa , const MKL_INT *m , const MKL_INT *lb , const
MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16 *val , const MKL_INT
*indx , const MKL_INT *pntrb , const MKL_INT *pntre , const MKL_Complex16 *x ,
MKL_Complex16 *y );
Include Files
• mkl.h
Description
This routine is deprecated. Use mkl_sparse_?_trsvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?bsrsv routine solves a system of linear equations with matrix-vector operations for a sparse
matrix in the BSR format:
y := alpha*inv(A)*x
or
y := alpha*inv(AT)* x,
where:
alpha is scalar, x and y are vectors, A is a sparse upper or lower triangular matrix with unit or non-unit main
diagonal, AT is the transpose of A.
NOTE
This routine supports a BSR format both with one-based indexing and zero-based indexing.
Input Parameters
183
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
matdescra Array of six elements, specifies properties of the matrix used for operation.
Only first four array elements are used, their possible values are given in
Table “Possible Values of the Parameter matdescra (descra)”. Possible
combinations of element values of this parameter are given in Table
“Possible Combinations of Element Values of the Parameter matdescra”.
val Array containing elements of non-zero blocks of the matrix A. Its length is
equal to the number of non-zero blocks in the matrix A multiplied by lb*lb.
Refer to the values array description in BSR Format for more details.
NOTE
The non-zero elements of the given row of the matrix must be
stored in the same order as they appear in the row (from left to
right).
No diagonal element can be omitted from a sparse storage if the solver
is called with the non-unit indicator.
indx For one-based indexing, array containing the column indices plus one for
each non-zero element of the matrix A. For zero-based indexing, array
containing the column indices for each non-zero element of the matrix A.
Its length is equal to the number of non-zero blocks in the matrix A.
Refer to the columns array description in BSR Format for more details.
This array contains row indices, such that pntrb[i] - pntrb[0] is the
first index of block row i in the array indx
For one-based indexing this array contains row indices, such that pntre[i]
- pntrb[1] is the last index of block row i in the array indx.
For zero-based indexing this array contains row indices, such that
pntre[i] - pntrb[0] - 1 is the last index of block row i in the array
indx.
Refer to pointerE array description in BSR Format for more details.
On entry, the array x must contain the vector x. The elements are accessed
with unit increment.
On entry, the array y must contain the vector y. The elements are accessed
with unit increment.
184
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
mkl_?cscsv
Solves a system of linear equations for a sparse
matrix in the CSC format (deprecated).
Syntax
void mkl_scscsv (const char *transa , const MKL_INT *m , const float *alpha , const
char *matdescra , const float *val , const MKL_INT *indx , const MKL_INT *pntrb , const
MKL_INT *pntre , const float *x , float *y );
void mkl_dcscsv (const char *transa , const MKL_INT *m , const double *alpha , const
char *matdescra , const double *val , const MKL_INT *indx , const MKL_INT *pntrb ,
const MKL_INT *pntre , const double *x , double *y );
void mkl_ccscsv (const char *transa , const MKL_INT *m , const MKL_Complex8 *alpha ,
const char *matdescra , const MKL_Complex8 *val , const MKL_INT *indx , const MKL_INT
*pntrb , const MKL_INT *pntre , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_zcscsv (const char *transa , const MKL_INT *m , const MKL_Complex16 *alpha ,
const char *matdescra , const MKL_Complex16 *val , const MKL_INT *indx , const MKL_INT
*pntrb , const MKL_INT *pntre , const MKL_Complex16 *x , MKL_Complex16 *y );
Include Files
• mkl.h
Description
This routine is deprecated. Use mkl_sparse_?_trsvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?cscsv routine solves a system of linear equations with matrix-vector operations for a sparse
matrix in the CSC format:
y := alpha*inv(A)*x
or
y := alpha*inv(AT)* x,
where:
alpha is scalar, x and y are vectors, A is a sparse upper or lower triangular matrix with unit or non-unit main
diagonal, AT is the transpose of A.
NOTE
This routine supports a CSC format both with one-based indexing and zero-based indexing.
Input Parameters
185
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
matdescra Array of six elements, specifies properties of the matrix used for operation.
Only first four array elements are used, their possible values are given in
Table “Possible Values of the Parameter matdescra (descra)”. Possible
combinations of element values of this parameter are given in Table
“Possible Combinations of Element Values of the Parameter matdescra”.
NOTE
The non-zero elements of the given row of the matrix must be
stored in the same order as they appear in the row (from left to
right).
No diagonal element can be omitted from a sparse storage if the solver
is called with the non-unit indicator.
indx For one-based indexing, array containing the row indices plus one for each
non-zero element of the matrix A.
For zero-based indexing, array containing the row indices for each non-zero
element of the matrix A.
Its length is equal to length of the val array.
NOTE
Row indices must be sorted in increasing order for each column.
This array contains column indices, such that pntrb[i] - pntrb[0] is the
first index of column i in the arrays val and indx.
On entry, the array x must contain the vector x. The elements are accessed
with unit increment.
186
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
On entry, the array y must contain the vector y. The elements are accessed
with unit increment.
Output Parameters
mkl_?coosv
Solves a system of linear equations for a sparse
matrix in the coordinate format (deprecated).
Syntax
void mkl_scoosv (const char *transa , const MKL_INT *m , const float *alpha , const
char *matdescra , const float *val , const MKL_INT *rowind , const MKL_INT *colind ,
const MKL_INT *nnz , const float *x , float *y );
void mkl_dcoosv (const char *transa , const MKL_INT *m , const double *alpha , const
char *matdescra , const double *val , const MKL_INT *rowind , const MKL_INT *colind ,
const MKL_INT *nnz , const double *x , double *y );
void mkl_ccoosv (const char *transa , const MKL_INT *m , const MKL_Complex8 *alpha ,
const char *matdescra , const MKL_Complex8 *val , const MKL_INT *rowind , const MKL_INT
*colind , const MKL_INT *nnz , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_zcoosv (const char *transa , const MKL_INT *m , const MKL_Complex16 *alpha ,
const char *matdescra , const MKL_Complex16 *val , const MKL_INT *rowind , const
MKL_INT *colind , const MKL_INT *nnz , const MKL_Complex16 *x , MKL_Complex16 *y );
Include Files
• mkl.h
Description
This routine is deprecated. Use mkl_sparse_?_trsvfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?coosv routine solves a system of linear equations with matrix-vector operations for a sparse
matrix in the coordinate format:
y := alpha*inv(A)*x
or
y := alpha*inv(AT)*x,
where:
alpha is scalar, x and y are vectors, A is a sparse upper or lower triangular matrix with unit or non-unit main
diagonal, AT is the transpose of A.
NOTE
This routine supports a coordinate format both with one-based indexing and zero-based indexing.
Input Parameters
187
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
matdescra Array of six elements, specifies properties of the matrix used for operation.
Only first four array elements are used, their possible values are given in
Table “Possible Values of the Parameter matdescra (descra)”. Possible
combinations of element values of this parameter are given in Table
“Possible Combinations of Element Values of the Parameter matdescra”.
val Array of length nnz, contains non-zero elements of the matrix A in the
arbitrary order.
Refer to values array description in Coordinate Format for more details.
For one-based indexing, contains the row indices plus one for each non-zero
element of the matrix A.
For zero-based indexing, contains the row indices for each non-zero
element of the matrix A.
Refer to rows array description in Coordinate Format for more details.
For one-based indexing, contains the column indices plus one for each non-
zero element of the matrix A.
For zero-based indexing, contains the column indices for each non-zero
element of the matrix A.
Refer to columns array description in Coordinate Format for more details.
On entry, the array x must contain the vector x. The elements are accessed
with unit increment.
On entry, the array y must contain the vector y. The elements are accessed
with unit increment.
Output Parameters
mkl_?csrmm
Computes matrix - matrix product of a sparse matrix
stored in the CSR format (deprecated).
188
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void mkl_scsrmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const float *alpha , const char *matdescra , const float *val , const
MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const float *b , const
MKL_INT *ldb , const float *beta , float *c , const MKL_INT *ldc );
void mkl_dcsrmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const double *alpha , const char *matdescra , const double *val , const
MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const double *b , const
MKL_INT *ldb , const double *beta , double *c , const MKL_INT *ldc );
void mkl_ccsrmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8
*val , const MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const
MKL_Complex8 *b , const MKL_INT *ldb , const MKL_Complex8 *beta , MKL_Complex8 *c ,
const MKL_INT *ldc );
void mkl_zcsrmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16
*val , const MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const
MKL_Complex16 *b , const MKL_INT *ldb , const MKL_Complex16 *beta , MKL_Complex16 *c ,
const MKL_INT *ldc );
Include Files
• mkl.h
Description
This routine is deprecated. Use Use mkl_sparse_?_mmfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?csrmm routine performs a matrix-matrix operation defined as
C := alpha*A*B + beta*C
or
C := alpha*AT*B + beta*C
or
C := alpha*AH*B + beta*C,
where:
alpha and beta are scalars,
B and C are dense matrices, A is an m-by-k sparse matrix in compressed sparse row (CSR) format, AT is the
transpose of A, and AH is the conjugate transpose of A.
NOTE
This routine supports a CSR format both with one-based indexing and zero-based indexing.
Input Parameters
189
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
matdescra Array of six elements, specifies properties of the matrix used for operation.
Only first four array elements are used, their possible values are given in
Table "Possible Values of the Parameter matdescra (descra)". Possible
combinations of element values of this parameter are given in Table
"Possible Combinations of Element Values of the Parameter matdescra".
indx For one-based indexing, array containing the column indices plus one for
each non-zero element of the matrix A.
For zero-based indexing, array containing the column indices for each non-
zero element of the matrix A.
Its length is equal to length of the val array.
This array contains row indices, such that pntrb[I] - pntrb[0] is the
first index of row I in the arrays val and indx.
b Array, size ldb by at least n for non-transposed matrix A and at least m for
transposed for one-based indexing, and (at least k for non-transposed
matrix A and at least m for transposed, ldb) for zero-based indexing.
On entry with transa='N' or 'n', the leading k-by-n part of the array b
must contain the matrix B, otherwise the leading m-by-n part of the array b
must contain the matrix B.
ldb Specifies the leading dimension of b for one-based indexing, and the second
dimension of b for zero-based indexing, as declared in the calling
(sub)program.
190
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
c Array, size ldc by n for one-based indexing, and (m, ldc) for zero-based
indexing.
On entry, the leading m-by-n part of the array c must contain the matrix C,
otherwise the leading k-by-n part of the array c must contain the matrix C.
ldc Specifies the leading dimension of c for one-based indexing, and the second
dimension of c for zero-based indexing, as declared in the calling
(sub)program.
Output Parameters
mkl_?bsrmm
Computes matrix - matrix product of a sparse matrix
stored in the BSR format (deprecated).
Syntax
void mkl_sbsrmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_INT *lb , const float *alpha , const char *matdescra , const
float *val , const MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const
float *b , const MKL_INT *ldb , const float *beta , float *c , const MKL_INT *ldc );
void mkl_dbsrmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_INT *lb , const double *alpha , const char *matdescra , const
double *val , const MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const
double *b , const MKL_INT *ldb , const double *beta , double *c , const MKL_INT *ldc );
void mkl_cbsrmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_INT *lb , const MKL_Complex8 *alpha , const char *matdescra ,
const MKL_Complex8 *val , const MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT
*pntre , const MKL_Complex8 *b , const MKL_INT *ldb , const MKL_Complex8 *beta ,
MKL_Complex8 *c , const MKL_INT *ldc );
void mkl_zbsrmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_INT *lb , const MKL_Complex16 *alpha , const char *matdescra ,
const MKL_Complex16 *val , const MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT
*pntre , const MKL_Complex16 *b , const MKL_INT *ldb , const MKL_Complex16 *beta ,
MKL_Complex16 *c , const MKL_INT *ldc );
Include Files
• mkl.h
Description
This routine is deprecated. Use Use mkl_sparse_?_mmfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?bsrmm routine performs a matrix-matrix operation defined as
C := alpha*A*B + beta*C
191
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
or
C := alpha*AT*B + beta*C
or
C := alpha*AH*B + beta*C,
where:
alpha and beta are scalars,
B and C are dense matrices, A is an m-by-k sparse matrix in block sparse row (BSR) format, AT is the
transpose of A, and AH is the conjugate transpose of A.
NOTE
This routine supports a BSR format both with one-based indexing and zero-based indexing.
Input Parameters
matdescra Array of six elements, specifies properties of the matrix used for operation.
Only first four array elements are used, their possible values are given in
Table “Possible Values of the Parameter matdescra (descra)”. Possible
combinations of element values of this parameter are given in Table
“Possible Combinations of Element Values of the Parameter matdescra”.
val Array containing elements of non-zero blocks of the matrix A. Its length is
equal to the number of non-zero blocks in the matrix A multiplied by lb*lb.
Refer to the values array description in BSR Format for more details.
indx For one-based indexing, array containing the column indices plus one for
each non-zero block in the matrix A.
For zero-based indexing, array containing the column indices for each non-
zero block in the matrix A.
Its length is equal to the number of non-zero blocks in the matrix A. Refer
to the columns array description in BSR Format for more details.
192
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
This array contains row indices, such that pntrb[I] - pntrb[0] is the
first index of block row I in the array indx.
b Array, size ldb by at least n for non-transposed matrix A and at least m for
transposed for one-based indexing, and (at least k for non-transposed
matrix A and at least m for transposed, ldb) for zero-based indexing.
On entry with transa='N' or 'n', the leading n-by-k block part of the
array b must contain the matrix B, otherwise the leading m-by-n block part
of the array b must contain the matrix B.
ldb Specifies the leading dimension (in blocks) of b as declared in the calling
(sub)program.
c Array, size ldc* n for one-based indexing, size k* ldc for zero-based
indexing.
On entry, the leading m-by-n block part of the array c must contain the
matrix C, otherwise the leading n-by-k block part of the array c must
contain the matrix C.
ldc Specifies the leading dimension (in blocks) of c as declared in the calling
(sub)program.
Output Parameters
mkl_?cscmm
Computes matrix-matrix product of a sparse matrix
stored in the CSC format (deprecated).
Syntax
void mkl_scscmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const float *alpha , const char *matdescra , const float *val , const
MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const float *b , const
MKL_INT *ldb , const float *beta , float *c , const MKL_INT *ldc );
void mkl_dcscmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const double *alpha , const char *matdescra , const double *val , const
MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const double *b , const
MKL_INT *ldb , const double *beta , double *c , const MKL_INT *ldc );
193
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
void mkl_ccscmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8
*val , const MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const
MKL_Complex8 *b , const MKL_INT *ldb , const MKL_Complex8 *beta , MKL_Complex8 *c ,
const MKL_INT *ldc );
void mkl_zcscmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16
*val , const MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const
MKL_Complex16 *b , const MKL_INT *ldb , const MKL_Complex16 *beta , MKL_Complex16 *c ,
const MKL_INT *ldc );
Include Files
• mkl.h
Description
This routine is deprecated. Use Use mkl_sparse_?_mmfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?cscmm routine performs a matrix-matrix operation defined as
C := alpha*A*B + beta*C
or
C := alpha*AT*B + beta*C,
or
C := alpha*AH*B + beta*C,
where:
alpha and beta are scalars,
B and C are dense matrices, A is an m-by-k sparse matrix in compressed sparse column (CSC) format, AT is
the transpose of A, and AH is the conjugate transpose of A.
NOTE
This routine supports CSC format both with one-based indexing and zero-based indexing.
Input Parameters
194
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
matdescra Array of six elements, specifies properties of the matrix used for operation.
Only first four array elements are used, their possible values are given in
Table “Possible Values of the Parameter matdescra (descra)”. Possible
combinations of element values of this parameter are given in Table
“Possible Combinations of Element Values of the Parameter matdescra”.
indx For one-based indexing, array containing the row indices plus one for each
non-zero element of the matrix A.
For zero-based indexing, array containing the column indices for each non-
zero element of the matrix A.
Its length is equal to length of the val array.
This array contains column indices, such that pntrb[i] - pntrb[0] is the
first index of column i in the arrays val and indx.
b Array, size ldb by at least n for non-transposed matrix A and at least m for
transposed for one-based indexing, and (at least k for non-transposed
matrix A and at least m for transposed, ldb) for zero-based indexing.
On entry with transa = 'N' or 'n', the leading k-by-n part of the array b
must contain the matrix B, otherwise the leading m-by-n part of the array b
must contain the matrix B.
ldb Specifies the leading dimension of b for one-based indexing, and the second
dimension of b for zero-based indexing, as declared in the calling
(sub)program.
c Array, size ldc by n for one-based indexing, and (m, ldc) for zero-based
indexing.
On entry, the leading m-by-n part of the array c must contain the matrix C,
otherwise the leading k-by-n part of the array c must contain the matrix C.
ldc Specifies the leading dimension of c for one-based indexing, and the second
dimension of c for zero-based indexing, as declared in the calling
(sub)program.
195
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
mkl_?coomm
Computes matrix-matrix product of a sparse matrix
stored in the coordinate format (deprecated).
Syntax
void mkl_scoomm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const float *alpha , const char *matdescra , const float *val , const
MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const float *b , const
MKL_INT *ldb , const float *beta , float *c , const MKL_INT *ldc );
void mkl_dcoomm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const double *alpha , const char *matdescra , const double *val , const
MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const double *b , const
MKL_INT *ldb , const double *beta , double *c , const MKL_INT *ldc );
void mkl_ccoomm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8
*val , const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const
MKL_Complex8 *b , const MKL_INT *ldb , const MKL_Complex8 *beta , MKL_Complex8 *c ,
const MKL_INT *ldc );
void mkl_zcoomm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16
*val , const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const
MKL_Complex16 *b , const MKL_INT *ldb , const MKL_Complex16 *beta , MKL_Complex16 *c ,
const MKL_INT *ldc );
Include Files
• mkl.h
Description
This routine is deprecated. Use Use mkl_sparse_?_mmfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?coomm routine performs a matrix-matrix operation defined as
C := alpha*A*B + beta*C
or
C := alpha*AT*B + beta*C,
or
C := alpha*AH*B + beta*C,
where:
alpha and beta are scalars,
B and C are dense matrices, A is an m-by-k sparse matrix in the coordinate format, AT is the transpose of A,
and AH is the conjugate transpose of A.
196
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
NOTE
This routine supports a coordinate format both with one-based indexing and zero-based indexing.
Input Parameters
matdescra Array of six elements, specifies properties of the matrix used for operation.
Only first four array elements are used, their possible values are given in
Table “Possible Values of the Parameter matdescra (descra)”. Possible
combinations of element values of this parameter are given in Table
“Possible Combinations of Element Values of the Parameter matdescra”.
val Array of length nnz, contains non-zero elements of the matrix A in the
arbitrary order.
Refer to values array description in Coordinate Format for more details.
For one-based indexing, contains the row indices plus one for each non-zero
element of the matrix A.
For zero-based indexing, contains the row indices for each non-zero
element of the matrix A.
Refer to rows array description in Coordinate Format for more details.
For one-based indexing, contains the column indices plus one for each non-
zero element of the matrix A.
For zero-based indexing, contains the column indices for each non-zero
element of the matrix A.
Refer to columns array description in Coordinate Format for more details.
b Array, size ldb by at least n for non-transposed matrix A and at least m for
transposed for one-based indexing, and (at least k for non-transposed
matrix A and at least m for transposed, ldb) for zero-based indexing.
197
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
On entry with transa = 'N' or 'n', the leading k-by-n part of the array b
must contain the matrix B, otherwise the leading m-by-n part of the array b
must contain the matrix B.
ldb Specifies the leading dimension of b for one-based indexing, and the second
dimension of b for zero-based indexing, as declared in the calling
(sub)program.
c Array, size ldc by n for one-based indexing, and (m, ldc) for zero-based
indexing.
On entry, the leading m-by-n part of the array c must contain the matrix C,
otherwise the leading k-by-n part of the array c must contain the matrix C.
ldc Specifies the leading dimension of c for one-based indexing, and the second
dimension of c for zero-based indexing, as declared in the calling
(sub)program.
Output Parameters
mkl_?csrsm
Solves a system of linear matrix equations for a
sparse matrix in the CSR format (deprecated).
Syntax
void mkl_scsrsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const float
*alpha , const char *matdescra , const float *val , const MKL_INT *indx , const MKL_INT
*pntrb , const MKL_INT *pntre , const float *b , const MKL_INT *ldb , float *c , const
MKL_INT *ldc );
void mkl_dcsrsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
double *alpha , const char *matdescra , const double *val , const MKL_INT *indx , const
MKL_INT *pntrb , const MKL_INT *pntre , const double *b , const MKL_INT *ldb , double
*c , const MKL_INT *ldc );
void mkl_ccsrsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8 *val , const MKL_INT
*indx , const MKL_INT *pntrb , const MKL_INT *pntre , const MKL_Complex8 *b , const
MKL_INT *ldb , MKL_Complex8 *c , const MKL_INT *ldc );
void mkl_zcsrsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16 *val , const MKL_INT
*indx , const MKL_INT *pntrb , const MKL_INT *pntre , const MKL_Complex16 *b , const
MKL_INT *ldb , MKL_Complex16 *c , const MKL_INT *ldc );
Include Files
• mkl.h
Description
198
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
This routine is deprecated. Use mkl_sparse_?_trsmfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?csrsm routine solves a system of linear equations with matrix-matrix operations for a sparse
matrix in the CSR format:
C := alpha*inv(A)*B
or
C := alpha*inv(AT)*B,
where:
alpha is scalar, B and C are dense matrices, A is a sparse upper or lower triangular matrix with unit or non-
unit main diagonal, AT is the transpose of A.
NOTE
This routine supports a CSR format both with one-based indexing and zero-based indexing.
Input Parameters
matdescra Array of six elements, specifies properties of the matrix used for operation.
Only first four array elements are used, their possible values are given in
Table "Possible Values of the Parameter matdescra (descra)". Possible
combinations of element values of this parameter are given in Table
"Possible Combinations of Element Values of the Parameter matdescra".
NOTE
The non-zero elements of the given row of the matrix must be
stored in the same order as they appear in the row (from left to
right).
No diagonal element can be omitted from a sparse storage if the solver
is called with the non-unit indicator.
indx For one-based indexing, array containing the column indices plus one for
each non-zero element of the matrix A.
For zero-based indexing, array containing the column indices for each non-
zero element of the matrix A.
199
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
Column indices must be sorted in increasing order for each row.
This array contains row indices, such that pntrb[i] - pntrb[0] is the
first index of row i in the arrays val and indx.
For zero-based indexing this array contains row indices, such that
pntre[i] - pntrb[0] - 1 is the last index of row i in the arrays val and
indx.
Refer to pointerE array description in CSR Format for more details.
b Array, size ldb* n for one-based indexing, and (m, ldb) for zero-based
indexing.
On entry the leading m-by-n part of the array b must contain the matrix B.
ldb Specifies the leading dimension of b for one-based indexing, and the second
dimension of b for zero-based indexing, as declared in the calling
(sub)program.
ldc Specifies the leading dimension of c for one-based indexing, and the second
dimension of c for zero-based indexing, as declared in the calling
(sub)program.
Output Parameters
c Array, size ldc by n for one-based indexing, and (m, ldc) for zero-based
indexing.
The leading m-by-n part of the array c contains the output matrix C.
mkl_?cscsm
Solves a system of linear matrix equations for a
sparse matrix in the CSC format (deprecated).
Syntax
void mkl_scscsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const float
*alpha , const char *matdescra , const float *val , const MKL_INT *indx , const MKL_INT
*pntrb , const MKL_INT *pntre , const float *b , const MKL_INT *ldb , float *c , const
MKL_INT *ldc );
void mkl_dcscsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
double *alpha , const char *matdescra , const double *val , const MKL_INT *indx , const
MKL_INT *pntrb , const MKL_INT *pntre , const double *b , const MKL_INT *ldb , double
*c , const MKL_INT *ldc );
200
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void mkl_ccscsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8 *val , const MKL_INT
*indx , const MKL_INT *pntrb , const MKL_INT *pntre , const MKL_Complex8 *b , const
MKL_INT *ldb , MKL_Complex8 *c , const MKL_INT *ldc );
void mkl_zcscsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16 *val , const MKL_INT
*indx , const MKL_INT *pntrb , const MKL_INT *pntre , const MKL_Complex16 *b , const
MKL_INT *ldb , MKL_Complex16 *c , const MKL_INT *ldc );
Include Files
• mkl.h
Description
This routine is deprecated. Use mkl_sparse_?_trsmfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?cscsm routine solves a system of linear equations with matrix-matrix operations for a sparse
matrix in the CSC format:
C := alpha*inv(A)*B
or
C := alpha*inv(AT)*B,
where:
alpha is scalar, B and C are dense matrices, A is a sparse upper or lower triangular matrix with unit or non-
unit main diagonal, AT is the transpose of A.
NOTE
This routine supports a CSC format both with one-based indexing and zero-based indexing.
Input Parameters
matdescra Array of six elements, specifies properties of the matrix used for operation.
Only first four array elements are used, their possible values are given in
Table “Possible Values of the Parameter matdescra (descra)”. Possible
combinations of element values of this parameter are given in Table
“Possible Combinations of Element Values of the Parameter matdescra”.
201
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
The non-zero elements of the given row of the matrix must be
stored in the same order as they appear in the row (from left to
right).
No diagonal element can be omitted from a sparse storage if the solver
is called with the non-unit indicator.
indx For one-based indexing, array containing the row indices plus one for each
non-zero element of the matrix A. For zero-based indexing, array containing
the row indices for each non-zero element of the matrix A.
Refer to rows array description in CSC Format for more details.
NOTE
Row indices must be sorted in increasing order for each column.
This array contains column indices, such that pntrb[I] - pntrb[0] is the
first index of column I in the arrays val and indx.
b Array, size ldb by n for one-based indexing, and (m, ldb) for zero-based
indexing.
On entry the leading m-by-n part of the array b must contain the matrix B.
ldb Specifies the leading dimension of b for one-based indexing, and the second
dimension of b for zero-based indexing, as declared in the calling
(sub)program.
ldc Specifies the leading dimension of c for one-based indexing, and the second
dimension of c for zero-based indexing, as declared in the calling
(sub)program.
Output Parameters
c Array, size ldc by n for one-based indexing, and (m, ldc) for zero-based
indexing.
The leading m-by-n part of the array c contains the output matrix C.
202
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
mkl_?coosm
Solves a system of linear matrix equations for a
sparse matrix in the coordinate format (deprecated).
Syntax
void mkl_scoosm (const char *transa , const MKL_INT *m , const MKL_INT *n , const float
*alpha , const char *matdescra , const float *val , const MKL_INT *rowind , const
MKL_INT *colind , const MKL_INT *nnz , const float *b , const MKL_INT *ldb , float *c ,
const MKL_INT *ldc );
void mkl_dcoosm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
double *alpha , const char *matdescra , const double *val , const MKL_INT *rowind ,
const MKL_INT *colind , const MKL_INT *nnz , const double *b , const MKL_INT *ldb ,
double *c , const MKL_INT *ldc );
void mkl_ccoosm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8 *val , const MKL_INT
*rowind , const MKL_INT *colind , const MKL_INT *nnz , const MKL_Complex8 *b , const
MKL_INT *ldb , MKL_Complex8 *c , const MKL_INT *ldc );
void mkl_zcoosm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16 *val , const MKL_INT
*rowind , const MKL_INT *colind , const MKL_INT *nnz , const MKL_Complex16 *b , const
MKL_INT *ldb , MKL_Complex16 *c , const MKL_INT *ldc );
Include Files
• mkl.h
Description
This routine is deprecated. Use mkl_sparse_?_trsmfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?coosm routine solves a system of linear equations with matrix-matrix operations for a sparse
matrix in the coordinate format:
C := alpha*inv(A)*B
or
C := alpha*inv(AT)*B,
where:
alpha is scalar, B and C are dense matrices, A is a sparse upper or lower triangular matrix with unit or non-
unit main diagonal, AT is the transpose of A.
NOTE
This routine supports a coordinate format both with one-based indexing and zero-based indexing.
Input Parameters
203
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
matdescra Array of six elements, specifies properties of the matrix used for operation.
Only first four array elements are used, their possible values are given in
Table “Possible Values of the Parameter matdescra (descra)”. Possible
combinations of element values of this parameter are given in Table
“Possible Combinations of Element Values of the Parameter matdescra”.
val Array of length nnz, contains non-zero elements of the matrix A in the
arbitrary order.
Refer to values array description in Coordinate Format for more details.
For one-based indexing, contains the row indices plus one for each non-zero
element of the matrix A.
For zero-based indexing, contains the row indices for each non-zero
element of the matrix A.
Refer to rows array description in Coordinate Format for more details.
For one-based indexing, contains the column indices plus one for each non-
zero element of the matrix A
For zero-based indexing, contains the row indices for each non-zero
element of the matrix A
Refer to columns array description in Coordinate Format for more details.
b Array, size ldb by n for one-based indexing, and (m, ldb) for zero-based
indexing.
Before entry the leading m-by-n part of the array b must contain the matrix
B.
ldb Specifies the leading dimension of b for one-based indexing, and the second
dimension of b for zero-based indexing, as declared in the calling
(sub)program.
ldc Specifies the leading dimension of c for one-based indexing, and the second
dimension of c for zero-based indexing, as declared in the calling
(sub)program.
Output Parameters
c Array, size ldc by n for one-based indexing, and (m, ldc) for zero-based
indexing.
204
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The leading m-by-n part of the array c contains the output matrix C.
mkl_?bsrsm
Solves a system of linear matrix equations for a
sparse matrix in the BSR format (deprecated).
Syntax
void mkl_sbsrsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *lb , const float *alpha , const char *matdescra , const float *val , const
MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const float *b , const
MKL_INT *ldb , float *c , const MKL_INT *ldc );
void mkl_dbsrsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *lb , const double *alpha , const char *matdescra , const double *val , const
MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const double *b , const
MKL_INT *ldb , double *c , const MKL_INT *ldc );
void mkl_cbsrsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *lb , const MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8
*val , const MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const
MKL_Complex8 *b , const MKL_INT *ldb , MKL_Complex8 *c , const MKL_INT *ldc );
void mkl_zbsrsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *lb , const MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16
*val , const MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const
MKL_Complex16 *b , const MKL_INT *ldb , MKL_Complex16 *c , const MKL_INT *ldc );
Include Files
• mkl.h
Description
This routine is deprecated. Use mkl_sparse_?_trsmfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?bsrsm routine solves a system of linear equations with matrix-matrix operations for a sparse
matrix in the BSR format:
C := alpha*inv(A)*B
or
C := alpha*inv(AT)*B,
where:
alpha is scalar, B and C are dense matrices, A is a sparse upper or lower triangular matrix with unit or non-
unit main diagonal, AT is the transpose of A.
NOTE
This routine supports a BSR format both with one-based indexing and zero-based indexing.
Input Parameters
205
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
matdescra Array of six elements, specifies properties of the matrix used for operation.
Only first four array elements are used, their possible values are given in
Table “Possible Values of the Parameter matdescra (descra)”. Possible
combinations of element values of this parameter are given in Table
“Possible Combinations of Element Values of the Parameter matdescra”.
val Array containing elements of non-zero blocks of the matrix A. Its length is
equal to the number of non-zero blocks in the matrix A multiplied by lb*lb.
Refer to the values array description in BSR Format for more details.
NOTE
The non-zero elements of the given row of the matrix must be
stored in the same order as they appear in the row (from left to
right).
No diagonal element can be omitted from a sparse storage if the solver
is called with the non-unit indicator.
indx For one-based indexing, array containing the column indices plus one for
each non-zero element of the matrix A. For zero-based indexing, array
containing the column indices for each non-zero element of the matrix A.
Its length is equal to the number of non-zero blocks in the matrix A.
Refer to the columns array description in BSR Format for more details.
This array contains row indices, such that pntrb[i] - pntrb[0] is the
first index of block row i in the array indx.
b Array, size ldb* n for one-based indexing, size m* ldb for zero-based
indexing.
On entry the leading m-by-n part of the array b must contain the matrix B.
206
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ldb Specifies the leading dimension (in blocks) of b as declared in the calling
(sub)program.
ldc Specifies the leading dimension (in blocks) of c as declared in the calling
(sub)program.
Output Parameters
c Array, size ldc* n for one-based indexing, size m* ldc for zero-based
indexing.
The leading m-by-n part of the array c contains the output matrix C.
mkl_?diamv
Computes matrix - vector product for a sparse matrix
in the diagonal format with one-based indexing
(deprecated).
Syntax
void mkl_sdiamv (const char *transa , const MKL_INT *m , const MKL_INT *k , const float
*alpha , const char *matdescra , const float *val , const MKL_INT *lval , const MKL_INT
*idiag , const MKL_INT *ndiag , const float *x , const float *beta , float *y );
void mkl_ddiamv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
double *alpha , const char *matdescra , const double *val , const MKL_INT *lval , const
MKL_INT *idiag , const MKL_INT *ndiag , const double *x , const double *beta , double
*y );
void mkl_cdiamv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8 *val , const MKL_INT
*lval , const MKL_INT *idiag , const MKL_INT *ndiag , const MKL_Complex8 *x , const
MKL_Complex8 *beta , MKL_Complex8 *y );
void mkl_zdiamv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16 *val , const MKL_INT
*lval , const MKL_INT *idiag , const MKL_INT *ndiag , const MKL_Complex16 *x , const
MKL_Complex16 *beta , MKL_Complex16 *y );
Include Files
• mkl.h
Description
This routine is deprecated, but no replacement is available yet in the Inspector-Executor Sparse BLAS API
interfaces. You can continue using this routine until a replacement is provided and this can be fully removed.
The mkl_?diamv routine performs a matrix-vector operation defined as
y := alpha*A*x + beta*y
or
y := alpha*AT*x + beta*y,
where:
alpha and beta are scalars,
207
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
This routine supports only one-based indexing of the input arrays.
Input Parameters
matdescra Array of six elements, specifies properties of the matrix used for operation.
Only first four array elements are used, their possible values are given in
Table “Possible Values of the Parameter matdescra (descra)”. Possible
combinations of element values of this parameter are given in Table
“Possible Combinations of Element Values of the Parameter matdescra”.
idiag Array of length ndiag, contains the distances between main diagonal and
each non-zero diagonals in the matrix A.
Refer to distance array description in Diagonal Storage Scheme for more
details.
Output Parameters
208
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
mkl_?skymv
Computes matrix - vector product for a sparse matrix
in the skyline storage format with one-based indexing
(deprecated).
Syntax
void mkl_sskymv (const char *transa , const MKL_INT *m , const MKL_INT *k , const float
*alpha , const char *matdescra , const float *val , const MKL_INT *pntr , const float
*x , const float *beta , float *y );
void mkl_dskymv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
double *alpha , const char *matdescra , const double *val , const MKL_INT *pntr , const
double *x , const double *beta , double *y );
void mkl_cskymv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8 *val , const MKL_INT
*pntr , const MKL_Complex8 *x , const MKL_Complex8 *beta , MKL_Complex8 *y );
void mkl_zskymv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16 *val , const MKL_INT
*pntr , const MKL_Complex16 *x , const MKL_Complex16 *beta , MKL_Complex16 *y );
Include Files
• mkl.h
Description
This routine is deprecated, but no replacement is available yet in the Inspector-Executor Sparse BLAS API
interfaces. You can continue using this routine until a replacement is provided and this can be fully removed.
The mkl_?skymv routine performs a matrix-vector operation defined as
y := alpha*A*x + beta*y
or
y := alpha*AT*x + beta*y,
where:
alpha and beta are scalars,
x and y are vectors,
A is an m-by-k sparse matrix stored using the skyline storage scheme, AT is the transpose of A.
NOTE
This routine supports only one-based indexing of the input arrays.
Input Parameters
209
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
matdescra Array of six elements, specifies properties of the matrix used for operation.
Only first four array elements are used, their possible values are given in
Table “Possible Values of the Parameter matdescra (descra)”. Possible
combinations of element values of this parameter are given in Table
“Possible Combinations of Element Values of the Parameter matdescra”.
NOTE
General matrices (matdescra[0]='G') is not supported.
val Array containing the set of elements of the matrix A in the skyline profile
form.
If matdescrsa[1]= 'L', then val contains elements from the low triangle
of the matrix A.
If matdescrsa[1]= 'U', then val contains elements from the upper
triangle of the matrix A.
Refer to values array description in Skyline Storage Scheme for more
details.
pntr Array of length (m + 1) for lower triangle, and (k + 1) for upper triangle.
It contains the indices specifying in the val the positions of the first
element in each row (column) of the matrix A. Refer to pointers array
description in Skyline Storage Scheme for more details.
Output Parameters
mkl_?diasv
Solves a system of linear equations for a sparse
matrix in the diagonal format with one-based indexing
(deprecated).
Syntax
void mkl_sdiasv (const char *transa , const MKL_INT *m , const float *alpha , const
char *matdescra , const float *val , const MKL_INT *lval , const MKL_INT *idiag , const
MKL_INT *ndiag , const float *x , float *y );
void mkl_ddiasv (const char *transa , const MKL_INT *m , const double *alpha , const
char *matdescra , const double *val , const MKL_INT *lval , const MKL_INT *idiag ,
const MKL_INT *ndiag , const double *x , double *y );
210
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void mkl_cdiasv (const char *transa , const MKL_INT *m , const MKL_Complex8 *alpha ,
const char *matdescra , const MKL_Complex8 *val , const MKL_INT *lval , const MKL_INT
*idiag , const MKL_INT *ndiag , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_zdiasv (const char *transa , const MKL_INT *m , const MKL_Complex16 *alpha ,
const char *matdescra , const MKL_Complex16 *val , const MKL_INT *lval , const MKL_INT
*idiag , const MKL_INT *ndiag , const MKL_Complex16 *x , MKL_Complex16 *y );
Include Files
• mkl.h
Description
This routine is deprecated, but no replacement is available yet in the Inspector-Executor Sparse BLAS API
interfaces. You can continue using this routine until a replacement is provided and this can be fully removed.
The mkl_?diasv routine solves a system of linear equations with matrix-vector operations for a sparse
matrix stored in the diagonal format:
y := alpha*inv(A)*x
or
y := alpha*inv(AT)* x,
where:
alpha is scalar, x and y are vectors, A is a sparse upper or lower triangular matrix with unit or non-unit main
diagonal, AT is the transpose of A.
NOTE
This routine supports only one-based indexing of the input arrays.
Input Parameters
matdescra Array of six elements, specifies properties of the matrix used for operation.
Only first four array elements are used, their possible values are given in
Table “Possible Values of the Parameter matdescra (descra)”. Possible
combinations of element values of this parameter are given in Table
“Possible Combinations of Element Values of the Parameter matdescra”.
211
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
idiag Array of length ndiag, contains the distances between main diagonal and
each non-zero diagonals in the matrix A.
NOTE
All elements of this array must be sorted in increasing order.
On entry, the array x must contain the vector x. The elements are accessed
with unit increment.
On entry, the array y must contain the vector y. The elements are accessed
with unit increment.
Output Parameters
mkl_?skysv
Solves a system of linear equations for a sparse
matrix in the skyline format with one-based indexing
(deprecated).
Syntax
void mkl_sskysv (const char *transa , const MKL_INT *m , const float *alpha , const
char *matdescra , const float *val , const MKL_INT *pntr , const float *x , float *y );
void mkl_dskysv (const char *transa , const MKL_INT *m , const double *alpha , const
char *matdescra , const double *val , const MKL_INT *pntr , const double *x , double
*y );
void mkl_cskysv (const char *transa , const MKL_INT *m , const MKL_Complex8 *alpha ,
const char *matdescra , const MKL_Complex8 *val , const MKL_INT *pntr , const
MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_zskysv (const char *transa , const MKL_INT *m , const MKL_Complex16 *alpha ,
const char *matdescra , const MKL_Complex16 *val , const MKL_INT *pntr , const
MKL_Complex16 *x , MKL_Complex16 *y );
Include Files
• mkl.h
Description
This routine is deprecated, but no replacement is available yet in the Inspector-Executor Sparse BLAS API
interfaces. You can continue using this routine until a replacement is provided and this can be fully removed.
212
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The mkl_?skysv routine solves a system of linear equations with matrix-vector operations for a sparse
matrix in the skyline storage format:
y := alpha*inv(A)*x
or
y := alpha*inv(AT)*x,
where:
alpha is scalar, x and y are vectors, A is a sparse upper or lower triangular matrix with unit or non-unit main
diagonal, AT is the transpose of A.
NOTE
This routine supports only one-based indexing of the input arrays.
Input Parameters
matdescra Array of six elements, specifies properties of the matrix used for operation.
Only first four array elements are used, their possible values are given in
Table “Possible Values of the Parameter matdescra (descra)”. Possible
combinations of element values of this parameter are given in Table
“Possible Combinations of Element Values of the Parameter matdescra”.
NOTE
General matrices (matdescra[0]='G') is not supported.
val Array containing the set of elements of the matrix A in the skyline profile
form.
If matdescra[2]= 'L', then val contains elements from the low triangle
of the matrix A.
If matdescsa[2]= 'U', then val contains elements from the upper
triangle of the matrix A.
Refer to values array description in Skyline Storage Scheme for more
details.
pntr Array of length (m + 1) for lower triangle, and (k + 1) for upper triangle.
It contains the indices specifying in the val the positions of the first
element in each row (column) of the matrix A. Refer to pointers array
description in Skyline Storage Scheme for more details.
213
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
On entry, the array x must contain the vector x. The elements are accessed
with unit increment.
On entry, the array y must contain the vector y. The elements are accessed
with unit increment.
Output Parameters
mkl_?diamm
Computes matrix-matrix product of a sparse matrix
stored in the diagonal format with one-based indexing
(deprecated).
Syntax
void mkl_sdiamm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const float *alpha , const char *matdescra , const float *val , const
MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const float *b , const
MKL_INT *ldb , const float *beta , float *c , const MKL_INT *ldc );
void mkl_ddiamm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const double *alpha , const char *matdescra , const double *val , const
MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const double *b , const
MKL_INT *ldb , const double *beta , double *c , const MKL_INT *ldc );
void mkl_cdiamm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8
*val , const MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const
MKL_Complex8 *b , const MKL_INT *ldb , const MKL_Complex8 *beta , MKL_Complex8 *c ,
const MKL_INT *ldc );
void mkl_zdiamm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16
*val , const MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const
MKL_Complex16 *b , const MKL_INT *ldb , const MKL_Complex16 *beta , MKL_Complex16 *c ,
const MKL_INT *ldc );
Include Files
• mkl.h
Description
This routine is deprecated, but no replacement is available yet in the Inspector-Executor Sparse BLAS API
interfaces. You can continue using this routine until a replacement is provided and this can be fully removed.
The mkl_?diamm routine performs a matrix-matrix operation defined as
C := alpha*A*B + beta*C
or
C := alpha*AT*B + beta*C,
214
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
or
C := alpha*AH*B + beta*C,
where:
alpha and beta are scalars,
B and C are dense matrices, A is an m-by-k sparse matrix in the diagonal format, AT is the transpose of A,
and AH is the conjugate transpose of A.
NOTE
This routine supports only one-based indexing of the input arrays.
Input Parameters
matdescra Array of six elements, specifies properties of the matrix used for operation.
Only first four array elements are used, their possible values are given in
Table “Possible Values of the Parameter matdescra (descra)”. Possible
combinations of element values of this parameter are given in Table
“Possible Combinations of Element Values of the Parameter matdescra”.
idiag Array of length ndiag, contains the distances between main diagonal and
each non-zero diagonals in the matrix A.
Refer to distance array description in Diagonal Storage Scheme for more
details.
On entry with transa = 'N' or 'n', the leading k-by-n part of the array b
must contain the matrix B, otherwise the leading m-by-n part of the array b
must contain the matrix B.
215
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
On entry, the leading m-by-n part of the array c must contain the matrix C,
otherwise the leading k-by-n part of the array c must contain the matrix C.
Output Parameters
mkl_?skymm
Computes matrix-matrix product of a sparse matrix
stored using the skyline storage scheme with one-
based indexing (deprecated).
Syntax
void mkl_sskymm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const float *alpha , const char *matdescra , const float *val , const
MKL_INT *pntr , const float *b , const MKL_INT *ldb , const float *beta , float *c ,
const MKL_INT *ldc );
void mkl_dskymm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const double *alpha , const char *matdescra , const double *val , const
MKL_INT *pntr , const double *b , const MKL_INT *ldb , const double *beta , double *c ,
const MKL_INT *ldc );
void mkl_cskymm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8
*val , const MKL_INT *pntr , const MKL_Complex8 *b , const MKL_INT *ldb , const
MKL_Complex8 *beta , MKL_Complex8 *c , const MKL_INT *ldc );
void mkl_zskymm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16
*val , const MKL_INT *pntr , const MKL_Complex16 *b , const MKL_INT *ldb , const
MKL_Complex16 *beta , MKL_Complex16 *c , const MKL_INT *ldc );
Include Files
• mkl.h
Description
This routine is deprecated, but no replacement is available yet in the Inspector-Executor Sparse BLAS API
interfaces. You can continue using this routine until a replacement is provided and this can be fully removed.
The mkl_?skymm routine performs a matrix-matrix operation defined as
C := alpha*A*B + beta*C
216
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
or
C := alpha*AT*B + beta*C,
or
C := alpha*AH*B + beta*C,
where:
alpha and beta are scalars,
B and C are dense matrices, A is an m-by-k sparse matrix in the skyline storage format, AT is the transpose
of A, and AH is the conjugate transpose of A.
NOTE
This routine supports only one-based indexing of the input arrays.
Input Parameters
matdescra Array of six elements, specifies properties of the matrix used for operation.
Only first four array elements are used, their possible values are given in
Table “Possible Values of the Parameter matdescra (descra)”. Possible
combinations of element values of this parameter are given in Table
“Possible Combinations of Element Values of the Parameter matdescra”.
NOTE
General matrices (matdescra [0]='G') is not supported.
val Array containing the set of elements of the matrix A in the skyline profile
form.
If matdescrsa[2]= 'L', then val contains elements from the low triangle
of the matrix A.
If matdescrsa[2]= 'U', then val contains elements from the upper
triangle of the matrix A.
Refer to values array description in Skyline Storage Scheme for more
details.
pntr Array of length (m + 1) for lower triangle, and (k + 1) for upper triangle.
217
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
It contains the indices specifying the positions of the first element of the
matrix A in each row (for the lower triangle) or column (for upper triangle)
in the val array such that val[pntr[i] - 1] is the first element in row or
column i + 1. Refer to pointers array description in Skyline Storage
Scheme for more details.
On entry with transa = 'N' or 'n', the leading k-by-n part of the array b
must contain the matrix B, otherwise the leading m-by-n part of the array b
must contain the matrix B.
On entry, the leading m-by-n part of the array c must contain the matrix C,
otherwise the leading k-by-n part of the array c must contain the matrix C.
Output Parameters
mkl_?diasm
Solves a system of linear matrix equations for a
sparse matrix in the diagonal format with one-based
indexing (deprecated).
Syntax
void mkl_sdiasm (const char *transa , const MKL_INT *m , const MKL_INT *n , const float
*alpha , const char *matdescra , const float *val , const MKL_INT *lval , const MKL_INT
*idiag , const MKL_INT *ndiag , const float *b , const MKL_INT *ldb , float *c , const
MKL_INT *ldc );
void mkl_ddiasm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
double *alpha , const char *matdescra , const double *val , const MKL_INT *lval , const
MKL_INT *idiag , const MKL_INT *ndiag , const double *b , const MKL_INT *ldb , double
*c , const MKL_INT *ldc );
void mkl_cdiasm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8 *val , const MKL_INT
*lval , const MKL_INT *idiag , const MKL_INT *ndiag , const MKL_Complex8 *b , const
MKL_INT *ldb , MKL_Complex8 *c , const MKL_INT *ldc );
void mkl_zdiasm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16 *val , const MKL_INT
*lval , const MKL_INT *idiag , const MKL_INT *ndiag , const MKL_Complex16 *b , const
MKL_INT *ldb , MKL_Complex16 *c , const MKL_INT *ldc );
218
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h
Description
This routine is deprecated, but no replacement is available yet in the Inspector-Executor Sparse BLAS API
interfaces. You can continue using this routine until a replacement is provided and this can be fully removed.
The mkl_?diasm routine solves a system of linear equations with matrix-matrix operations for a sparse
matrix in the diagonal format:
C := alpha*inv(A)*B
or
C := alpha*inv(AT)*B,
where:
alpha is scalar, B and C are dense matrices, A is a sparse upper or lower triangular matrix with unit or non-
unit main diagonal, AT is the transpose of A.
NOTE
This routine supports only one-based indexing of the input arrays.
Input Parameters
matdescra Array of six elements, specifies properties of the matrix used for operation.
Only first four array elements are used, their possible values are given in
Table “Possible Values of the Parameter matdescra (descra)”. Possible
combinations of element values of this parameter are given in Table
“Possible Combinations of Element Values of the Parameter matdescra”.
idiag Array of length ndiag, contains the distances between main diagonal and
each non-zero diagonals in the matrix A.
219
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
All elements of this array must be sorted in increasing order.
On entry the leading m-by-n part of the array b must contain the matrix B.
Output Parameters
mkl_?skysm
Solves a system of linear matrix equations for a
sparse matrix stored using the skyline storage scheme
with one-based indexing (deprecated).
Syntax
void mkl_sskysm (const char *transa , const MKL_INT *m , const MKL_INT *n , const float
*alpha , const char *matdescra , const float *val , const MKL_INT *pntr , const float
*b , const MKL_INT *ldb , float *c , const MKL_INT *ldc );
void mkl_dskysm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
double *alpha , const char *matdescra , const double *val , const MKL_INT *pntr , const
double *b , const MKL_INT *ldb , double *c , const MKL_INT *ldc );
void mkl_cskysm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8 *val , const MKL_INT
*pntr , const MKL_Complex8 *b , const MKL_INT *ldb , MKL_Complex8 *c , const MKL_INT
*ldc );
void mkl_zskysm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_Complex16 *alpha , const char *matdescra , const MKL_Complex16 *val , const MKL_INT
*pntr , const MKL_Complex16 *b , const MKL_INT *ldb , MKL_Complex16 *c , const MKL_INT
*ldc );
Include Files
• mkl.h
Description
220
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
This routine is deprecated, but no replacement is available yet in the Inspector-Executor Sparse BLAS API
interfaces. You can continue using this routine until a replacement is provided and this can be fully removed.
The mkl_?skysm routine solves a system of linear equations with matrix-matrix operations for a sparse
matrix in the skyline storage format:
C := alpha*inv(A)*B
or
C := alpha*inv(AT)*B,
where:
alpha is scalar, B and C are dense matrices, A is a sparse upper or lower triangular matrix with unit or non-
unit main diagonal, AT is the transpose of A.
NOTE
This routine supports only one-based indexing of the input arrays.
Input Parameters
matdescra Array of six elements, specifies properties of the matrix used for operation.
Only first four array elements are used, their possible values are given in
Table “Possible Values of the Parameter matdescra (descra)”. Possible
combinations of element values of this parameter are given in Table
“Possible Combinations of Element Values of the Parameter matdescra”.
NOTE
General matrices (matdescra[0]='G') is not supported.
val Array containing the set of elements of the matrix A in the skyline profile
form.
If matdescrsa[2]= 'L', then val contains elements from the low triangle
of the matrix A.
If matdescrsa[2]= 'U', then val contains elements from the upper
triangle of the matrix A.
Refer to values array description in Skyline Storage Scheme for more
details.
pntr Array of length (m + 1) for lower triangle, and (n + 1) for upper triangle.
221
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
It contains the indices specifying the positions of the first element of the
matrix A in each row (for the lower triangle) or column (for upper triangle)
in the val array such that val[pntr[i] - 1] is the first element in row or
column i + 1. Refer to pointers array description in Skyline Storage
Scheme for more details.
On entry the leading m-by-n part of the array b must contain the matrix B.
Output Parameters
mkl_?dnscsr
Convert a sparse matrix in uncompressed
representation to the CSR format and vice versa
(deprecated).
Syntax
void mkl_ddnscsr (const MKL_INT *job , const MKL_INT *m , const MKL_INT *n , double
*adns , const MKL_INT *lda , double *acsr , MKL_INT *ja , MKL_INT *ia , MKL_INT *info );
void mkl_sdnscsr (const MKL_INT *job , const MKL_INT *m , const MKL_INT *n , float
*adns , const MKL_INT *lda , float *acsr , MKL_INT *ja , MKL_INT *ia , MKL_INT *info );
void mkl_cdnscsr (const MKL_INT *job , const MKL_INT *m , const MKL_INT *n ,
MKL_Complex8 *adns , const MKL_INT *lda , MKL_Complex8 *acsr , MKL_INT *ja , MKL_INT
*ia , MKL_INT *info );
void mkl_zdnscsr (const MKL_INT *job , const MKL_INT *m , const MKL_INT *n ,
MKL_Complex16 *adns , const MKL_INT *lda , MKL_Complex16 *acsr , MKL_INT *ja , MKL_INT
*ia , MKL_INT *info );
Include Files
• mkl.h
Description
This routine is deprecated, but no replacement is available yet in the Inspector-Executor Sparse BLAS API
interfaces. Either write your own (see the examples/c/sparse_blas/source/sparse_converters.c
example for hints) or continue using this routine until a replacement is provided and this can be fully
removed.
This routine converts a sparse matrix A between formats: stored as a rectangular array (dense
representation) and stored using compressed sparse row (CSR) format (3-array variation).
222
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
adns (input/output)
If the conversion type is from uncompressed to CSR, on input adns
contains an uncompressed (dense) representation of matrix A.
acsr (input/output)
If conversion type is from CSR to uncompressed, on input acsr contains
the non-zero elements of the matrix A. Its length is equal to the number of
non-zero elements in the matrix A. Refer to values array description in
Sparse Matrix Storage Formats for more details.
223
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
acsr, ja, ia If conversion type is from uncompressed to CSR, on output acsr, ja, and
ia contain the compressed sparse row (CSR) format (3-array variation) of
matrix A (see Sparse Matrix Storage Formats for a description of the
storage format).
info Integer info indicator only for restoring the matrix A from the CSR format.
If info=0, the execution is successful.
If info=i, the routine is interrupted processing the i-th row because there
is no space in the arrays acsr and ja according to the value nzmax.
mkl_?csrcoo
Converts a sparse matrix in the CSR format to the
coordinate format and vice versa (deprecated).
Syntax
void mkl_scsrcoo (const MKL_INT *job , const MKL_INT *n , float *acsr , MKL_INT *ja ,
MKL_INT *ia , MKL_INT *nnz , float *acoo , MKL_INT *rowind , MKL_INT *colind , MKL_INT
*info );
void mkl_dcsrcoo (const MKL_INT *job , const MKL_INT *n , double *acsr , MKL_INT *ja ,
MKL_INT *ia , MKL_INT *nnz , double *acoo , MKL_INT *rowind , MKL_INT *colind , MKL_INT
*info );
void mkl_ccsrcoo (const MKL_INT *job , const MKL_INT *n , MKL_Complex8 *acsr , MKL_INT
*ja , MKL_INT *ia , MKL_INT *nnz , MKL_Complex8 *acoo , MKL_INT *rowind , MKL_INT
*colind , MKL_INT *info );
void mkl_zcsrcoo (const MKL_INT *job , const MKL_INT *n , MKL_Complex16 *acsr , MKL_INT
*ja , MKL_INT *ia , MKL_INT *nnz , MKL_Complex16 *acoo , MKL_INT *rowind , MKL_INT
*colind , MKL_INT *info );
224
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h
Description
This routine is deprecated. Use the matrix manipulation routinesfrom the Intel® oneAPI Math Kernel Library
(oneMKL) Inspector-executor Sparse BLAS interface instead.
This routine converts a sparse matrix A stored in the compressed sparse row (CSR) format (3-array
variation) to coordinate format and vice versa.
Input Parameters
job[2]
If job[2]=0, zero-based indexing for the matrix in coordinate format is
used;
if job[2]=1, one-based indexing for the matrix in coordinate format is
used.
job[4]
job[4]=nzmax - maximum number of the non-zero elements allowed if
job[0]=0.
job[5] - job indicator.
For conversion to the coordinate format:
If job[5]=1, only array rowind is filled in for the output storage.
If job[5]=2, arrays rowind, colind are filled in for the output storage.
If job[5]=3, all arrays rowind, colind, acoo are filled in for the output
storage.
For conversion to the CSR format:
If job[5]=0, all arrays acsr, ja, ia are filled in for the output storage.
225
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If job[5]=2, then it is assumed that the routine already has been called
with the job[5]=1, and the user allocated the required space for storing
the output arrays acsr and ja.
nnz Specifies the number of non-zero elements of the matrix A for job[0]≠0.
acsr (input/output)
Array containing non-zero elements of the matrix A. Its length is equal to
the number of non-zero elements in the matrix A. Refer to values array
description in Sparse Matrix Storage Formats for more details.
acoo (input/output)
Array containing non-zero elements of the matrix A. Its length is equal to
the number of non-zero elements in the matrix A. Refer to values array
description in Sparse Matrix Storage Formats for more details.
rowind (input/output). Array of length nnz, contains the row indices for each non-
zero element of the matrix A.
Refer to rows array description in Coordinate Format for more details.
colind (input/output). Array of length nnz, contains the column indices for each
non-zero element of the matrix A. Refer to columns array description in
Coordinate Format for more details.
Output Parameters
nnz Returns the number of converted elements of the matrix A for job[0]=0.
info Integer info indicator only for converting the matrix A from the CSR format.
If info=0, the execution is successful.
226
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
mkl_?csrbsr
Converts a square sparse matrix in the CSR format to
the BSR format and vice versa (deprecated).
Syntax
void mkl_scsrbsr (const MKL_INT *job , const MKL_INT *m , const MKL_INT *mblk , const
MKL_INT *ldabsr , float *acsr , MKL_INT *ja , MKL_INT *ia , float *absr , MKL_INT *jab ,
MKL_INT *iab , MKL_INT *info );
void mkl_dcsrbsr (const MKL_INT *job , const MKL_INT *m , const MKL_INT *mblk , const
MKL_INT *ldabsr , double *acsr , MKL_INT *ja , MKL_INT *ia , double *absr , MKL_INT
*jab , MKL_INT *iab , MKL_INT *info );
void mkl_ccsrbsr (const MKL_INT *job , const MKL_INT *m , const MKL_INT *mblk , const
MKL_INT *ldabsr , MKL_Complex8 *acsr , MKL_INT *ja , MKL_INT *ia , MKL_Complex8 *absr ,
MKL_INT *jab , MKL_INT *iab , MKL_INT *info );
void mkl_zcsrbsr (const MKL_INT *job , const MKL_INT *m , const MKL_INT *mblk , const
MKL_INT *ldabsr , MKL_Complex16 *acsr , MKL_INT *ja , MKL_INT *ia , MKL_Complex16
*absr , MKL_INT *jab , MKL_INT *iab , MKL_INT *info );
Include Files
• mkl.h
Description
This routine is deprecated. Use the matrix manipulation routinesfrom the Intel® oneAPI Math Kernel Library
(oneMKL) Inspector-executor Sparse BLAS interface instead.
This routine converts a square sparse matrix A stored in the compressed sparse row (CSR) format (3-array
variation) to the block sparse row (BSR) format and vice versa.
Input Parameters
if job[0]=1, the matrix in the BSR format is converted to the CSR format.
job[1]
If job[1]=0, zero-based indexing for the matrix in CSR format is used;
job[2]
If job[2]=0, zero-based indexing for the matrix in the BSR format is used;
if job[2]=1, one-based indexing for the matrix in the BSR format is used.
job[3] is only used for conversion to CSR format. By default, the converter
saves the blocks without checking whether an element is zero or not. If
job[3]=1, then the converter only saves non-zero elements in blocks.
job[5] - job indicator.
For conversion to the BSR format:
227
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If job[5]=0, only arrays jab, iab are generated for the output storage.
If job[5]>0, all output arrays absr, jab, and iab are filled in for the
output storage.
If job[5]=-1, iab[m] - iab[0] returns the number of non-zero blocks.
m Actual row dimension of the matrix A for convert to the BSR format; block
row dimension of the matrix A for convert to the CSR format.
ldabsr Leading dimension of the array absr as declared in the calling program.
ldabsr must be greater than or equal to mblk*mblk.
acsr (input/output)
Array containing non-zero elements of the matrix A. Its length is equal to
the number of non-zero elements in the matrix A. Refer to values array
description in Sparse Matrix Storage Formats for more details.
absr (input/output)
Array containing elements of non-zero blocks of the matrix A. Its length is
equal to the number of non-zero blocks in the matrix A multiplied by
mblk*mblk. Refer to values array description in BSR Format for more
details.
jab (input/output). Array containing the column indices for each non-zero block
of the matrix A.
Its length is equal to the number of non-zero blocks of the matrix A. Refer
to columns array description in BSR Format for more details.
Output Parameters
info Integer info indicator only for converting the matrix A from the CSR format.
If info=0, the execution is successful.
228
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info=1, it means that mblk is equal to 0.
mkl_?csrcsc
Converts a square sparse matrix in the CSR format to
the CSC format and vice versa (deprecated).
Syntax
void mkl_dcsrcsc (const MKL_INT *job , const MKL_INT *n , double *acsr , MKL_INT *ja ,
MKL_INT *ia , double *acsc , MKL_INT *ja1 , MKL_INT *ia1 , MKL_INT *info );
void mkl_scsrcsc (const MKL_INT *job , const MKL_INT *n , float *acsr , MKL_INT *ja ,
MKL_INT *ia , float *acsc , MKL_INT *ja1 , MKL_INT *ia1 , MKL_INT *info );
void mkl_ccsrcsc (const MKL_INT *job , const MKL_INT *n , MKL_Complex8 *acsr , MKL_INT
*ja , MKL_INT *ia , MKL_Complex8 *acsc , MKL_INT *ja1 , MKL_INT *ia1 , MKL_INT *info );
void mkl_zcsrcsc (const MKL_INT *job , const MKL_INT *n , MKL_Complex16 *acsr , MKL_INT
*ja , MKL_INT *ia , MKL_Complex16 *acsc , MKL_INT *ja1 , MKL_INT *ia1 , MKL_INT *info );
Include Files
• mkl.h
Description
This routine is deprecated. Use the matrix manipulation routinesfrom the Intel® oneAPI Math Kernel Library
(oneMKL) Inspector-executor Sparse BLAS interface instead.
This routine converts a square sparse matrix A stored in the compressed sparse row (CSR) format (3-array
variation) to the compressed sparse column (CSC) format and vice versa.
Input Parameters
if job[0]=1, the matrix in the CSC format is converted to the CSR format.
job[1]
If job[1]=0, zero-based indexing for the matrix in CSR format is used;
job[2]
If job[2]=0, zero-based indexing for the matrix in the CSC format is used;
if job[2]=1, one-based indexing for the matrix in the CSC format is used.
229
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If job[5]≠0, all output arrays acsc, ja1, and ia1 are filled in for the
output storage.
For conversion to the CSR format:
If job[5]=0, only arrays ja, ia are filled in for the output storage.
If job[5]≠0, all output arrays acsr, ja, and ia are filled in for the output
storage.
acsr (input/output)
Array containing non-zero elements of the square matrix A. Its length is
equal to the number of non-zero elements in the matrix A. Refer to values
array description in Sparse Matrix Storage Formats for more details.
acsc (input/output)
Array containing non-zero elements of the square matrix A. Its length is
equal to the number of non-zero elements in the matrix A. Refer to values
array description in Sparse Matrix Storage Formats for more details.
ja1 (input/output). Array containing the row indices for each non-zero element
of the matrix A.
Its length is equal to the length of the array acsc. Refer to columns array
description in Sparse Matrix Storage Formats for more details.
Output Parameters
mkl_?csrdia
Converts a sparse matrix in the CSR format to the
diagonal format and vice versa (deprecated).
230
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void mkl_dcsrdia (const MKL_INT *job , const MKL_INT *n , double *acsr , MKL_INT *ja ,
MKL_INT *ia , double *adia , const MKL_INT *ndiag , MKL_INT *distance , MKL_INT
*idiag , double *acsr_rem , MKL_INT *ja_rem , MKL_INT *ia_rem , MKL_INT *info );
void mkl_scsrdia (const MKL_INT *job , const MKL_INT *n , float *acsr , MKL_INT *ja ,
MKL_INT *ia , float *adia , const MKL_INT *ndiag , MKL_INT *distance , MKL_INT *idiag ,
float *acsr_rem , MKL_INT *ja_rem , MKL_INT *ia_rem , MKL_INT *info );
void mkl_ccsrdia (const MKL_INT *job , const MKL_INT *n , MKL_Complex8 *acsr , MKL_INT
*ja , MKL_INT *ia , MKL_Complex8 *adia , const MKL_INT *ndiag , MKL_INT *distance ,
MKL_INT *idiag , MKL_Complex8 *acsr_rem , MKL_INT *ja_rem , MKL_INT *ia_rem , MKL_INT
*info );
void mkl_zcsrdia (const MKL_INT *job , const MKL_INT *n , MKL_Complex16 *acsr , MKL_INT
*ja , MKL_INT *ia , MKL_Complex16 *adia , const MKL_INT *ndiag , MKL_INT *distance ,
MKL_INT *idiag , MKL_Complex16 *acsr_rem , MKL_INT *ja_rem , MKL_INT *ia_rem , MKL_INT
*info );
Include Files
• mkl.h
Description
This routine is deprecated, but no replacement is available yet in the Inspector-Executor Sparse BLAS API
interfaces. Either write your own (see the examples/c/sparse_blas/source/sparse_converters.c
example for hints) or continue using this routine until a replacement is provided and this can be fully
removed.
This routine converts a sparse matrix A stored in the compressed sparse row (CSR) format (3-array
variation) to the diagonal format and vice versa.
Input Parameters
job[2]
If job[2]=0, zero-based indexing for the matrix in the diagonal format is
used;
if job[2]=1, one-based indexing for the matrix in the diagonal format is
used.
job[5] - job indicator.
231
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If job[5]≠0, each entry in the array adia is not checked whether it is zero.
acsr (input/output)
Array containing non-zero elements of the matrix A. Its length is equal to
the number of non-zero elements in the matrix A. Refer to values array
description in Sparse Matrix Storage Formats for more details.
adia (input/output)
Array of size (ndiag*idiag) containing diagonals of the matrix A.
The key point of the storage is that each element in the array adia retains
the row number of the original matrix. To achieve this diagonals in the
lower triangular part of the matrix are padded from the top, and those in
the upper triangular part are padded from the bottom.
ndiag Specifies the leading dimension of the array adia as declared in the calling
(sub)program, must be at least max(1, m).
distance Array of length idiag, containing the distances between the main diagonal
and each non-zero diagonal to be extracted. The distance is positive if the
diagonal is above the main diagonal, and negative if the diagonal is below
the main diagonal. The main diagonal has a distance equal to zero.
232
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
acsr_rem, ja_rem, ia_rem Remainder of the matrix in the CSR format if it is needed for conversion to
the diagonal format.
Output Parameters
mkl_?csrsky
Converts a sparse matrix in CSR format to the skyline
format and vice versa (deprecated).
Syntax
void mkl_dcsrsky (const MKL_INT *job , const MKL_INT *m , double *acsr , MKL_INT *ja ,
MKL_INT *ia , double *asky , MKL_INT *pointers , MKL_INT *info );
void mkl_scsrsky (const MKL_INT *job , const MKL_INT *m , float *acsr , MKL_INT *ja ,
MKL_INT *ia , float *asky , MKL_INT *pointers , MKL_INT *info );
void mkl_ccsrsky (const MKL_INT *job , const MKL_INT *m , MKL_Complex8 *acsr , MKL_INT
*ja , MKL_INT *ia , MKL_Complex8 *asky , MKL_INT *pointers , MKL_INT *info );
void mkl_zcsrsky (const MKL_INT *job , const MKL_INT *m , MKL_Complex16 *acsr , MKL_INT
*ja , MKL_INT *ia , MKL_Complex16 *asky , MKL_INT *pointers , MKL_INT *info );
Include Files
• mkl.h
Description
This routine is deprecated, but no replacement is available yet in the Inspector-Executor Sparse BLAS API
interfaces. Either write your own (see the examples/c/sparse_blas/source/sparse_converters.c
example for hints) or continue using this routine until a replacement is provided and this can be fully
removed.
This routine converts a sparse matrix A stored in the compressed sparse row (CSR) format (3-array
variation) to the skyline format and vice versa.
Input Parameters
job[2]
233
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If job[3]=1, the lower part of the matrix A in the CSR format is converted.
If job[5]=1, all output arrays asky and pointers are filled in for the
output storage.
acsr (input/output)
Array containing non-zero elements of the matrix A. Its length is equal to
the number of non-zero elements in the matrix A. Refer to values array
description in Sparse Matrix Storage Formats for more details.
asky (input/output)
Array, for a lower triangular part of A it contains the set of elements from
each row starting from the first none-zero element to and including the
diagonal element. For an upper triangular matrix it contains the set of
elements from each column of the matrix starting with the first non-zero
234
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
element down to and including the diagonal element. Encountered zero
elements are included in the sets. Refer to values array description in
Skyline Storage Format for more details.
pointers (input/output).
Array with dimension (m+1), where m is number of rows for lower triangle
(columns for upper triangle), pointers[i-1] - pointers[0] gives the
index of element in the array asky that is first non-zero element in row
(column)i . The value of pointers[m] is set to nnz + pointers[0],
where nnz is the number of elements in the array asky. Refer to pointers
array description in Skyline Storage Format for more details
Output Parameters
info Integer info indicator only for converting the matrix A from the CSR format.
If info=0, the execution is successful.
mkl_?csradd
Computes the sum of two matrices stored in the CSR
format (3-array variation) with one-based indexing
(deprecated).
Syntax
void mkl_dcsradd (const char *trans , const MKL_INT *request , const MKL_INT *sort ,
const MKL_INT *m , const MKL_INT *n , double *a , MKL_INT *ja , MKL_INT *ia , const
double *beta , double *b , MKL_INT *jb , MKL_INT *ib , double *c , MKL_INT *jc , MKL_INT
*ic , const MKL_INT *nzmax , MKL_INT *info );
void mkl_scsradd (const char *trans , const MKL_INT *request , const MKL_INT *sort ,
const MKL_INT *m , const MKL_INT *n , float *a , MKL_INT *ja , MKL_INT *ia , const
float *beta , float *b , MKL_INT *jb , MKL_INT *ib , float *c , MKL_INT *jc , MKL_INT
*ic , const MKL_INT *nzmax , MKL_INT *info );
void mkl_ccsradd (const char *trans , const MKL_INT *request , const MKL_INT *sort ,
const MKL_INT *m , const MKL_INT *n , MKL_Complex8 *a , MKL_INT *ja , MKL_INT *ia ,
const MKL_Complex8 *beta , MKL_Complex8 *b , MKL_INT *jb , MKL_INT *ib , MKL_Complex8
*c , MKL_INT *jc , MKL_INT *ic , const MKL_INT *nzmax , MKL_INT *info );
void mkl_zcsradd (const char *trans , const MKL_INT *request , const MKL_INT *sort ,
const MKL_INT *m , const MKL_INT *n , MKL_Complex16 *a , MKL_INT *ja , MKL_INT *ia ,
const MKL_Complex16 *beta , MKL_Complex16 *b , MKL_INT *jb , MKL_INT *ib ,
MKL_Complex16 *c , MKL_INT *jc , MKL_INT *ic , const MKL_INT *nzmax , MKL_INT *info );
Include Files
• mkl.h
Description
This routine is deprecated. Use mkl_sparse_?_addfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
235
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
C := A+beta*op(B)
where:
A, B, C are the sparse matrices in the CSR format (3-array variation).
op(B) is one of op(B) = B, or op(B) = BT, or op(B) = BH
beta is a scalar.
The routine works correctly if and only if the column indices in sparse matrix representations of matrices A
and B are arranged in the increasing order for each row. If not, use the parameter sort (see below) to
reorder column indices and the corresponding elements of the input matrices.
NOTE
This routine supports only one-based indexing of the input arrays.
Input Parameters
request If request=0, the routine performs addition. The memory for the output
arrays ic, jc, c must be allocated beforehand.
sort Specifies the type of reordering. If this parameter is not set (default), the
routine does not perform reordering.
If sort=1, the routine arranges the column indices ja for each row in the
increasing order and reorders the corresponding values of the matrix A in
the array a.
If sort=2, the routine arranges the column indices jb for each row in the
increasing order and reorders the corresponding values of the matrix B in
the array b.
If sort=3, the routine performs reordering for both input matrices A and B.
236
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
a Array containing non-zero elements of the matrix A. Its length is equal to
the number of non-zero elements in the matrix A. Refer to values array
description in Sparse Matrix Storage Formats for more details.
ja Array containing the column indices plus one for each non-zero element of
the matrix A. For each row the column indices must be arranged in the
increasing order.
The length of this array is equal to the length of the array a. Refer to
columns array description in Sparse Matrix Storage Formats for more
details.
jb Array containing the column indices plus one for each non-zero element of
the matrix B. For each row the column indices must be arranged in the
increasing order.
The length of this array is equal to the length of the array b. Refer to
columns array description in Sparse Matrix Storage Formats for more
details.
This array contains indices of elements in the array b, such that ib[i] -
ib[0] is the index in the array b of the first non-zero element from the row
i. The value of the last element ib[m] or ib[n] is equal to the number of
non-zero elements of the matrix B plus one. Refer to rowIndex array
description in Sparse Matrix Storage Formats for more details.
Output Parameters
jc Array containing the column indices plus one for each non-zero element of
the matrix C.
The length of this array is equal to the length of the array c. Refer to
columns array description in Sparse Matrix Storage Formats for more
details.
237
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If info=I>0, the routine stops calculation in the I-th row of the matrix C
because number of elements in C exceeds nzmax.
If info=-1, the routine calculates only the size of the arrays c and jc and
returns this value plus 1 as the last element of the array ic.
mkl_?csrmultcsr
Computes product of two sparse matrices stored in
the CSR format (3-array variation) with one-based
indexing (deprecated).
Syntax
void mkl_dcsrmultcsr (const char *trans , const MKL_INT *request , const MKL_INT
*sort , const MKL_INT *m , const MKL_INT *n , const MKL_INT *k , double *a , MKL_INT
*ja , MKL_INT *ia , double *b , MKL_INT *jb , MKL_INT *ib , double *c , MKL_INT *jc ,
MKL_INT *ic , const MKL_INT *nzmax , MKL_INT *info );
void mkl_scsrmultcsr (const char *trans , const MKL_INT *request , const MKL_INT
*sort , const MKL_INT *m , const MKL_INT *n , const MKL_INT *k , float *a , MKL_INT
*ja , MKL_INT *ia , float *b , MKL_INT *jb , MKL_INT *ib , float *c , MKL_INT *jc ,
MKL_INT *ic , const MKL_INT *nzmax , MKL_INT *info );
void mkl_ccsrmultcsr (const char *trans , const MKL_INT *request , const MKL_INT
*sort , const MKL_INT *m , const MKL_INT *n , const MKL_INT *k , MKL_Complex8 *a ,
MKL_INT *ja , MKL_INT *ia , MKL_Complex8 *b , MKL_INT *jb , MKL_INT *ib , MKL_Complex8
*c , MKL_INT *jc , MKL_INT *ic , const MKL_INT *nzmax , MKL_INT *info );
void mkl_zcsrmultcsr (const char *trans , const MKL_INT *request , const MKL_INT
*sort , const MKL_INT *m , const MKL_INT *n , const MKL_INT *k , MKL_Complex16 *a ,
MKL_INT *ja , MKL_INT *ia , MKL_Complex16 *b , MKL_INT *jb , MKL_INT *ib ,
MKL_Complex16 *c , MKL_INT *jc , MKL_INT *ic , const MKL_INT *nzmax , MKL_INT *info );
Include Files
• mkl.h
Description
This routine is deprecated. Use mkl_sparse_spmmfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?csrmultcsr routine performs a matrix-matrix operation defined as
C := op(A)*B
where:
A, B, C are the sparse matrices in the CSR format (3-array variation);
op(A) is one of op(A) = A, or op(A) =AT, or op(A) = AH .
238
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
You can use the parameter sort to perform or not perform reordering of non-zero entries in input and output
sparse matrices. The purpose of reordering is to rearrange non-zero entries in compressed sparse row matrix
so that column indices in compressed sparse representation are sorted in the increasing order for each row.
The following table shows correspondence between the value of the parameter sort and the type of
reordering performed by this routine for each sparse matrix involved:
Value of the parameter Reordering of A (arrays Reordering of B (arrays Reordering of C (arrays
sort a, ja, ia) b, ja, ib) c, jc, ic)
1 yes no yes
2 no yes yes
3 yes yes yes
4 yes no no
5 no yes no
6 yes yes no
7 no no no
arbitrary value not equal to no no yes
1, 2,..., 7
NOTE
This routine supports only one-based indexing of the input arrays.
Input Parameters
request If request=0, the routine performs multiplication, the memory for the
output arrays ic, jc, c must be allocated beforehand.
239
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ja Array containing the column indices plus one for each non-zero element of
the matrix A. For each row the column indices must be arranged in the
increasing order.
The length of this array is equal to the length of the array a. Refer to
columns array description in Sparse Matrix Storage Formats for more
details.
ia Array of length m + 1.
This array contains indices of elements in the array a, such that ia[i] -
ia[0] is the index in the array a of the first non-zero element from the row
i. The value of the last element ia[m] is equal to the number of non-zero
elements of the matrix A plus one. Refer to rowIndex array description in
Sparse Matrix Storage Formats for more details.
jb Array containing the column indices plus one for each non-zero element of
the matrix B. For each row the column indices must be arranged in the
increasing order.
The length of this array is equal to the length of the array b. Refer to
columns array description in Sparse Matrix Storage Formats for more
details.
This array contains indices of elements in the array b, such that ib[i] -
ib[0] is the index in the array b of the first non-zero element from the row
i. The value of the last element ib[n] or ib[m] is equal to the number of
non-zero elements of the matrix B plus one. Refer to rowIndex array
description in Sparse Matrix Storage Formats for more details.
Output Parameters
jc Array containing the column indices plus one for each non-zero element of
the matrix C.
The length of this array is equal to the length of the array c. Refer to
columns array description in Sparse Matrix Storage Formats for more
details.
240
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
This array contains indices of elements in the array c, such that ic[i] -
ic[0] is the index in the array c of the first non-zero element from the row
i. The value of the last element ic[m] or ic[n] is equal to the number of
non-zero elements of the matrix C plus one. Refer to rowIndex array
description in Sparse Matrix Storage Formats for more details.
If info=I>0, the routine stops calculation in the I-th row of the matrix C
because number of elements in C exceeds nzmax.
If info=-1, the routine calculates only the size of the arrays c and jc and
returns this value plus 1 as the last element of the array ic.
mkl_?csrmultd
Computes product of two sparse matrices stored in
the CSR format (3-array variation) with one-based
indexing. The result is stored in the dense matrix
(deprecated).
Syntax
void mkl_dcsrmultd (const char *trans , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , double *a , MKL_INT *ja , MKL_INT *ia , double *b , MKL_INT *jb , MKL_INT
*ib , double *c , MKL_INT *ldc );
void mkl_scsrmultd (const char *trans , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , float *a , MKL_INT *ja , MKL_INT *ia , float *b , MKL_INT *jb , MKL_INT
*ib , float *c , MKL_INT *ldc );
void mkl_ccsrmultd (const char *trans , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , MKL_Complex8 *a , MKL_INT *ja , MKL_INT *ia , MKL_Complex8 *b , MKL_INT
*jb , MKL_INT *ib , MKL_Complex8 *c , MKL_INT *ldc );
void mkl_zcsrmultd (const char *trans , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , MKL_Complex16 *a , MKL_INT *ja , MKL_INT *ia , MKL_Complex16 *b , MKL_INT
*jb , MKL_INT *ib , MKL_Complex16 *c , MKL_INT *ldc );
Include Files
• mkl.h
Description
This routine is deprecated. Use mkl_sparse_?_spmmdfrom the Intel® oneAPI Math Kernel Library (oneMKL)
Inspector-executor Sparse BLAS interface instead.
The mkl_?csrmultd routine performs a matrix-matrix operation defined as
C := op(A)*B
where:
A, B are the sparse matrices in the CSR format (3-array variation), C is dense matrix;
op(A) is one of op(A) = A, or op(A) =AT, or op(A) = AH .
The routine works correctly if and only if the column indices in sparse matrix representations of matrices A
and B are arranged in the increasing order for each row.
241
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
This routine supports only one-based indexing of the input arrays.
Input Parameters
ja Array containing the column indices plus one for each non-zero element of
the matrix A. For each row the column indices must be arranged in the
increasing order.
The length of this array is equal to the length of the array a. Refer to
columns array description in Sparse Matrix Storage Formats for more
details.
This array contains indices of elements in the array a, such that ia[i] -
ia[0] is the index in the array a of the first non-zero element from the row
i. The value of the last element ia[m] or ia[n] is equal to the number of
non-zero elements of the matrix A plus one. Refer to rowIndex array
description in Sparse Matrix Storage Formats for more details.
jb Array containing the column indices plus one for each non-zero element of
the matrix B. For each row the column indices must be arranged in the
increasing order.
The length of this array is equal to the length of the array b. Refer to
columns array description in Sparse Matrix Storage Formats for more
details.
ib Array of length m + 1.
This array contains indices of elements in the array b, such that ib[i] -
ib[0] is the index in the array b of the first non-zero element from the row
i. The value of the last element ib[m] is equal to the number of non-zero
elements of the matrix B plus one. Refer to rowIndex array description in
Sparse Matrix Storage Formats for more details.
242
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
ldc Specifies the leading dimension of the dense matrix C as declared in the
calling (sub)program. Must be at least max(m, 1) when trans = 'N' or
'n', or max(1, n) otherwise.
Sparse QR Routines
Sparse QR routines and their data types
NOTE The underdetermined systems of equations are not supported. The number of columns should
be less or equal to the number or rows.
For more information about the workflow of sparse QR functionality, refer to oneMKL Sparse QR solver.
Multifrontal Sparse QR Factorization Method for Solving a Sparse System of Linear Equations.
mkl_sparse_set_qr_hint
Define the pivot strategy for further calls of
mkl_sparse_?_qr.
Syntax
sparse_status_t mkl_sparse_set_qr_hint (sparse_matrix_t A, sparse_qr_hint_t hint);
Include Files
• mkl_sparse_qr.h
Description
You can use this routine to enable a pivot strategy in the case of an ill-conditioned matrix.
Input Parameters
243
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Return Values
mkl_sparse_?_qr
Computes the QR decomposition for the matrix of a
sparse linear system and calculates the solution.
Syntax
sparse_status_t mkl_sparse_d_qr ( sparse_operation_t operation, sparse_matrix_t A,
struct matrix_descr descr, sparse_layout_t layout, MKL_INT columns, double *x, MKL_INT
ldx, const double *b, MKL_INT ldb );
sparse_status_t mkl_sparse_s_qr ( sparse_operation_t operation, sparse_matrix_t A,
struct matrix_descr descr, sparse_layout_t layout, MKL_INT columns, float *x, MKL_INT
ldx, const float *b, MKL_INT ldb );
Include Files
• mkl_sparse_qr.h
Description
The mkl_sparse_?_qr routine computes the QR decomposition for the matrix of a sparse linear system A*x
= b, so that A = Q*R where Q is the orthogonal matrix and R is upper triangular, and calculates the solution.
NOTE
Currently, mkl_sparse_?_qr supports only square and overdetermined systems. For
underdetermined systems you can manually transpose the system matrix and use QR
decomposition for AT to get the minimum-norm solution for the original underdetermined
system.
244
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
NOTE Currently, mkl_sparse_?_qr supports only CSR format for the input matrix, non-
transpose operation, and single right-hand side.
Input Parameters
descr Structure specifying sparse matrix properties. Only the parameters listed here
are currently supported.
layout = layout =
SPARSE_LAYOUT_COLUMN_MAJOR SPARSE_LAYOUT_ROW_MAJOR
rows ldx Number of columns in A
(number of
rows in x)
cols columns ldx
(number of
columns in
x)
layout = layout =
SPARSE_LAYOUT_COLUMN_MAJOR SPARSE_LAYOUT_ROW_MAJOR
rows ldb Number of columns in A
(number of
rows in b)
245
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
Return Values
mkl_sparse_qr_reorder
Reordering step of SPARSE QR solver.
Syntax
sparse_status_t mkl_sparse_qr_reorder (sparse_matrix_t A, struct matrix_descr descr);
Include Files
• mkl_sparse_qr.h
Description
The mkl_sparse_qr_reorder routine performs ordering and symbolic analysis of matrix A.
NOTE Currently, mkl_sparse_qr_reorder supports only general structure and CSR format
for the input matrix.
Input Parameters
246
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
descr Structure specifying sparse matrix properties. Only the parameters listed here
are currently supported.
Return Values
mkl_sparse_?_qr_factorize
Factorization step of the SPARSE QR solver.
Syntax
sparse_status_t mkl_sparse_d_qr_factorize (sparse_matrix_t A, double *alt_values);
sparse_status_t mkl_sparse_s_qr_factorize (sparse_matrix_t A, float *alt_values);
Include Files
• mkl_sparse_qr.h
Description
The mkl_sparse_?_qr_factorize routine performs numerical factorization of matrix A. Prior to calling this
routine, the mkl_sparse_?_qr_reorder routine must be called for the matrix handle A. For more
information about the workflow of sparse QR functionality, refer to oneMKL Sparse QR solver. Multifrontal
Sparse QR Factorization Method for Solving a Sparse System of Linear Equations.
NOTE Currently, mkl_sparse_?_qr_factorize supports only CSR format for the input matrix.
Input Parameters
alt_values Array with alternative values. Must be the size of the non-zeroes in the initial
input matrix. When passed to the routine, these values will be used during the
factorization step instead of the values stored in handle A.
247
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Return Values
mkl_sparse_?_qr_solve
Solving step of the SPARSE QR solver.
Syntax
sparse_status_t mkl_sparse_d_qr_solve ( sparse_operation_t operation, sparse_matrix_t
A, double *alt_values, sparse_layout_t layout, MKL_INT columns, double *x, MKL_INT ldx,
const double *b, MKL_INT ldb );
sparse_status_t mkl_sparse_s_qr_solve ( sparse_operation_t operation, sparse_matrix_t
A, float *alt_values, sparse_layout_t layout, MKL_INT columns, float *x, MKL_INT ldx,
const float *b, MKL_INT ldb );
Include Files
• mkl_sparse_qr.h
Description
The mkl_sparse_?_qr_solve routine computes the solution of sparse systems of linear equations A*x =
b. Prior to calling this routine, the mkl_sparse_?_qr_factorize routine must be called for the matrix
handle A. For more information about the workflow of sparse QR functionality, refer to oneMKL Sparse QR
solver. Multifrontal Sparse QR Factorization Method for Solving a Sparse System of Linear Equations.
NOTE
Currently, mkl_sparse_?_qr_solve supports only CSR format for the input matrix, non-
transpose operation, and single right-hand side.
Alternative values are not supported and must be set to NULL.
Input Parameters
248
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
layout = layout =
SPARSE_LAYOUT_COLUMN_MAJOR SPARSE_LAYOUT_ROW_MAJOR
rows ldx Number of columns in A
(number of
rows in x)
cols columns ldx
(number of
columns in
x)
layout = layout =
SPARSE_LAYOUT_COLUMN_MAJOR SPARSE_LAYOUT_ROW_MAJOR
rows ldb Number of columns in A
(number of
rows in b)
cols columns ldb
(number of
columns in
b)
Output Parameters
Return Values
249
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
mkl_sparse_?_qr_qmult
First stage of the solving step of the SPARSE QR
solver.
Syntax
sparse_status_t mkl_sparse_d_qr_qmult ( sparse_operation_t operation, sparse_matrix_t
A, sparse_layout_t layout, MKL_INT columns, double *x, MKL_INT ldx, const double *b,
MKL_INT ldb );
sparse_status_t mkl_sparse_s_qr_qmult ( sparse_operation_t operation, sparse_matrix_t
A, sparse_layout_t layout, MKL_INT columns, float *x, MKL_INT ldx, const float *b,
MKL_INT ldb );
Include Files
• mkl_sparse_qr.h
Description
The mkl_sparse_?_qr_qmult routine computes multiplication of inversed matrix Q and right-hand side
matrix b. This routine can be used to perform the solving step in two separate calls as an alternative to a
single call of mkl_sparse_?_qr_solve.
NOTE Currently, mkl_sparse_?_qr_qmult supports only CSR format for the input matrix,
non-transpose operation, and single right-hand side.
Input Parameters
250
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
layout Describes the storage scheme for the dense matrix:
layout = layout =
SPARSE_LAYOUT_COLUMN_MAJOR SPARSE_LAYOUT_ROW_MAJOR
rows ldx Number of columns in A
(number of
rows in x)
cols columns ldx
(number of
columns in
x)
layout = layout =
SPARSE_LAYOUT_COLUMN_MAJOR SPARSE_LAYOUT_ROW_MAJOR
rows ldb Number of columns in A
(number of
rows in b)
cols columns ldb
(number of
columns in
b)
Output Parameters
Return Values
251
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
mkl_sparse_?_qr_rsolve
Second stage of the solving step of the SPARSE QR
solver.
Syntax
sparse_status_t mkl_sparse_d_qr_rsolve ( sparse_operation_t operation, sparse_matrix_t
A, sparse_layout_t layout, MKL_INT columns, double *x, MKL_INT ldx, const double *b,
MKL_INT ldb );
sparse_status_t mkl_sparse_s_qr_rsolve ( sparse_operation_t operation, sparse_matrix_t
A, sparse_layout_t layout, MKL_INT columns, float *x, MKL_INT ldx, const float *b,
MKL_INT ldb );
Include Files
• mkl_sparse_qr.h
Description
The mkl_sparse_?_qr_rsolve routine computes the solution of A*x = b.
NOTE Currently, mkl_sparse_?_qr_rsolve supports only CSR format for the input matrix,
non-transpose operation, and single right-hand side.
Input Parameters
layout = layout =
SPARSE_LAYOUT_COLUMN_MAJOR SPARSE_LAYOUT_ROW_MAJOR
252
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
rows ldx Number of columns in A
(number of
rows in x)
cols columns ldx
(number of
columns in
x)
layout = layout =
SPARSE_LAYOUT_COLUMN_MAJOR SPARSE_LAYOUT_ROW_MAJOR
rows ldb Number of columns in A
(number of
rows in b)
cols columns ldb
(number of
columns in
b)
Output Parameters
Return Values
253
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Overview
Many HPC applications rely on the application of BLAS and LAPACK operations on groups of very small
matrices. While existing batch Intel® oneAPI Math Kernel Library (oneMKL) BLAS routines already provide
meaningful speedup over OpenMP* loops around BLAS operations for these sizes, another customization
offers potential speedup by allocating matrices in aSIMD-friendly format, thus allowing for cross-matrix
vectorization in the BLAS and LAPACK routines of the Intel® oneAPI Math Kernel Library (oneMKL)
calledCompact BLAS and LAPACK.
The main idea behind these compact methods is to create true SIMD computations in which subgroups of
matrices are operated on with kernels that abstractly appear as scalar kernels, while registers are filled by
cross-matrix vectorization.
These are the BLAS/LAPACK compact functions:
• mkl_?gemm_compact
• mkl_?trsm_compact
• mkl_?potrf_compact
• mkl_?getrfnp_compact
• mkl_?geqrf_compact
• mkl_?getrinp_compact
The compact API provides additional service functions to refactor data. Because this capability is not specific
to any particular BLAS or LAPACK operation, this data manipulation can be executed once for an application's
data, allowing the entire program -- consisting of any number of BLAS and LAPACK operations for which
compact kernels have been written -- to be performed on the compact data without any refactoring. For
applications working on data in compact format, the packing function need not be used.
See "About the Compact Format" below for more details.
Along with this new data format, the API consists of two components:
• BLAS and LAPACK Compact Kernels: The first component of the API is a compact kernel that works on
matrices stored in compact format.
• Service Functions for the Compact Format: The second component of the API is a compact service
function allowing for data to be factored into and out of compact format. These are:
• mkl_?gepack_compact
• mkl_?geunpack_compact
• mkl_get_format_compact
• mkl_?get_size_compact
Note that there are some Numerical Limitations for the routines mentioned above.
254
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The figure below demonstrates the packing of a set of four 3 x 3 real-precision matrices into compact format.
The pack length for this example is V = 2, resulting in 2 compact packs.
255
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
For calculations involving complex precision, the real and imaginary parts of each matrix are packed
separately. In the figure below, the group of four 3 x 3 complex matrices is packed into compact format with
pack length V = 2. The first pack consists of the real parts of the first two matrices, and the second pack
consists of the imaginary parts of the first two matrices. Real and imaginary packs alternate in memory. This
storage format means that all compact arrays can be handled as a real type.
The particular specifications (size and number) of the compact packs for the architecture and problem-
precision definition are specified by an MKL_COMPACT_PACK enum type. For example: given a double-
precision problem involving a group of 128 matrices working on an architecture with a 256-bit SIMD vector
length, the optimal pack length is V = 4, and the number of packs is 32.
The initially-permitted values for the enum are:
• MKL_COMPACT_SSE - pack length 2 for double precision, pack length 4 for single precision.
• MKL_COMPACT_AVX - pack length 4 for double precision, pack length 8 for single precision.
• MKL_COMPACT_AVX512 - pack length 8 for double precision, pack length 16 for single precision.
For calculations involving complex precision, the pack length is the same; however, half of the packs store
the real parts of matrices, and half store the imaginary parts. The means that it takes double the number of
packs to store the same number of matrices.
256
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The above examples illustrate the case when the number of matrices is evenly-divisible by the pack length.
When this is not the case, there will be partially-unfilled packs at the end of the memory segment, and the
compact-packing routine will pad these partially unfilled packs with identity matrices, so that compact
routines use only the completely-filled registers in their calculations. The next figure illustrates this padding
for a group of three 3 x 3 real-precision matrices with a pack length of 2.
Before calling a BLAS or LAPACK compact function, the input data must be packed in compact format. After
execution, the output data should be unpacked from this compact format, unless another compact routine
will be called immediately following the first. Two service functions, mkl_?gepack_compact, and mkl_?
geunpack_compact, facilitate the process of storing matrices in compact format. It is recommended that the
user call the function mkl_get_format_compact before calling the mkl_?gepack_compactroutine to obtain the
optimal format for performance. Advanced users can pack and unpack the matrices themselves and still use
Intel® oneAPI Math Kernel Library (oneMKL) compact functions on the packed set.
Compact routines can only be called for groups of matrices that have the same dimensions, leading
dimension, and storage format. For example, the routine mkl_?getrfnp_compact, which calculates the LU
factorization of a group of m x n matrices without pivoting, can only be called for a group of matrices with
the same number of rows (m) and the same number of columns (n). All of the matrices must also be stored
in arrays with the same leading dimension, and all must be stored in the same storage format (column-major
or row-major).
mkl_?gemm_compact
Computes a matrix-matrix product of a set of compact
format general matrices.
Syntax
void mkl_sgemm_compact (MKL_LAYOUT layout, MKL_TRANSPOSE transa, MKL_TRANSPOSE transb,
MKL_INT m, MKL_INT n, MKL_INT k, float alpha, const float *ap, MKL_INT ldap, const float
*bp, MKL_INT ldbp, float beta, float *cp, MKL_INT ldcp, MKL_COMPACT_PACK format, MKL_INT
nm);
257
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Description
The mkl_?gemm_compact routine computes a scalar-matrix-matrix product and adds the result to a scalar-
matrix product for a group of nm general matrices Ac that have been stored in compact format. The operation
is defined for each matrix as:
Cc := alpha*op(Ac)*op(Bc) + beta*Cc
Where
Input Parameters
258
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ap Points to the beginning of the array that stores the nmAc matrices. See
Compact Format for more details.
transa=MKL_NOTRANS transa=MKL_TRANS or
transa=MKL_CONJTRANS
bp Points to the beginning of the array that stores the nmBc matrices. See
Compact Format for more details.
transb=MKL_NOTRANS transb=MKL_TRANS or
transb=MKL_CONJTRANS
cp Before entry, cp points to the beginning of the array that stores the nmCc
matrices, except when beta is equal to zero, in which case cp need not be
set on entry.
format Specifies the format of the compact matrices. See Compact Format or
mkl_get_format_compact for details.
259
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
The values of ldap, ldbp, and ldcp used in mkl_?gemm_compact must be consistent with the
values used in mkl_?get_size_compact, mkl_?gepack_compact, and mkl_?geunpack_compact.
Output Parameters
mkl_?trsm_compact
Solves a triangular matrix equation for a set of
general, m x n matrices that have been stored in
Compact format.
Syntax
mkl_strsm_compact (MKL_LAYOUT layout, MKL_SIDE side, MKL_UPLO uplo, MKL_TRANSPOSE
transa, MKL_DIAG diag, MKL_INT m, MKL_INT n, float alpha, const float *ap, MKL_INT
a_stride, float *bp, MKL_INT b_stide, MKL_COMPACT_PACK format, MKL_INT nm);
mkl_dtrsm_compact (MKL_LAYOUT layout, MKL_SIDE side, MKL_UPLO uplo, MKL_TRANSPOSE
transa, MKL_DIAG diag, MKL_INT m, MKL_INT n, double alpha, const double*ap, MKL_INT
a_stride, double *bp, MKL_INT b_stride, MKL_COMPACT_PACK format, MKL_INT nm);
mkl_ctrsm_compact (MKL_LAYOUT layout, MKL_SIDE side, MKL_UPLO uplo, MKL_TRANSPOSE
transa, MKL_DIAG diag, MKL_INT m, MKL_INT n, mkl_compact_complex_float *alpha, const
float *ap, MKL_INT a_stride, float *bp, MKL_INT b_stride, MKL_COMPACT_PACK format,
MKL_INT nm);
mkl_ztrsm_compact (MKL_LAYOUT layout, MKL_SIDE side, MKL_UPLO uplo, MKL_TRANSPOSE
transa, MKL_DIAG diag, MKL_INT m, MKL_INT n, mkl_compact_complex_double *alpha, const
double *ap, MKL_INT a_stride, double *bp, MKL_INT b_stride, MKL_COMPACT_PACK format,
MKL_INT nm);
Description
The routine solves one of the following matrix equations for a group of nm matrices:
op(Ac)*Xc = alpha*Bc,
or
Xc*op(Ac) = alpha*Bc
where:
alpha is a scalar, Xc and Bc are m-by-n matrices that have been stored in compact format, and Ac is a m-by-
m unit, or non-unit, upper or lower triangular matrix that has been stored in compact format.
op(Ac) is one of op(Ac) = Ac, or op(Ac) = AcT, or op(Ac) = AcH,
Bc is overwritten by the solution matrix Xc.
Input Parameters
side Specifies whether op(Ac) appears on the left or right of Xc in the equation:
260
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
if side = MKL_LEFT, then op(Ac)*Xc = alpha*Bc, if side = MKL_RIGHT, then
Xc*op(Ac) = alpha*Bc
261
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
The values of ldap and ldbp used in mkl_?trsm_compact must be consistent with the values
used in mkl_?get_size_compact, mkl_?gepack_compact, and mkl_?geunpack_compact.
Output Parameters
mkl_?potrf_compact
Computes the Cholesky factorization of a set of
symmetric (Hermitian), positive-definite matrices,
stored in Compact format (see Compact Format for
details).
Syntax
void mkl_spotrf_compact (MKL_LAYOUT layout, MKL_UPLO uplo, MKL_INT n, float * ap,
MKL_INT ldap, MKL_INT * info, MKL_COMPACT_PACK format, MKL_INT nm);
void mkl_cpotrf_compact (MKL_LAYOUT layout, MKL_UPLO uplo, MKL_INT n, float * ap,
MKL_INT ldap, MKL_INT * info, MKL_COMPACT_PACK format, MKL_INT nm);
void mkl_dpotrf_compact (MKL_LAYOUT layout, MKL_UPLO uplo, MKL_INT n, double * ap,
MKL_INT ldap, MKL_INT * info, MKL_COMPACT_PACK format, MKL_INT nm);
void mkl_zpotrf_compact (MKL_LAYOUT layout, MKL_UPLO uplo, MKL_INT n, double * ap,
MKL_INT ldap, MKL_INT * info, MKL_COMPACT_PACK format, MKL_INT nm);
Description
The routine forms the Cholesky factorization of a set of symmetric, positive definite (or, for complex data,
Hermitian, positive-definite), n x n matrices Ac, stored in Compact format, as:
• Ac = Uc T*Uc (for real data), Ac = Uc H*Uc (for complex data), if uplo = MKL_UPPER
• Ac = Lc*Lc T (for real data), Ac = Lc*Lc H (for complex data), if uplo = MKL_LOWER
where Lc is a lower triangular matrix, and Uc is upper triangular. The factorization (output) data will also be
stored in Compact format.
Before calling this routine, call mkl_?gepack_compact to store the matrices in the Compact format.
NOTE
Compact routines have some limitations; see Numerical Limitations.
Input Parameters
Indicates whether the upper or lower triangular part of Ac has been stored
and will be factored.
If uplo = MKL_UPPER, the upper triangular part of Ac is stored, and the
strictly lower triangular part of Ac is not referenced.
262
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If uplo = MKL_LOWER, the lower triangular part of Ac is stored, and the
strictly upper triangular part of Ac is not referenced.
Application Notes:
Before calling this routine,mkl_?gepack_compact must be called. After calling this routine,
mkl_?geunpack_compact should be called, unless another compact routine will be called for the Compact
format matrices.
The total number of floating-point operations is approximately nm* (1/3) n 3 for real flavors and nm* (4/3) n
3 for complex flavors.
Output Parameters
mkl_?getrfnp_compact
The routine computes the LU factorization, without
pivoting, of a set of general, m x n matrices that have
been stored in Compact format (see Compact
Format).
Syntax
void mkl_sgetrfnp_compact (MKL_LAYOUT layout, MKL_INT m, MKL_INT n, float * ap, MKL_INT
ldap, MKL_INT * info, MKL_COMPACT_PACK format, MKL_INT nm);
void mkl_dgetrfnp_compact (MKL_LAYOUT layout, MKL_INT m, MKL_INT n, double * ap,
MKL_INT ldap, MKL_INT * info, MKL_COMPACT_PACK format, MKL_INT nm);
void mkl_cgetrfnp_compact (MKL_LAYOUT layout, MKL_INT m, MKL_INT n, float * ap, MKL_INT
ldap, MKL_INT * info, MKL_COMPACT_PACK format, MKL_INT nm);
void mkl_zgetrfnp_compact (MKL_LAYOUT layout, MKL_INT m, MKL_INT n, double * ap,
MKL_INT ldap, MKL_INT * info, MKL_COMPACT_PACK format, MKL_INT nm);
Description
The mkl_?getrfnp_compact routine calculates the LU factorizations of a set of nm general (m x n) matrices
A, stored in Compact format, as Ac = Lc*Uc. The factorization (output) data will also be stored in Compact
format.
263
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
Compact routines have some limitations; see Numerical Limitations.
Input Parameters
format Specifies the format of the compact matrices. See Compact Format
or mkl_get_format_compact for details.
nm Total number of matrices stored in Compact format.
Application Notes:
Before calling this routine, mkl_?gepack_compact must be called. After calling this routine, mkl_?
geunpack_compact should be called, unless another compact routine will be subsequently called for the
Compact format matrices.
The approximate number of floating-point operations for real flavors is:
nm*(2/3)n3, if m = n,
nm*(1/3)n2(3m-n), if m > n,
nm*(1/3)m2(3n-m), if m < n.
The number of operations for complex flavors is four times greater. Directly after calling this routine, you can
call the following:
mkl_?getrinp_compact, for computing the inverse of the nm input matrices in Compact format
Output Parameters
mkl_?geqrf_compact
Computes the QR factorization of a set of general m x
n, matrices, stored in Compact format (see Compact
Format for details).
Syntax
void mkl_sgeqrf_compact (MKL_LAYOUT layout, MKL_INT m, MKL_INT n, float * ap, MKL_INT
ldap, float * taup, float * work, MKL_INT lwork, MKL_INT * info, MKL_COMPACT_PACK
format, MKL_INT nm);
264
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void mkl_cgeqrf_compact (MKL_LAYOUT layout, MKL_INT m, MKL_INT n, float * ap, MKL_INT
ldap, float * taup, float * work, MKL_INT lwork, MKL_INT * info, MKL_COMPACT_PACK
format, MKL_INT nm);
void mkl_dgeqrf_compact (MKL_LAYOUT layout, MKL_INT m, MKL_INT n, double * ap, MKL_INT
ldap, double * taup, double * work, MKL_INT lwork, MKL_INT * info, MKL_COMPACT_PACK
format, MKL_INT nm);
void mkl_zgeqrf_compact (MKL_LAYOUT layout, MKL_INT m, MKL_INT n, double * ap, MKL_INT
ldap, double * taup, double * work, MKL_INT lwork, MKL_INT * info, MKL_COMPACT_PACK
format, MKL_INT nm);
Description
The routine forms the QR factorization of a set of general, m x n matrices A, stored in Compact format. The
routine does not form the Q factors explicitly. Instead, Q is represented as a product of min(m,n) elementary
reflectors. The factorization (output) data will also be stored in Compact format.
NOTE
Compact routines have some limitations; see Numerical Limitations.
Input Parameters
lwork The size of the work array. If lwork = -1, a workspace query is
assumed; the routine only calculates the optimal size of the work
array and returns this value as the first entry of the work array.
format Specifies the format of the compact matrices. See Compact Format
or mkl_get_format_compact for details.
nm Total number of matrices stored in Compact format.
Application Notes:
The compact array that will store the elementary reflectors needs to be allocated before the routine is called
and unpacked after. First, the routine mkl_?get_size_compact should be called, to determine the size of taup,
and memory for taup should be allocated. After calling mkl_?geqrf_compact, taup stores the elementary
reflectors in compact form, so should be unpacked using mkl_?geunpack_compact. See Compact Format for
more details, or reference the example below. (Note: the following example is meant to demonstrate the
calling sequence to allocate memory and unpack taup. All other parameters are assumed to be already set
up before the sequence below is executed.)
MKL_R_TYPE *tau_array[nm];
// ...
tau_buffer_size = mkl_?get_size_compact(min(m, n), 1, format, nm);
265
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
work[0] On exit contains the minimum value of lwork required for optimum
performance. Use this lwork for subsequent runs.
info The parameter is not currently used in this routine. It is reserved for
the future use.
mkl_?getrinp_compact
Computes the inverse of a set of LU-factorized general
matrices, without pivoting, stored in the compact
format (see Compact Format for details).
Syntax
void mkl_sgetrinp_compact (MKL_LAYOUT layout, MKL_INT n, float * ap, MKL_INT ldap,
float * work, MKL_INT lwork, MKL_INT * info, MKL_COMPACT_PACK format, MKL_INT nm);
void mkl_dgetrinp_compact (MKL_LAYOUT layout, MKL_INT n, double * ap, MKL_INT ldap,
double * work, MKL_INT lwork, MKL_INT * info, MKL_COMPACT_PACK format, MKL_INT nm);
void mkl_cgetrinp_compact (MKL_LAYOUT layout, MKL_INT n, float * ap, MKL_INT ldap,
float * work, MKL_INT lwork, MKL_INT * info, MKL_COMPACT_PACK format, MKL_INT nm);
void mkl_zgetrinp_compact (MKL_LAYOUT layout, MKL_INT n, double * ap, MKL_INT ldap,
double * work, MKL_INT lwork, MKL_INT * info, MKL_COMPACT_PACK format, MKL_INT nm);
Description
This routine computes the inverse inv( Ac) of a set of general, n x n matrices Ac, that have been stored in
Compact format. The factorization (output) data will also be stored in Compact format.
NOTE
Compact routines have some limitations; see Numerical Limitations.
266
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
lwork The size of the work array. If lwork = -1, a workspace query is
assumed; the routine calculates only the optimal size of the work
array and returns this value as the first entry of the work array.
format Specifies the format of the compact matrices. See Compact Format
or mkl_get_format_compact for details.
nm Total number of matrices stored in Compact format.
Application Notes:
Before calling this routine, mkl_?gepack_compact must be called. After calling this routine,
mkl_?geunpack_compact should be called, unless another compact routine will be subsequently called on
the Compact format matrices.
The total number of floating-point operations is approximately nm* (4/3) n 3 for real flavors and nm* (16/3) n
3for complex flavors.
Output Parameters
267
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Matrices scaled near underflow/overflow: the LAPACK compact routines do not provide safe handling for
values near underflow/overflow. This means that Compact routines may return incorrect results for such
matrices. This limitation is related to compact routine for QR: mkl_?geqrf_compact.
It is the responsibility of the user to ensure that the input matrices can be factorized, inverted, and/or solved
given these numerical limitations.
mkl_?get_size_compact
Returns the buffer size, in bytes, needed to pack data
in Compact format.
Syntax
MKL_INT mkl_sget_size_compact (MKL_INT ld, MKL_INT sd, MKL_COMPACT_PACK format, MKL_INT
nm);
MKL_INT mkl_dget_size_compact (MKL_INT ld, MKL_INT sd, MKL_COMPACT_PACK format, MKL_INT
nm);
MKL_INT mkl_cget_size_compact (MKL_INT ld, MKL_INT sd, MKL_COMPACT_PACK format, MKL_INT
nm);
MKL_INT mkl_zget_size_compact (MKL_INT ld, MKL_INT sd, MKL_COMPACT_PACK format, MKL_INT
nm);
Description
The routine returns the buffer size, in bytes, required for mkl_?gepack_compact.
Input Parameters
Application Notes:
Before calling this routine, mkl_?get_format_compact can be called to determine the optimal format.
After calling this routine and allocating the amount of memory indicated by size, the user can call
mkl_?gepack_compact to pack the nm input matrices in Compact format.
Return Values
This function returns a value size.
size The buffer size, in bytes, required by the packing function
mkl_?gepack_compact.
mkl_get_format_compact
Returns the optimal compact packing format for the
architecture, needed for all compact routines.
Syntax
MKL_COMPACT_PACK mkl_get_format_compact ();
268
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
The routine returns the optimal compact packing format, which is an MKL_COMPACT_PACK type, for the
current architecture. The optimal value of format is determined by the architecture's vector-register length.
format is a required parameter for any packing, unpacking, or BLAS/LAPACK compact routine. See Compact
Format for details.
Return Values
The function returns a value format.
format format can be returned as any of the following three
values. MKL_COMPACT_AVX512 is the optimal format
value for:
Application Notes:
After calling this routine, mkl_?get_size_compact can be called to calculate the buffer size needed for
mkl_?gepack_compact.
mkl_?gepack_compact
Packs matrices from standard (row or column-major)
format to Compact format.
Syntax
mkl_sgepack_compact(MKL_LAYOUT layout, MKL_INT rows, MKL_INT columns, const float *
const *a, MKL_INT lda, float *ap, MKL_INT ldap, MKL_COMPACT_PACK format, MKL_INT nm);
mkl_dgepack_compact(MKL_LAYOUT layout, MKL_INT rows, MKL_INT columns, const double *
const *a, MKL_INT lda, double *ap, MKL_INT ldap, MKL_COMPACT_PACK format, MKL_INT nm);
mkl_cgepack_compact (MKL_LAYOUT layout, MKL_INT rows, MKL_INT columns, const
mkl_compact_complex_float * const *a, MKL_INT lda, float *ap, MKL_INT ldap,
MKL_COMPACT_PACK format, const MKL_INT nm);
269
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Description
The routine packs nm matrices A from standard format (row or column-major, pointer to pointer) in a into
Compact format, storing the new compact format matrices Ac in array ap.
Input Parameters
NOTE
The values of ldap used in mkl_?gepack_compact must be
consistent with the values used in mkl_?get_size_compact and
mkl_?geunpack_compact.
format Specifies the format of the compact matrices. See Compact Format
or mkl_get_format_compact for details.
nm Total number of matrices that will be stored in Compact format.
Application Notes:
Directly after calling this routine, any BLAS or LAPACK compact routine can be called. Unpacking matrices
from Compact format can be done by calling mkl_?geunpack_compact.
Output Parameters
ap Array storing the compact format input matrices Ac. ap must have
size at least size = mkl_?get_size_compact.
270
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
mkl_?geunpack_compact
Unpacks matrices from Compact format to standard
(row- or column-major, pointer-to-pointer) format.
Syntax
mkl_sgeunpack_compact (MKL_LAYOUT layout, MKL_INT rows, MKL_INT columns, float * const
*a, MKL_INT lda, const float *ap, MKL_INT ldap, MKL_COMPACT_PACK format, MKL_INT nm);
mkl_dgeunpack_compact (MKL_LAYOUT layout, MKL_INT rows, MKL_INT columns, double * const
*a, MKL_INT lda, const double *ap, MKL_INT ldap, MKL_COMPACT_PACK format, MKL_INT nm);
mkl_cgeunpack_compact (MKL_LAYOUT layout, MKL_INT rows, MKL_INT columns,
mkl_compact_complex_float * const *a, MKL_INT lda, const float *ap, MKL_INT ldap,
MKL_COMPACT_PACK format, MKL_INT nm);
mkl_zgeunpack_compact (MKL_LAYOUT layout, MKL_INT rows, MKL_INT columns,
mkl_compact_complex_double * const *a, MKL_INT lda, const double *ap, MKL_INT ldap,
MKL_COMPACT_PACK format, MKL_INT nm);
Description
The routine unpacks nm Compact format matrices Ac from array ap into standard (row- or column-major,
pointer-to-pointer) format in array A.
Input Parameters
ap Array storing the compact format of input matrices Ac. See Compact
Formator mkl_get_format_compact for details.
271
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
The values of ldap used in mkl_?geunpack_compact must be
consistent with the values used in mkl_?get_size_compact and
mkl_?gepack_compact.
format Specifies the format of the compact matrices. See Compact Format
ormkl_get_format_compact for details.
nm Total number of matrices that will be stored in Compact format.
Output Parameters
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
272
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The data type is included in the name only if the function accepts dense matrix or scalar floating point
parameters.
The <operation> field indicates the type of operation:
optimize analyze the matrix using hints and store optimization information in matrix
handle
spmm/spmmd compute sparse matrix by sparse matrix product and store the result as a
sparse/dense matrix
sypr compute the symmetric or Hermitian product of sparse matrices and store the
result as a sparse matrix
syprd compute the symmetric or Hermitian product of sparse and dense matrices and
store the result as a dense matrix
syrk compute the product of sparse matrix with its transposed matrix and store the
result as a sparse matrix
syrkd compute the product of sparse matrix with its transposed matrix and store the
result as a dense matrix
bsr block sparse row format plus variations. Fill out either rows_start and rows_end
(for 4-arrays representation) or rowIndex array (for 3-array BSR/CSR).
273
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
csr compressed sparse row format plus variations. Fill out either rows_start and
rows_end (for 4-arrays representation) or rowIndex array (for 3-array BSR/
CSR).
csc compressed sparse column format plus variations. Fill out either cols_start
and cols_end (for 4-arrays representation) or colIndex array (for 3 array
CSC).
The format is included in the function name only if the function parameters include an explicit sparse matrix
in one of the conventional sparse matrix formats.
274
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
NOTE The multistage approach currently does not allow you to allocate memory for the output matrix
outside oneMKL.
NOTE The format of the output is decided internally but can be checked using the export functionality
mkl_sparse_?_export_<format>.
2. The second stage allocates data and computes column or row indices (depending on the format) of
non-zero elements and/or values of the output matrix.
Specifying the stage for execution is supported through the sparse_request_t parameter in the API with
the following options:
Values for sparse_request_t parameter
Value
Description
SPARSE_STAGE_NNZ_COUN
T
Allocates and computes only the rows_start/rows_end (CSR/BSR format) or
cols_start/cols_end (CSC format) arrays for the output matrix. After this
stage, by calling mkl_sparse_?_export_<format>, you can obtain the
number of non-zeros in the output matrix and calculate the amount of
memory required for the output matrix.
SPARSE_STAGE_FINALIZE_
MULT_NO_VAL
Allocates and computes row/column indices provided that rows_start/
rows_end or cols_start/cols_end have already been computed in a prior call
with the request SPARSE_STAGE_NNZ_COUNT. The values of the output
matrix are not computed.
SPARSE_STAGE_FINALIZE_
MULT
Depending on the state of the output matrix C on entry to the routine, this
stage does one of the following:
• Allocates and computes row/column indices and values of nonzero
elements, if only rows_start/rows_end or cols_start/cols_end are present
• allocates and computes values of nonzero elements, if rows_start/
rows_end or cols_start/cols_end and row/column indices of non-zero
elements are present
SPARSE_STAGE_FULL_MULT
_NO_VAL
Allocates and computes the output matrix structure in a single step. The
values of the output matrix are not computed.
SPARSE_STAGE_FULL_MULT
Allocates and computes the entire output matrix (structure and values) in a
single step.
275
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The example below shows how you can use the two-stage approach for estimating the memory requirements
for the output matrix in CSR format:
First stage (sparse_request_t = SPARSE_STAGE_NNZ_COUNT)
1. The routine mkl_sparse_sp2m is called with the request parameter SPARSE_STAGE_NNZ_COUNT.
2. The arrays rows_start and rows_end are exported using the mkl_sparse_x_export_csr routine.
3. These arrays are used to calculate the number of non-zeros (nnz) of the resulting output matrix.
Note that by the end of the first stage, the arrays associated with column indices and values of the output
matrix have not been allocated or computed yet.
/* optional calculation of nnz in the output matrix for getting a memory estimate */
status = mkl_sparse_sp2m (opA, descrA, csrA, opB, descrB, csrB, SPARSE_STAGE_FULL_MULT, &csrC);
276
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Routine or Data Types Description
Function Group
mkl_sparse_?_create_csr
Creates a handle for a CSR-format matrix.
Syntax
sparse_status_t mkl_sparse_s_create_csr (sparse_matrix_t *A, const sparse_index_base_t
indexing, const MKL_INT rows, const MKL_INT cols, MKL_INT *rows_start, MKL_INT
*rows_end, MKL_INT *col_indx, float *values);
sparse_status_t mkl_sparse_d_create_csr (sparse_matrix_t *A, const sparse_index_base_t
indexing, const MKL_INT rows, const MKL_INT cols, MKL_INT *rows_start, MKL_INT
*rows_end, MKL_INT *col_indx, double *values);
sparse_status_t mkl_sparse_c_create_csr (sparse_matrix_t *A, const sparse_index_base_t
indexing, const MKL_INT rows, const MKL_INT cols, MKL_INT *rows_start, MKL_INT
*rows_end, MKL_INT *col_indx, MKL_Complex8 *values);
sparse_status_t mkl_sparse_z_create_csr (sparse_matrix_t *A, const sparse_index_base_t
indexing, const MKL_INT rows, const MKL_INT cols, MKL_INT *rows_start, MKL_INT
*rows_end, MKL_INT *col_indx, MKL_Complex16 *values);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_?_create_csr routine creates a handle for an m-by-k matrix A in CSR format.
NOTE
The input arrays provided are left unchanged except for the call to mkl_sparse_order, which
performs ordering of column indexes of the matrix. To avoid any changes to the input data,
use mkl_sparse_copy.
277
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
rows_start Array of length at least rows. This array contains row indices, such that
rows_start[i] - indexing is the first index of row i in the arrays values
and col_indx. The value of indexing is 0 for zero-based indexing and 1
for one-based indexing.
Refer to pointerB array description in CSR Format for more details.
rows_end Array of at least length rows. This array contains row indices, such that
rows_end[i] - indexing - 1 is the last index of row i in the arrays
values and col_indx. The value of indexing is 0 for zero-based indexing
and 1 for one-based indexing.
Refer to pointerE array description in CSR Format for more details.
col_indx For one-based indexing, array containing the column indices plus one for
each non-zero element of the matrix A. For zero-based indexing, array
containing the column indices for each non-zero element of the matrix A.
Its length is at least rows_end[rows - 1] - indexing.
values Array containing non-zero elements of the matrix A. Its length is equal to
length of the col_indx array.
Output Parameters
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
278
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.
mkl_sparse_?_create_csc
Creates a handle for a CSC format matrix.
Syntax
sparse_status_t mkl_sparse_s_create_csc (sparse_matrix_t *A, const sparse_index_base_t
indexing, const MKL_INT rows, const MKL_INT cols, MKL_INT *cols_start, MKL_INT
*cols_end, MKL_INT *row_indx, float *values);
sparse_status_t mkl_sparse_d_create_csc (sparse_matrix_t *A, const sparse_index_base_t
indexing, const MKL_INT rows, const MKL_INT cols, MKL_INT *cols_start, MKL_INT
*cols_end, MKL_INT *row_indx, double *values);
sparse_status_t mkl_sparse_c_create_csc (sparse_matrix_t *A, const sparse_index_base_t
indexing, const MKL_INT rows, const MKL_INT cols, MKL_INT *cols_start, MKL_INT
*cols_end, MKL_INT *row_indx, MKL_Complex8 *values);
sparse_status_t mkl_sparse_z_create_csc (sparse_matrix_t *A, const sparse_index_base_t
indexing, const MKL_INT rows, const MKL_INT cols, MKL_INT *cols_start, MKL_INT
*cols_end, MKL_INT *row_indx, MKL_Complex16 *values);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_?_create_csc routine creates a handle for an m-by-k matrix A in CSC format.
NOTE
The input arrays provided are left unchanged except for the call to mkl_sparse_order, which
performs ordering of column indexes of the matrix. To avoid any changes to the input data,
use mkl_sparse_copy.
Input Parameters
cols_start Array of length at least m. This array contains col indices, such that
cols_start[i] - ind is the first index of col i in the arrays values and
row_indx. ind takes 0 for zero-based indexing and 1 for one-based
indexing.
279
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
cols_end Array of at least length m. This array contains col indices, such that
cols_end[i] - ind - 1 is the last index of col i in the arrays values and
row_indx. ind takes 0 for zero-based indexing and 1 for one-based
indexing.
Refer to pointerE array description in CSC Format for more details.
row_indx For one-based indexing, array containing the row indices plus one for each
non-zero element of the matrix A. For zero-based indexing, array containing
the row indices for each non-zero element of the matrix A. Its length is at
least cols_end[cols - 1] - ind. ind takes 0 for zero-based indexing and
1 for one-based indexing.
values Array containing non-zero elements of the matrix A. Its length is equal to
length of the row_indx array.
Output Parameters
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
mkl_sparse_?_create_coo
Creates a handle for a matrix in COO format.
Syntax
sparse_status_t mkl_sparse_s_create_coo (sparse_matrix_t *A, const sparse_index_base_t
indexing, const MKL_INT rows, const MKL_INT cols, const MKL_INT nnz, MKL_INT *row_indx,
MKL_INT * col_indx, float *values);
sparse_status_t mkl_sparse_d_create_coo (sparse_matrix_t *A, const sparse_index_base_t
indexing, const MKL_INT rows, const MKL_INT cols, const MKL_INT nnz, MKL_INT *row_indx,
MKL_INT * col_indx, double *values);
sparse_status_t mkl_sparse_c_create_coo (sparse_matrix_t *A, const sparse_index_base_t
indexing, const MKL_INT rows, const MKL_INT cols, const MKL_INT nnz, MKL_INT *row_indx,
MKL_INT * col_indx, MKL_Complex8 *values);
280
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
sparse_status_t mkl_sparse_z_create_coo (sparse_matrix_t *A, const sparse_index_base_t
indexing, const MKL_INT rows, const MKL_INT cols, const MKL_INT nnz, MKL_INT *row_indx,
MKL_INT * col_indx, MKL_Complex16 *values);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_?_create_coo routine creates a handle for an m-by-k matrix A in COO format.
NOTE
The input arrays provided are left unchanged except for the call to mkl_sparse_order, which
performs ordering of column indexes of the matrix. To avoid any changes to the input data,
use mkl_sparse_copy.
Input Parameters
row_indx Array of length nnz, containing the row indices for each non-zero element
of matrix A.
Refer to rows array description in Coordinate Format for more details.
col_indx Array of length nnz, containing the column indices for each non-zero
element of matrix A.
Refer to columns array description in Coordinate Format for more details.
Output Parameters
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
281
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
mkl_sparse_?_create_bsr
Creates a handle for a matrix in BSR format.
Syntax
sparse_status_t mkl_sparse_s_create_bsr (sparse_matrix_t *A, const sparse_index_base_t
indexing, const sparse_layout_t block_layout, const MKL_INT rows, const MKL_INT cols,
const MKL_INT block_size, MKL_INT *rows_start, MKL_INT *rows_end, MKL_INT *col_indx,
float *values);
sparse_status_t mkl_sparse_d_create_bsr (sparse_matrix_t *A, const sparse_index_base_t
indexing, const sparse_layout_t block_layout, const MKL_INT rows, const MKL_INT cols,
const MKL_INT block_size, MKL_INT *rows_start, MKL_INT *rows_end, MKL_INT *col_indx,
double *values);
sparse_status_t mkl_sparse_c_create_bsr (sparse_matrix_t *A, const sparse_index_base_t
indexing, const sparse_layout_t block_layout, const MKL_INT rows, const MKL_INT cols,
const MKL_INT block_size, MKL_INT *rows_start, MKL_INT *rows_end, MKL_INT *col_indx,
MKL_Complex8 *values);
sparse_status_t mkl_sparse_z_create_bsr (sparse_matrix_t *A, const sparse_index_base_t
indexing, const sparse_layout_t block_layout, const MKL_INT rows, const MKL_INT cols,
const MKL_INT block_size, MKL_INT *rows_start, MKL_INT *rows_end, MKL_INT *col_indx,
MKL_Complex16 *values);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_?_create_bsr routine creates a handle for an m-by-k matrix A in BSR format.
NOTE
The input arrays provided are left unchanged except for the call to mkl_sparse_order, which
performs ordering of column indexes of the matrix. To avoid any changes to the input data,
use mkl_sparse_copy.
Input Parameters
282
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
SPARSE_INDEX_BASE_ONE One-based (Fortran-style) indexing: indices
start at 1.
rows_start Array of length m. This array contains row indices, such that
rows_start[i] - ind is the first index of block row i in the arrays values
and col_indx. ind takes 0 for zero-based indexing and 1 for one-based
indexing.
Refer to pointerB array description in CSR Format for more details.
rows_end Array of length m. This array contains row indices, such that rows_end[i]
- ind- 1 is the last index of block row i in the arrays values and
col_indx. ind takes 0 for zero-based indexing and 1 for one-based
indexing.
Refer to pointerE array description in CSR Format for more details.
col_indx For one-based indexing, array containing the column indices plus one for
each non-zero block of the matrix A. For zero-based indexing, array
containing the column indices for each non-zero block of the matrix A. Its
length is rows_end[rows - 1] - ind. ind takes 0 for zero-based indexing
and 1 for one-based indexing.
values Array containing non-zero elements of the matrix A. Its length is equal to
length of the col_indx array multiplied by block_size*block_size.
Refer to the values array description in BSR Format for more details.
Output Parameters
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
283
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
mkl_sparse_copy
Creates a copy of a matrix handle.
Syntax
sparse_status_t mkl_sparse_copy (const sparse_matrix_t source, const struct
matrix_descr descr, sparse_matrix_t *dest);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_copy routine creates a copy of a matrix handle.
NOTE
Currently, the mkl_sparse_copy routine does not support the descriptor argument and
creates an exact (deep) copy of the input matrix.
Input Parameters
284
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
SPARSE_FILL_MODE_LOWE The lower triangular matrix part is processed.
R
Output Parameters
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
mkl_sparse_destroy
Frees memory allocated for matrix handle.
Syntax
sparse_status_t mkl_sparse_destroy (sparse_matrix_t A);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_destroy routine frees memory allocated for matrix handle.
NOTE
You must free memory allocated for matrices after completing use of them. The mkl_sparse_destroy
routine provides a utility to do so.
Input Parameters
285
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
mkl_sparse_convert_csr
Converts internal matrix representation to CSR
format.
Syntax
sparse_status_t mkl_sparse_convert_csr (const sparse_matrix_t source, const
sparse_operation_t operation, sparse_matrix_t *dest);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_convert_csr routine converts internal matrix representation to CSR format.
When the source matrix is in COO format, the routine performs a sum reduction on duplicate elements.
Input Parameters
Output Parameters
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
286
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
SPARSE_STATUS_NOT_INITIALIZED The routine encountered an empty handle or matrix array.
mkl_sparse_convert_bsr
Converts internal matrix representation to BSR format
or changes BSR block size.
Syntax
sparse_status_t mkl_sparse_convert_bsr (const sparse_matrix_t source, const MKL_INT
block_size, const sparse_layout_t block_layout, const sparse_operation_t operation,
sparse_matrix_t *dest);
Include Files
• mkl_spblas.h
Description
Themkl_sparse_convert_bsr routine converts internal matrix representation to BSR format or changes
BSR block size.
When the source matrix is in COO format, the routine performs a sum reduction on duplicate elements.
Input Parameters
287
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
mkl_sparse_?_export_csr
Exports CSR matrix from internal representation.
Syntax
sparse_status_t mkl_sparse_s_export_csr (const sparse_matrix_t source,
sparse_index_base_t *indexing, MKL_INT *rows, MKL_INT *cols, MKL_INT **rows_start,
MKL_INT **rows_end, MKL_INT **col_indx, float **values);
sparse_status_t mkl_sparse_d_export_csr (const sparse_matrix_t source,
sparse_index_base_t *indexing, MKL_INT *rows, MKL_INT *cols, MKL_INT **rows_start,
MKL_INT **rows_end, MKL_INT **col_indx, double **values);
sparse_status_t mkl_sparse_c_export_csr (const sparse_matrix_t source,
sparse_index_base_t *indexing, MKL_INT *rows, MKL_INT *cols, MKL_INT **rows_start,
MKL_INT **rows_end, MKL_INT **col_indx, MKL_Complex8 **values);
sparse_status_t mkl_sparse_z_export_csr (const sparse_matrix_t source,
sparse_index_base_t *indexing, MKL_INT *rows, MKL_INT *cols, MKL_INT **rows_start,
MKL_INT **rows_end, MKL_INT **col_indx, MKL_Complex16 **values);
Include Files
• mkl_spblas.h
Description
If the matrix specified by the source handle is in CSR format, the mkl_sparse_?_export_csr routine
exports an m-by-k matrix A in CSR format matrix from the internal representation. The routine returns
pointers to the internal representation and does not allocate additional memory.
If the matrix is not already in CSR format, the routine returns SPARSE_STATUS_INVALID_VALUE.
Input Parameters
288
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
rows_start Pointer to array of length m. This array contains row indices, such that
rows_start[i] - ind is the first index of row i in the arrays values and
col_indx. ind takes 0 for zero-based indexing and 1 for one-based
indexing.
Refer to pointerB array description in CSR Format for more details.
rows_end Pointer to array of length m. This array contains row indices, such that
rows_end[i] - ind - 1 is the last index of row i in the arrays values and
col_indx. ind takes 0 for zero-based indexing and 1 for one-based
indexing.
Refer to pointerE array description in CSR Format for more details.
col_indx For one-based indexing, pointer to array containing the column indices plus
one for each non-zero element of the matrix source. For zero-based
indexing, pointer to array containing the column indices for each non-zero
element of the matrix source. Its length is rows_end[rows - 1] - ind.
ind takes 0 for zero-based indexing and 1 for one-based indexing.
values Pointer to array containing non-zero elements of the matrix A. Its length is
equal to length of the col_indx array.
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
mkl_sparse_?_export_csc
Exports CSC matrix from internal representation.
289
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
sparse_status_t mkl_sparse_s_export_csc (const sparse_matrix_t source,
sparse_index_base_t *indexing, MKL_INT *rows, MKL_INT *cols, MKL_INT **cols_start,
MKL_INT **cols_end, MKL_INT **row_indx, float **values);
sparse_status_t mkl_sparse_d_export_csc (const sparse_matrix_t source,
sparse_index_base_t *indexing, MKL_INT *rows, MKL_INT *cols, MKL_INT **cols_start,
MKL_INT **cols_end, MKL_INT **row_indx, double **values);
sparse_status_t mkl_sparse_c_export_csc (const sparse_matrix_t source,
sparse_index_base_t *indexing, MKL_INT *rows, MKL_INT *cols, MKL_INT **cols_start,
MKL_INT **cols_end, MKL_INT **row_indx, MKL_Complex8 **values);
sparse_status_t mkl_sparse_z_export_csc (const sparse_matrix_t source,
sparse_index_base_t *indexing, MKL_INT *rows, MKL_INT *cols, MKL_INT **cols_start,
MKL_INT **cols_end, MKL_INT **row_indx, MKL_Complex16 **values);
Include Files
• mkl_spblas.h
Description
If the matrix specified by the source handle is in CSC format, the mkl_sparse_?_export_csc routine
exports an m-by-k matrix A in CSC format matrix from the internal representation. The routine returns
pointers to the internal representation and does not allocate additional memory.
If the matrix is not already in CSC format, the routine returns SPARSE_STATUS_INVALID_VALUE.
Input Parameters
Output Parameters
cols_start Array of length m. This array contains column indices, such that
cols_start[i] - cols_start[0] is the first index of column i in the
arrays values and row_indx.
cols_end Pointer to array of length m. This array contains row indices, such that
cols_end[i] - cols_start[0] - 1 is the last index of column i in the
arrays values and row_indx.
290
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
row_indx For one-based indexing, pointer to array containing the row indices plus one
for each non-zero element of the matrix source. For zero-based indexing,
pointer to array containing the row indices for each non-zero element of the
matrix source. Its length is cols_end[cols - 1] - cols_start[0].
values Pointer to array containing non-zero elements of the matrix A. Its length is
equal to length of the row_indx array.
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
mkl_sparse_?_export_bsr
Exports BSR matrix from internal representation.
Syntax
sparse_status_t mkl_sparse_s_export_bsr (const sparse_matrix_t source,
sparse_index_base_t *indexing, sparse_layout_t *block_layout, MKL_INT *rows, MKL_INT
*cols, MKL_INT *block_size, MKL_INT **rows_start, MKL_INT **rows_end, MKL_INT
**col_indx, float **values);
sparse_status_t mkl_sparse_d_export_bsr (const sparse_matrix_t source,
sparse_index_base_t *indexing, sparse_layout_t *block_layout, MKL_INT *rows, MKL_INT
*cols, MKL_INT *block_size, MKL_INT **rows_start, MKL_INT **rows_end, MKL_INT
**col_indx, double **values);
sparse_status_t mkl_sparse_c_export_bsr (const sparse_matrix_t source,
sparse_index_base_t *indexing, sparse_layout_t *block_layout, MKL_INT *rows, MKL_INT
*cols, MKL_INT *block_size, MKL_INT **rows_start, MKL_INT **rows_end, MKL_INT
**col_indx, MKL_Complex8 **values);
sparse_status_t mkl_sparse_z_export_bsr (const sparse_matrix_t source,
sparse_index_base_t *indexing, sparse_layout_t *block_layout, MKL_INT *rows, MKL_INT
*cols, MKL_INT *block_size, MKL_INT **rows_start, MKL_INT **rows_end, MKL_INT
**col_indx, MKL_Complex16 **values);
Include Files
• mkl_spblas.h
291
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Description
If the matrix specified by the source handle is in BSR format, the mkl_sparse_?_export_bsr routine
exports an (block_size * rows)-by-(block_size * cols) matrix A in BSR format from the internal
representation. The routine returns pointers to the internal representation and does not allocate additional
memory.
If the matrix is not already in BSR format, the routine returns SPARSE_STATUS_INVALID_VALUE.
Input Parameters
Output Parameters
rows_start Pointer to array of length rows. This array contains row indices, such that
rows_start[i] - ind is the first index of block row i in the arrays values
and col_indx. ind takes 0 for zero-based indexing and 1 for one-based
indexing.
Refer to pointerB array description in BSR Format for more details.
rows_end Pointer to array of length rows. This array contains row indices, such that
rows_end[i] - ind - 1 is the last index of block row i in the arrays values
and col_indx. ind takes 0 for zero-based indexing and 1 for one-based
indexing.
Refer to pointerE array description in BSR Format for more details.
col_indx For one-based indexing, pointer to array containing the column indices plus
one for each non-zero blocks of the matrix source. For zero-based indexing,
pointer to array containing the column indices for each non-zero blocks of
the matrix source. Its length is rows_end[rows - 1] - ind[0]. ind takes
0 for zero-based indexing and 1 for one-based indexing.
292
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
values Pointer to array containing non-zero elements of matrix source. Its length is
equal to length of the col_indx array multiplied by
block_size*block_size.
Refer to the values array description in BSR Format for more details.
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
mkl_sparse_?_set_value
Changes a single value of matrix in internal
representation.
Syntax
sparse_status_t mkl_sparse_s_set_value (const sparse_matrix_t A, const MKL_INT row,
const MKL_INT col, const float value);
sparse_status_t mkl_sparse_d_set_value (const sparse_matrix_t A, const MKL_INT row,
const MKL_INT col, const double value);
sparse_status_t mkl_sparse_c_set_value (const sparse_matrix_t A, const MKL_INT row,
const MKL_INT col, const MKL_Complex8 value);
sparse_status_t mkl_sparse_z_set_value (const sparse_matrix_t A, const MKL_INT row,
const MKL_INT col, const MKL_Complex16 value);
Include Files
• mkl_spblas.h
Description
Use the mkl_sparse_?_set_value routine to change a single value of a matrix in the internal Inspector-
executor Sparse BLAS format. The value should already be presented in a matrix structure.
Input Parameters
293
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
mkl_sparse_?_update_values
Changes all or selected matrix values in internal
representation.
Syntax
NOTE
This routine is supported for sparse matrices in BSR format only.
Include Files
• mkl_spblas.h
Description
Use the mkl_sparse_?_update_values routine to change all or selected values of a matrix in the internal
Inspector-Executor Sparse BLAS format.
The values to be updated should already be present in the matrix structure.
• To change selected values, you must provide an array values (with new values) and also the
corresponding row and column indices for each value via indx and indy arrays as well as the overall
number of changed elements nvalues.
So that, for example, to change A(0, 0) to 1 and A(0, 1) to 2, pass the following input parameters:
nvalues = 2, indx = {0, 0}, indy = {0, 1} and values = {1, 2}.
• To change all the values in the matrix, provide the values array and explicitly set nvalues to 0 or the
actual number of non zero elements. There is no need to supply indx and indy arrays.
294
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
NOTE
Currently, only updating the full matrix is supported. Set indx
and indy as NULL.
NOTE
Currently, only updating the full matrix is supported. Set indx
and indy as NULL.
Output Parameters
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
mkl_sparse_order
Performs ordering of column indexes of the matrix in
CSR format
Syntax
sparse_status_t mkl_sparse_order (const sparse_matrix_t csrA);
Include Files
• mkl_spblas.h
Description
Use the mkl_sparse_order routine to perform ordering of column indexes of the matrix in CSR format.
295
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
Output Parameters
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
mkl_sparse_set_lu_smoot Provides and estimate of the number and type of upcoming calls to LU
her_hint smoother functionality.
mkl_sparse_set_sv_hint Provides estimate of number and type of upcoming triangular system solver
operations.
mkl_sparse_set_sm_hint Provides estimate of number and type of upcoming triangular matrix solve
with multiple right hand sides operations.
mkl_sparse_set_dotmv_h Sets estimate of the number and type of upcoming matrix-vector operations.
int
mkl_sparse_optimize Analyzes matrix structure and performs optimizations using the hints
provided in the handle.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
296
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
mkl_sparse_set_lu_smoother_hint
Provides an estimate of the number and type of
upcoming calls to LU smoother functionality.
Syntax
sparse_status_t mkl_sparse_set_lu_smoother_hint (sparse_matrix_t A, const
sparse_operation_t operation, struct matrix_descr descr, MKL_INT expected_calls);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_set_lu_smoother_hint function provides subsequent Inspector-Executor Sparse BLAS
calls an estimate of the number of upcoming calls to the lu_smoother routine that ultimately may influence
the optimizations applied and specifies whether or not to perform an operation on the matrix.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
297
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
mkl_sparse_set_mv_hint
Provides estimate of number and type of upcoming
matrix-vector operations.
Syntax
sparse_status_t mkl_sparse_set_mv_hint (const sparse_matrix_t A, const
sparse_operation_t operation, const struct matrix_descr descr, const MKL_INT
expected_calls);
Include Files
• mkl_spblas.h
Description
Use the mkl_sparse_set_mv_hint routine to provide the Inspector-executor Sparse BLAS API an estimate
of the number of upcoming matrix-vector multiplication operations for performance optimization, and specify
whether or not to perform an operation on the matrix.
298
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
299
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
mkl_sparse_set_sv_hint
Provides estimate of number and type of upcoming
triangular system solver operations.
Syntax
sparse_status_t mkl_sparse_set_sv_hint (const sparse_matrix_t A, const
sparse_operation_t operation, const struct matrix_descr descr, const MKL_INT
expected_calls);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_set_sv_hint routine provides an estimate of the number of upcoming triangular system
solver operations and type of these operations for performance optimization.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
300
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
T
SPARSE_OPERATION_TRAN Transpose, op(A) = A .
SPOSE
H
SPARSE_OPERATION_CONJ Conjugate transpose, op(A) = A .
UGATE_TRANSPOSE
Output Parameters
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
301
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
mkl_sparse_set_mm_hint
Provides estimate of number and type of upcoming
matrix-matrix multiplication operations.
Syntax
sparse_status_t mkl_sparse_set_mm_hint (const sparse_matrix_t A, const
sparse_operation_t operation, const struct matrix_descr descr, const sparse_layout_t
layout, const MKL_INT dense_matrix_size, const MKL_INT expected_calls);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_set_mm_hint routine provides an estimate of the number of upcoming matrix-matrix
multiplication operations and type of these operations for performance optimization purposes.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
302
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
SPARSE_MATRIX_TYPE_HE The matrix is Hermitian (only the requested
RMITIAN triangle is processed).
Output Parameters
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
303
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
mkl_sparse_set_sm_hint
Provides estimate of number and type of upcoming
triangular matrix solve with multiple right hand sides
operations.
Syntax
sparse_status_t mkl_sparse_set_sm_hint (const sparse_matrix_t A, const
sparse_operation_t operation, const struct matrix_descr descr, const sparse_layout_t
layout, const MKL_INT dense_matrix_size, const MKL_INT expected_calls);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_set_sm_hint routine provides an estimate of the number of upcoming triangular matrix
solve with multiple right hand sides operations and type of these operations for performance optimization
purposes.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
304
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
SPARSE_MATRIX_TYPE_TR The matrix is triangular (only the requested
IANGULAR triangle is processed).
Output Parameters
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
305
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
mkl_sparse_set_dotmv_hint
Sets estimate of the number and type of upcoming
matrix-vector operations.
Syntax
sparse_status_t mkl_sparse_set_dotmv_hint (const sparse_matrix_t A, const
sparse_operation_t operation, const struct matrix_descr descr, const MKL_INT
expected_calls);
Include Files
• mkl_spblas.h
Description
Use the mkl_sparse_set_dotmv_hint routine to provide the Inspector-executor Sparse BLAS API an
estimate of the number of upcoming matrix-vector multiplication operations for performance optimization,
and specify whether or not to perform an operation on the matrix.
Input Parameters
306
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
sparse_fill_mode_t mode - Specifies the triangular matrix part for
symmetric, Hermitian, triangular, and block-triangular matrices:
Output Parameters
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
mkl_sparse_set_symgs_hint
Syntax
Sets estimate of number and type of upcoming mkl_sparse_?_symgs operations.
sparse_status_t mkl_sparse_set_symgs_hint (const sparse_matrix_t A, const
sparse_operation_t operation, const struct matrix_descr descr, const MKL_INT
expected_calls);
Include Files
• mkl_spblas.h
Description
Use the mkl_sparse_set_symgs_hint routine to provide the Inspector-executor Sparse BLAS API an
estimate of the number of upcoming symmetric Gauss-Zeidel preconditioner operations for performance
optimization, and specify whether or not to perform an operation on the matrix.
307
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
mode Specifies the triangular matrix part for symmetric, Hermitian, triangular,
and block-triangular matrices.
308
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
mkl_sparse_set_sorv_hint
Sets an estimate of the number and type of upcoming
mkl_sparse_?_sorv operations.
Syntax
sparse_status_t mkl_sparse_set_sorv_hint(
const sparse_sor_type_t type,
const sparse_matrix_t A,
const struct matrix_descr descr,
const MKL_INT expected_calls
);
Include Files
• mkl_spblas.h
Description
Use the mkl_sparse_set_sorv_hint routine to provide the Inspector-Executor Sparse BLAS API an
estimate of the number of upcoming forward/backward sweeps or symmetric SOR preconditioner operations
for performance optimization.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
309
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
• SPARSE_FILL_MODE_LOWER
The lower triangular matrix part is processed.
• SPARSE_FILL_MODE_UPPER
The upper triangular matrix part is
processed.
310
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• SPARSE_DIAG_NON_UNIT
Diagonal elements might not be equal to
one.
• SPARSE_DIAG_UNIT
Diagonal elements are equal to one.
Output Parameters
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
mkl_sparse_set_memory_hint
Provides memory requirements for performance
optimization purposes.
Syntax
sparse_status_t mkl_sparse_set_memory_hint (const sparse_matrix_t A, const
sparse_memory_usage_t policy);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_set_memory_hint routine allocates additional memory for further performance
optimization purposes.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
311
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
policy Specify memory utilization policy for optimization routine using these types:
SPARSE_MEMORY_AGGRESS Default.
IVE Routine can allocate memory up to the size of
matrix A for converting into the appropriate
sparse format.
Output Parameters
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
mkl_sparse_optimize
Analyzes matrix structure and performs optimizations
using the hints provided in the handle.
Syntax
sparse_status_t mkl_sparse_optimize (sparse_matrix_t A);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_optimize routine analyzes matrix structure and performs optimizations using the hints
provided in the handle. Generally, specifying a higher number of expected operations allows for more
aggressive and time consuming optimizations.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
312
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Product and Performance Information
Input Parameters
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
313
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
mkl_sparse_?_lu_smoother
Computes an action of a preconditioner which
corresponds to the approximate matrix decomposition
A ≈ L + D × E × U + D for the system Ax = b (see
description below).
Syntax
sparse_status_t mkl_sparse_s_lu_smoother (const sparse_operation_t op, const
sparse_matrix_t A, const struct matrix descr descr, const float *diag, const float
*approx_diag_inverse, float *x, const float *b);
sparse_status_t mkl_sparse_d_lu_smoother (const sparse_operation_t op, const
sparse_matrix_t A, const struct matrix descr descr, const double *diag, const double
*approx_diag_inverse, double *x, const double *b);
sparse_status_t mkl_sparse_c_lu_smoother (const sparse_operation_t op, const
sparse_matrix_t A, const struct matrix descr descr, const MKL_COMPLEX8 *diag, const
MKL_COMPLEX8 *approx_diag_inverse, MKL_COMPLEX8 *x, const MKL_COMPLEX8 *b);
sparse_status_t mkl_sparse_z_lu_smoother (const sparse_operation_t op, const
sparse_matrix_t A, const struct matrix descr descr, const MKL_COMPLEX16 *diag, const
MKL_COMPLEX16 *approx_diag_inverse, MKL_COMPLEX16 *x, const MKL_COMPLEX16 *b);
Include Files
• mkl_spblas.h
314
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
This routine computes an update for an iterative solution x of the system Ax=b by means of applying one
iteration of an approximate preconditioner which is based on the following approximation:
A L + D * E * U + D , where E is an approximate inverse of the diagonal (using exact inverse will result in
Gauss-Seidel preconditioner), L and U are lower/upper triangular parts of A, D is the diagonal (block diagonal
in case of BSR format) of A.
The mkl_sparse_?_lu_smoother routine performs these operations:
NOTE
This routine is supported only for non-transpose operation, real data types, and CSR/BSR
sparse formats. In a BSR format, both diagonal values and approximate diagonal inverse
arrays should be passed explicitly. For CSR format, diagonal values should be passed
explicitly.
Input Parameters
SPARSE_OPERATION_NON_
TRANSPOSE, op(A) := A NOTE
Transpose and conjugate transpose
(SPARSE_OPERATION_TRANSPOSE and
SPARSE_OPERATION_CONJUGATE_TRANSPOSE)
are not supported.
Non-transpose, op(A)= A.
315
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
Only SPARSE_MATRIX_TYPE_GENERAL is supported.
diag Array of size at least m, where m is the number of rows (or nrows *
block_size * block_size in case of BSR format) of matrix A.
The array diag must contain the diagonal values of matrix A.
approx_diag_inverse Array of size at least m, where m is the number of rows (or the number of
rows * block_size * block_size in case of BSR format) of matrix A.
The array approx_diag_inverse will be used as E, approximate inverse of
the diagonal of the matrix A.
Output Parameters
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
316
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
SPARSE_STATUS_SUCCESS The operation was successful.
mkl_sparse_?_mv
Computes a sparse matrix- vector product.
Syntax
sparse_status_t mkl_sparse_s_mv (const sparse_operation_t operation, const float alpha,
const sparse_matrix_t A, const struct matrix_descr descr, const float *x, const float
beta, float *y);
sparse_status_t mkl_sparse_d_mv (const sparse_operation_t operation, const double
alpha, const sparse_matrix_t A, const struct matrix_descr descr, const double *x, const
double beta, double *y);
sparse_status_t mkl_sparse_c_mv (const sparse_operation_t operation, const MKL_Complex8
alpha, const sparse_matrix_t A, const struct matrix_descr descr, const MKL_Complex8 *x,
const MKL_Complex8 beta, MKL_Complex8 *y);
sparse_status_t mkl_sparse_z_mv (const sparse_operation_t operation, const
MKL_Complex16 alpha, const sparse_matrix_t A, const struct matrix_descr descr, const
MKL_Complex16 *x, const MKL_Complex16 beta, MKL_Complex16 *y);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_?_mv routine computes a sparse matrix-dense vector product defined as
y := alpha*op(A)*x + beta*y
where:
alpha and beta are scalars, x and y are vectors, and A is a sparse matrix handle of a matrix with m rows and
k columns, and op is a matrix modifier for matrix A.
Input Parameters
317
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
H
SPARSE_OPERATION_CONJ Conjugate transpose, op(A) = A .
UGATE_TRANSPOSE
318
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
number of rows, m of A if operation =
SPARSE_OPERATION_NON_TRANSPOSE and at least the number of columns,
k, of A otherwise. On entry, the array y must contain the vector y.
Output Parameters
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
mkl_sparse_?_trsv
Solves a system of linear equations for a triangular
sparse matrix.
Syntax
sparse_status_t mkl_sparse_s_trsv (const sparse_operation_t operation, const float
alpha, const sparse_matrix_t A, const struct matrix_descr descr, const float *x, float
*y);
sparse_status_t mkl_sparse_d_trsv (const sparse_operation_t operation, const double
alpha, const sparse_matrix_t A, const struct matrix_descr descr, const double *x,
double *y);
sparse_status_t mkl_sparse_c_trsv (const sparse_operation_t operation, const
MKL_Complex8 alpha, const sparse_matrix_t A, const struct matrix_descr descr, const
MKL_Complex8 *x, MKL_Complex8 *y);
sparse_status_t mkl_sparse_z_trsv (const sparse_operation_t operation, const
MKL_Complex16 alpha, const sparse_matrix_t A, const struct matrix_descr descr, const
MKL_Complex16 *x, MKL_Complex16 *y);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_?_trsv routine solves a system of linear equations for a matrix:
op(A)*y = alpha * x
where A is a triangular sparse matrix , op is a matrix modifier for matrix A, alpha is a scalar, and x and y are
vectors .
319
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
For sparse matrices in the BSR format, the supported combinations of
(indexing,block_layout) are:
• (SPARSE_INDEX_BASE_ZERO, SPARSE_LAYOUT_ROW_MAJOR)
• (SPARSE_INDEX_BASE_ONE, SPARSE_LAYOUT_COLUMN_MAJOR)
Input Parameters
320
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
sparse_diag_type_t diag - Specifies diagonal type for non-general
matrices:
Output Parameters
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
mkl_sparse_?_mm
Computes the product of a sparse matrix and a dense
matrix and stores the result as a dense matrix.
Syntax
sparse_status_t mkl_sparse_s_mm (const sparse_operation_t operation, const float alpha,
const sparse_matrix_t A, const struct matrix_descr descr, const sparse_layout_t layout,
const float *B, const MKL_INT columns, const MKL_INT ldb, const float beta, float *C,
const MKL_INT ldc);
sparse_status_t mkl_sparse_d_mm (const sparse_operation_t operation, const double
alpha, const sparse_matrix_t A, const struct matrix_descr descr, const sparse_layout_t
layout, const double *B, const MKL_INT columns, const MKL_INT ldb, const double beta,
double *C, const MKL_INT ldc);
sparse_status_t mkl_sparse_c_mm (const sparse_operation_t operation, const MKL_Complex8
alpha, const sparse_matrix_t A, const struct matrix_descr descr, const sparse_layout_t
layout, const MKL_Complex8 *B, const MKL_INT columns, const MKL_INT ldb, const
MKL_Complex8 beta, MKL_Complex8 *C, const MKL_INT ldc);
sparse_status_t mkl_sparse_z_mm (const sparse_operation_t operation, const
MKL_Complex16 alpha, const sparse_matrix_t A, const struct matrix_descr descr, const
sparse_layout_t layout, const MKL_Complex16 *B, const MKL_INT columns, const MKL_INT
ldb, const MKL_Complex16 beta, MKL_Complex16 *C, const MKL_INT ldc);
321
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl_spblas.h
Description
The mkl_sparse_?_mm routine performs a matrix-matrix operation:
C := alpha*op(A)*B + beta*C
where alpha and beta are scalars, A is a sparse matrix, op is a matrix modifier for matrix A, and B and C are
dense matrices.
The mkl_sparse_?_mm and mkl_sparse_?_trsm routines support these configurations:
NOTE
For sparse matrices in the BSR format, the supported combinations of
(indexing,block_layout) are:
• (SPARSE_INDEX_BASE_ZERO, SPARSE_LAYOUT_ROW_MAJOR )
• (SPARSE_INDEX_BASE_ONE, SPARSE_LAYOUT_COLUMN_MAJOR )
Input Parameters
322
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
SPARSE_MATRIX_TYPE_SY The matrix is symmetric (only the requested
MMETRIC triangle is processed).
layout = layout =
SPARSE_LAYOUT_COLU SPARSE_LAYOUT_ROW_MA
MN_MAJOR JOR
323
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
layout = layout =
SPARSE_LAYOUT_COLU SPARSE_LAYOUT_ROW_MA
MN_MAJOR JOR
Output Parameters
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
mkl_sparse_?_trsm
Solves a system of linear equations with multiple right
hand sides for a triangular sparse matrix.
Syntax
sparse_status_t mkl_sparse_s_trsm (const sparse_operation_t operation, const float
alpha, const sparse_matrix_t A, const struct matrix_descr descr, const sparse_layout_t
layout, const float *x, const MKL_INT columns, const MKL_INT ldx, float *y, const
MKL_INT ldy);
sparse_status_t mkl_sparse_d_trsm (const sparse_operation_t operation, const double
alpha, const sparse_matrix_t A, const struct matrix_descr descr, const sparse_layout_t
layout, const double *x, const MKL_INT columns, const MKL_INT ldx, double *y, const
MKL_INT ldy);
324
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
sparse_status_t mkl_sparse_c_trsm (const sparse_operation_t operation, const
MKL_Complex8 alpha, const sparse_matrix_t A, const struct matrix_descr descr, const
sparse_layout_t layout, const MKL_Complex8 *x, const MKL_INT columns, const MKL_INT
ldx, MKL_Complex8 *y, const MKL_INT ldy);
sparse_status_t mkl_sparse_z_trsm (const sparse_operation_t operation, const
MKL_Complex16 alpha, const sparse_matrix_t A, const struct matrix_descr descr, const
sparse_layout_t layout, const MKL_Complex16 *x, const MKL_INT columns, const MKL_INT
ldx, MKL_Complex16 *y, const MKL_INT ldy);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_?_trsm routine solves a system of linear equations with multiple right hand sides for a
triangular sparse matrix:
Y := alpha*inv(op(A))*X
where:
alpha is a scalar, X and Y are dense matrices, A is a sparse matrix, and op is a matrix modifier for matrix A.
NOTE
For sparse matrices in the BSR format, the supported combinations of
(indexing,block_layout) are:
• (SPARSE_INDEX_BASE_ZERO, SPARSE_LAYOUT_ROW_MAJOR )
• (SPARSE_INDEX_BASE_ONE, SPARSE_LAYOUT_COLUMN_MAJOR )
Input Parameters
325
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
layout = layout =
SPARSE_LAYOUT_COLU SPARSE_LAYOUT_ROW_MA
MN_MAJOR JOR
326
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
rows (number of ldx number of rows in A
rows in x)
layout = layout =
SPARSE_LAYOUT_COLU SPARSE_LAYOUT_ROW_MA
MN_MAJOR JOR
Output Parameters
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
mkl_sparse_?_add
Computes the sum of two sparse matrices. The result
is stored in a newly allocated sparse matrix.
Syntax
sparse_status_t mkl_sparse_s_add (const sparse_operation_t operation, const
sparse_matrix_t A, const float alpha, const sparse_matrix_t B, sparse_matrix_t *C);
sparse_status_t mkl_sparse_d_add (const sparse_operation_t operation, const
sparse_matrix_t A, const double alpha, const sparse_matrix_t B, sparse_matrix_t *C);
327
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl_spblas.h
Description
The mkl_sparse_?_add routine performs a matrix-matrix operation:
C := alpha*op(A) + B
where alpha is a scalar, op is a matrix modifier, and A, B, and C are sparse matrices.
NOTE
This routine is only supported for sparse matrices in CSR and BSR formats. It is not
supported for COO or CSC formats.
Input Parameters
Output Parameters
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
328
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.
mkl_sparse_spmm
Computes the product of two sparse matrices. The
result is stored in a newly allocated sparse matrix.
Syntax
sparse_status_t mkl_sparse_spmm (const sparse_operation_t operation, const
sparse_matrix_t A, const sparse_matrix_t B, sparse_matrix_t *C);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_spmm routine performs a matrix-matrix operation:
C := op(A) *B
where A, B, and C are sparse matrices and op is a matrix modifier for matrix A.
Notes
• This routine is supported only for sparse matrices in CSC, CSR, and BSR formats. It is not
supported for sparse matrices in COO format.
• The column indices of the output matrix (if in CSR format) can appear unsorted due to the
algorithm chosen internally. To ensure sorted column indices (if that is important), call
mkl_sparse_order().
Input Parameters
Output Parameters
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
329
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
mkl_sparse_?_spmmd
Computes the product of two sparse matrices and
stores the result as a dense matrix.
Syntax
sparse_status_t mkl_sparse_s_spmmd (const sparse_operation_t operation, const
sparse_matrix_t A, const sparse_matrix_t B, const sparse_layout_t layout, float *C,
const MKL_INT ldc);
sparse_status_t mkl_sparse_d_spmmd (const sparse_operation_t operation, const
sparse_matrix_t A, const sparse_matrix_t B, const sparse_layout_t layout, double *C,
const MKL_INT ldc);
sparse_status_t mkl_sparse_c_spmmd (const sparse_operation_t operation, const
sparse_matrix_t A, const sparse_matrix_t B, const sparse_layout_t layout, MKL_Complex8
*C, const MKL_INT ldc);
sparse_status_t mkl_sparse_z_spmmd (const sparse_operation_t operation, const
sparse_matrix_t A, const sparse_matrix_t B, const sparse_layout_t layout, MKL_Complex16
*C, const MKL_INT ldc);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_?_spmmd routine performs a matrix-matrix operation:
C := op(A)*B
where A and B are sparse matrices, op is a matrix modifier for matrix A, and C is a dense matrix.
NOTE
This routine is not supported for sparse matrices in the COO format. For sparse matrices in
BSR format, these combinations of (indexing, block_layout) are supported:
• (SPARSE_INDEX_BASE_ZERO, SPARSE_LAYOUT_ROW_MAJOR)
• (SPARSE_INDEX_BASE_ONE, SPARSE_LAYOUT_COLUMN_MAJOR)
Input Parameters
330
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
SPARSE_OPERATION_NON_ Non-transpose, op(A) = A.
TRANSPOSE
T
SPARSE_OPERATION_TRAN Transpose, op(A) = A .
SPOSE
H
SPARSE_OPERATION_CONJ Conjugate transpose, op(A) = A .
UGATE_TRANSPOSE
Output Parameters
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
mkl_sparse_sp2m
Computes the product of two sparse matrices. The
result is stored in a newly allocated sparse matrix.
Syntax
sparse_status_t mkl_sparse_sp2m (const sparse_operation_t transA, const struct
matrix_descr descrA, const sparse_matrix_t A, const sparse_operation_t transB, const
struct matrix_descr descrB, const sparse_matrix_t B, const sparse_request_t request,
sparse_matrix_t *C);
Include Files
• mkl_spblas.h
331
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Description
The mkl_sparse_sp2m routine performs a matrix-matrix operation:
C := opA(A) *opB(B)
where A,B, and C are sparse matrices, opA and opB are matrix modifiers for matrices A and B, respectively.
NOTE
The column indices of the output matrix (if in CSR format) can appear unsorted due to the
algorithm chosen internally. To ensure sorted column indices (if that is important), call
mkl_sparse_order().
Input Parameters
332
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
SPARSE_MATRIX_TYPE_BLOCK_TRI The matrix is block-triangular (only
ANGULAR the requested triangle is
processed). This applies to BSR
format only.
SPARSE_MATRIX_TYPE_BLOCK_DIA The matrix is block-diagonal (only
GONAL the requested triangle is
processed). This applies to BSR
format only.
333
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
request Specifies whether the full computations are performed at once or using the
two-stage algorithm. See Two-stage Algorithm for Inspector-executor
Sparse BLAS Routines.
Output Parameters
334
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Return Values
The function returns a value indicating whether the operation was successful, or the reason why it failed.
mkl_sparse_?_sp2md
Computes the product of two sparse matrices (support
operations on both matrices) and stores the result as
a dense matrix.
Syntax
sparse_status_t mkl_sparse_s_sp2md ( const sparse_operation_t transA, const struct
matrix_descr descrA, const sparse_matrix_t A, const sparse_operation_t transB, const
struct matrix_descr descrB, const sparse_matrix_t B, const float alpha, const float
beta, float *C, const sparse_layout_t layout, const MKL_INT ldc );
sparse_status_t mkl_sparse_d_sp2md ( const sparse_operation_t transA, const struct
matrix_descr descrA, const sparse_matrix_t A, const sparse_operation_t transB, const
struct matrix_descr descrB, const sparse_matrix_t B, const double alpha, const double
beta, double *C, const sparse_layout_t layout, const MKL_INT ldc );
sparse_status_t mkl_sparse_c_sp2md ( const sparse_operation_t transA, const struct
matrix_descr descrA, const sparse_matrix_t A, const sparse_operation_t transB, const
struct matrix_descr descrB, const sparse_matrix_t B, const MKL_Complex8 alpha, const
MKL_Complex8 beta, MKL_Complex8 *C, const sparse_layout_t layout, const MKL_INT ldc );
sparse_status_t mkl_sparse_z_sp2md ( const sparse_operation_t transA, const struct
matrix_descr descrA, const sparse_matrix_t A, const sparse_operation_t transB, const
struct matrix_descr descrB, const sparse_matrix_t B, const MKL_Complex16 alpha, const
MKL_Complex16 beta, MKL_Complex16 *C, const sparse_layout_t layout, const MKL_INT
ldc );
Include Files
• mkl_spblas.h
Description
The mkl_sparse_?_sp2md routine performs a matrix-matrix operation:
335
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
This routine is not supported for sparse matrices in the COO format. For sparse matrices in
BSR format, these combinations of (indexing, block_layout) are supported:
• (SPARSE_INDEX_BASE_ZERO, SPARSE_LAYOUT_ROW_MAJOR)
• (SPARSE_INDEX_BASE_ONE, SPARSE_LAYOUT_COLUMN_MAJOR)
Input Parameters
336
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
SPARSE_FILL_MODE_UPPER The upper triangular matrix is
processed.
NOTE
Currently, only SPARSE_MATRIX_TYPE_GENERAL is supported.
337
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
Return Values
The function returns a value indicating whether the operation was successful, or the reason why it failed.
mkl_sparse_sypr
Computes the symmetric product of three sparse
matrices and stores the result in a newly allocated
sparse matrix.
338
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
sparse_status_t mkl_sparse_sypr (const sparse_operation_t operation , const
sparse_matrix_t A, const sparse_matrix_t B, const struct matrix_descr B,
sparse_matrix_t *C, const sparse_request_t request);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_sypr routine performs a multiplication of three sparse matrices that results in a symmetric
or Hermitian matrix, C.
C:=A*B*opA(A)
or
C:=opA(A)*B*A
depending on the matrix modifier operation.
Here, A, B, and C are sparse matrices, where A has a general structure while B and C are symmetric (for real
data types) or Hermitian (for complex data types) matrices. opA is the transpose (real data types) or
conjugate transpose (complex data types) operator.
NOTE
This routine is not supported for sparse matrices in COO or CSC formats. This routine
supports only CSR and BSR formats. In addition, it supports only the sorted CSR and sorted
BSR formats for the input matrix. If the data is unsorted, call the mkl_sparse_order routine
before either mkl_sparse_sypr or mkl_sparse_?_syprd.
Input Parameters
339
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
SPARSE_MATRIX_TYPE_SYMMETRIC
The matrix is symmetric (only
the specified triangle is
processed).
SPARSE_MATRIX_TYPE_HERMITIAN
The matrix is Hermitian (only
the specified triangle is
processed).
SPARSE_FILL_MODE_LOWER
The lower triangular matrix part
is processed.
SPARSE_FILL_MODE_UPPER
The upper triangular matrix part
is processed.
SPARSE_DIAG_NON_UNIT
Diagonal elements cannot be
equal to one.
NOTE
This routine also supports C=AAT,H with these parameters:
descrB.type=SPARSE_MATRIX_TYPE_DIAGONAL
descrB.diag=SPARSE_DIAG_UNIT
In this case, you do not need to allocate structure B. Use the
routine as a 2-stage version of mkl_sparse_syrk.
340
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
SPARSE_STAGE_FINALIZE_MULT_N
O_VAL. Can also be used when the
matrix structure remains unchanged
and only values of the resulting
matrix C need to be recomputed.
SPARSE_STAGE_FULL_MULT_NO_V Perform computations of the matrix
AL structure.
SPARSE_STAGE_FULL_MULT Perform the entire computation in a
single step.
Output Parameters
C Handle which contains the resulting sparse matrix. Only the upper-
triangular part of the matrix is computed.
Return Values
The function returns a value indicating whether the operation was successful, or the reason why it failed.
mkl_sparse_?_syprd
Computes the symmetric triple product of a sparse
matrix and a dense matrix and stores the result as a
dense matrix.
Syntax
sparse_status_t mkl_sparse_s_syprd (const sparse_operation_t op, const sparse_matrix_t
A, const float *B, const sparse_layout_t layoutB, const MKL_INT ldb, const float alpha,
const float beta, float *C, const sparse_layout_t layoutC, const MKL_INT ldc);
sparse_status_t mkl_sparse_d_syprd (const sparse_operation_t op, const sparse_matrix_t
A, const double *B, const sparse_layout_t layoutB, const MKL_INT ldb, const double
alpha, const double beta, double *C, const sparse_layout_t layoutC, const MKL_INT ldc);
sparse_status_t mkl_sparse_c_syprd (const sparse_operation_t op, const sparse_matrix_t
A, const MKL_Complex8 *B, const sparse_layout_t layoutB, const MKL_INT ldb, const
MKL_Complex8 alpha, const MKL_Complex8 beta, MKL_Complex8 *C, const sparse_layout_t
layoutC, const MKL_INT ldc);
sparse_status_t mkl_sparse_z_syprd (const sparse_operation_t op, const sparse_matrix_t
A, const MKL_Complex16 *B, const sparse_layout_t layoutB, const MKL_INT ldb, const
MKL_Complex16 alpha, const MKL_Complex16 beta, MKL_Complex16 *C, const sparse_layout_t
layoutC, const MKL_INT ldc);
341
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl_spblas.h
Description
The mkl_sparse_?_syprd routine performs a multiplication of three sparse matrices that results in a
symmetric or Hermitian matrix, C.
C:=alpha*A*B*op(A) + beta*C
or
C:=alpha*op(A)*B*A + beta*C
depending on the matrix modifier operation. Here A is a sparse matrix, B and C are dense and symmetric
(or Hermitian) matrices.
op is the transpose (real precision) or conjugate transpose (complex precision) operator.
NOTE
This routine is not supported for sparse matrices in COO or CSC formats. It supports only
CSR and BSR formats. In addition, this routine supports only the sorted CSR and sorted BSR
formats for the input matrix. If the data is unsorted, call the mkl_sparse_order routine
before either mkl_sparse_sypr or mkl_sparse_?_syprd.
Input Parameters
B Input dense matrix. Only the upper triangular part of the matrix is
used for computation.
342
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
denselayoutB Structure that describes the storage scheme for the dense matrix.
SPARSE_LAYOUT_COLUMN_MAJOR
Store elements in a column-
major layout.
SPARSE_LAYOUT_ROW_MAJOR
Store elements in a row-major
layout.
NOTE
Since the upper triangular part of matrix C is the only
portion that is processed, set real values of alpha and beta
in the complex case to obtain the Hermitian matrix.
denselayoutC Structure that describes the storage scheme for the dense matrix.
SPARSE_LAYOUT_COLUMN_MAJOR
Store elements in a column-
major layout.
SPARSE_LAYOUT_ROW_MAJOR
Store elements in a row-major
layout.
Output Parameters
C Handle which contains the resulting dense matrix. Only the upper-
triangular part of the matrix is computed.
Return Values
The function returns a value indicating whether the operation was successful, or the reason why it failed.
mkl_sparse_?_symgs
Computes a symmetric Gauss-Seidel preconditioner.
343
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
sparse_status_t mkl_sparse_s_symgs (const sparse_operation_t operation, const
sparse_matrix_t A, const struct matrix_descr descr, const float alpha, const float *b,
float *x);
sparse_status_t mkl_sparse_d_symgs (const sparse_operation_t operation, const
sparse_matrix_t A, const struct matrix_descr descr, const double alpha, const double
*b, double *x);
sparse_status_t mkl_sparse_c_symgs (const sparse_operation_t operation, const
sparse_matrix_t A, const struct matrix_descr descr, const MKL_Complex8 alpha, const
MKL_Complex8 *b, MKL_Complex8 *x);
sparse_status_t mkl_sparse_z_symgs (const sparse_operation_t operation, const
sparse_matrix_t A, const struct matrix_descr descr, const MKL_Complex16 alpha, const
MKL_Complex16 *b, MKL_Complex16 *x);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_?_symgs routine performs this operation:
x0 := x*alpha;
(L + D)*x1 = b - U*x0;
(U + D)*x = b - L*x1;
where A = L + D + U.
NOTE
This routine is not supported for sparse matrices in BSR, COO, or CSC formats. It supports
only the CSR format. Additionally, only symmetric matrices are supported, so the desc.type
must be SPARSE_MATRIX_TYPE_SYMMETRIC.
Input Parameters
NOTE
Transpose (SPARSE_OPERATION_TRANSPOSE) and conjugate
transpose (SPARSE_OPERATION_CONJUGATE_TRANSPOSE) are not
supported.
344
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
SPARSE_MATRIX_TYPE_GE The matrix is processed as is.
NERAL
Output Parameters
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
345
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
mkl_sparse_?_symgs_mv
Computes a symmetric Gauss-Seidel preconditioner
followed by a matrix-vector multiplication.
Syntax
sparse_status_t mkl_sparse_s_symgs_mv (const sparse_operation_t operation, const
sparse_matrix_t A, const struct matrix_descr descr, const float alpha, const float *b,
float *x, float *y);
sparse_status_t mkl_sparse_d_symgs_mv (const sparse_operation_t operation, const
sparse_matrix_t A, const struct matrix_descr descr, const double alpha, const double
*b, double *x, double *y);
sparse_status_t mkl_sparse_c_symgs_mv (const sparse_operation_t operation, const
sparse_matrix_t A, const struct matrix_descr descr, const MKL_Complex8 alpha, const
MKL_Complex8 *b, MKL_Complex8 *x, MKL_Complex8 *y);
sparse_status_t mkl_sparse_z_symgs_mv (const sparse_operation_t operation, const
sparse_matrix_t A, const struct matrix_descr descr, const MKL_Complex16 alpha, const
MKL_Complex16 *b, MKL_Complex16 *x, MKL_Complex16 *y);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_?_symgs_mv routine performs this operation:
x0 := x*alpha;
(L + D)*x1 = b - U*x0;
(U + D)*x = b - L*x1;
y := A*x
where A = L + D + U
NOTE
This routine is not supported for sparse matrices in BSR, COO, or CSC formats. It supports
only the CSR format. Additionally, only symmetric matrices are supported, so the desc.type
must be SPARSE_MATRIX_TYPE_SYMMETRIC.
Input Parameters
346
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
NOTE
Transpose (SPARSE_OPERATION_TRANSPOSE) and conjugate
transpose (SPARSE_OPERATION_CONJUGATE_TRANSPOSE) are not
supported.
347
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
mkl_sparse_syrk
Computes the product of sparse matrix with its
transpose (or conjugate transpose) and stores the
result in a newly allocated sparse matrix.
Syntax
sparse_status_t mkl_sparse_syrk (const sparse_operation_t operation, const
sparse_matrix_t A, sparse_matrix_t *C);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_syrk routine performs a sparse matrix-matrix operation which results in a sparse matrix C
that is either Symmetric (real) or Hermitian (complex):
C := A*op(A)
where op(*) is the transpose for real matrices and conjugate transpose for complex matrices OR
C := op(A)*A
depending on the matrix modifier op which can be the transpose for real matrices or conjugate transpose for
complex matrices.
Here, A and C are sparse matrices.
NOTE This routine is not supported for sparse matrices in COO or CSC formats. It supports
only CSR and BSR formats. Additionally, this routine supports only the sorted CSR and
sorted BSR formats for the input matrix. If data is unsorted, call the mkl_sparse_order
routine before either mkl_sparse_syrk or mkl_sparse_?_syrkd.
348
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
Output Parameters
C Handle which contains the resulting sparse matrix. Only the upper-
triangular part of the matrix is computed.
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
mkl_sparse_?_syrkd
Computes the product of sparse matrix with its
transpose (or conjugate transpose) and stores the
result as a dense matrix.
Syntax
sparse_status_t mkl_sparse_s_syrkd (sparse_operation_t operation, const sparse_matrix_t
A, float alpha, float beta, float *C, sparse_layout_t layout, MKL_INT ldc);
sparse_status_t mkl_sparse_d_syrkd (sparse_operation_t operation, const sparse_matrix_t
A, double alpha, double beta, double *C, sparse_layout_t layout, MKL_INT ldc);
sparse_status_t mkl_sparse_c_syrkd (sparse_operation_t operation, const sparse_matrix_t
A, const MKL_Complex8 alpha, MKL_Complex8 beta, MKL_Complex8 *C, sparse_layout_t
layout, MKL_INT ldc);
sparse_status_t mkl_sparse_z_syrkd (sparse_operation_t operation, const sparse_matrix_t
A, MKL_Complex16 alpha, MKL_Complex16 beta, MKL_Complex16 *C, sparse_layout_t layout,
const MKL_INT ldc);
349
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl_spblas.h
Description
The mkl_sparse_?_syrkd routine performs a sparse matrix-matrix operation which results in a dense
matrix C that is either symmetric (real case) or Hermitian (complex case):
C := beta*C + alpha*A*op(A)
or
C := beta*C + alpha*op(A)*A
depending on the matrix modifier op which can be the transpose for real matrices or conjugate transpose for
complex matrices. Here, A is a sparse matrix and C is a dense matrix.
NOTE This routine is not supported for sparse matrices in COO or CSC formats. It supports
only CSR and BSR formats. Additionally, this routine supports only the sorted CSR and
sorted BSR formats for the input matrix. If data is unsorted, call the mkl_sparse_order
routine before either mkl_sparse_syrk or mkl_sparse_?_syrkd.
Input Parameters
NOTE
Only the upper triangular part of matrix C is processed. Therefore, you must set real values
of alpha and beta for complex matrices in order to obtain a Hermitian matrix.
350
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
C Resulting dense matrix. Only the upper triangular part of the matrix is
computed.
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
mkl_sparse_?_dotmv
Computes a sparse matrix-vector product followed by
a dot product.
Syntax
sparse_status_t mkl_sparse_s_dotmv (const sparse_operation_t operation, const float
alpha, const sparse_matrix_t A, const struct matrix_descr descr, const float *x, const
float beta, float *y, float *d);
sparse_status_t mkl_sparse_d_dotmv (const sparse_operation_t operation, const double
alpha, const sparse_matrix_t A, const struct matrix_descr descr, const double *x, const
double beta, double *y, double *d);
sparse_status_t mkl_sparse_c_dotmv (const sparse_operation_t operation, const
MKL_Complex8 alpha, const sparse_matrix_t A, const struct matrix_descr descr, const
MKL_Complex8 *x, const MKL_Complex8 beta, MKL_Complex8 *y, MKL_Complex8 *d);
sparse_status_t mkl_sparse_z_dotmv (const sparse_operation_t operation, const
MKL_Complex16 alpha, const sparse_matrix_t A, const struct matrix_descr descr, const
MKL_Complex16 *x, const MKL_Complex16 beta, MKL_Complex16 *y, MKL_Complex16 *d);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_?_dotmv routine computes a sparse matrix-vector product and dot product:
351
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
For sparse matrices in the BSR format, the supported combinations of
(indexing,block_layout) are:
• (SPARSE_INDEX_BASE_ZERO, SPARSE_LAYOUT_ROW_MAJOR )
• (SPARSE_INDEX_BASE_ONE, SPARSE_LAYOUT_COLUMN_MAJOR )
Input Parameters
352
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
SPARSE_FILL_MODE_UPPE The upper triangular matrix part is processed.
R
sparse_diag_type_t diag - Specifies diagonal type for non-general
matrices:
Output Parameters
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
mkl_sparse_?_sorv
Computes forward, backward sweeps or a symmetric
successive over-relaxation preconditioner operation.
Syntax
sparse_status_t mkl_sparse_s_sorv(
const sparse_sor_type_t type,
const struct matrix_descr descrA,
353
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
const sparse_matrix_t A,
float omega,
float alpha,
float* x,
float* b
);
sparse_status_t mkl_sparse_d_sorv(
const sparse_sor_type_t type,
const struct matrix_descr descrA,
const sparse_matrix_t A,
double omega,
double alpha,
double* x,
double* b
);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_?_sorv routine performs one of the following operations:
SPARSE_SOR_FORWARD:
SPARSE_SOR_BACKWARD:
preconditioner.
where A = L + D + U and x^0 is an input vector x scaled by input parameter alpha vector and x^1 is an
output stored in vector x.
354
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
NOTE
Currently this routine only supports the following configuration:
• CSR format of the input matrix
• SPARSE_SOR_FORWARD operation
• General matrix (descr.type is SPARSE_MATRIX_TYPE_GENERAL) or symmetric matrix with full
portrait and unit diagonal (descr.type is SPARSE_MATRIX_TYPE_SYMMETRIC, descr.mode is
SPARSE_FILL_MODE_FULL, and descr.diag is SPARSE_DIAG_UNIT)
NOTE
Currently, this routine is optimized only for sequential threading execution mode.
Warning It is currently not allowed to place a sorv call in a parallel section (e.g., under
#pragma omp parallel), because it is not thread-safe in this scenario. This limitation will be
addressed in one of the upcoming releases.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
355
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
• SPARSE_FILL_MODE_LOWER
The lower triangular matrix part is processed.
• SPARSE_FILL_MODE_UPPER
The upper triangular matrix part is
processed.
• SPARSE_DIAG_NON_UNIT
Diagonal elements might not be equal to
one.
• SPARSE_DIAG_UNIT
Diagonal elements are equal to one.
alpha Parameter that could be used to normalize or set to zero the vector x that
holds the initial guess.
b Right-hand side.
Output Parameters
356
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
BLAS-like Extensions
Intel® oneAPI Math Kernel Library provides C and Fortran routines to extend the functionality of the BLAS
routines. These include routines to compute vector products, matrix-vector products, and matrix-matrix
products.
Intel® oneAPI Math Kernel Library also provides routines to perform certain data manipulation, including
matrix in-place and out-of-place transposition operations combined with simple matrix arithmetic operations.
Transposition operations are Copy As Is, Conjugate transpose, Transpose, and Conjugate. Each routine adds
the possibility of scaling during the transposition operation by giving some alpha and/or beta parameters.
Each routine supports both row-major orderings and column-major orderings.
Table “BLAS-like Extensions” lists these routines.
The <?> symbol in the routine short names is a precision prefix that indicates the data type:
s float
d double
c MKL_Complex8
z MKL_Complex16
BLAS-like Extensions
Routine Data Types Description
cblas_?axpby s, d, c, z Scales two vectors, adds them to one another and stores
result in the vector (routines).
s, d, c, z
cblas_?dgmm_batch_strided Computes groups of diagonal matrix-general matrix
product
cblas_?dgmm_batch
bfloat16
cblas_gemm_bf16bf16f32 Computes a matrix-matrix product with general matrices
of bfloat16 data type.
357
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
bfloat16
cblas_gemm_bf16bf16f32_compute Computes a matrix-matrix product with general matrices
of bfloat16 data type where one or both input matrices
are stored in a packed data structure, and adds the result
to a scalar-matrix product.
half precision
cblas_gemm_f16f16f32_compute Computes a matrix-matrix product with general matrices
of half precision data type where one or both input
matrices are stored in a packed data structure, and adds
the result to a scalar-matrix product.
cblas_gemm_*_pack Integer, bfloat16 Pack the matrix into the buffer allocated previously.
h, s, d
cblas_?gemm_pack_get_size Returns the number of bytes required to store the packed
matrix.
Integer, bfloat16
cblas_gemm_*_pack_get_size Returns the number of bytes required to store the packed
matrix.
s, d, c, z
cblas_?gemv_batch_strided Computes groups of matrix-vector product using general
matrices.
cblas_?gemv_batch
Solves a triangular matrix equation for a group of matrices.
cblas_?trsm_batch s, d, c, z
?cblas_?trsm_batch_strided
s, d, c, z
mkl_?imatcopy_batch_strided Computes groups of in-place matrix copy/transposition
with scaling using general matrices.
mkl_?imatcopy_batch
358
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Routine Data Types Description
cblas_?axpy_batch
Computes a group of vector-scalar products added to
a vector.
Syntax
void cblas_saxpy_batch (const MKL_INT *n_array, const float *alpha_array, const float
**x_array, const MKL_INT *incx_array, float **y_array, const MKL_INT *incy_array, const
MKL_INT group_count, const MKL_INT *group_size_array);
void cblas_daxpy_batch (const MKL_INT *n_array, const double *alpha_array, const double
**x_array, const MKL_INT *incx_array, double **y_array, const MKL_INT *incy_array,
const MKL_INT group_count, const MKL_INT *group_size_array);
void cblas_caxpy_batch (const MKL_INT *n_array, const void *alpha_array, const void
**x_array, const MKL_INT *incx_array, void **y_array, const MKL_INT *incy_array, const
MKL_INT group_count, const MKL_INT *group_size_array);
void cblas_zaxpy_batch (const MKL_INT *n_array, const void *alpha_array, const void
**x_array, const MKL_INT *incx_array, void **y_array, const MKL_INT *incy_array, const
MKL_INT group_count, const MKL_INT *group_size_array);
Description
The cblas_?axpy_batch routines perform a series of scalar-vector product added to a vector. They are
similar to the cblas_?axpy routine counterparts, but the cblas_?axpy_batch routines perform vector
operations with a group of vectors. The groups contain vectors with the same parameters.
The operation is defined as
idx = 0
for i = 0 … group_count – 1
n, alpha, incx, incy and group_size at position i in n_array, alpha_array, incx_array,
incy_array and group_size_array
for j = 0 … group_size – 1
359
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
incx_array Array of size group_count. For the group i, incxi = incx_array[i] is the
stride of vector x.
incy_array Array of size group_count. For the group i, incyi = incy_array[i] is the
stride of vector y.
Output Parameters
cblas_?axpy_batch_strided
Computes a group of vector-scalar products added to
a vector.
Syntax
void cblas_saxpy_batch_strided (const MKL_INT n, const float alpha, const float *x,
const MKL_INT incx, const MKL_INT stridex, float *y, const MKL_INT incy, const MKL_INT
stridey, const MKL_INT batch_size);
void cblas_daxpy_batch_strided (const MKL_INT n, const double alpha, const double *x,
const MKL_INT incx, const MKL_INT stridex, double *y, const MKL_INT incy, const MKL_INT
stridey, const MKL_INT batch_size);
void cblas_caxpy_batch_strided (const MKL_INT n, const void alpha, const void *x, const
MKL_INT incx, const MKL_INT stridex, void *y, const MKL_INT incy, const MKL_INT
stridey, const MKL_INT batch_size);
360
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void cblas_zaxpy_batch_strided (const MKL_INT n, const void alpha, const void *x, const
MKL_INT incx, const MKL_INT stridex, void *y, const MKL_INT incy, const MKL_INT
stridey, const MKL_INT batch_size);
Include Files
• mkl.h
Description
The cblas_?axpy_batch_strided routines perform a series of scalar-vector product added to a vector.
They are similar to the cblas_?axpy routine counterparts, but the cblas_?axpy_batch_strided routines
perform vector operations with a group of vectors.
All vector x (respectively, y) have the same parameters (size, increments) and are stored at constant stridex
(respectively, stridey) from each other. The operation is defined as
For i = 0 … batch_size – 1
X and Y are vectors at offset i * stridex and i * stridey in x and y
Y = alpha * X + Y
end for
Input Parameters
Output Parameters
cblas_?axpby
Scales two vectors, adds them to one another and
stores result in the vector.
Syntax
void cblas_saxpby (const MKL_INT n, const float a, const float *x, const MKL_INT incx,
const float b, float *y, const MKL_INT incy);
void cblas_daxpby (const MKL_INT n, const double a, const double *x, const MKL_INT
incx, const double b, double *y, const MKL_INT incy);
361
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
void cblas_caxpby (const MKL_INT n, const void *a, const void *x, const MKL_INT incx,
const void *b, void *y, const MKL_INT incy);
void cblas_zaxpby (const MKL_INT n, const void *a, const void *x, const MKL_INT incx,
const void *b, void *y, const MKL_INT incy);
Include Files
• mkl.h
Description
y := a*x + b*y
where:
a and b are scalars
x and y are vectors each with n elements.
Input Parameters
Output Parameters
Example
For examples of routine usage, see these code examples in the Intel® oneAPI Math Kernel Library (oneMKL)
installation directory:
• cblas_saxpby: examples\cblas\source\cblas_saxpbyx.c
• cblas_daxpby: examples\cblas\source\cblas_daxpbyx.c
• cblas_caxpby: examples\cblas\source\cblas_caxpbyx.c
• cblas_zaxpby: examples\cblas\source\cblas_zaxpbyx.c
cblas_?copy_batch
Computes a group of vector copies.
Syntax
void cblas_scopy_batch (const MKL_INT *n_array, const float **x_array, const MKL_INT
*incx_array, float **y_array, const MKL_INT *incy_array, const MKL_INT group_count,
const MKL_INT *group_size_array);
362
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void cblas_dcopy_batch (const MKL_INT *n_array, const double **x_array, const MKL_INT
*incx_array, double **y_array, const MKL_INT *incy_array, const MKL_INT group_count,
const MKL_INT *group_size_array);
void cblas_ccopy_batch (const MKL_INT *n_array, const void **x_array, const MKL_INT
*incx_array, void **y_array, const MKL_INT *incy_array, const MKL_INT group_count,
const MKL_INT *group_size_array);
void cblas_zcopy_batch (const MKL_INT *n_array, const void **x_array, const MKL_INT
*incx_array, void **y_array, const MKL_INT *incy_array, const MKL_INT group_count,
const MKL_INT *group_size_array);
Description
The cblas_?copy_batch routines perform a series of vector copies. They are similar to their cblas_?copy
routine counterparts, but the cblas_?copy_batch routines perform vector operations with a group of
vectors. Each groups contains vectors with the same parameters (size and increment), while those
parameters may vary between groups.
The operation is defined as follows:
idx = 0
for i = 0 … group_count – 1
n, incx, incy and group_size at position i in n_array, alpha_array, incx_array, incy_array
and group_size_array
for j = 0 … group_size – 1
x and y are vectors of size n at position idx in x_array and y_array
y := x
idx := idx + 1
end for
end for
The number of entries in x_array and y_array is total_batch_count, which is the sum of all the
group_size entries.
Input Parameters
n_array Array of size group_count. For the group i, n_i = n_array[i] is the
number of elements in the vectors x and y.
x_array Array of size total_batch_count of pointers used to store x vectors.
The array allocated for the x vectors of the group i must be of size
at least (1 + (n_i - 1)*abs(incx_i)).
incx_array Array of size group_count. For the group i, incx_i = incx_array[i]
is the increment (or stride) between two consecutive elements of
the vector x.
y_array Array of size total_batch_count of pointers used to store the output
vectors y. The array allocated for the y vectors of the group i must
be of size at least (1 + (n_i - 1)*abs(incy_i)).
incy_array Array of size group_count. For the group i, incy_i = incy_array[i]
is the increment (or stride) between two consecutive elements of
the vector y.
group_count Number of groups. Must be at least 0.
363
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
cblas_?copy_batch_strided
Computes a group of vector copies.
Syntax
void cblas_scopy_batch_strided (const MKL_INT n, const float *x, const MKL_INT incx,
const MKL_INT stridex, float *y, const MKL_INT incy, const MKL_INT stridey, const
MKL_INT batch_size);
void cblas_dcopy_batch_strided (const MKL_INT n, const double *x, const MKL_INT incx,
const MKL_INT stridex, double *y, const MKL_INT incy, const MKL_INT stridey, const
MKL_INT batch_size);
void cblas_ccopy_batch_strided (const MKL_INT n, const void *x, const MKL_INT incx,
const MKL_INT stridex, void *y, const MKL_INT incy, const MKL_INT stridey, const
MKL_INT batch_size);
void cblas_zcopy_batch_strided (const MKL_INT n, const void *x, const MKL_INT incx,
const MKL_INT stridex, void *y, const MKL_INT incy, const MKL_INT stridey, const
MKL_INT batch_size);
Description
The cblas_?copy_batch_strided routines perform a series of vector copies. They are similar to their
cblas_?copy routine counterparts, but the cblas_?copy_batch_strided routines perform vector
operations with a group of vectors.
All vectors x and y have the same parameters (size, increments) and are stored at constant distance
stridex (respectively, stridey) from each other. The operation is defined as follows:
for i = 0 … batch_size – 1
X and Y are vectors at offset i * stridex and i * stridey in x and y
Y = X
end for
Input Parameters
364
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
cblas_?gemmt
Computes a matrix-matrix product with general
matrices but updates only the upper or lower
triangular part of the result matrix.
Syntax
void cblas_sgemmt (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE transa, const CBLAS_TRANSPOSE transb, const MKL_INT n, const MKL_INT k,
const float alpha, const float *a, const MKL_INT lda, const float *b, const MKL_INT
ldb, const float beta, float *c, const MKL_INT ldc);
void cblas_dgemmt (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE transa, const CBLAS_TRANSPOSE transb, const MKL_INT n, const MKL_INT k,
const double alpha, const double *a, const MKL_INT lda, const double *b, const MKL_INT
ldb, const double beta, double *c, const MKL_INT ldc);
void cblas_cgemmt (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE transa, const CBLAS_TRANSPOSE transb, const MKL_INT n, const MKL_INT k,
const void *alpha, const void *a, const MKL_INT lda, const void *b, const MKL_INT ldb,
const void *beta, void *c, const MKL_INT ldc);
void cblas_zgemmt (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE transa, const CBLAS_TRANSPOSE transb, const MKL_INT n, const MKL_INT k,
const void *alpha, const void *a, const MKL_INT lda, const void *b, const MKL_INT ldb,
const void *beta, void *c, const MKL_INT ldc);
Include Files
• mkl.h
Description
The ?gemmt routines compute a scalar-matrix-matrix product with general matrices and add the result to the
upper or lower part of a scalar-matrix product. These routines are similar to the ?gemm routines, but they
only access and update a triangular part of the square result matrix (see Application Notes below).
The operation is defined as
C := alpha*op(A)*op(B) + beta*C,
where:
op(X) is one of op(X) = X, or op(X) = XT, or op(X) = XH,
alpha and beta are scalars,
A, B and C are matrices:
op(A) is an n-by-k matrix,
op(B) is a k-by-n matrix,
C is an n-by-n upper or lower triangular matrix.
365
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
uplo Specifies whether the upper or lower triangular part of the array c is used.
If uplo = 'CblasUpper', then the upper triangular part of the array c is
used. If uplo = 'CblasLower', then the lower triangular part of the array
c is used.
n Specifies the order of the matrix C. The value of n must be at least zero.
k Specifies the number of columns of the matrix op(A) and the number of
rows of the matrix op(B). The value of k must be at least zero.
a transa='CblasNoTr transa='CblasTran
ans' s' or
'CblasConjTrans'
Layout='CblasColMaj Array, size lda * k. Array, size lda * n.
or' Before entry, the leading Before entry, the leading
n-by-k part of the array k-by-n part of the array
a must contain the a must contain the
matrix A. matrix A.
Layout='CblasRowMaj Array, size lda * n. Array, size lda * k.
or' Before entry, the leading Before entry, the leading
k-by-n part of the array n-by-k part of the array
a must contain the a must contain the
matrix A. matrix A.
transa='CblasNoTr transa='CblasTran
ans' s' or
'CblasConjTrans'
Layout='CblasColMaj lda must be at least lda must be at least
or' max(1, n). max(1, k).
366
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Layout='CblasRowMaj lda must be at least lda must be at least
or' max(1, k). max(1, n).
b transb='CblasNoTr transb='CblasTran
ans' s' or
'CblasConjTrans'
Layout='CblasColMaj Array, size ldb * n. Array, size ldb * k.
or' Before entry, the leading Before entry, the leading
k-by-n part of the array n-by-k part of the array
b must contain the b must contain the
matrix B. matrix B.
Layout='CblasRowMaj Array, size ldb * k. Array, size ldb * n.
or' Before entry, the leading Before entry, the leading
n-by-k part of the array k-by-n part of the array
b must contain the b must contain the
matrix B. matrix B.
transb='CblasNoTr transb='CblasTran
ans' s' or
'CblasConjTrans'
Layout='CblasColMaj ldb must be at least ldb must be at least
or' max(1, k). max(1, n).
Layout='CblasRowMaj ldb must be at least ldb must be at least
or' max(1, n). max(1, k).
beta Specifies the scalar beta. When beta is equal to zero, then c need not be
set on input.
Output Parameters
367
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Application Notes
These routines only access and update the upper or lower triangular part of the result matrix. This can be
useful when the result is known to be symmetric; for example, when computing a product of the form C :=
alpha*B*S*BT + beta*C , where S and C are symmetric matrices and B is a general matrix. In this case,
first compute A := B*S (which can be done using the corresponding ?symm routine), then compute C :=
alpha*A*BT + beta*C using the ?gemmt routine.
cblas_?gemm3m
Computes a scalar-matrix-matrix product using matrix
multiplications and adds the result to a scalar-matrix
product.
Syntax
void cblas_cgemm3m (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, const
CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const void
*alpha, const void *a, const MKL_INT lda, const void *b, const MKL_INT ldb, const void
*beta, void *c, const MKL_INT ldc);
void cblas_zgemm3m (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, const
CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const void
*alpha, const void *a, const MKL_INT lda, const void *b, const MKL_INT ldb, const void
*beta, void *c, const MKL_INT ldc);
Include Files
• mkl.h
Description
The ?gemm3m routines perform a matrix-matrix operation with general complex matrices. These routines are
similar to the ?gemm routines, but they use fewer matrix multiplication operations (see Application Notes
below).
The operation is defined as
C := alpha*op(A)*op(B) + beta*C,
where:
op(x) is one of op(x) = x, or op(x) = x', or op(x) = conjg(x'),
alpha and beta are scalars,
A, B and C are matrices:
op(A) is an m-by-k matrix,
op(B) is a k-by-n matrix,
C is an m-by-n matrix.
Input Parameters
368
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
if transa=CblasNoTrans, then op(A) = A;
m Specifies the number of rows of the matrix op(A) and of the matrix C. The
value of m must be at least zero.
n Specifies the number of columns of the matrix op(B) and the number of
columns of the matrix C.
The value of n must be at least zero.
k Specifies the number of columns of the matrix op(A) and the number of
rows of the matrix op(B).
a
transa=CblasNoTrans transa=CblasTrans or
transa=CblasConjTrans
transa=CblasNoTrans transa=CblasTrans or
transa=CblasConjTrans
369
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
b
transb=CblasNoTrans transb=CblasTrans or
transb=CblasConjTrans
transb=CblasNoTrans transb=CblasTrans or
transb=CblasConjTrans
c
Layout = Array, size ldc by n. Before entry, the leading m-
CblasColMajor by-n part of the array c must contain the matrix C,
except when beta is equal to zero, in which case c
need not be set on entry.
Output Parameters
370
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Application Notes
These routines perform a complex matrix multiplication by forming the real and imaginary parts of the input
matrices. This uses three real matrix multiplications and five real matrix additions instead of the conventional
four real matrix multiplications and two real matrix additions. The use of three real matrix multiplications
reduces the time spent in matrix operations by 25%, resulting in significant savings in compute time for
large matrices.
If the errors in the floating point calculations satisfy the following conditions:
fl(x op y)=(x op y)(1+δ),|δ|≤u, op=×,/, fl(x±y)=x(1+α)±y(1+β), |α|,|β|≤u
then for an n-by-n matrix Ĉ=fl(C1+iC2)= fl((A1+iA2)(B1+iB2))=Ĉ1+iĈ2, the following bounds are
satisfied:
║Ĉ1-C1║≤ 2(n+1)u║A║∞║B║∞+O(u2),
║Ĉ2-C2║≤ 4(n+4)u║A║∞║B║∞+O(u2),
where ║A║∞=max(║A1║∞,║A2║∞), and ║B║∞=max(║B1║∞,║B2║∞).
cblas_?gemm_batch
Computes scalar-matrix-matrix products and adds the
results to scalar matrix products for groups of general
matrices.
Syntax
void cblas_sgemm_batch (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE* transa_array,
const CBLAS_TRANSPOSE* transb_array, const MKL_INT* m_array, const MKL_INT* n_array,
const MKL_INT* k_array, const float* alpha_array, const float **a_array, const MKL_INT*
lda_array, const float **b_array, const MKL_INT* ldb_array, const float* beta_array,
float **c_array, const MKL_INT* ldc_array, const MKL_INT group_count, const MKL_INT*
group_size);
void cblas_dgemm_batch (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE* transa_array,
const CBLAS_TRANSPOSE* transb_array, const MKL_INT* m_array, const MKL_INT* n_array,
const MKL_INT* k_array, const double* alpha_array, const double **a_array, const
MKL_INT* lda_array, const double **b_array, const MKL_INT* ldb_array, const double*
beta_array, double **c_array, const MKL_INT* ldc_array, const MKL_INT group_count,
const MKL_INT* group_size);
void cblas_cgemm_batch (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE* transa_array,
const CBLAS_TRANSPOSE* transb_array, const MKL_INT* m_array, const MKL_INT* n_array,
const MKL_INT* k_array, const void *alpha_array, const void **a_array, const MKL_INT*
lda_array, const void **b_array, const MKL_INT* ldb_array, const void *beta_array, void
**c_array, const MKL_INT* ldc_array, const MKL_INT group_count, const MKL_INT*
group_size);
void cblas_zgemm_batch (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE* transa_array,
const CBLAS_TRANSPOSE* transb_array, const MKL_INT* m_array, const MKL_INT* n_array,
const MKL_INT* k_array, const void *alpha_array, const void **a_array, const MKL_INT*
lda_array, const void **b_array, const MKL_INT* ldb_array, const void *beta_array, void
**c_array, const MKL_INT* ldc_array, const MKL_INT group_count, const MKL_INT*
group_size);
371
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The ?gemm_batch routines perform a series of matrix-matrix operations with general matrices. They are
similar to the ?gemm routine counterparts, but the ?gemm_batch routines perform matrix-matrix operations
with groups of matrices, processing a number of groups at once. The groups contain matrices with the same
parameters.
The operation is defined as
idx = 0
for i = 0..group_count - 1
alpha and beta in alpha_array[i] and beta_array[i]
for j = 0..group_size[i] - 1
A, B, and C matrix in a_array[idx], b_array[idx], and c_array[idx]
C := alpha*op(A)*op(B) + beta*C,
idx = idx + 1
end for
end for
where:
op(X) is one of op(X) = X, or op(X) = XT, or op(X) = XH,
alpha and beta are scalar elements of alpha_array and beta_array,
A, B and C are matrices such that for m, n, and k which are elements of m_array, n_array, and k_array:
See also gemm for a detailed description of multiplication for general matrices and ?gemm3m_batch, BLAS-
like extension routines for similar matrix-matrix operations.
NOTE
Error checking is not performed for oneMKL Windows* single dynamic libraries for
the?gemm_batch routines.
Input Parameters
372
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
if transai = CblasConjTrans, then op(A) = AH.
m_array Array of size group_count. For the group i, mi = m_array[i] specifies the
number of rows of the matrix op(A) and of the matrix C.
n_array Array of size group_count. For the group i, ni = n_array[i] specifies the
number of columns of the matrix op(B) and the number of columns of the
matrix C.
The value of each element of n_array must be at least zero.
k_array Array of size group_count. For the group i, ki = k_array[i] specifies the
number of columns of the matrix op(A) and the number of rows of the
matrix op(B).
alpha_array Array of size group_count. For the group i, alpha_array[i] specifies the
scalar alphai.
transai=CblasNoTrans transai=CblasTrans or
transai=CblasConjTrans
transbi=CblasNoTrans transbi=CblasTrans or
transbi=CblasConjTrans
373
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
beta_array Array of size group_count. For the group i, beta_array[i] specifies the
scalar betai.
When betai is equal to zero, then C matrices in group i need not be set on
input.
Output Parameters
cblas_?gemm_batch_strided
Computes groups of matrix-matrix product with
general matrices.
Syntax
void cblas_sgemm_batch_strided (const CBLAS_LAYOUT layout, const CBLAS_TRANSPOSE
transa, const CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT
k, const float alpha, const float *a, const MKL_INT lda, const MKL_INT stridea, const
float *b, const MKL_INT ldb, const MKL_INT strideb, const float beta, float *c, const
MKL_INT ldc, const MKL_INT stridec, const MKL_INT batch_size);
void cblas_dgemm_batch_strided (const CBLAS_LAYOUT layout, const CBLAS_TRANSPOSE
transa, const CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT
k, const double alpha, const double *a, const MKL_INT lda, const MKL_INT stridea, const
double *b, const MKL_INT ldb, const MKL_INT strideb, const double beta, double *c,
const MKL_INT ldc, const MKL_INT stridec, const MKL_INT batch_size);
374
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void cblas_cgemm_batch_strided (const CBLAS_LAYOUT layout, const CBLAS_TRANSPOSE
transa, const CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT
k, const void *alpha, const void *a, const MKL_INT lda, const MKL_INT stridea, const
void *b, const MKL_INT ldb, const MKL_INT strideb, const void *beta, void *c, const
MKL_INT ldc, const MKL_INT stridec, const MKL_INT batch_size);
void cblas_zgemm_batch_strided (const CBLAS_LAYOUT layout, const CBLAS_TRANSPOSE
transa, const CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT
k, const void *alpha, const void *a, const MKL_INT lda, const MKL_INT stridea, const
void *b, const MKL_INT ldb, const MKL_INT strideb, const void *beta, void *c, const
MKL_INT ldc, const MKL_INT stridec, const MKL_INT batch_size);
Include Files
• mkl.h
Description
The cblas_?gemm_batch_strided routines perform a series of matrix-matrix operations with general
matrices. They are similar to the cblas_?gemm routine counterparts, but the cblas_?gemm_batch_strided
routines perform matrix-matrix operations with groups of matrices. The groups contain matrices with the
same parameters.
All matrix a (respectively, b or c) have the same parameters (size, leading dimension, transpose operation,
alpha, beta scaling) and are stored at constant stridea (respectively, strideb or stridec) from each other. The
operation is defined as
For i = 0 … batch_size – 1
Ai, Bi and Ci are matrices at offset i * stridea, i * strideb and i * stridec in a, b and c
Ci = alpha * Ai * Bi + beta * Ci
end for
Input Parameters
k Number of columns of the op(A) matrix and number of rows of the op(B)
matrix. Must be at least 0.
375
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
transa=CblasNoTrans transa=CblasTrans or
CblasConjTrans
transa=CblasNoTrans transa=CblasTrans
or CblasConjTrans
transa=CblasNoTrans transa=CblasTrans or
CblasConjTrans
transb=CblasNoTrans transb=CblasTrans or
CblasConjTrans
376
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
transab=CblasNoTrans transb=CblasTrans
or CblasConjTrans
transa=CblasNoTrans transa=CblasTrans or
CblasConjTrans
Output Parameters
cblas_?gemm3m_batch_strided
Computes groups of matrix-matrix product with
general matrices.
377
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
void cblas_cgemm3m_batch_strided (const CBLAS_LAYOUT layout, const CBLAS_TRANSPOSE
transa, const CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT
k, const void *alpha, const void *a, const MKL_INT lda, const MKL_INT stridea, const
void *b, const MKL_INT ldb, const MKL_INT strideb, const void *beta, void *c, const
MKL_INT ldc, const MKL_INT stridec, const MKL_INT batch_size);
void cblas_zgemm3m_batch_strided (const CBLAS_LAYOUT layout, const CBLAS_TRANSPOSE
transa, const CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT
k, const void *alpha, const void *a, const MKL_INT lda, const MKL_INT stridea, const
void *b, const MKL_INT ldb, const MKL_INT strideb, const void *beta, void *c, const
MKL_INT ldc, const MKL_INT stridec, const MKL_INT batch_size);
Include Files
• mkl.h
Description
The cblas_?gemm3m_batch_strided routines perform a series of matrix-matrix operations with general
matrices. They are similar to the cblas_?gemm routine counterparts, but the
cblas_?gemm3m_batch_strided routines perform matrix-matrix operations with groups of matrices. The
groups contain matrices with the same parameters.
All matrix a (respectively, b or c) have the same parameters (size, leading dimension, transpose operation,
alpha, beta scaling) and are stored at constant stridea (respectively, strideb or stridec) from each other. The
operation is defined as
For i = 0 … batch_size – 1
Ai, Bi and Ci are matrices at offset i * stridea, i * strideb and i * stridec in a, b and c
Ci = alpha * Ai * Bi + beta * Ci
end for
The cblas_?gemm3m_batch_strided routines use fewer matrix multiplications than the cblas_?gemm
routines, as described in the Application Notes below.
Input Parameters
378
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
k Number of columns of the op(A) matrix and number of rows of the op(B)
matrix. Must be at least 0.
transa=CblasNoTrans transa=CblasTrans or
CblasConjTrans
transa=CblasNoTrans transa=CblasTrans
or CblasConjTrans
transa=CblasNoTrans transa=CblasTrans or
CblasConjTrans
transb=CblasNoTrans transb=CblasTrans or
CblasConjTrans
379
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
transab=CblasNoTrans transb=CblasTrans
or CblasConjTrans
transa=CblasNoTrans transa=CblasTrans or
CblasConjTrans
Output Parameters
380
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Application Notes
These routines perform a complex matrix multiplication by forming the real and imaginary parts of the input
matrices. This uses three real matrix multiplications and five real matrix additions instead of the conventional
four real matrix multiplications and two real matrix additions. The use of three real matrix multiplications
reduces the time spent in matrix operations by 25%, resulting in significant savings in compute time for
large matrices.
If the errors in the floating point calculations satisfy the following conditions:
fl(x op y)=(x op y)(1+δ),|δ|≤u, op=×,/, fl(x±y)=x(1+α)±y(1+β), |α|,|β|≤u
then for an n-by-n matrix Ĉ=fl(C1+iC2)=fl((A1+iA2)(B1+iB2))=Ĉ1+iĈ2, the following bounds are
satisfied:
║Ĉ1-C1║≤ 2(n+1)u║A║∞║B║∞+O(u2),
║Ĉ2-C2║≤ 4(n+4)u║A║∞║B║∞+O(u2),
where ║A║∞=max(║A1║∞,║A2║∞), and ║B║∞=max(║B1║∞,║B2║∞).
cblas_?gemm3m_batch
Computes scalar-matrix-matrix products and adds the
results to scalar matrix products for groups of general
matrices.
Syntax
void cblas_cgemm3m_batch (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE*
transa_array, const CBLAS_TRANSPOSE* transb_array, const MKL_INT* m_array, const
MKL_INT* n_array, const MKL_INT* k_array, const void *alpha_array, const void
**a_array, const MKL_INT* lda_array, const void **b_array, const MKL_INT* ldb_array,
const void *beta_array, void **c_array, const MKL_INT* ldc_array, const MKL_INT
group_count, const MKL_INT* group_size);
void cblas_zgemm3m_batch (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE*
transa_array, const CBLAS_TRANSPOSE* transb_array, const MKL_INT* m_array, const
MKL_INT* n_array, const MKL_INT* k_array, const void *alpha_array, const void
**a_array, const MKL_INT* lda_array, const void **b_array, const MKL_INT* ldb_array,
const void *beta_array, void **c_array, const MKL_INT* ldc_array, const MKL_INT
group_count, const MKL_INT* group_size);
Include Files
• mkl.h
Description
The ?gemm3m_batch routines perform a series of matrix-matrix operations with general matrices. They are
similar to the ?gemm3m routine counterparts, but the ?gemm3m_batch routines perform matrix-matrix
operations with groups of matrices, processing a number of groups at once. The groups contain matrices with
the same parameters. The ?gemm3m_batch routines use fewer matrix multiplications than the ?gemm_batch
routines, as described in the Application Notes.
The operation is defined as
idx = 0
for i = 0..group_count - 1
alpha and beta in alpha_array[i] and beta_array[i]
381
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
for j = 0..group_size[i] - 1
A, B, and C matrix in a_array[idx], b_array[idx], and c_array[idx]
C := alpha*op(A)*op(B) + beta*C,
idx = idx + 1
end for
end for
where:
op(X) is one of op(X) = X, or op(X) = XT, or op(X) = XH,
alpha and beta are scalar elements of alpha_array and beta_array,
A, B and C are matrices such that for m, n, and k which are elements of m_array, n_array, and k_array:
See also gemm for a detailed description of multiplication for general matrices and gemm_batch, BLAS-like
extension routines for similar matrix-matrix operations.
NOTE
Error checking is not performed for Intel® oneAPI Math Kernel Library (oneMKL) Windows*
single dynamic libraries for the?gemm3m_batch routines.
Input Parameters
m_array Array of size group_count. For the group i, mi = m_array[i] specifies the
number of rows of the matrix op(A) and of the matrix C.
382
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
n_array Array of size group_count. For the group i, ni = n_array[i] specifies the
number of columns of the matrix op(B) and the number of columns of the
matrix C.
The value of each element of n_array must be at least zero.
k_array Array of size group_count. For the group i, ki = k_array[i] specifies the
number of columns of the matrix op(A) and the number of rows of the
matrix op(B).
alpha_array Array of size group_count. For the group i, alpha_array[i] specifies the
scalar alphai.
transai=CblasNoTrans transai=CblasTrans or
transai=CblasConjTrans
transbi=CblasNoTrans transbi=CblasTrans or
transbi=CblasConjTrans
When betai is equal to zero, then C matrices in group i need not be set on
input.
383
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
Application Notes
These routines perform a complex matrix multiplication by forming the real and imaginary parts of the input
matrices. This uses three real matrix multiplications and five real matrix additions instead of the conventional
four real matrix multiplications and two real matrix additions. The use of three real matrix multiplications
reduces the time spent in matrix operations by 25%, resulting in significant savings in compute time for
large matrices.
If the errors in the floating point calculations satisfy the following conditions:
fl(x op y)=(x op y)(1+δ),|δ|≤u, op=×,/, fl(x±y)=x(1+α)±y(1+β), |α|,|β|≤u
then for an n-by-n matrix Ĉ=fl(C1+iC2)= fl((A1+iA2)(B1+iB2))=Ĉ1+iĈ2, the following bounds are
satisfied:
║Ĉ1-C1║≤ 2(n+1)u║A║∞║B║∞+O(u2),
║Ĉ2-C2║≤ 4(n+4)u║A║∞║B║∞+O(u2),
where ║A║∞=max(║A1║∞,║A2║∞), and ║B║∞=max(║B1║∞,║B2║∞).
cblas_?trsm_batch
Solves a triangular matrix equation for a group of
matrices.
Syntax
void cblas_strsm_batch (const CBLAS_LAYOUT Layout, const CBLAS_SIDE *Side_Array, const
CBLAS_UPLO *Uplo_Array, const CBLAS_TRANSPOSE *TransA_Array, const CBLAS_DIAG
*Diag_Array, const MKL_INT *M_Array, const MKL_INT *N_Array, const float *alpha_Array,
const float * *A_Array, const MKL_INT *lda_Array, float * *B_Array, const MKL_INT
*ldb_Array, const MKL_INT group_count, const MKL_INT *group_size );
void cblas_dtrsm_batch (const CBLAS_LAYOUT Layout, const CBLAS_SIDE *Side_Array, const
CBLAS_UPLO *Uplo_Array, const CBLAS_TRANSPOSE *Transa_Array, const CBLAS_DIAG
*Diag_Array, const MKL_INT *M_Array, const MKL_INT *N_Array, const double *alpha_Array,
const double * *A_Array, const MKL_INT *lda_Array, double * *B_Array, const MKL_INT
*ldb_Array, const MKL_INT group_count, const MKL_INT *group_size );
384
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void cblas_ctrsm_batch (const CBLAS_LAYOUT Layout, const CBLAS_SIDE *Side_Array, const
CBLAS_UPLO *Uplo_Array, const CBLAS_TRANSPOSE *Transa_Array, const CBLAS_DIAG
*Diag_Array, const MKL_INT *M_Array, const MKL_INT *N_Array, const void *alpha_Array,
const void * *A_Array, const MKL_INT *lda_Array, void * *B_Array, const MKL_INT
*ldb_Array, const MKL_INT group_count, const MKL_INT *group_size );
void cblas_ztrsm_batch (const CBLAS_LAYOUT Layout, const CBLAS_SIDE *Side_Array, const
CBLAS_UPLO *Uplo_Array, const CBLAS_TRANSPOSE *Transa_Array, const CBLAS_DIAG
*Diag_Array, const MKL_INT *M_Array, const MKL_INT *N_Array, const void *alpha_Array,
const void * *A_Array, const MKL_INT *lda_Array, void * *B_Array, const MKL_INT
*ldb_Array, const MKL_INT group_count, const MKL_INT *group_size );
Include Files
• mkl.h
Description
The ?trsm_batch routines solve a series of matrix equations. They are similar to the ?trsm routines except
that they operate on groups of matrices which have the same parameters. The ?trsm_batch routines
process a number of groups at once.
idx = 0
for i = 0..group_count - 1
alpha in alpha_array[i]
for j = 0..group_size[i] - 1
A and B matrix in a_array[idx] and b_array[idx]
Solve op(A)*X = alpha*B
or
Solve X*op(A) = alpha*B
idx = idx + 1
end for
end for
where:
alpha is a scalar element of alpha_array,
X and B are m-by-n matrices for m and n which are elements of m_array and n_array, respectively,
A and B represent matrices stored at addresses pointed to by a_array and b_array, respectively. There are
total_batch_count entries in each of a_array and b_array, where total_batch_count is the sum of all the
group_size entries.
Input Parameters
385
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
if transai=CblasTrans;
386
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
b_array Array, size total_batch_count, of pointers to arrays used to store B
matrices.
For group i, 0 ≤i≤group_count - 1, b is any of the group_size[i] arrays
starting with b_array[group_size[0] + group_size[1] + ... +
group_size(i - 1)]:
For Layout = CblasColMajor: before entry, the leading mi-by-ni part of
the array b must contain the matrix B.
For Layout = CblasRowMajor: before entry, the leading ni-by-mi part of
the array b must contain the matrix B.
Output Parameters
cblas_?trsm_batch_strided
Solves groups of triangular matrix equations.
Syntax
void cblas_strsm_batch_strided(const CBLAS_LAYOUT layout, const CBLAS_SIDE side, const
CBLAS_UPLO uplo, const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m,
const MKL_INT n, const float alpha, const float *a, const MKL_INT lda, const MKL_INT
stridea, float *b, const MKL_INT ldb, const MKL_INT strideb, MKL_INT batch_size);
void cblas_dtrsm_batch_strided(const CBLAS_LAYOUT layout, const CBLAS_SIDE side, const
CBLAS_UPLO uplo, const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m,
const MKL_INT n, const double alpha, const double *a, const MKL_INT lda, const MKL_INT
stridea, double *b, const MKL_INT ldb, const MKL_INT strideb, const MKL_INT
batch_size);
void cblas_ctrsm_batch_strided(const CBLAS_LAYOUT layout, const CBLAS_SIDE side, const
CBLAS_UPLO uplo, const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m,
const MKL_INT n, const void *alpha, const void *a, const MKL_INT lda, const MKL_INT
stridea, void *b, const MKL_INT ldb, const MKL_INT strideb, const MKL_INT batch_size);
void zblas_ctrsm_batch_strided(const CBLAS_LAYOUT layout, const CBLAS_SIDE side, const
CBLAS_UPLO uplo, const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m,
const MKL_INT n, const void *alpha, const void *a, const MKL_INT lda, const MKL_INT
stridea, void *b, const MKL_INT ldb, const MKL_INT strideb, const MKL_INT batch_size);
387
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The cblas_?trsm_batch_strided routines solve a series of triangular matrix equations. They are similar to
the cblas_?trsm routine counterparts, but the cblas_?trsm_batch_strided routines solve triangular
matrix equations with groups of matrices. All matrix a have the same parameters (size, leading dimension,
side, uplo, diag, transpose operation) and are stored at constant stridea from each other. Similarly, all matrix
b have the same parameters (size, leading dimension, alpha scaling) and are stored at constant strideb from
each other.
The operation is defined as
Input Parameters
side Specifies whether op(A) appears on the left or right of X in the equation.
388
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Before entry with uplo = CblasLower lower triangular part of the array A
must contain the lower triangular matrix and the strictly upper triangular
part of A is not referenced.
When diag = CblasUnit, the diagonal elements of A are not referenced
either, but are assumed to be unity.
lda Specifies the leading dimension of the A matrices. When side = CblasLeft,
then lda must be at least max(1, m), when side = side = CblasRight, then
lda must be at least max(1, n).
Output Parameters
mkl_?imatcopy
Performs scaling and in-place transposition/copying of
matrices.
Syntax
void mkl_simatcopy (const char ordering, const char trans, size_t rows, size_t cols,
const float alpha, float * AB, size_t lda, size_t ldb);
void mkl_dimatcopy (const char ordering, const char trans, size_t rows, size_t cols,
const double alpha, double * AB, size_t lda, size_t ldb);
void mkl_cimatcopy (const char ordering, const char trans, size_t rows, size_t cols,
const MKL_Complex8 alpha, MKL_Complex8 * AB, size_t lda, size_t ldb);
void mkl_zimatcopy (const char ordering, const char trans, size_t rows, size_t cols,
const MKL_Complex16 alpha, MKL_Complex16 * AB, size_t lda, size_t ldb);
389
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The mkl_?imatcopy routine performs scaling and in-place transposition/copying of matrices. A transposition
operation can be a normal matrix copy, a transposition, a conjugate transposition, or just a conjugation. The
operation is defined as follows:
AB := alpha*op(AB).
NOTE
Different arrays must not overlap.
Input Parameters
If the data is real, then trans = 'R' is the same as trans = 'N', and
trans = 'C' is the same as trans = 'T'.
ab Array.
lda Distance between the first elements in adjacent columns (in the case of the
column-major order) or rows (in the case of the row-major order) in the
source matrix; measured in the number of elements.
This parameter must be at least rows if ordering = 'C' or 'c', and
max(1,cols) otherwise.
ldb Distance between the first elements in adjacent columns (in the case of the
column-major order) or rows (in the case of the row-major order) in the
destination matrix; measured in the number of elements.
To determine the minimum value of ldb on output, consider the following
guideline:
390
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If ordering = 'C' or 'c', then
Output Parameters
ab Array.
Contains the matrix AB.
Application Notes
For threading to be active in mkl_?imatcopy, the pointer AB must be aligned on the 64-byte boundary. This
requirement can be met by allocating AB with mkl_malloc.
Interfaces
mkl_?imatcopy_batch
Computes a group of in-place scaled matrix copy or
transposition operations on general matrices.
Syntax
void mkl_simatcopy_batch (char layout, const char * trans_array, const size_t *
rows_array, const size_t * cols_array, const float * alpha_array, float ** ab_array,
const size_t * lda_array, const size_t * ldb_array, size_t group_count, const size_t *
group_size);
void mkl_dimatcopy_batch (char layout, const char * trans_array, const size_t *
rows_array, const size_t * cols_array, const double * alpha_array, double ** ab_array,
const size_t * lda_array, const size_t * ldb_array, size_t group_count, const size_t *
group_size);
void mkl_cimatcopy_batch (char layout, const char * trans_array, const size_t *
rows_array, const size_t * cols_array, const MKL_Complex8 * alpha_array, MKL_Complex8 **
ab_array, const size_t * lda_array, const size_t * ldb_array, size_t group_count, const
size_t * group_size);
void mkl_zimatcopy_batch (char layout, const char * trans_array, const size_t *
rows_array, const size_t * cols_array, const MKL_Complex16 * alpha_array, MKL_Complex16
** ab_array, const size_t * lda_array, const size_t * ldb_array, size_t group_count,
const size_t * group_size);
391
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Description
The mkl_?imatcopy_batch routine performs a series of in-place scaled matrix copies or transpositions. They
are similar to the mkl_?imatcopy routine counterparts, but the mkl_?imatcopy_batch routine performs
matrix operations with groups of matrices. Each group has the same parameters (matrix size, leading
dimension, and scaling parameter), but a single call to mkl_?imatcopy_batch operates on multiple groups,
and each group can have different parameters, unlike the related mkl_?imatcopy_batch_strided routines.
idx = 0
for i = 0..group_count - 1
m in rows_array[i], n in cols_array[i], and alpha in alpha_array[i]
for j = 0..group_size[i] - 1
AB matrices in AB_array[idx]
AB := alpha*op(AB)
idx = idx + 1
end for
end for
Where op(X) is one of op(X)=X, op(X)=X', op(X)=conjg(X'), or op(X)=conjg(X). On entry, AB is a m-
by-n matrix such that m and n are elements of rows_array and cols_array.
AB represents a matrix stored at addresses pointed to by AB_array. The number of entries in AB_array is
total_batch_count = the sum of all of the group_size entries.
Input Parameters
rows_array Array of size group_count. Specifies the number of rows of the input
matrix AB. The value of each element must be at least zero.
cols_array Array of size group_count. Specifies the number of columns of the input
matrix AB. The value of each element must be at least zero.
lda_array Array of size group_count. The leading dimension of the matrix input AB.
It must be positive and at least m if column major layout is used or at least
n if row major layout is used.
ldb_array Array of size group_count. The leading dimension of the matrix input AB.
It must be positive and at least
392
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
m if column major layout is used and op(AB) = AB or conjg(AB)
n otherwise
Output Parameters
mkl_?imatcopy_batch_strided
Computes a group of in-place scaled matrix copy or
transposition using general matrices.
Syntax
void mkl_simatcopy_batch_strided (const char layout, const char trans, size_t row,
size_t col, const float alpha, float * ab, size_t lda, size_t ldb, size_t stride, size_t
batch_size);
void mkl_dimatcopy_batch_strided (const char layout, const char trans, size_t row,
size_t col, const double alpha, double * ab, size_t lda, size_t ldb, size_t stride,
size_t batch_size);
void mkl_cimatcopy_batch_strided (const char layout, const char trans, size_t row,
size_t col, MKL_complex8 alpha, MKL_complex8 * ab, size_t lda, size_t ldb, size_t
stride, size_t batch_size);
void mkl_zimatcopy_batch_strided (const char layout, const char trans, size_t row,
size_t col, MKL_complex16 alpha, MKL_complex16 * ab, size_t lda, size_t ldb, size_t
stride, size_t batch_size);
Description
The mkl_?imatcopy_batch_strided routine performs a series of scaled matrix copy or transposition. They
are similar to the mkl_?imatcopy routine counterparts, but the mkl_?imatcopy_batch_strided routine
performs matrix operations with a group of matrices.
All matrices ab have the same parameters (size, transposition operation…) and are stored at constant stride
from each other. The operation is defined as
for i = 0 … batch_size – 1
AB is a matrix at offset i * stride in ab
AB = alpha * op(AB)
end for
Input Parameters
393
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
row Specifies the number of rows of the matrices AB. The value of row must be
at least zero.
col Specifies the number of columns of the matrices AB. The value of col must
be at least zero.
ab Array holding all the input matrix AB. Must be of size at least batch_size
* stride.
lda The leading dimension of the matrix input AB. It must be positive and at
least row if column major layout is used or at least col if row major layout
is used.
ldb The leading dimension of the matrix input AB. It must be positive and at
least
row if column major layout is used and op(AB) = AB or conjg(AB)
row if row major layout is used and op(AB) = AB' or conjg(AB')
col otherwise
Output Parameters
mkl_?omatadd_batch_strided
Computes a group of out-of-place scaled matrix
additions using general matrices.
Syntax
void mkl_somatadd_batch_strided(char ordering, char transa, char transb, size_t rows,
size_t cols, float alpha, const float * A, size_t lda, size_t stridea, float beta,
const float * B, size_t ldb, size_t strideb, float * C, size_t ldc, size_t stridec,
size_t batch_size);
394
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void mkl_domatadd_batch_strided(char ordering, char transa, char transb, size_t rows,
size_t cols, double alpha, const double * A, size_t lda, size_t stridea, double beta,
const double * B, size_t ldb, size_t strideb, double * C, size_t ldc, size_t stridec,
size_t batch_size);
void mkl_comatadd_batch_strided(char ordering, char transa, char transb, size_t rows,
size_t cols, MKL_Complex8 alpha, const MKL_Complex8 * A, size_t lda, size_t stridea,
MKL_Complex8 beta, const MKL_Complex8 * B, size_t ldb, size_t strideb, MKL_Complex8 *
C, size_t ldc, size_t stridec, size_t batch_size);
void mkl_zomatadd_batch_strided(char ordering, char transa, char transb, size_t rows,
size_t cols, MKL_Complex16 alpha, const MKL_Complex16 * A, size_t lda, size_t stridea,
MKL_Complex16 beta, const MKL_Complex16 * B, size_t ldb, size_t strideb, MKL_Complex16
* C, size_t ldc, size_t stridec, size_t batch_size);
Description
The mkl_omatadd_batch_strided routines perform a series of scaled matrix additions. They are similar to
the mkl_omatadd routines, but the mkl_omatadd_batch_strided routines perform matrix operations with a
group of matrices.
The matrices A, B, and C are stored at a constant stride from each other in memory, given by the parameters
stridea, strideb, and stridec. The operation is defined as:
for i = 0 … batch_size – 1
A is a matrix at offset i * stridea in the array a
B is a matrix at offset i * strideb in the array b
C is a matrix at offset i * stridec in the array c
C = alpha * op(A) + beta * op(B)
end for
where:
In general, the a, b, and c arrays must not overlap in memory, with the exception of the following in-place
operations:
• a and c can point to the same memory if transa is non-transpose and all the A matrices within a have
the same parameters as all the respective C matrices within c.
• b and c can point to the same memory if transb is non-transpose and all the B matrices within b have
the same parameters as all the respective C matrices within c.
Input Parameters
395
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
cols Number of columns for the result matrix C. Must be at least zero.
Output Parameters
396
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
mkl_?omatcopy
Performs scaling and out-place transposition/copying
of matrices.
Syntax
void mkl_somatcopy (char ordering, char trans, size_t rows, size_t cols, const float
alpha, const float * A, size_t lda, float * B, size_t ldb);
void mkl_domatcopy (char ordering, char trans, size_t rows, size_t cols, const double
alpha, const double * A, size_t lda, double * B, size_t ldb);
void mkl_comatcopy (char ordering, char trans, size_t rows, size_t cols, const
MKL_Complex8 alpha, const MKL_Complex8 * A, size_t lda, MKL_Complex8 * B, size_t ldb);
void mkl_zomatcopy (char ordering, char trans, size_t rows, size_t cols, const
MKL_Complex16 alpha, const MKL_Complex16 * A, size_t lda, MKL_Complex16 * B, size_t
ldb);
Include Files
• mkl.h
Description
NOTE
Different arrays must not overlap.
Input Parameters
If the data is real, then trans = 'R' is the same as trans = 'N', and
trans = 'C' is the same as trans = 'T'.
397
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
a Input array.
If ordering = 'R' or 'r', the size of a is lda*rows.
lda If ordering = 'R' or 'r', lda represents the number of elements in array
a between adjacent rows of matrix A; lda must be at least equal to the
number of columns of matrix A.
If ordering = 'C' or 'c', lda represents the number of elements in array
a between adjacent columns of matrix A; lda must be at least equal to the
number of row in matrix A.
b Output array.
If ordering = 'R' or 'r';
ldb If ordering = 'R' or 'r', ldb represents the number of elements in array
b between adjacent rows of matrix B.
• If trans = 'T' or 't' or 'C' or 'c', ldb must be at least equal to
rows.
• If trans = 'N' or 'n' or 'R' or 'r', ldb must be at least equal to
cols.
If ordering = 'C' or 'c', ldb represents the number of elements in array
b between adjacent columns of matrix B.
• If trans = 'T' or 't' or 'C' or 'c', ldb must be at least equal to
cols.
• If trans = 'N' or 'n' or 'R' or 'r', ldb must be at least equal to
rows.
Output Parameters
b Output array.
Contains the destination matrix.
Interfaces
mkl_?omatcopy_batch
Computes a group of out of place scaled matrix copy
or transposition operations on general matrices.
398
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void mkl_somatcopy_batch (char layout, const char * trans_array, const size_t *
rows_array, const size_t * cols_array, const float * alpha_array, float ** A_array,
const size_t * lda_array, float ** B_array, const size_t * ldb_array, size_t
group_count, const size_t * group_size);
void mkl_domatcopy_batch (char layout, const char * trans_array, const size_t *
rows_array, const size_t * cols_array, const double * alpha_array, float ** A_array,
const size_t * lda_array, double ** B_array, const size_t * ldb_array, size_t
group_count, const size_t * group_size);
void mkl_comatcopy_batch (char layout, const char * trans_array, const size_t *
rows_array, const size_t * cols_array, const MKL_Complex8 * alpha_array, MKL_Complex8 **
A_array, const size_t * lda_array, MKL_Complex8 ** B_array, const size_t * ldb_array,
size_t group_count, const size_t * group_size);
void mkl_zomatcopy_batch (char layout, const char * trans_array, const size_t *
rows_array, const size_t * cols_array, const MKL_Complex16 * alpha_array, MKL_Complex16
** A_array, const size_t * lda_array, MKL_Complex16 ** B_array, const size_t *
ldb_array, size_t group_count, const size_t * group_size);
Description
The mkl_?omatcopy_batch routine performs a series of out-of-place scaled matrix copies or transpositions.
They are similar to the mkl_?omatcopy routine counterparts, but the mkl_?omatcopy_batch routine
performs matrix operations with groups of matrices. Each group has the same parameters (matrix size,
leading dimension, and scaling parameter), but a single call to mkl_?omatcopy_batch operates on multiple
groups, and each group can have different parameters, unlike the related mkl_?omatcopy_batch_strided
routines.
The operation is defined as
idx = 0
for i = 0..group_count - 1
m in rows_array[i], n in cols_array[i], and alpha in alpha_array[i]
for j = 0..group_size[i] - 1
A and B matrices in a_array[idx] and b_array[idx], respectively
B := alpha*op(A)
idx = idx + 1
end for
end for
Where op(X) is one of op(X)=X, op(X)=X', op(X)=conjg(X'), or op(X)=conjg(X). A is a m-by-n matrix
such that m and n are elements of rows_array and cols_array.
A and B represent matrices stored at addresses pointed to by A_array and B_array. The number of entries in
A_array and B_array is total_batch_count = the sum of all of the group_size entries.
Input Parameters
399
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
rows_array Array of size group_count. Specifies the number of rows of the matrix A.
The value of each element must be at least zero.
cols_array Array of size group_count. Specifies the number of columns of the matrix
A. The value of each element must be at least zero.
lda_array Array of size group_count. The leading dimension of the input matrix A. It
must be positive and at least m if column major layout is used or at least n
if row major layout is used.
ldb_array Array of size group_count. The leading dimension of the output matrix B.
It must be positive and at least
m if column major layout is used and op(A) = A or conjg(A)
n otherwise
Output Parameters
mkl_?omatcopy_batch_strided
Computes a group of out of place scaled matrix copy
or transposition using general matrices.
Syntax
void mkl_somatcopy_batch_strided (const char layout, const char trans, size_t row,
size_t col, const float alpha, const float * a, size_t lda, size_t stridea, float * b,
size_t ldb, size_t strideb, size_t batch_size);
void mkl_domatcopy_batch_strided (const char layout, const char trans, size_t row,
size_t col, const double alpha, const double * a, size_t lda, size_t stridea, double *
b, size_t ldb, size_t strideb, size_t batch_size);
void mkl_comatcopy_batch_strided (const char layout, const char trans, size_t row,
size_t col, const MKL_complex8 alpha, const MKL_complex8 * a, size_t lda, size_t
stridea, MKL_complex8 * b, size_t ldb, size_t strideb, size_t batch_size);
400
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void mkl_zomatcopy_batch_strided (const char layout, const char trans, size_t row,
size_t col, const MKL_complex16 alpha, const MKL_complex16 * a, size_t lda, size_t
stridea, MKL_complex16 * b, size_t ldb, size_t strideb, size_t batch_size);
Description
The mkl_?omatcopy_batch_strided routine performs a series of out-of-place scaled matrix copy or
transposition. They are similar to the mkl_?omatcopy routine counterparts, but the
mkl_?omatcopy_batch_strided routine performs matrix operations with group of matrices.
All matrices a and b have the same parameters (size, transposition operation…) and are stored at constant
stride from each other respectively given by stridea and strideb. The operation is defined as
for i = 0 … batch_size – 1
A and B are matrices at offset i * stridea in a and I * strideb in b
B = alpha * op(A)
end for
Input Parameters
row Specifies the number of rows of the matrices A and B. The value of row
must be at least zero.
col Specifies the number of columns of the matrices A and B. The value of col
must be at least zero.
a Array holding all the input matrices A. Must be of size at least lda * k +
stridea * (batch_size - 1) * stridea where k is col if column
major is used and row otherwise.
lda The leading dimension of the matrix input A. It must be positive and at
least row if column major layout is used or at least col if row major layout
is used.
b Array holding all the output matrices B. Must be of size at least batch_size
* strideb. The b array must be independent from the a array.
ldb The leading dimension of the output matrix B. It must be positive and at
least:
• row if column major layout is used and op(A) = A or conjg(A)
• row if row major layout is used and op(A) = A' or conjg(A')
401
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
• col otherwise
batch_size
Output Parameters
mkl_?omatcopy2
Performs two-strided scaling and out-of-place
transposition/copying of matrices.
Syntax
void mkl_somatcopy2 (char ordering, char trans, size_t rows, size_t cols, const float
alpha, const float * A, size_t lda, size_t stridea, float * B, size_t ldb, size_t
strideb);
void mkl_domatcopy2 (char ordering, char trans, size_t rows, size_t cols, const double
alpha, const double * A, size_t lda, size_t stridea, double * B, size_t ldb, size_t
strideb);
void mkl_comatcopy2 (char ordering, char trans, size_t rows, size_t cols, const
MKL_Complex8 alpha, const MKL_Complex8 * A, size_t lda, size_t stridea, MKL_Complex8 *
B, size_t ldb, size_t strideb);
void mkl_zomatcopy2 (char ordering, char trans, size_t rows, size_t cols, const
MKL_Complex16 alpha, const MKL_Complex16 * A, size_t lda, size_t stridea, MKL_Complex16
* B, size_t ldb, size_t strideb);
Include Files
• mkl.h
Description
402
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
NOTE
Different arrays must not overlap.
Input Parameters
If the data is real, then trans = 'R' is the same as trans = 'N', and
trans = 'C' is the same as trans = 'T'.
rows number of rows for the input matrix A. Must be at least zero.
cols Number of columns for the input matrix A. Must be at least zero.
a Array holding the input matrix A. Must have size at least lda * n for column
major ordering and at least lda * m for row major ordering.
lda Leading dimension of the matrix A. If matrices are stored using column
major layout, lda is the number of elements in the array between adjacent
columns of the matrix and must be at least stridea * (m-1) + 1. If
using row major layout, lda is the number of elements between adjacent
rows of the matrix and must be at least stridea * (n-1) + 1.
stridea The second stride of the matrix A. For column major layout, stridea is the
number of elements in the array between adjacent rows of the matrix. For
row major layout stridea is the number of elements between adjacent
columns of the matrix. In both cases stridea must be at least 1.
trans = trans =
transpose::nontrans transpose::trans, or
trans =
transpose::conjtrans
403
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
trans = trans =
transpose::nontrans transpose::trans, or
trans =
transpose::conjtrans
strideb The second stride of the matrix B. For column major layout, strideb is the
number of elements in the array between adjacent rows of the matrix. For
row major layout, strideb is the number of elements between adjacent
columns of the matrix. In both cases strideb must be at least 1.
Output Parameters
Interfaces
mkl_?omatadd
Scales and sums two matrices in addition to
performing out-of-place transposition operations.
Syntax
void mkl_somatadd (char ordering, char transa, char transb, size_t m, size_t n, const
float alpha, const float * A, size_t lda, const float beta, const float * B, size_t ldb,
float * C, size_t ldc);
void mkl_domatadd (char ordering, char transa, char transb, size_t m, size_t n, const
double alpha, const double * A, size_t lda, const double beta, const double * B, size_t
ldb, double * C, size_t ldc);
void mkl_comatadd (char ordering, char transa, char transb, size_t m, size_t n, const
MKL_Complex8 alpha, const MKL_Complex8 * A, size_t lda, const MKL_Complex8 beta, const
MKL_Complex8 * B, size_t ldb, MKL_Complex8 * C, size_t ldc);
void mkl_zomatadd (char ordering, char transa, char transb, size_t m, size_t n, const
MKL_Complex16 alpha, const MKL_Complex16 * A, size_t lda, const MKL_Complex16 beta,
const MKL_Complex16 * B, size_t ldb, MKL_Complex16 * C, size_t ldc);
Include Files
• mkl.h
Description
404
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The mkl_?omatadd routine scales and adds two matrices in addition to performing out-of-place transposition
operations. A transposition operation can be no operation, a transposition, a conjugate transposition, or a
conjugation (without transposition). The following out-of-place memory movement is done:
C := alpha*op(A) + beta*op(B)
where the op(A) and op(B) operations are transpose, conjugate-transpose, conjugate (no transpose), or no
transpose, depending on the values of transa and transb. If no transposition of the source matrices is
required, m is the number of rows and n is the number of columns in the source matrices A and B. In this
case, the output matrix C is m-by-n.
In general, a, b, and c must not overlap in memory, with the exception of the following in-place operations:
• a and c can point to the same memory if transa is non-transpose and lda = ldc.
• b and c can point to the same memory if transb is non-transpose and ldb = ldc.
Input Parameters
405
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
a Pointer to array for input matrix A. If alpha is zero, a is never accessed and
may be a null pointer.
lda Distance between the first elements in adjacent columns (in the case of the
column-major order) or rows (in the case of the row-major order) in the
source matrix A; measured in the number of elements.
For ordering = 'C' or 'c': when transa = 'N', 'n', 'R', or 'r', lda
must be at least max(1,m); otherwise lda must be max(1,n).
For ordering = 'R' or 'r': when transa = 'N', 'n', 'R', or 'r', lda
must be at least max(1,n); otherwise lda must be max(1,m).
b Pointer to array for input matrix B. If beta is zero, b is never accessed and
may be a null pointer.
ldb Distance between the first elements in adjacent columns (in the case of the
column-major order) or rows (in the case of the row-major order) in the
source matrix B; measured in the number of elements.
For ordering = 'C' or 'c': when transa = 'N', 'n', 'R', or 'r', ldb
must be at least max(1,m); otherwise ldb must be max(1,n).
For ordering = 'R' or 'r': when transa = 'N', 'n', 'R', or 'r', ldb
must be at least max(1,n); otherwise ldb must be max(1,m).
ldc Distance between the first elements in adjacent columns (in the case of the
column-major order) or rows (in the case of the row-major order) in the
destination matrix C; measured in the number of elements.
If ordering = 'C' or 'c', then ldc must be at least max(1, m),
otherwise ldc must be at least max(1, n).
Output Parameters
c Array.
Interfaces
cblas_?gemm_pack_get_size, cblas_gemm_*_pack_get_size
Returns the number of bytes required to store the
packed matrix.
Syntax
size_t cblas_hgemm_pack_get_size (const CBLAS_IDENTIFIER identifier, const MKL_INT m,
const MKL_INT n, const MKL_INT k)
size_t cblas_sgemm_pack_get_size (const CBLAS_IDENTIFIER identifier, const MKL_INT m,
const MKL_INT n, const MKL_INT k)
size_t cblas_dgemm_pack_get_size (const CBLAS_IDENTIFIER identifier, const MKL_INT m,
const MKL_INT n, const MKL_INT k)
size_t cblas_gemm_s8u8s32_pack_get_size (const CBLAS_IDENTIFIER identifier, const
MKL_INT m, const MKL_INT n, const MKL_INT k)
406
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
size_t cblas_gemm_s16s16s32_pack_get_size (const CBLAS_IDENTIFIER identifier, const
MKL_INT m, const MKL_INT n, const MKL_INT k)
size_t cblas_gemm_bf16bf16f32_pack_get_size (const CBLAS_IDENTIFIER identifier, const
MKL_INT m, const MKL_INT n, const MKL_INT k)
size_t cblas_gemm_f16f16f32_pack_get_size (const CBLAS_IDENTIFIER identifier, const
MKL_INT m, const MKL_INT n, const MKL_INT k)
Include Files
• mkl.h
Description
The cblas_?gemm_pack_get_size and cblas_gemm_*_pack_get_size routines belong to a set of related
routines that enable the use of an internal packed storage. Call the cblas_?gemm_pack_get_size and
cblas_gemm_*_pack_get_size routines first to query the size of storage required for a packed matrix
structure to be used in subsequent calls. Ultimately, the packed matrix structure is used to compute
C := alpha*op(A)*op(B) + beta*C for bfloat16, half, single and double precision or
C := alpha*(op(A)+ A_offset)*(op(B)+ B_offset) + beta*C + C_offset for integer type.
where:
op(X) is one of the operations op(X) = X or op(X) = XT
alpha and beta are scalars,
A , A_offset,B, B_offset,C, and C_offset are matrices
op(A) is an m-by-k matrix,
op(B) is a k-by-n matrix,
C is an m-by-n matrix.
A_offset is an m-by-k matrix.
B_offset is an k-by-n matrix.
C_offset is an m-by-n matrix.
Input Parameters
Parameter Type Description
identifier CBLAS_IDENTIFIER
Specifies which matrix is to be packed:
If identifier = CblasAMatrix, the size
returned is the size required to store matrix A
in an internal format.
If identifier = CblasBMatrix, the size
returned is the size required to store matrix B
in an internal format.
m MKL_INT
Specifies the number of rows of matrix op(A)
and of the matrix C. The value of m must be
at least zero.
n MKL_INT
Specifies the number of columns of matrix
op(B) and the number of columns of matrix
C. The value of n must be at least zero.
407
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
k MKL_INT
Specifies the number of columns of matrix
op(A) and the number of rows of matrix
op(B). The value of k must be at least zero.
Return Values
Parameter Type Description
size size_t
Returns the size (in bytes) required to store
the matrix when packed into the internal
format of Intel® oneAPI Math Kernel Library
(oneMKL).
Example
See the following examples in the MKL installation directory to understand the use of these routines:
cblas_hgemm_pack_get_size: examples\cblas\source\cblas_hgemm_computex.c
cblas_sgemm_pack_get_size: examples\cblas\source\cblas_sgemm_computex.c
cblas_dgemm_pack_get_size: examples\cblas\source\cblas_dgemm_computex.c
cblas_gemm_s8u8s32_pack_get_size: examples\cblas\source\cblas_gemm_s8u8s32_computex.c
cblas_gemm_s16u16s32_pack_get_size: examples\cblas\source\cblas_gemm_s16s16s32_computex.c
cblas_gemm_bf16bf16f32_pack_get_size: examples\cblas\source\cblas_gemm_bf16bf16f32_computex.c
cblas_gemm_f16f16f32_pack_get_size: examples\cblas\source\cblas_gemm_f16f16f32_computex.c
See Also
cblas_?gemm_pack and cblas_gemm_*_pack
to pack the matrix into a buffer allocated previously.
cblas_?gemm_compute and cblas_gemm_*_compute
to compute a matrix-matrix product with general matrices (where one or both input matrices are stored in
a packed data structure) and add the result to a scalar-matrix product.
cblas_?gemm_pack
Performs scaling and packing of the matrix into the
previously allocated buffer.
Syntax
void cblas_hgemm_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier,
const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const
MKL_F16 alpha, const MKL_F16 *src, const MKL_INT ld, MKL_F16 *dest);
void cblas_sgemm_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier,
const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const
float alpha, const float *src, const MKL_INT ld, float *dest);
void cblas_dgemm_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier,
const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const
double alpha, const double *src, const MKL_INT ld, double *dest);
Include Files
• mkl.h
408
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
The cblas_?gemm_pack routine is one of a set of related routines that enable use of an internal packed
storage. Call cblas_?gemm_pack after you allocate a buffer whose size is given by
cblas_?gemm_pack_getsize. The cblas_?gemm_pack routine scales the identified matrix by alpha and
packs it into the buffer allocated previously.
NOTE
Do not copy the packed matrix to a different address because the internal implementation
depends on the alignment of internally-stored metadata.
NOTE
You must use the same value of the Layout parameter for the entire sequence of related
cblas_?gemm_pack and cblas_?gemm_compute calls.
For best performance, use the same number of threads for packing and for computing.
If packing for both A and B matrices, you must use the same number of threads for packing A as for
packing B.
Input Parameters
409
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
m Specifies the number of rows of the matrix op(A) and of the matrix C. The
value of m must be at least zero.
n Specifies the number of columns of the matrix op(B) and the number of
columns of the matrix C. The value of n must be at least zero.
k Specifies the number of columns of the matrix op(A) and the number of
rows of the matrix op(B). The value of k must be at least zero.
src Array:
410
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
identifier = identifier = CblasBMatrix
CblasAMatrix
Output Parameters
See Also
cblas_?gemm_pack_get_size Returns the number of bytes required to store the packed matrix.
cblas_?gemm_compute Computes a matrix-matrix product with general matrices where one or both
input matrices are stored in a packed data structure and adds the result to a scalar-matrix
product.
cblas_?gemm
for a detailed description of general matrix multiplication.
cblas_gemm_*_pack
Pack the matrix into the buffer allocated previously.
Syntax
void cblas_gemm_s8u8s32_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER
identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const
MKL_INT k, const void *src, const MKL_INT ld, void *dest);
void cblas_gemm_s16s16s32_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER
identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const
MKL_INT k, const MKL_INT16 *src, const MKL_INT ld, MKL_INT16 *dest);
void cblas_gemm_bf16bf16f32_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER
identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const
MKL_INT k, const MKL_BF16 *src, const MKL_INT ld, MKL_BF16 *dest);
411
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The cblas_gemm_*_pack routine is one of a set of related routines that enable the use of an internal packed
storage. Call cblas_gemm_*_pack after you allocate a buffer whose size is given by
cblas_gemm_*_pack_get_size. The cblas_gemm_*_pack routine packs the identified matrix into the
buffer allocated previously.
The cblas_gemm_*_pack routine performs this operation:
NOTE
You must use the same value of the Layout parameter for the entire sequence of related
cblas_gemm_*_pack and cblas_gemm_*_compute calls.
For best performance, use the same number of threads for packing and for computing.
If packing for both A and B matrices, you must use the same number of threads for packing A as for
packing B.
Input Parameters
Layout CBLAS_LAYOUT
Specifies whether two-dimensional array storage is row-major
(CblasRowMajor) or column-major(CblasColMajor).
identifier CBLAS_IDENTIFIER
Specifies which matrix is to be packed:
If identifier = CblasAMatrix, the A matrix is packed.
412
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
trans CBLAS_TRANSPOSE
Specifies the form of op(src) used in the packing:
m MKL_INT
Specifies the number of rows of matrix op(A) and of the matrix C. The value
of m must be at least zero.
n MKL_INT
Specifies the number of columns of matrix op(B) and the number of
columns of matrix C. The value of n must be at least zero.
k MKL_INT
Specifies the number of columns of matrix op(A) and the number of rows of
matrix op(B). The value of k must be at least zero.
413
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
414
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
identifier = identifier = CblasBMatrix
CblasAMatrix
Output Parameters
Example
See the following examples in the MKL installation directory to understand the use of these routines:
cblas_gemm_s8u8s32_pack: examples\cblas\source\cblas_gemm_s8u8s32_computex.c
cblas_gemm_s16s16s32_pack: examples\cblas\source\cblas_gemm_s16s16s32_computex.c
cblas_gemm_bf16bf16f32_pack: examples\cblas\source\cblas_gemm_bf16bf16f32_computex.c
cblas_gemm_f16f16f32_pack: examples\cblas\source\cblas_gemm_f16f16f32_computex.c
Application Notes
When using cblas_gemm_s8u8s32_pack with row-major layout , the data types of A and B must be
swapped. That is, you must provide an 8-bit unsigned integer array for matrix A and an 8-bit signed integer
array for matrix B .
See Also
cblas_gemm_*_pack_get_size
to return the number of bytes needed to store the packed matrix.
cblas_gemm_*_compute
to compute a matrix-matrix product with general integer matrices (where one or both input matrices are
stored in a packed data structure) and add the result to a scalar-matrix product.
415
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
cblas_?gemm_compute
Computes a matrix-matrix product with general
matrices where one or both input matrices are stored
in a packed data structure and adds the result to a
scalar-matrix product.
Syntax
void cblas_hgemm_compute (const CBLAS_LAYOUT Layout, const MKL_INT transa, const
MKL_INT transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const MKL_F16 *a,
const MKL_INT lda, const MKL_F16 *b, const MKL_INT ldb, const MKL_F16 beta, MKL_F16 *c,
const MKL_INT ldc);
void cblas_sgemm_compute (const CBLAS_LAYOUT Layout, const MKL_INT transa, const
MKL_INT transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const float *a,
const MKL_INT lda, const float *b, const MKL_INT ldb, const float beta, float *c, const
MKL_INT ldc);
void cblas_dgemm_compute (const CBLAS_LAYOUT Layout, const MKL_INT transa, const
MKL_INT transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const double *a,
const MKL_INT lda, const double *b, const MKL_INT ldb, const double beta, double *c,
const MKL_INT ldc);
Include Files
• mkl.h
Description
The cblas_?gemm_compute routine is one of a set of related routines that enable use of an internal packed
storage. After calling cblas_?gemm_pack call cblas_?gemm_compute to compute
C := op(A)*op(B) + beta*C,
where:
op(X) is one of the operations op(X) = X, op(X) = XT, or op(X) = XH,
beta is a scalar,
A , B, and C are matrices:
op(A) is an m-by-k matrix,
op(B) is a k-by-n matrix,
C is an m-by-n matrix.
NOTE
You must use the same value of the Layout parameter for the entire sequence of related
cblas_?gemm_pack and cblas_?gemm_compute calls.
For best performance, use the same number of threads for packing and for computing.
If packing for both A and B matrices, you must use the same number of threads for packing A as for
packing B.
Input Parameters
416
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
transa Specifies the form of op(A) used in the matrix multiplication, one of the
CBLAS_TRANSPOSE or CBLAS_STORAGE enumerated types:
If transa = CblasNoTrans op(A) = A.
transb Specifies the form of op(B) used in the matrix multiplication, one of the
CBLAS_TRANSPOSE or CBLAS_STORAGE enumerated types:
If transb = CblasNoTrans op(B) = B.
m Specifies the number of rows of the matrix op(A) and of the matrix C. The
value of m must be at least zero.
n Specifies the number of columns of the matrix op(B) and the number of
columns of the matrix C. The value of n must be at least zero.
k Specifies the number of columns of the matrix op(A) and the number of
rows of the matrix op(B). The value of k must be at least zero.
a Array:
417
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
b Array:
418
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
beta Specifies the scalar beta. When beta is equal to zero, then c need not be
set on input.
c Array:
Output Parameters
See Also
cblas_?gemm_pack_get_size Returns the number of bytes required to store the packed matrix.
cblas_?gemm_pack Performs scaling and packing of the matrix into the previously allocated buffer.
cblas_?gemm
for a detailed description of general matrix multiplication.
cblas_gemm_*_compute
Computes a matrix-matrix product with general
integer matrices (where one or both input matrices
are stored in a packed data structure) and adds the
result to a scalar-matrix product.
Syntax
void cblas_gemm_s8u8s32_compute(const CBLAS_LAYOUT Layout, const MKL_INT transa, const
MKL_INT transb, const CBLAS_OFFSET offsetc, const MKL_INT m, const MKL_INT n, const
MKL_INT k, const float alpha, const void *a, const MKL_INT lda, const MKL_INT8 oa,
const void *b, const MKL_INT ldb, const MKL_INT8 ob, const float beta, MKL_INT32 *c,
const MKL_INT ldc, const MKL_INT32 *oc);
void cblas_gemm_s16s16s32_compute(const CBLAS_LAYOUT Layout, const MKL_INT transa,
const MKL_INT transb, const CBLAS_OFFSET offsetc, const MKL_INT m, const MKL_INT n,
const MKL_INT k, const float alpha, const MKL_INT16 *a, const MKL_INT lda, const
MKL_INT16 oa, const MKL_INT16 *b, const MKL_INT ldb, const MKL_INT16 ob, const float
beta, MKL_INT32 *c, const MKL_INT ldc, const MKL_INT32 *oc);
419
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The cblas_gemm_*_compute routine is one of a set of related routines that enable use of an internal packed
storage. After calling cblas_gemm_*_pack call cblas_gemm_*_compute to compute
NOTE
You must use the same value of the Layout parameter for the entire sequence of related
cblas_?gemm_pack and cblas_?gemm_compute calls.
For best performance, use the same number of threads for packing and for computing.
If you are packing for both A and B matrices, you must use the same number of threads for packing A
as for packing B.
Input Parameters
Layout CBLAS_LAYOUT
Specifies whether two-dimensional array storage is row-major (CblasRowMajor) or column-
major(CblasColMajor).
If transa = CblasPacked the matrix in array ais packed into a format internal to Intel® oneAPI
Math Kernel Library (oneMKL) andlda is ignored.
If transb = CblasPacked the matrix in array bis packed into a format internal to Intel® oneAPI
Math Kernel Library (oneMKL) andldb is ignored.
offsetc CBLAS_OFFSET Specifies the form of C_offset used in the matrix multiplication.
420
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If offsetc=CblasFixOffset :oc has a single element and every element of C_offset is equal to
this element.
If offsetc=CblasColOffset :oc has a size of m and every element of C_offset is equal to oc.
If offsetc=CblasRowOffset :oc has a size of n and every element of C_offset is equal to oc.
m MKL_INTSpecifies the number of rows of the matrix op(A) and of the matrix C. The value of m
must be at least zero.
n MKL_INTSpecifies the number of columns of the matrix op(B) and the number of columns of the
matrix C. The value of n must be at least zero.
k MKL_INTSpecifies the number of columns of the matrix op(A) and the number of rows of the
matrix op(B). The value of k must be at least zero.
Layout = CblasColMajor
Layout = CblasRowMajor
421
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Layout = CblasRowMajor
Layout = lda must be at least max(1, m). lda must be at least max(1, k).
CblasColMajor
Layout = lda must be at least max(1, k). lda must be at least max(1, m).
CblasRowMajor
Layout = CblasColMajor
422
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Layout = CblasRowMajor
ldb MKL_INT Specifies the leading dimension of b as declared in the calling (sub)program.
Layout = ldb must be at least max(1, k). ldb must be at least max(1, n).
CblasColMajor
Layout = ldb must be at least max(1, n). ldb must be at least max(1, k).
CblasRowMajor
beta float
Specifies the scalar beta.
c MKL_INT32*
Array:
423
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Before entry, the leading n-by-m part of the array c must contain the
matrix C, except when beta is equal to zero, in which case c need not
be set on entry.
ldc MKL_INT Specifies the leading dimension of c as declared in the calling (sub)program.
oc MKL_INT32*
Array, size len. Specifies the scalar offset value for the matrix C.
Output Parameters
c MKL_INT32*
Overwritten by the matrix alpha*(op(A) + A_offset)*(op(B) +
B_offset) + beta*C + C_offset.
Example
See the following examples in the MKL installation directory to understand the use of these routines:
cblas_gemm_s8u8s32_compute: examples\cblas\source\cblas_gemm_s8u8s32_computex.c
cblas_gemm_s16s16s32_compute: examples\cblas\source\cblas_gemm_s16s16s32_computex.c
Application Notes
You can expand the matrix-matrix product in this manner:
(op(A) + A_offset)*(op(B) + B_offset) = op(A)*op(B) + op(A)*B_offset + A_offset*op(B) +
A_offset*B_offset
After computing these four multiplication terms separately, they are summed from left to right. The results
from the matrix-matrix product and the C matrix are scaled with alpha and beta floating-point values
respectively using double-precision arithmetic. Before storing the results to the output c array, the floating-
point values are rounded to the nearest integers.
In the event of overflow or underflow, the results depend on the architecture. The results are either
unsaturated (wrapped) or saturated to maximum or minimum representable integer values for the data type
of the output matrix.
When using cblas_gemm_s8u8s32_compute with row-major layout , the data types of A and B must be
swapped. That is, you must provide an 8-bit unsigned integer array for matrix A and an 8-bit signed integer
array for matrix B .
See Also
cblas_gemm_*_pack_get_size
to return the number of bytes needed to store the packed matrix.
cblas_gemm_*_pack
424
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
to pack the matrix into the buffer allocated previously.
cblas_gemm_bf16bf16f32_compute
Computes a matrix-matrix product with general
bfloat16 matrices (where one or both input matrices
are stored in a packed data structure) and adds the
result to a scalar-matrix product.
Syntax
C:
void cblas_gemm_bf16bf16f32_compute (const CBLAS_LAYOUT Layout, const MKL_INT transa,
const MKL_INT transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const float
alpha, const MKL_BF16 *a, const MKL_INT lda, const MKL_BF16 *b, const MKL_INT ldb,
const float beta, float *c, const MKL_INT ldc);
Include Files
• mkl.h
Description
The cblas_gemm_bf16bf16f32_compute routine is one of a set of related routines that enable use of an
internal packed storage. After calling cblas_gemm_bf16bf16f32_pack call
cblas_gemm_bf16bf16f32_compute to compute
C := alpha* op(A)*op(B) + beta*C,
where:
op(X) is either op(X) = X or op(X) = XT,
alpha and beta are scalars,
A , B, and C are matrices:
op(A) is an m-by-k matrix,
op(B) is a k-by-n matrix,
C is an m-by-n matrix.
NOTE
You must use the same value of the Layout parameter for the entire sequence of related
cblas_gemm_bf16bf16f32_pack and cblas_gemm_bf16bf16f32_compute calls.
For best performance, use the same number of threads for packing and for computing.
If packing for both A and B matrices, you must use the same number of threads for packing A as for
packing B.
Input Parameters
Layout CBLAS_LAYOUT
Specifies whether two-dimensional array storage is row-major
(CblasRowMajor) or column-major (CblasColMajor).
transa MKL_INT
Specifies the form of op(A) used in the packing:
425
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
transb MKL_INT
Specifies the form of op(B) used in the packing:
m MKL_INT
Specifies the number of rows of the matrix op(A) and of the matrix C.
The value of m must be at least zero.
n MKL_INT
Specifies the number of columns of the matrix op(B) and the number
of columns of the matrix C. The value of n must be at least zero.
k MKL_INT
Specifies the number of columns of the matrix op(A) and the number
of rows of the matrix op(B). The value of k must be at least zero.
alpha float
Specifies the scalar alpha.
a MKL_BF16*
426
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lda MKL_INT
Specifies the leading dimension of a as declared in the calling
(sub)program.
transa = transa =
CblasNoTrans CblasTrans
b MKL_BF16*
ldb MKL_INT
Specifies the leading dimension of b as declared in the calling
(sub)program.
427
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
transb = transb =
CblasNoTrans CblasTrans
beta float
Specifies the scalar beta.
c float*
ldc MKL_INT
Specifies the leading dimension of c as declared in the calling
(sub)program.
Output Parameters
c float*
Overwritten by the matrix alpha * op(A)*op(B) + beta*C.
Example
See the following examples in the Intel® oneAPI Math Kernel Library (oneMKL) installation directory to
understand the use of these routines:
cblas_gemm_bf16bf16f32_compute:
examples\cblas\source\cblas_gemm_bf16bf16f32_computex.c
428
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Application Notes
On architectures without native bfloat16 hardware instructions, matrix A and B are upconverted to single
precision and SGEMM is called to compute matrix multiplication operation.
cblas_gemm_bf16bf16f32
Computes a matrix-matrix product with general
bfloat16 matrices.
Syntax
void cblas_gemm_bf16bf16f32 (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa,
const CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const
float alpha, const MKL_BF16 *a, const MKL_INT lda, const MKL_BF16 *b, const MKL_INT
ldb, const float beta, float *c, const MKL_INT ldc);
Include Files
• mkl.h
Description
The cblas_gemm_bf16bf16f32 routines compute a scalar-matrix-matrix product and adds the result to a
scalar-matrix product. The operation is defined as:
Input Parameters
m Specifies the number of rows of the matrix op(A) and of the matrix C.
The value of m must be at least zero.
n Specifies the number of columns of the matrix op(B) and the number
of columns of the matrix C. The value of n must be at least zero.
429
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
k Specifies the number of columns of the matrix op(A) and the number
of rows of the matrix op(B). The value of k must be at least zero.
a
transa=CblasNoTrans transa=CblasTrans
transa=CblasNoTrans transa=CblasTrans
b
transb=CblasNoTrans transb=CblasTrans
transb=CblasNoTrans transb=CblasTrans
430
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
beta Specifies the scalar beta. When beta is equal to zero, then c need not
be set on input.
c
Layout = Array, size ldc by n. Before entry, the leading
CblasColMajor m-by-n part of the array c must contain the
matrix C, except when beta is equal to zero,
in which case c need not be set on entry.
Output Parameters
Example
For examples of routine usage, see these code examples in the Intel® oneAPI Math Kernel Library (oneMKL)
installation directory:
• cblas_gemm_bf16bf16f32: examples\cblas\source\cblas_gemm_bf16bf16f32x.c
Application Notes
On architectures without native bfloat16 hardware instructions, matrix A and B are upconverted to single
precision and SGEMM is called to compute matrix multiplication operation.
cblas_gemm_f16f16f32_compute
Computes a matrix-matrix product with general
matrices of half-precision data type (where one or
both input matrices are stored in a packed data
structure) and adds the result to a scalar-matrix
product.
Syntax
C:
431
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The cblas_gemm_f16f16f32_compute routine is one of a set of related routines that enable use of an
internal packed storage. After calling cblas_gemm_f16f16f32_pack call cblas_gemm_f16f16f32_compute
to compute
C := alpha* op(A)*op(B) + beta*C,
where:
op(X) is either op(X) = X or op(X) = XT,
alpha and beta are scalars,
A , B, and C are matrices:
op(A) is an m-by-k matrix,
op(B) is a k-by-n matrix,
C is an m-by-n matrix.
NOTE
You must use the same value of the Layout parameter for the entire sequence of related
cblas_gemm_f16f16f32_pack and cblas_gemm_f16f16f32_compute calls.
For best performance, use the same number of threads for packing and for computing.
If packing for both A and B matrices, you must use the same number of threads for packing A as for
packing B.
Input Parameters
Layout CBLAS_LAYOUT
Specifies whether two-dimensional array storage is row-major
(CblasRowMajor) or column-major (CblasColMajor).
transa MKL_INT
Specifies the form of op(A) used in the packing:
transb MKL_INT
Specifies the form of op(B) used in the packing:
432
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If transb = CblasTrans op(B) = BT.
m MKL_INT
Specifies the number of rows of the matrix op(A) and of the matrix C.
The value of m must be at least zero.
n MKL_INT
Specifies the number of columns of the matrix op(B) and the number
of columns of the matrix C. The value of n must be at least zero.
k MKL_INT
Specifies the number of columns of the matrix op(A) and the number
of rows of the matrix op(B). The value of k must be at least zero.
alpha float
Specifies the scalar alpha.
a MKL_F16*
lda MKL_INT
Specifies the leading dimension of a as declared in the calling
(sub)program.
433
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
transa = transa =
CblasNoTrans CblasTrans
b MKL_F16*
ldb MKL_INT
Specifies the leading dimension of b as declared in the calling
(sub)program.
transb = transb =
CblasNoTrans CblasTrans
beta float
434
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Specifies the scalar beta.
c float*
ldc MKL_INT
Specifies the leading dimension of c as declared in the calling
(sub)program.
Output Parameters
c float*
Overwritten by the matrix alpha * op(A)*op(B) + beta*C.
Example
See the following examples in the Intel® oneAPI Math Kernel Library (oneMKL) installation directory to
understand the use of these routines:
cblas_gemm_f16f16f32_compute:
examples\cblas\source\cblas_gemm_f16f16f32_computex.c
Application Notes
On architectures without native half precision hardware instructions, matrix A and B are upconverted to
single precision and SGEMM is called to compute matrix multiplication operation.
cblas_gemm_f16f16f32
Computes a matrix-matrix product with general
matrices of half precision data type.
Syntax
void cblas_gemm_f16f16f32 (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa,
const CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const
float alpha, const MKL_F16 *a, const MKL_INT lda, const MKL_F16 *b, const MKL_INT ldb,
const float beta, float *c, const MKL_INT ldc);
435
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The cblas_gemm_f16f16f32 routines compute a scalar-matrix-matrix product and adds the result to a
scalar-matrix product. The operation is defined as:
Input Parameters
m Specifies the number of rows of the matrix op(A) and of the matrix C.
The value of m must be at least zero.
n Specifies the number of columns of the matrix op(B) and the number
of columns of the matrix C. The value of n must be at least zero.
k Specifies the number of columns of the matrix op(A) and the number
of rows of the matrix op(B). The value of k must be at least zero.
a
transa=CblasNoTrans transa=CblasTrans
436
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
transa=CblasNoTrans transa=CblasTrans
b
transb=CblasNoTrans transb=CblasTrans
transb=CblasNoTrans transb=CblasTrans
beta Specifies the scalar beta. When beta is equal to zero, then c need not
be set on input.
c
Layout = Array, size ldc by n. Before entry, the leading
CblasColMajor m-by-n part of the array c must contain the
matrix C, except when beta is equal to zero,
in which case c need not be set on entry.
437
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
Example
For examples of routine usage, see these code examples in the Intel® oneAPI Math Kernel Library (oneMKL)
installation directory:
• cblas_gemm_f16f16f32: examples\cblas\source\cblas_gemm_f16f16f32x.c
Application Notes
On architectures without native half precision hardware instructions, matrix A and B are upconverted to
single precision and SGEMM is called to compute matrix multiplication operation.
cblas_?gemm_free
Frees the storage previously allocated for the packed
matrix (deprecated).
Syntax
void cblas_sgemm_free (float *dest);
void cblas_dgemm_free (double *dest);
Include Files
• mkl.h
Description
The cblas_?gemm_free routine is one of a set of related routines that enable use of an internal packed
storage. Call the cblas_?gemm_free routine last to release storage for the packed matrix structure allocated
with cblas_?gemm_alloc (deprecated).
Input Parameters
Output Parameters
438
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
See Also
cblas_?gemm_pack Performs scaling and packing of the matrix into the previously allocated buffer.
cblas_?gemm_compute Computes a matrix-matrix product with general matrices where one or both
input matrices are stored in a packed data structure and adds the result to a scalar-matrix
product.
cblas_?gemm
for a detailed description of general matrix multiplication.
cblas_gemm_*
Computes a matrix-matrix product with general
integer matrices.
Syntax
void cblas_gemm_s8u8s32 (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, const
CBLAS_TRANSPOSE transb, const CBLAS_OFFSET offsetc, const MKL_INT m, const MKL_INT n,
const MKL_INT k, const float alpha, const void *a, const MKL_INT lda, const MKL_INT8
oa, const void *b, const MKL_INT ldb, const MKL_INT8 ob, const float beta, MKL_INT32 *c,
const MKL_INT ldc, const MKL_INT32 *oc);
Include Files
• mkl.h
Description
The cblas_gemm_* routines compute a scalar-matrix-matrix product and adds the result to a scalar-matrix
product. To get the final result, a vector is added to each row or column of the output matrix. The operation
is defined as:
Input Parameters
439
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
m Specifies the number of rows of the matrix op(A) and of the matrix C. The
value of m must be at least zero.
n Specifies the number of columns of the matrix op(B) and the number of
columns of the matrix C. The value of n must be at least zero.
k Specifies the number of columns of the matrix op(A) and the number of
rows of the matrix op(B). The value of k must be at least zero.
a
transa=CblasNoTrans transa=CblasTrans
transa=CblasNoTrans transa=CblasTrans
440
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
b
transb=CblasNoTrans transb=CblasTrans
transb=CblasNoTrans transb=CblasTrans
beta Specifies the scalar beta. When beta is equal to zero, then c need not be
set on input.
c
Layout = Array, size ldc by n. Before entry, the leading m-
CblasColMajor by-n part of the array c must contain the matrix C,
except when beta is equal to zero, in which case c
need not be set on entry.
441
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
Example
For examples of routine usage, see the code in in the following links and in the Intel® oneAPI Math Kernel
Library (oneMKL) installation directory:
• cblas_gemm_s8u8s32: examples\cblas\source\cblas_gemm_s8u8s32x.c
• cblas_gemm_s16s16s32: examples\cblas\source\cblas_gemm_s16s16s32x.c
Application Notes
The matrix-matrix product can be expanded:
(op(A) + A_offset)*(op(B) + B_offset)
= op(A)*op(B) + op(A)*B_offset + A_offset*op(B) + A_offset*B_offset
After computing these four multiplication terms separately, they are summed from left to right. The results
from the matrix-matrix product and the C matrix are scaled with alpha and beta floating-point values
respectively using double-precision arithmetic. Before storing the results to the output c array, the floating-
point values are rounded to the nearest integers. In the event of overflow or underflow, the results depend
on the architecture . The results are either unsaturated (wrapped) or saturated to maximum or minimum
representable integer values for the data type of the output matrix.
When using cblas_gemm_s8u8s32 with row-major layout, the data types of A and B must be swapped. That
is, you must provide an 8-bit unsigned integer array for matrix A and an 8-bit signed integer array for matrix
B.
Intermediate integer computations in cblas_gemm_s8u8s32 on 64-bit Intel® Advanced Vector Extensions 2
(Intel® AVX2) and Intel® Advanced Vector Extensions 512 (Intel® AVX-512) architectures without Vector
Neural Network Instructions (VNNI) extensions can saturate. This is because only 16-bits are available for
the accumulation of intermediate results. You can avoid integer saturation by maintaining all integer
elements of A or B matrices under 8 bits.
442
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
cblas_?gemv_batch_strided
Computes groups of matrix-vector product with
general matrices.
Syntax
void cblas_sgemv_batch_strided (const CBLAS_LAYOUT layout, const CBLAS_TRANSPOSE trans,
const MKL_INT m, const MKL_INT n, const float alpha, const float *a, const MKL_INT lda,
const MKL_INT stridea, const float *x, const MKL_INT incx, const MKL_INT stridex, const
float beta, float *y, const MKL_INT incy, const MKL_INT stridey, const MKL_INT
batch_size);
void cblas_dgemv_batch_strided (const CBLAS_LAYOUT layout, const CBLAS_TRANSPOSE trans,
const MKL_INT m, const MKL_INT n, const double alpha, const double *a, const MKL_INT
lda, const MKL_INT stridea, const double *x, const MKL_INT incx, const MKL_INT stridex,
const double beta, double *y, const MKL_INT incy, const MKL_INT stridey, const MKL_INT
batch_size);
void cblas_cgemv_batch_strided (const CBLAS_LAYOUT layout, const CBLAS_TRANSPOSE trans,
const MKL_INT m, const MKL_INT n, const void alpha, const void *a, const MKL_INT lda,
const MKL_INT stridea, const void *x, const MKL_INT incx, const MKL_INT stridex, const
void beta, void *y, const MKL_INT incy, const MKL_INT stridey, const MKL_INT
batch_size);
void cblas_zgemv_batch_strided (const CBLAS_LAYOUT layout, const CBLAS_TRANSPOSE trans,
const MKL_INT m, const MKL_INT n, const void alpha, const void *a, const MKL_INT lda,
const MKL_INT stridea, const void *x, const MKL_INT incx, const MKL_INT stridex, const
void beta, void *y, const MKL_INT incy, const MKL_INT stridey, const MKL_INT
batch_size);
Include Files
• mkl.h
Description
The cblas_?gemv_batch_strided routines perform a series of matrix-vector product added to a scaled
vector. They are similar to the cblas_?gemv routine counterparts, but the cblas_?gemv_batch_strided
routines perform matrix-vector operations with groups of matrices and vectors.
All matrices a and vectors x and y have the same parameters (size, increments) and are stored at constant
stridea, stridex, and stridey from each other. The operation is defined as
for i = 0 … batch_size – 1
A is a matrix at offset i * stridea in a
X and Y are vectors at offset i * stridex and i * stridey in x and y
Y = alpha * op(A) * X + beta * Y
end for
Input Parameters
443
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
a Array holding all the input matrix A. Must be of size at least lda*k + stridea
* (batch_size -1) where k is n if column major layout is used or m if row
major layout is used.
lda Specifies the leading dimension of the matrixA. It must be positive and at
least mif column major layout is used or at least n if row major layout is
used.
incx Stride between two consecutive elements of the x vectors. Must not be
zero.
y Array holding all the input vectors y. Must be of size at least batch_size *
stridey.
incy Stride between two consecutive elements of the y vectors. Must not be
zero.
Output Parameters
cblas_?gemv_batch
Computes groups of matrix-vector product with
general matrices.
Syntax
void cblas_sgemv_batch (const CBLAS_LAYOUT layout, const CBLAS_TRANSPOSE *trans_array,
const MKL_INT *m_array, const MKL_INT *n_array, const float *alpha_array, const float
**a_array, const MKL_INT *lda_array, const float **x_array, const MKL_INT *incx_array,
const float *beta_array, float **y_array, const MKL_INT *incy_array, const MKL_INT
group_count, const MKL_INT *group_size);
444
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void cblas_dgemv_batch (const CBLAS_LAYOUT layout, const CBLAS_TRANSPOSE *trans_array,
const MKL_INT *m_array, const MKL_INT *n_array, const double *alpha_array, const double
**a_array, const MKL_INT *lda_array, const double **x_array, const MKL_INT *incx_array,
const double *beta_array, double **y_array, const MKL_INT *incy_array, const MKL_INT
group_count, const MKL_INT *group_size);
void cblas_cgemv_batch (const CBLAS_LAYOUT layout, const CBLAS_TRANSPOSE *trans_array,
const MKL_INT *m_array, const MKL_INT *n_array, const void *alpha_array, const void
**a_array, const MKL_INT *lda_array, const void **x_array, const MKL_INT *incx_array,
const void *beta_array, void **y_array, const MKL_INT *incy_array, const MKL_INT
group_count, const MKL_INT *group_size);
void cblas_zgemv_batch (const CBLAS_LAYOUT layout, const CBLAS_TRANSPOSE *trans_array,
const MKL_INT *m_array, const MKL_INT *n_array, const void *alpha_array, const void
**a_array, const MKL_INT *lda_array, const void **x_array, const MKL_INT *incx_array,
const void *beta_array, void **y_array, const MKL_INT *incy_array, const MKL_INT
group_count, const MKL_INT *group_size);
Include Files
• mkl.h
Description
The cblas_?gemv_batch routines perform a series of matrix-vector product added to a scaled vector. They
are similar to the cblas_?gemv routine counterparts, but the cblas_?gemv_batch routines perform matrix-
vector operations with groups of matrices and vectors.
Each group contains matrices and vectors with the same parameters (size, increments). The operation is
defined as:
idx = 0
For i = 0 … group_count – 1
trans, m, n, alpha, lda, incx, beta, incy and group_size at position i in trans_array,
m_array, n_array, alpha_array, lda_array, incx_array, beta_array, incy_array and group_size_array
for j = 0 … group_size – 1
a is a matrix of size mxn at position idx in a_array
x and y are vectors of size m or n depending on trans, at position idx in x_array and
y_array
y := alpha * op(a) * x + beta * y
idx := idx + 1
end for
end for
The number of entries in a_array, x_array, and y_array is total_batch_count = the sum of all of the
group_size entries.
Input Parameters
trans_array Array of size group_count. For the group i, transi = trans_array[i] specifies
the transposition operation applied to A.
if trans = CblasNoTrans, then op(A) = A;
if trans = CblasTrans, then op(A) = A';
if trans = CblasConjTrans, then op(A) = conjg(A').
445
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
m_array Array of size group_count. For the group i, mi = m_array[i] is the number
of rows of the matrix A.
n_array Array of size group_count. For the group i, ni = n_array[i] is the number of
columns in the matrix A.
alpha_array Array of size group_count. For the group i, alphai = alpha_array[i] is the
scalar alpha.
lda_array Array of size group_count. For the group i, ldai = lda_array[i] is the leading
dimension of the matrix A. It must be positive and at least miif column
major layout is used or at least ni if row major layout is used..
incx_array Array of size group_count. For the group i, incxi = incx_array[i] is the stride
of vector x. Must not be zero.
beta_array Array of size group_count. For the group i, betai = beta_array[i] is the
scalar beta.
incy_array Array of size group_count. For the group i, incyi = incy_array[i] is the stride
of vector y. Must not be zero.
Output Parameters
cblas_?dgmm_batch_strided
Computes groups of matrix-vector product using
general matrices.
Syntax
void cblas_sdgmm_batch_strided (const CBLAS_LAYOUT layout, const CBLAS_SIDE left_right,
const MKL_INT m, const MKL_INT n, const float *a, const MKL_INT lda, const MKL_INT
stridea, const float *x, const MKL_INT incx, const MKL_INT stridex, const float *c,
const MKL_INT ldc, const MKL_INT stridec, const MKL_INT batch_size);
446
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void cblas_ddgmm_batch_strided (const CBLAS_LAYOUT layout, const CBLAS_SIDE left_right,
const MKL_INT m, const MKL_INT n, const double *a, const MKL_INT lda, const MKL_INT
stridea, const double *x, const MKL_INT incx, const MKL_INT stridex, const double *c,
const MKL_INT ldc, const MKL_INT stridec, const MKL_INT batch_size);
void cblas_cdgmm_batch_strided (const CBLAS_LAYOUT layout, const CBLAS_SIDE left_right,
const MKL_INT m, const MKL_INT n, const void *a, const MKL_INT lda, const MKL_INT
stridea, const void *x, const MKL_INT incx, const MKL_INT stridex, const void *c, const
MKL_INT ldc, const MKL_INT stridec, const MKL_INT batch_size);
void cblas_zdgmm_batch_strided (const CBLAS_LAYOUT layout, const CBLAS_SIDE left_right,
const MKL_INT m, const MKL_INT n, const void *a, const MKL_INT lda, const MKL_INT
stridea, const void *x, const MKL_INT incx, const MKL_INT stridex, const void *c, const
MKL_INT ldc, const MKL_INT stridec, const MKL_INT batch_size);
Include Files
• mkl.h
Description
The cblas_?dgmm_batch_strided routines perform a series of diagonal matrix-matrix product. The
diagonal matrices are stored as dense vectors and the operations are performed with group of matrices and
vectors.
All matrices a and c and vector x have the same parameters (size, increments) and are stored at constant
stride, respectively, given by stridea, stridec, and stridex from each other. The operation is defined as
for i = 0 … batch_size – 1
A and C are matrices at offset i * stridea in a and i * stridec in c
X is a vector at offset i * stridex in x
C = diag(X) * A or C = A * diag(X)
end for
Input Parameters
left_right Specifies the position of the diagonal matrix in the matrix product
if left_right = CblasLeft, then C = diag(X) * A;
if left_right = CblasRight, then C = A * diag(X).
a Array holding all the input matrix A. Must be of size at least lda*k + stridea
* (batch_size -1) where k is n if column major layout is used or m if row
major layout is used.
lda Specifies the leading dimension of the matrixA. It must be positive and at
least mif column major layout is used or at least n if row major layout is
used.
447
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
x Array holding all the input vector x. Must be of size at least (1 + (len
-1)*abs(incx)) + stridex * (batch_size - 1) where len is n if the diagonal
matrix is on the right of the product or m otherwise.
c Array holding all the input matrix C. Must be of size at least batch_size *
stridec.
ldc Specifies the leading dimension of the matrix C. It must be positive and at
least mif column major layout is used or at least n if row major layout is
used.
stridec Stride between two consecutive A matrices, must be at least ldc * nif
column major layout is used or ldc * m if row major layout is used.
Output Parameters
cblas_?dgmm_batch
Computes groups of matrix-vector product using
general matrices.
Syntax
void cblas_sdgmm_batch (const CBLAS_LAYOUT layout, const CBLAS_SIDE *left_right_array,
const MKL_INT *m_array, const MKL_INT *n_array, const float **a_array, const MKL_INT
*lda_array, const float **x_array, const MKL_INT *incx_array, float **c_array, const
MKL_INT *ldc_array, const MKL_INT group_count, const MKL_INT *group_size);
void cblas_ddgmm_batch (const CBLAS_LAYOUT layout, const CBLAS_SIDE *left_right_array,
const MKL_INT *m_array, const MKL_INT *n_array, const double **a_array, const MKL_INT
*lda_array, const double **x_array, const MKL_INT *incx_array, double **c_array, const
MKL_INT *ldc_array, const MKL_INT group_count, const MKL_INT *group_size);
void cblas_cdgmm_batch (const CBLAS_LAYOUT layout, const CBLAS_SIDE *left_right_array,
const MKL_INT *m_array, const MKL_INT *n_array, const void **a_array, const MKL_INT
*lda_array, const void **x_array, const MKL_INT *incx_array, void **c_array, const
MKL_INT *ldc_array, const MKL_INT group_count, const MKL_INT *group_size);
void cblas_zdgmm_batch (const CBLAS_LAYOUT layout, const CBLAS_SIDE *left_right_array,
const MKL_INT *m_array, const MKL_INT *n_array, const void **a_array, const MKL_INT
*lda_array, const void **x_array, const MKL_INT *incx_array, void **c_array, const
MKL_INT *ldc_array, const MKL_INT group_count, const MKL_INT *group_size);
Include Files
• mkl.h
Description
The cblas_?dgmm_batch routines perform a series of diagonal matrix-matrix product. The diagonal matrices
are stored as dense vectors and the operations are performed with group of matrices and vectors. .
448
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Each group contains matrices and vectors with the same parameters (size, increments). The operation is
defined as:
idx = 0
For i = 0 … group_count – 1
left_right, m, n, lda, incx, ldc and group_size at position i in left_right_array, m_array,
n_array, lda_array, incx_array, ldc_array and group_size_array
for j = 0 … group_size – 1
a and c are matrices of size mxn at position idx in a_array and c_array
x is a vector of size m or n depending on left_right, at position idx in x_array
if (left_right == oneapi::mkl::side::left) c := diag(x) * a
else c := a * diag(x)
idx := idx + 1
end for
end for
The number of entries in a_array, x_array, and c_array is total_batch_count = the sum of all of the
group_size entries.
Input Parameters
m_array Array of size group_count. For the group i, mi = m_array[i] is the number
of rows of the matrix A and C.
n_array Array of size group_count. For the group i, ni = n_array[i] is the number of
columns in the matrix A and C.
lda_array Array of size group_count. For the group i, ldai = lda_array[i] is the leading
dimension of the matrix A. It must be positive and at least miif column
major layout is used or at least ni if row major layout is used..
incx_array Array of size group_count. For the group i, incxi = incx_array[i] is the stride
of vector x.
449
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ldc_array Array of size group_count. For the group i, ldci = ldc_array[i] is the leading
dimension of the matrix C. It must be positive and at least miif column
major layout is used or at least ni if row major layout is used..
Output Parameters
mkl_jit_create_?gemm
Create a GEMM kernel that computes a scalar-matrix-
matrix product and adds the result to a scalar-matrix
product.
Syntax
mkl_jit_status_t mkl_jit_create_sgemm(void** jitter, const MKL_LAYOUT layout, const
MKL_TRANPOSE transa, const MKL_TRANSPOSE transb, const MKL_INT m, const MKL_INT n,
const MKL_INT k, const float alpha, const MKL_INT lda, const MKL_INT ldb, const float
beta, const MKL_INT ldc);
mkl_jit_status_t mkl_jit_create_dgemm(void** jitter, const MKL_LAYOUT layout, const
MKL_TRANPOSE transa, const MKL_TRANSPOSE transb, const MKL_INT m, const MKL_INT n,
const MKL_INT k, const double alpha, const MKL_INT lda, const MKL_INT ldb, const double
beta, const MKL_INT ldc);
mkl_jit_status_t mkl_jit_create_cgemm(void** jitter, const MKL_LAYOUT layout, const
MKL_TRANPOSE transa, const MKL_TRANSPOSE transb, const MKL_INT m, const MKL_INT n,
const MKL_INT k, const void* alpha, const MKL_INT lda, const MKL_INT ldb, const void*
beta, const MKL_INT ldc);
mkl_jit_status_t mkl_jit_create_zgemm(void** jitter, const MKL_LAYOUT layout, const
MKL_TRANPOSE transa, const MKL_TRANSPOSE transb, const MKL_INT m, const MKL_INT n,
const MKL_INT k, const void* alpha, const MKL_INT lda, const MKL_INT ldb, const void*
beta, const MKL_INT ldc);
Include Files
• mkl.h
Description
The mkl_jit_create_?gemm functions belong to a set of related routines that enable use of just-in-time
code generation.
The mkl_jit_create_?gemm functions create a handle to a just-in-time code generator (a jitter) and
generate a GEMM kernel that computes a scalar-matrix-matrix product and adds the result to a scalar-matrix
product, with general matrices. The operation of the generated GEMM kernel is defined as follows:
C := alpha*op(A)*op(B) + beta*C
Where:
450
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• A, B, and C are matrices
• op(A) is an m-by-k matrix
• op(B) is a k-by-n matrix
• C is an m-by-n matrix
NOTE
Generating a new kernel with mkl_jit_create_?gemm involves moderate runtime overhead.
To benefit from JIT code generation, use this feature when you need to call the generated
kernel many times (for example, several hundred calls).
Input Parameters
transa Specifies the form of op(A) used in the generated matrix multiplication:
transb Specifies the form of op(B) used in the generated matrix multiplication:
m Specifies the number of rows of the matrix op(A) and of the matrix C. The
value of m must be at least zero.
n Specifies the number of columns of the matrix op(B) and of the matrix C.
The value of n must be at least zero.
k Specifies the number of columns of the matrix op(A) and the number of
rows of the matrix op(B). The value of k must be at least zero.
NOTE
alpha is passed by pointer for mkl_jit_create_cgemm and
mkl_jit_create_zgemm.
451
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
transb=MKL_NOTRAN transb=MKL_TRANS
S or
transb=MKL_CONJTR
ANS
layout=MKL_ROW_MA ldb must be at least ldb must be at least
JOR max(1,n) max(1,k)
layout=MKL_COL_MA ldb must be at least ldb must be at least
JOR max(1,k) max(1,n)
NOTE
beta is passed by pointer for mkl_jit_create_cgemm and
mkl_jit_create_zgemm.
Output Parameters
Return Values
mkl_jit_get_?gemm_ptr
Return the GEMM kernel associated with a jitter
previously created with mkl_jit_create_?gemm.
Syntax
sgemm_jit_kernel_t mkl_jit_get_sgemm_ptr(const void* jitter);
dgemm_jit_kernel_t mkl_jit_get_dgemm_ptr(const void* jitter);
cgemm_jit_kernel_t mkl_jit_get_cgemm_ptr(const void* jitter);
452
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
zgemm_jit_kernel_t mkl_jit_get_zgemm_ptr(const void* jitter);
Include Files
• mkl.h
Description
The mkl_jit_get_?gemm_ptr functions belong to a set of related routines that enable use of just-in-time
code generation.
The mkl_jit_get_?gemm_ptr functions take as input a jitter previously created with
mkl_jit_create_?gemm, and return the GEMM kernel associated with that jitter. The returned GEMM kernel
computes a scalar-matrix-matrix product and adds the result to a scalar-matrix product, with general
matrices. The operation is defined as follows:
C := alpha*op(A)*op(B) + beta*C
Where:
NOTE
Generating a new kernel with mkl_jit_create_?gemm involves moderate runtime overhead.
To benefit from JIT code generation, use this feature when you need to call the generated
kernel many times (for example, several hundred calls).
Input Parameter
Return Values
typedef void(*dgemm_jit_kernel_t)
(void*,double*,double*,double*);
• cgemm_jit_kernel_t – A function pointer type expecting four
inputs of type void*, MKL_Complex8*, MKL_Complex8*, and
MKL_Complex8*
typedef void(*cgemm_jit_kernel_t)
(void*,MKL_Complex8*,MKL_Complex8*,MKL_Complex8*);
453
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
typedef void(*zgemm_jit_kernel_t)
(void*,MKL_Complex16*,MKL_Complex16*,MKL_Complex16*);
If the jitter input is not NULL, returns a function pointer to a GEMM
kernel. The GEMM kernel is called with four parameters: the jitter and
the three matrices a, b, and c. Otherwise, returns NULL.
If layout, transa, transb, m, n, k, lda, ldb, and ldc are the parameters used during the creation of the
input jitter, then:
a layout = layout =
MKL_COL_MAJOR MKL_ROW_MAJOR
b
layout = layout =
MKL_COL_MAJOR MKL_ROW_MAJOR
454
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
c layout = MKL_COL_MAJOR layout = MKL_ROW_MAJOR
mkl_jit_destroy
Delete the jitter previously created with
mkl_jit_create_?gemm as well as the GEMM kernel
that it contains.
Syntax
mkl_jit_status_t mkl_jit_destroy (void* jitter);
Include Files
• mkl.h
Description
The mkl_jit_destroy function belongs to a set of related routines that enable use of just-in-time code
generation.
The mkl_jit_destroy function takes as input a jitter previously created with mkl_jit_create_?gemm and
deletes the jitter as well as the GEMM kernel that it contains.
NOTE
Generating a new kernel with mkl_jit_create_?gemm involves moderate runtime overhead.
To benefit from JIT code generation, use this feature when you need to call the generated
kernel many times (for example, several hundred calls).
Input Parameter
Return Values
—or—
• MKL_JIT_SUCCESS if the jitter has been successfully destroyed
455
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
LAPACK Routines
Intel® oneAPI Math Kernel Library (oneMKL)implements routines from the LAPACK package that are used for
solving systems of linear equations, linear least squares problems, eigenvalue and singular value problems,
and performing a number of related computational tasks. The library includes LAPACK routines for both real
and complex data. Routines are supported for systems of equations with the following types of matrices:
• General
• Banded
• Symmetric or Hermitian positive-definite (full, packed, and rectangular full packed (RFP) storage)
• Symmetric or Hermitian positive-definite banded
• Symmetric or Hermitian indefinite (both full and packed storage)
• Symmetric or Hermitian indefinite banded
• Triangular (full, packed, and RFP storage)
• Triangular banded
• Tridiagonal
• Diagonally dominant tridiagonal.
NOTE
Different arrays used as parameters to Intel® MKL LAPACK routines must not overlap.
Warning
LAPACK routines assume that input matrices do not contain IEEE 754 special values such as INF or
NaN values. Using these special values may cause LAPACK to return unexpected results or become
unstable.
Function Prototypes
Intel® oneAPI Math Kernel Library (oneMKL) supports four distinct floating-point precisions. Each
corresponding prototype looks similar, usually differing only in the data type. C interface LAPACK function
names follow the form<?><name>[_64], where <?> is:
On 64-bit platforms, Intel® oneAPI Math Kernel Library (oneMKL) provides LAPACK C interfaces with the _64
suffix to support large data arrays in the LP64 interface library. For more interface library details, see "Using
the ILP64 Interface vs. LP64 Interface" in the developer guide.
A specific example follows. To solve a system of linear equations with a packed Cholesky-factored Hermitian
positive-definite matrix with complex precision, use the following:
456
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_cpptrs(int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_float* ap, lapack_complex_float* b, lapack_int ldb);
For matrices whose dimensions are greater than 231-1, you can use either LAPACKE_cpptrs in the ILP64
interface library or LAPACKE_cpptrs_64 in the LP64 interface library.
Workspace Arrays
In contrast to the Fortran interface, the LAPACK C interface omits workspace parameters because workspace
is allocated during runtime and released upon completion of the function operation.
If you prefer to allocate workspace arrays yourself, the LAPACK C interface provides alternate interfaces with
work parameters. The name of the alternate interface is the same as the LAPACK C interface with _work
appended. For example, the syntax for the singular value decomposition of a real bidiagonal matrix is:
See the install_dir/include/mkl_lapacke.h file for the full list of alternative C LAPACK interfaces.
The Intel® oneAPI Math Kernel Library (oneMKL) Fortran-specific documentation contains details about
workspace arrays.
INTEGER lapack_int
LOGICAL lapack_logical
REAL float
COMPLEX lapack_complex_float
CHARACTER char
C Type Definitions
You can find type definitions specific to Intel® oneAPI Math Kernel Library (oneMKL) such asMKL_INT,
MKL_Complex8, and MKL_Complex16 in install_dir/mkl_types.h.
457
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
#ifndef lapack_logical
#define lapack_logical lapack_int
#endif
#ifndef lapack_complex_float
#define lapack_complex_float MKL_Complex8
#endif
Complex type for double precision:
#ifndef lapack_complex_double
#define lapack_complex_double MKL_Complex16
#endif
458
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Column Major Layout
In column major layout the first index, i, of matrix elements ai,j changes faster than the second index when
accessing sequential memory locations. In other words, for 1 ≤i < M, if the element ai,j is stored in a specific
location in memory, the element ai+1,j is stored in the next location, and, for 1 ≤j < N, the element aM,j is
stored in the location previous to element a1,j+1. So the matrix elements are located in memory according to
this sequence:
{a1,1a2,1 ... aM,1a1,2a2,2 ... aM,2 ... ... a1,Na2,N ... aM,N}
B is formed from rows with indices i0 + 1 to i0 + K and columns j0 + 1 to j0 + L of matrix A. To specify matrix
B, LAPACK routines require four parameters:
• the number of rows K;
• the number of columns L;
• a pointer to the start of the array containing elements of B;
• the leading dimension of the array containing elements of B.
The leading dimension depends on the layout of the matrix:
• Column major layout
Leading dimension ldb=M, the number of rows of matrix A.
459
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Full Storage
Consider an m-by-n matrix A :
a1, 1 a1, 2 a1, 3 ⋯ a1, n
a2, 1 a2, 2 a2, 3 ⋯ a2, n
A = a3, 1 a3, 2 a3, 3 ⋯ a3, n
⋮ ⋮ ⋮ ⋱ ⋮
am, 1 am, 2 am, 3 ⋯ am, n
It is stored in a one-dimensional array a of length at least lda*n for column major layout or m*lda for row
major layout. Element ai,j is stored as array element a[k] where the mapping of k(i, j) is defined as
NOTE
Although LAPACK accepts parameter values of zero for matrix size, in general the size of the array
used to store an m-by-n matrix A with leading dimension lda should be greater than or equal to
max(1, n*lda) for column major layout and max (1, m*lda) for row major layout.
NOTE
Even though the array used to store a matrix is one-dimensional, for simplicity the documentation
sometimes refers parts of the array such as rows, columns, upper and lower triangular part, and
diagonals. These refer to the parts of the matrix stored within the array. For example, the lower
triangle of array a is defined as the subset of elements a[k(i,j)] with i≥j.
460
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Packed Storage
The packed storage format compactly stores matrix elements when only one part of the matrix, the upper or
lower triangle, is necessary to determine all of the elements of the matrix. This is the case when the matrix
is upper triangular, lower triangular, symmetric, or Hermitian. For an n-by-n matrix of one of these types, a
linear array ap of length n*(n + 1)/2 is adequate. Two parameters define the storage scheme:
matrix_layout, which specifies column major (with the value LAPACK_COL_MAJOR) or row major (with the
value LAPACK_ROW_MAJOR) matrix layout, and uplo, which specifies that the upper triangle (with the value
'U') or the lower triangle (with the value 'L') is stored.
Element ai,j is stored as array element a[k] where the mapping of k(i, j) is defined as
NOTE
Although LAPACK accepts parameter values of zero for matrix size, in general the size of the array
should be greater than or equal to max(1, nx*(n + 1)/2).
Band Storage
When the non-zero elements of a matrix are confined to diagonal bands, it is possible to store the elements
more efficiently using band storage. For example, consider an m-by-n band matrix A with kl subdiagonals
and ku superdiagonals:
a1, 1 a1, 2 ⋯ a1, k + 1
u
⋮ ⋮ ⋱ ⋱ ⋱
ak + 1, 1 ak + 1, 2 ⋯ ak + 1, k + 1 ⋱ ak + 1, k + k + 1
l l l u l l u
A= ak + 2, 2 ⋱ ⋱ ⋱ ⋱ ⋱
l
⋱ ⋱ ⋱ ⋱ ⋱⋱
ak + j, j ak + 1, j + 1 ⋯ ⋯ ⋯ ak + j, k + k + j
l l l l u
⋱ ⋱ ⋱ ⋱ ⋱ ⋱
This matrix can be stored compactly in a one dimensional array ab. There are two operations involved in
storing the matrix: packing the band matrix into matrix AB, and converting the packed matrix to a one-
dimensional array.
• Packing the Band Matrix: How the band matrix is packed depends on the matrix layout.
• Column major layout: matrix A is packed in an ldab-by-n matrix AB column-wise so that the diagonals
of A become rows of array AB.
461
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
For both column major and row major layout, elements of the upper left triangle of AB are not used.
Depending on the relationship of the dimensions m, n, kl, and ku, the lower right triangle might not be
used.
• Converting the Packed Matrix to a One-Dimensional Array: The packed matrix AB is stored in a linear
array ab as described in Full Storage . The size of ab should be greater than or equal to the total number
of elements of matrix AB: ldab*n for column major layout or ldab*m for row major layout. The leading
dimension of ab, ldab, must be greater than or equal to kl + ku + 1 (and some routines require it to be
even larger).
Element ai,j is stored as array element a[k(i, j)] where the mapping of k(i, j) is defined as
• column major layout: k(i, j) = i + ku - j + (j - 1)*ldab; 1 ≤j≤n, max(1, j - ku) ≤i≤ min(m, j + kl)
• row major layout: k(i,j) = j-i+kl+(i-1)(kl+ku+1), 1 ≤ i ≤ m, max(1, i - kl) ≤ j ≤ min(n, i + ku)
NOTE
Although LAPACK accepts parameter values of zero for matrix size, in general the size of the array
should be greater than or equal to max(1, n*ldab) for column major layout and max (1, m*ldab) for
row major layout.
462
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Rectangular Full Packed Storage
A combination of full and packed storage, rectangular full packed storage can be used to store the upper or
lower triangle of a matrix which is upper triangular, lower triangular, symmetric, or Hermitian. It offers the
storage savings of packed storage plus the efficiency of using full storage Level 3 BLAS and LAPACK routines.
Three parameters define the storage scheme: matrix_layout, which specifies column major (with the value
LAPACK_COL_MAJOR) or row major (with the value LAPACK_ROW_MAJOR) matrix layout; uplo, which specifies
that the upper triangle (with the value 'U') or the lower triangle (with the value 'L') is stored;and transr,
which specifies normal (with the value 'N'), transpose (with the value 'T'), or conjugate transpose (with the
value 'C') operation on the matrix.
Consider an N-by-N matrix A:
a0, 0 a0, 1 a0, 2 ⋯ a0, N − 1
a1, 0 a1, 1 a1, 2 ⋯ a1, N − 1
A= a2, 0 a2, 1 a2, 2 ⋯ a2, N − 1
⋮ ⋮ ⋮ ⋱ ⋮
aN − 1, 0 aN − 1, 1 aN − 1, 2 ⋯ aN − 1, N − 1
The upper or lower triangle of A can be stored in the array ap of length N*(N + 1)/2.
Additionally, define k as the integer part of N/2, such that N=2*k if N is even, and N=2*k + 1 if N is odd.
Storing the matrix involves packing the matrix into a rectangular matrix, and then storing the matrix in a
one-dimensional array. The size of rectangular matrix AP required for the N-by-N matrix A is N + 1 by N/2 for
even N, and N by (N + 1)/2 for odd N.
These examples illustrate the rectangular full packed storage method.
• Upper triangular - uplo = 'U'
The elements of the upper triangle of A can be packed in a matrix with the dimensions (N + 1)-by-
(N/2) = 7 by 3:
a0, 3 a0, 4 a0, 5
a1, 3 a1, 4 a1, 5
a2, 3 a2, 4 a2, 5
The elements of the upper triangle of A can be packed in a matrix with the dimensions (N/2) by (N +
1) = 3 by 7:
463
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The elements of the upper triangle of A can be packed in a matrix with the dimensions (N)-by-((N
+1)/2) = 5 by 3:
a0, 2 a0, 3 a0, 4
a1, 2 a1, 3 a1, 4
The elements of the upper triangle of A can be packed in a matrix with the dimensions ((N+1)/2) by
(N ) = 5 by 3:
a0, 2 a1, 2 a2, 3 a0, 0 a0, 1
The elements of the lower triangle of A can be packed in a matrix with the dimensions (N + 1)-by-
(N/2) = 7 by 3:
464
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
a3, 3 a4, 3 a5, 3
a0, 0 a4, 4 a5, 4
a1, 0 a1, 1 a5, 5
The elements of the lower triangle of A can be packed in a matrix with the dimensions (N/2) by (N +
1) = 3 by 7:
a3, 3 a0, 0 a1, 0 a2, 0 a3, 0 a4, 0 a5, 0
The elements of the lower triangle of A can be packed in a matrix with the dimensions (N)-by-((N
+1)/2) = 5 by 3:
a0, 0 a3, 3 a4, 3
a1, 0 a1, 1 a4, 4
The elements of the lower triangle of A can be packed in a matrix with the dimensions ((N+1)/2) by
(N ) = 5 by 3:
a0, 0 a1, 0 a2, 0 a3, 0 a4, 0
The packed matrix AP can be stored using column major layout or row major layout.
NOTE
The matrix_layout and transr parameters can specify the same storage scheme: for example, the
storage scheme for matrix_layout = LAPACK_COL_MAJOR and transr = 'N' is the same as that for
matrix_layout = LAPACK_ROW_MAJOR and transr = 'T'.
465
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Element ai,j is stored as array element ap[l] where the mapping of l(i, j) is defined in the following tables.
466
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
trans uplo N l(i, j) = i j
r
NOTE
Although LAPACK accepts parameter values of zero for matrix size, in general the size of the array
should be greater than or equal to max(1, N*(N + 1)/2).
467
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
σi Singular values of the matrix A. They are equal to square roots of the
eigenvalues of AHA. (For more information, see Singular Value
Decomposition).
Error Analysis
In practice, most computations are performed with rounding errors. Besides, you often need to solve a
system Ax = b, where the data (the elements of A and b) are not known exactly. Therefore, it is important
to understand how the data errors and rounding errors can affect the solution x.
Data perturbations. If x is the exact solution of Ax = b, and x + δx is the exact solution of a perturbed
problem (A + δA)(x + δx) = (b + δb), then this estimate, given up to linear terms of perturbations,
holds:
468
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
In other words, relative errors in A or b may be amplified in the solution vector x by a factor κ(A) = ||A||
||A-1|| called the condition number of A.
Rounding errors have the same effect as relative perturbations c(n)ε in the original data. Here ε is the
machine precision, defined as the smallest positive number x such that 1 + x > 1; and c(n) is a modest
function of the matrix order n. The corresponding solution error is
||δx||/||x||≤c(n)κ(A)ε. (The value of c(n) is seldom greater than 10n.)
NOTE
Machine precision depends on the data type used. For example, it is usually defined in the float.h
file as FLT_EPSILON the float datatype and DBL_EPSILON for the double datatype.
Thus, if your matrix A is ill-conditioned (that is, its condition number κ(A) is very large), then the error in
the solution x can also be large; you might even encounter a complete loss of precision. LAPACK provides
routines that allow you to estimate κ(A) (see Routines for Estimating the Condition Number) and also give
you a more precise estimate for the actual solution error (see Refining the Solution and Estimating Its Error).
469
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?geequb ?gerfsx
470
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Matrix type, Factorize Equilibrate Solve Condition Estimate Invert matrix
storage scheme matrix matrix system number error
471
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?getrf
Computes the LU factorization of a general m-by-n
matrix.
Syntax
lapack_int LAPACKE_sgetrf (int matrix_layout , lapack_int m , lapack_int n , float *
a , lapack_int lda , lapack_int * ipiv );
lapack_int LAPACKE_dgetrf (int matrix_layout , lapack_int m , lapack_int n , double *
a , lapack_int lda , lapack_int * ipiv );
lapack_int LAPACKE_cgetrf (int matrix_layout , lapack_int m , lapack_int n ,
lapack_complex_float * a , lapack_int lda , lapack_int * ipiv );
lapack_int LAPACKE_zgetrf (int matrix_layout , lapack_int m , lapack_int n ,
lapack_complex_double * a , lapack_int lda , lapack_int * ipiv );
Include Files
• mkl.h
Description
A = P*L*U,
where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m >
n) and U is upper triangular (upper trapezoidal if m < n). The routine uses partial pivoting, with row
interchanges.
NOTE
This routine supports the Progress Routine feature. See Progress Function for details.
Input Parameters
472
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
m The number of rows in the matrix A (m≥ 0).
Output Parameters
ipiv Array, size at least max(1,min(m, n)). Contains the pivot indices; for
1 ≤i≤ min(m, n), row i was interchanged with row ipiv(i).
Return Values
This function returns a value info.
If info = i, uii is 0. The factorization has been completed, but U is exactly singular. Division by 0 will
occur if you use the factor U for solving a system of linear equations.
Application Notes
The computed L and U are the exact factors of a perturbed matrix A + E, where
|E| ≤c(min(m,n))εP|L||U|
c(n) is a modest linear function of n, and ε is the machine precision.
The approximate number of floating-point operations for real flavors is
(2/3)n3 If m = n,
(1/3)n2(3m-n) If m>n,
(1/3)m2(3n-m) If m<n.
See Also
mkl_progress
473
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
mkl_?getrfnp
Computes the LU factorization of a general m-by-n
matrix without pivoting.
Syntax
lapack_int LAPACKE_mkl_sgetrfnp (int matrix_layout , lapack_int m , lapack_int n ,
float * a , lapack_int lda );
lapack_int LAPACKE_mkl_dgetrfnp (int matrix_layout , lapack_int m , lapack_int n ,
double * a , lapack_int lda );
lapack_int LAPACKE_mkl_cgetrfnp (int matrix_layout , lapack_int m , lapack_int n ,
lapack_complex_float * a , lapack_int lda );
lapack_int LAPACKE_mkl_zgetrfnp (int matrix_layout , lapack_int m , lapack_int n ,
lapack_complex_double * a , lapack_int lda );
Include Files
• mkl.h
Description
A = L*U,
where L is lower triangular with unit-diagonal elements (lower trapezoidal if m > n) and U is upper triangular
(upper trapezoidal if m < n). The routine does not use pivoting.
Input Parameters
Output Parameters
Return Values
This function returns a value info.
474
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = i, uii is 0. The factorization has been completed, but U is exactly singular. Division by 0 will
occur if you use the factor U for solving a system of linear equations.
Application Notes
The approximate number of floating-point operations for real flavors is
(2/3)n3 If m = n,
(1/3)n2(3m-n) If m>n,
(1/3)m2(3n-m) If m<n.
See Also
mkl_progress
mkl_?getrfnpi
Performs LU factorization (complete or incomplete) of
a general matrix without pivoting.
Syntax
lapack_int LAPACKE_mkl_sgetrfnpi (int matrix_layout, lapack_int m, lapack_int n,
lapack_int nfact, float* a, lapack_int lda);
lapack_int LAPACKE_mkl_dgetrfnpi (int matrix_layout, lapack_int m, lapack_int n,
lapack_int nfact, double* a, lapack_int lda);
lapack_int LAPACKE_mkl_cgetrfnpi (int matrix_layout, lapack_int m, lapack_int n,
lapack_int nfact, lapack_complex_float* a, lapack_int lda);
lapack_int LAPACKE_mkl_zgetrfnpi (int matrix_layout, lapack_int m, lapack_int n,
lapack_int nfact, lapack_complex_double* a, lapack_int lda);
Include Files
• mkl.h
Description
The routine computes the LU factorization of a general m-by-n matrix A without using pivoting. It supports
incomplete factorization. The factorization has the form:
A = L*U,
where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n) and U is upper triangular
(upper trapezoidal if m < n).
Incomplete factorization has the form:
where L is lower trapezoidal with unit diagonal elements, U is upper trapezoidal, and is the unfactored
part of matrix A. See the application notes section for further details.
475
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
Use ?getrf if it is possible that the matrix is not diagonal dominant.
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
nfact The number of rows and columns to factor; 0 ≤nfact≤ min(m, n). Note that
if nfact < min(m, n), incomplete factorization is performed.
a Array of size at least lda*n for column major layout and at least lda*m for
row major layout. Contains the matrix A.
lda The leading dimension of array a. lda≥ max(1, m) for column major layout
and lda≥ max(1, n) for row major layout.
Output Parameters
Return Values
This function returns a value info.
If info = i, uii is 0. The requested factorization has been completed, but U is exactly singular. Division by 0
will occur if factorization is completed and factor U is used for solving a system of linear equations.
Application Notes
The computed L and U are the exact factors of a perturbed matrix A + E, with
(2/3)n3 If m = n = nfact
(1/3)m2(3n-m) If m = nfact<n
476
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
(2/3)n3 - (n-nfact)3 If m = n,nfact< min(m, n)
where
The result is
L1 is a lower triangular square matrix of order nfact with unit diagonal and U1 is an upper triangular square
matrix of order nfact. L1 and U1 result from LU factorization of matrix A11: A11 = L1U1.
477
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
On exit, elements of the upper triangle U1 are stored in place of the upper triangle of block A11 in array a;
elements of the lower triangle L1 are stored in the lower triangle of block A11 in array a (unit diagonal
elements are not stored). Elements of L2 replace elements of A21; U2 replaces elements of A12 and
replaces elements of A22.
?getrf2
Computes LU factorization using partial pivoting with
row interchanges.
Syntax
lapack_int LAPACKE_sgetrf2 (int matrix_layout, lapack_int m, lapack_int n, float * a,
lapack_int lda, lapack_int * ipiv);
lapack_int LAPACKE_dgetrf2 (int matrix_layout, lapack_int m, lapack_int n, double * a,
lapack_int lda, lapack_int * ipiv);
lapack_int LAPACKE_cgetrf2 (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float * a, lapack_int lda, lapack_int * ipiv);
lapack_int LAPACKE_zgetrf2 (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double * a, lapack_int lda, lapack_int * ipiv);
Include Files
• mkl.h
Description
?getrf2 computes an LU factorization of a general m-by-n matrix A using partial pivoting with row
interchanges.
The factorization has the form
A=P*L*U
where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m >
n), and U is upper triangular (upper trapezoidal if m < n).
This is the recursive version of the algorithm. It divides the matrix into four submatrices:
A11 A12
A=
A21 A22
478
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
where A11 is n1 by n1 and A22 is n2 by n2 with n1 = min(m, n), and n2 = n - n1.
A11
The subroutine calls itself to factor ,
A12
A12
do the swaps on , solve A12, update A22, then it calls itself to factor A22 and do the swaps on A21.
A22
Input Parameters
Output Parameters
The pivot indices; for 1 <= i <= min(m,n), row i of the matrix was
interchanged with row ipiv[i - 1].
Return Values
This function returns a value info.
= 0: successful exit
< 0: if info = -i, the i-th argument had an illegal value.
> 0: if info = i, Ui, i is exactly zero. The factorization has been completed, but the factor U is exactly singular,
and division by zero will occur if it is used to solve a system of equations.
?gbtrf
Computes the LU factorization of a general m-by-n
band matrix.
Syntax
lapack_int LAPACKE_sgbtrf (int matrix_layout , lapack_int m , lapack_int n , lapack_int
kl , lapack_int ku , float * ab , lapack_int ldab , lapack_int * ipiv );
lapack_int LAPACKE_dgbtrf (int matrix_layout , lapack_int m , lapack_int n , lapack_int
kl , lapack_int ku , double * ab , lapack_int ldab , lapack_int * ipiv );
lapack_int LAPACKE_cgbtrf (int matrix_layout , lapack_int m , lapack_int n , lapack_int
kl , lapack_int ku , lapack_complex_float * ab , lapack_int ldab , lapack_int * ipiv );
lapack_int LAPACKE_zgbtrf (int matrix_layout , lapack_int m , lapack_int n , lapack_int
kl , lapack_int ku , lapack_complex_double * ab , lapack_int ldab , lapack_int *
ipiv );
479
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine forms the LU factorization of a general m-by-n band matrix A with kl non-zero subdiagonals and
ku non-zero superdiagonals, that is,
A = P*L*U,
where P is a permutation matrix; L is lower triangular with unit diagonal elements and at most kl non-zero
elements in each column; U is an upper triangular band matrix with kl + ku superdiagonals. The routine uses
partial pivoting, with row interchanges (which creates the additional kl superdiagonals in U).
NOTE
This routine supports the Progress Routine feature. See Progress Function for details.
Input Parameters
Output Parameters
ipiv Array, size at least max(1,min(m, n)). The pivot indices; for 1 ≤i≤
min(m, n) , row i was interchanged with row ipiv(i).
Return Values
This function returns a value info.
480
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = 0, the execution is successful.
If info = i, uiiis 0. The factorization has been completed, but U is exactly singular. Division by 0 will occur
if you use the factor U for solving a system of linear equations.
Application Notes
The computed L and U are the exact factors of a perturbed matrix A + E, where
As described in Band Storage, storage of a band matrix can be considered in two steps: packing band matrix
elements into a matrix AB, then storing the elements in a linear array ab using a full storage scheme. The
effect of the ?gbtrf routine on matrix AB is illustrated by this example, for m = n = 6, kl = 2, ku = 1.
• matrix_layout = LAPACK_COL_MAJOR
On entry: On exit:
• matrix_layout = LAPACK_ROW_MAJOR
On entry: On exit:
Elements marked * are not used; elements marked + need not be set on entry, but are required by the
routine to store elements of U because of fill-in resulting from the row interchanges.
After calling this routine with m = n, you can call the following routines:
481
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
See Also
mkl_progress
?gttrf
Computes the LU factorization of a tridiagonal matrix.
Syntax
lapack_int LAPACKE_sgttrf (lapack_int n , float * dl , float * d , float * du , float *
du2 , lapack_int * ipiv );
lapack_int LAPACKE_dgttrf (lapack_int n , double * dl , double * d , double * du ,
double * du2 , lapack_int * ipiv );
lapack_int LAPACKE_cgttrf (lapack_int n , lapack_complex_float * dl ,
lapack_complex_float * d , lapack_complex_float * du , lapack_complex_float * du2 ,
lapack_int * ipiv );
lapack_int LAPACKE_zgttrf (lapack_int n , lapack_complex_double * dl ,
lapack_complex_double * d , lapack_complex_double * du , lapack_complex_double * du2 ,
lapack_int * ipiv );
Include Files
• mkl.h
Description
The routine computes the LU factorization of a real or complex tridiagonal matrix A using elimination with
partial pivoting and row interchanges.
The factorization has the form
A = L*U,
where L is a product of permutation and unit lower bidiagonal matrices and U is upper triangular with
nonzeroes in only the main diagonal and first two superdiagonals.
Input Parameters
Output Parameters
dl Overwritten by the (n-1) multipliers that define the matrix L from the
LU factorization of A.
482
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
d Overwritten by the n diagonal elements of the upper triangular matrix
U from the LU factorization of A.
ipiv Array, dimension (n). The pivot indices: for 1 ≤ i ≤ n, row i was
interchanged with row ipiv[i-1]. ipiv[i-1] is always i or i+1;
ipiv[i-1] = i indicates a row interchange was not required.
Return Values
This function returns a value info.
If info = i, uiiis 0. The factorization has been completed, but U is exactly singular. Division by zero will
occur if you use the factor U for solving a system of linear equations.
Application Notes
?dttrfb
Computes the factorization of a diagonally dominant
tridiagonal matrix.
Syntax
void sdttrfb (const MKL_INT * n , float * dl , float * d , const float * du , MKL_INT *
info );
void ddttrfb (const MKL_INT * n , double * dl , double * d , const double * du ,
MKL_INT * info );
void cdttrfb (const MKL_INT * n , MKL_Complex8 * dl , MKL_Complex8 * d , const
MKL_Complex8 * du , MKL_INT * info );
void zdttrfb_ (const MKL_INT * n , MKL_Complex16 * dl , MKL_Complex16 * d , const
MKL_Complex16 * du , MKL_INT * info );
Include Files
• mkl.h
Description
The ?dttrfb routine computes the factorization of a real or complex tridiagonal matrix A with the BABE
(Burning At Both Ends) algorithm without pivoting. The factorization has the form
A = L1*U*L2
where
483
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
• L1 and L2 are unit lower bidiagonal with k and n - k - 1 subdiagonal elements, respectively, where k =
n/2, and
• U is an upper bidiagonal matrix with nonzeroes in only the main diagonal and first superdiagonal.
Input Parameters
Output Parameters
Application Notes
A diagonally dominant tridiagonal system is defined such that |di| > |dli-1| + |dui| for any i:
The underlying BABE algorithm is designed for diagonally dominant systems. Such systems are free from the
numerical stability issue unlike the canonical systems that use elimination with partial pivoting (see ?gttrf).
The diagonally dominant systems are much faster than the canonical systems.
NOTE
• The current implementation of BABE has a potential accuracy issue on very small or large data
close to the underflow or overflow threshold respectively. Scale the matrix before applying the
solver in the case of such input data.
• Applying the ?dttrfb factorization to non-diagonally dominant systems may lead to an accuracy
loss, or false singularity detected due to no pivoting.
?potrf
Computes the Cholesky factorization of a symmetric
(Hermitian) positive-definite matrix.
484
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_spotrf (int matrix_layout , char uplo , lapack_int n , float * a ,
lapack_int lda );
lapack_int LAPACKE_dpotrf (int matrix_layout , char uplo , lapack_int n , double * a ,
lapack_int lda );
lapack_int LAPACKE_cpotrf (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * a , lapack_int lda );
lapack_int LAPACKE_zpotrf (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * a , lapack_int lda );
Include Files
• mkl.h
Description
The routine forms the Cholesky factorization of a symmetric positive-definite or, for complex data, Hermitian
positive-definite matrix A:
NOTE
This routine supports the Progress Routine feature. See Progress Function for details.
Input Parameters
Indicates whether the upper or lower triangular part of A is stored and how
A is factored:
If uplo = 'U', the array a stores the upper triangular part of the matrix A,
and the strictly lower triangular part of the matrix is not referenced.
If uplo = 'L', the array a stores the lower triangular part of the matrix A,
and the strictly upper triangular part of the matrix is not referenced.
n Specifies the order of the matrix A. The value of n must be at least zero.
a Array, size max(1, lda*n). The array a contains either the upper or the
lower triangular part of the matrix A (see uplo).
485
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
Return Values
This function returns a value info.
If info = i, the leading minor of order i (and therefore the matrix A itself) is not positive-definite, and the
factorization could not be completed. This may indicate an error in forming the matrix A.
Application Notes
If uplo = 'U', the computed factor U is the exact factor of a perturbed matrix A + E, where
The total number of floating-point operations is approximately (1/3)n3 for real flavors or (4/3)n3 for
complex flavors.
After calling this routine, you can call the following routines:
See Also
mkl_progress
?potrf2
Computes Cholesky factorization using a recursive
algorithm.
Syntax
lapack_int LAPACKE_spotrf2 (int matrix_layout, char uplo, lapack_int n, float * a,
lapack_int lda);
lapack_int LAPACKE_dpotrf2 (int matrix_layout, char uplo, lapack_int n, double * a,
lapack_int lda);
lapack_int LAPACKE_cpotrf2 (int matrix_layout, char uplo, lapack_int n,
lapack_complex_float * a, lapack_int lda);
lapack_int LAPACKE_zpotrf2 (int matrix_layout, char uplo, lapack_int n,
lapack_complex_double * a, lapack_int lda);
486
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h
Description
?potrf2 computes the Cholesky factorization of a real or complex symmetric positive definite matrix A using
the recursive algorithm.
The factorization has the form
for real flavors:
A = UT * U, if uplo = 'U', or
The subroutine calls itself to factor A11. Update and scale A21 or A12, update A22 then call itself to factor
A22.
Input Parameters
If uplo = 'L', the leading n-by-n lower triangular part of a contains the
lower triangular part of the matrix A, and the strictly upper triangular part
of a is not referenced.
lda≥ max(1,n).
487
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
Return Values
This function returns a value info.
= 0: successful exit
< 0: if info = -i, the i-th argument had an illegal value
> 0: if info = i, the leading minor of order i is not positive definite, and the factorization could not be
completed.
?pstrf
Computes the Cholesky factorization with complete
pivoting of a real symmetric (complex Hermitian)
positive semidefinite matrix.
Syntax
lapack_int LAPACKE_spstrf( int matrix_layout, char uplo, lapack_int n, float* a,
lapack_int lda, lapack_int* piv, lapack_int* rank, float tol );
lapack_int LAPACKE_dpstrf( int matrix_layout, char uplo, lapack_int n, double* a,
lapack_int lda, lapack_int* piv, lapack_int* rank, double tol );
lapack_int LAPACKE_cpstrf( int matrix_layout, char uplo, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_int* piv, lapack_int* rank, float tol );
lapack_int LAPACKE_zpstrf( int matrix_layout, char uplo, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_int* piv, lapack_int* rank, double
tol );
Include Files
• mkl.h
Description
The routine computes the Cholesky factorization with complete pivoting of a real symmetric (complex
Hermitian) positive semidefinite matrix. The form of the factorization is:
where P is a permutation matrix stored as vector piv, and U and L are upper and lower triangular matrices,
respectively.
This algorithm does not attempt to check that A is positive semidefinite. This version of the algorithm calls
level 3 BLAS.
488
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
tol User defined tolerance. If tol < 0, then n*ε*max(Ak,k), where ε is the
machine precision, will be used (see Error Analysis for the definition of
machine precision). The algorithm terminates at the (k-1)-st step, if
the pivot ≤tol.
Output Parameters
piv Array, size at least max(1, n). The array piv is such that the nonzero
entries are Ppiv[k-1],k (1 ≤k≤n).
rank The rank of a given by the number of steps the algorithm completed.
Return Values
This function returns a value info.
If info > 0, the matrix A is either rank deficient with a computed rank as returned in rank, or is not
positive semidefinite.
See Also
Matrix Storage Schemes
?pftrf
Computes the Cholesky factorization of a symmetric
(Hermitian) positive-definite matrix using the
Rectangular Full Packed (RFP) format .
489
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
lapack_int LAPACKE_spftrf (int matrix_layout , char transr , char uplo , lapack_int n ,
float * a );
lapack_int LAPACKE_dpftrf (int matrix_layout , char transr , char uplo , lapack_int n ,
double * a );
lapack_int LAPACKE_cpftrf (int matrix_layout , char transr , char uplo , lapack_int n ,
lapack_complex_float * a );
lapack_int LAPACKE_zpftrf (int matrix_layout , char transr , char uplo , lapack_int n ,
lapack_complex_double * a );
Include Files
• mkl.h
Description
The routine forms the Cholesky factorization of a symmetric positive-definite or, for complex data, a
Hermitian positive-definite matrix A:
Input Parameters
transr Must be 'N', 'T' (for real data) or 'C' (for complex data).
a Array, size (n*(n+1)/2). The array a contains the matrix A in the RFP
format.
490
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
Return Values
This function returns a value info.
If info = i, the leading minor of order i (and therefore the matrix A itself) is not positive-definite, and the
factorization could not be completed. This may indicate an error in forming the matrix A.
See Also
Matrix Storage Schemes
?pptrf
Computes the Cholesky factorization of a symmetric
(Hermitian) positive-definite matrix using packed
storage.
Syntax
lapack_int LAPACKE_spptrf (int matrix_layout , char uplo , lapack_int n , float * ap );
lapack_int LAPACKE_dpptrf (int matrix_layout , char uplo , lapack_int n , double *
ap );
lapack_int LAPACKE_cpptrf (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * ap );
lapack_int LAPACKE_zpptrf (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * ap );
Include Files
• mkl.h
Description
The routine forms the Cholesky factorization of a symmetric positive-definite or, for complex data, Hermitian
positive-definite packed matrix A:
NOTE
This routine supports the Progress Routine feature. See Progress Function for details.
491
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
ap Array, size at least max(1, n(n+1)/2). The array ap contains either the
upper or the lower triangular part of the matrix A (as specified by
uplo) in packed storage (see Matrix Storage Schemes).
Output Parameters
Return Values
This function returns a value info.
If info = i, the leading minor of order i (and therefore the matrix A itself) is not positive-definite, and the
factorization could not be completed. This may indicate an error in forming the matrix A.
Application Notes
If uplo = 'U', the computed factor U is the exact factor of a perturbed matrix A + E, where
The total number of floating-point operations is approximately (1/3)n3 for real flavors and (4/3)n3 for
complex flavors.
After calling this routine, you can call the following routines:
See Also
mkl_progress
492
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
?pbtrf
Computes the Cholesky factorization of a symmetric
(Hermitian) positive-definite band matrix.
Syntax
lapack_int LAPACKE_spbtrf (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , float * ab , lapack_int ldab );
lapack_int LAPACKE_dpbtrf (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , double * ab , lapack_int ldab );
lapack_int LAPACKE_cpbtrf (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , lapack_complex_float * ab , lapack_int ldab );
lapack_int LAPACKE_zpbtrf (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , lapack_complex_double * ab , lapack_int ldab );
Include Files
• mkl.h
Description
The routine forms the Cholesky factorization of a symmetric positive-definite or, for complex data, Hermitian
positive-definite band matrix A:
NOTE
This routine supports the Progress Routine feature. See Progress Function for details.
Input Parameters
493
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ab Array, size max(1, ldab*n). The array ab contains either the upper or
the lower triangular part of the matrix A (as specified by uplo) in band
storage (see Matrix Storage Schemes).
Output Parameters
Return Values
This function returns a value info.
If info = i, the leading minor of order i (and therefore the matrix A itself) is not positive-definite, and the
factorization could not be completed. This may indicate an error in forming the matrix A.
Application Notes
If uplo = 'U', the computed factor U is the exact factor of a perturbed matrix A + E, where
The total number of floating-point operations for real flavors is approximately n(kd+1)2. The number of
operations for complex flavors is 4 times greater. All these estimates assume that kd is much less than n.
After calling this routine, you can call the following routines:
See Also
mkl_progress
?pttrf
Computes the factorization of a symmetric (Hermitian)
positive-definite tridiagonal matrix.
Syntax
lapack_int LAPACKE_spttrf( lapack_int n, float* d, float* e );
lapack_int LAPACKE_dpttrf( lapack_int n, double* d, double* e );
lapack_int LAPACKE_cpttrf( lapack_int n, float* d, lapack_complex_float* e );
lapack_int LAPACKE_zpttrf( lapack_int n, double* d, lapack_complex_double* e );
494
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h
Description
The routine forms the factorization of a symmetric positive-definite or, for complex data, Hermitian positive-
definite tridiagonal matrix A:
A = L*D*LT for real flavors, or
A = L*D*LH for complex flavors,
where D is diagonal and L is unit lower bidiagonal. The factorization may also be regarded as having the form
A = UT*D*U for real flavors, or A = UH*D*U for complex flavors, where U is unit upper bidiagonal.
Input Parameters
Output Parameters
Return Values
This function returns a value info.
If info = i, the leading minor of order i (and therefore the matrix A itself) is not positive-definite; if i < n,
the factorization could not be completed, while if i = n, the factorization was completed, but d[n - 1] ≤
0.
?sytrf
Computes the Bunch-Kaufman factorization of a
symmetric matrix.
Syntax
lapack_int LAPACKE_ssytrf (int matrix_layout , char uplo , lapack_int n , float * a ,
lapack_int lda , lapack_int * ipiv );
lapack_int LAPACKE_dsytrf (int matrix_layout , char uplo , lapack_int n , double * a ,
lapack_int lda , lapack_int * ipiv );
lapack_int LAPACKE_csytrf (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * a , lapack_int lda , lapack_int * ipiv );
495
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine computes the factorization of a real/complex symmetric matrix A using the Bunch-Kaufman
diagonal pivoting method. The form of the factorization is:
if uplo='U', A = U*D*UT
if uplo='L', A = L*D*LT
where A is the input matrix, U and L are products of permutation and triangular matrices with unit diagonal
(upper triangular for U and lower triangular for L), and D is a symmetric block-diagonal matrix with 1-by-1
and 2-by-2 diagonal blocks. U and L have 2-by-2 unit diagonal blocks corresponding to the 2-by-2 blocks of
D.
NOTE This routine supports the Progress Routine feature. See Progress Routine for details.
Input Parameters
If uplo = 'L', the array a stores the lower triangular part of the
matrix A, and A is factored as L*D*LT.
a Array, size max(1, lda*n). The array a contains either the upper or
the lower triangular part of the matrix A (see uplo).
Output Parameters
ipiv Array, size at least max(1, n). Contains details of the interchanges
and the block structure of D. If ipiv[i-1] = k >0, then dii is a 1-
by-1 block, and the i-th row and column of A was interchanged with
the k-th row and column.
496
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If uplo = 'U' and ipiv[i] =ipiv[i-1] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and i-th row and column of A was
interchanged with the m-th row and column.
If uplo = 'L' and ipiv[i] =ipiv[i-1] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and (i+1)-th row and column of A
was interchanged with the m-th row and column.
Return Values
This function returns a value info.
If info = i, Dii is 0. The factorization has been completed, but D is exactly singular. Division by 0 will occur
if you use D for solving a system of linear equations.
Application Notes
The 2-by-2 unit diagonal blocks and the unit diagonal elements of U and L are not stored. The remaining
elements of U and L are stored in the corresponding columns of the array a, but additional row interchanges
are required to recover U or L explicitly (which is seldom necessary).
If ipiv[i-1] = i for all i =1...n, then all off-diagonal elements of U (L) are stored explicitly in the
corresponding elements of the array a.
If uplo = 'U', the computed factors U and D are the exact factors of a perturbed matrix A + E, where
|E| ≤c(n)εP|U||D||UT|PT
c(n) is a modest linear function of n, and ε is the machine precision. A similar estimate holds for the
computed L and D when uplo = 'L'.
The total number of floating-point operations is approximately (1/3)n3 for real flavors or (4/3)n3 for
complex flavors.
After calling this routine, you can call the following routines:
497
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If s = 2, the upper triangle of D(k) overwrites A(k-1,k-1), A(k-1,k) and A(k,k), and v overwrites A(1:k-2,k
-1:k).
If s = 2, the lower triangle of D(k) overwrites A(k,k), A(k+1,k), and A(k+1,k+1), and v overwrites A(k
+2:n,k:k+1).
See Also
mkl_progress
?sytrf_aa
Computes the factorization of a symmetric matrix
using Aasen's algorithm.
lapack_int LAPACKE_ssytrf_aa (int matrix_layout, char uplo, lapack_int n, float * A,
lapack_int lda, lapack_int * ipiv);
498
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_dsytrf_aa (int matrix_layout, char uplo, lapack_int n, double * A,
lapack_int lda, lapack_int * ipiv);
lapack_int LAPACKE_csytrf_aa (int matrix_layout, char uplo, lapack_int n,
lapack_complex_float * A, lapack_int lda, lapack_int * ipiv);
lapack_int LAPACKE_zsytrf_aa (int matrix_layout, char uplo, lapack_int n,
lapack_complex_double * A, lapack_int lda, lapack_int * ipiv);
Description
?sytrf_aa computes the factorization of a symmetric matrix A using Aasen's algorithm. The form of the
factorization is A = U*T*UT or A = L*T*LT where U (or L) is a product of permutation and unit upper (lower)
triangular matrices, and T is a complex symmetric tridiagonal matrix.
This is the blocked version of the algorithm, calling Level 3 BLAS.
Input Parameters
A Array of size max(1, lda*n). The array A contains either the upper or the
lower triangular part of the matrix A (see uplo).
Output Parameters
ipiv Array of size n. On exit, it contains the details of the interchanges; that is,
the row and column k of A were interchanged with the row and column
ipiv(k).
Return Values
This function returns a value info.
= 0: Successful exit.
< 0: If info = -i, the ith argument had an illegal value.
> 0: If info = i, D(i,i) is exactly zero. The factorization has been completed, but the block diagonal matrix D
is exactly singular, and division by zero will occur if it is used to solve a system of equations.
?sytrf_rook
Computes the bounded Bunch-Kaufman factorization
of a symmetric matrix.
Syntax
lapack_int LAPACKE_ssytrf_rook (int matrix_layout, char uplo, lapack_int n, float * a,
lapack_int lda, lapack_int * ipiv);
499
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine computes the factorization of a real/complex symmetric matrix A using the bounded Bunch-
Kaufman ("rook") diagonal pivoting method. The form of the factorization is:
if uplo='U', A = U*D*UT
if uplo='L', A = L*D*LT,
where A is the input matrix, U and L are products of permutation and triangular matrices with unit diagonal
(upper triangular for U and lower triangular for L), and D is a symmetric block-diagonal matrix with 1-by-1
and 2-by-2 diagonal blocks. U and L have 2-by-2 unit diagonal blocks corresponding to the 2-by-2 blocks of
D.
Input Parameters
matrix_layout Specifies whether matrix storage layout for array b is row major
(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).
If uplo = 'L', the array a stores the lower triangular part of the
matrix A, and A is factored as L*D*LT.
a Array, size lda*n. The array a contains either the upper or the lower
triangular part of the matrix A (see uplo).
Output Parameters
ipiv If ipiv(k) > 0, then rows and columns k and ipiv(k) were
interchanged and Dk, k is a 1-by-1 diagonal block.
500
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If uplo = 'U' and ipiv(k) < 0 and ipiv(k - 1) < 0, then rows
and columns k and -ipiv(k) were interchanged, rows and columns k -
1 and -ipiv(k - 1) were interchanged, and Dk-1:k, k-1:k is a 2-by-2
diagonal block.
If uplo = 'L' and ipiv(k) < 0 and ipiv(k + 1) < 0, then rows
and columns k and -ipiv(k) were interchanged, rows and columns k +
1 and -ipiv(k + 1) were interchanged, and Dk:k+1, k:k+1 is a 2-by-2
diagonal block.
Return Values
This function returns a value info.
If info = i, Dii is 0. The factorization has been completed, but D is exactly singular. Division by 0 will occur
if you use D for solving a system of linear equations.
Application Notes
The total number of floating-point operations is approximately (1/3)n3 for real flavors or (4/3)n3 for
complex flavors.
After calling this routine, you can call the following routines:
501
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If s = 2, the upper triangle of D(k) overwrites A(k-1,k-1), A(k-1,k) and A(k,k), and v overwrites A(1:k-2,k
-1:k).
If s = 2, the lower triangle of D(k) overwrites A(k,k), A(k+1,k), and A(k+1,k+1), and v overwrites A(k
+2:n,k:k+1).
See Also
Matrix Storage Schemes
?sytrf_rk
Computes the factorization of a real or complex
symmetric indefinite matrix using the bounded Bunch-
Kaufman (rook) diagonal pivoting method (BLAS3
blocked algorithm).
lapack_int LAPACKE_ssytrf_rk (int matrix_layout, char uplo, lapack_int n, float * A,
lapack_int lda, float * e, lapack_int * ipiv);
lapack_int LAPACKE_dsytrf_rk (int matrix_layout, char uplo, lapack_int n, double * A,
lapack_int lda, double * e, lapack_int * ipiv);
lapack_int LAPACKE_csytrf_rk (int matrix_layout, char uplo, lapack_int n,
lapack_complex_float * A, lapack_int lda, lapack_complex_float * e, lapack_int * ipiv);
lapack_int LAPACKE_zsytrf_rk (int matrix_layout, char uplo, lapack_int n,
lapack_complex_double * A, lapack_int lda, lapack_complex_double * e, lapack_int *
ipiv);
Description
?sytrf_rk computes the factorization of a real or complex symmetric matrix A using the bounded Bunch-
Kaufman (rook) diagonal pivoting method: A= P*U*D*(UT)*(PT) or A = P*L*D*(LT)*(PT), where U (or L) is
unit upper (or lower) triangular matrix, UT (or LT) is the transpose of U (or L), P is a permutation matrix, PT
is the transpose of P, and D is symmetric and block diagonal with 1-by-1 and 2-by-2 diagonal blocks.
This is the blocked version of the algorithm, calling Level-3 BLAS.
502
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
uplo Specifies whether the upper or lower triangular part of the symmetric
matrix A is stored:
Output Parameters
A On exit, contains:
ipiv Array of size n.ipiv describes the permutation matrix P in the factorization
of matrix A as follows: The absolute value of ipiv(k) represents the index of
the row and column that were interchanged with the kth row and column.
The value of uplo describes the order in which the interchanges were
applied. Also, the sign of ipiv represents the block structure of the
symmetric block diagonal matrix D with 1-by-1 or 2-by-2 diagonal blocks,
which correspond to 1 or 2 interchanges at each factorization step. If uplo
= 'U' (in factorization order, k decreases from n to 1):
503
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Return Values
This function returns a value info.
= 0: Successful exit.
< 0: If info = -k, the kth argument had an illegal value.
> 0: If info = k, the matrix A is singular. If uplo = 'U', column k in the upper triangular part of A contains
all zeros. If uplo = 'L', column k in the lower triangular part of A contains all zeros. Therefore, D(k,k) is
exactly zero, and superdiagonal elements of column k of U (or subdiagonal elements of column k of L) are all
zeros. The factorization has been completed, but the block diagonal matrix D is exactly singular, and division
by zero will occur if it is used to solve a system of equations.
?hetrf
Computes the Bunch-Kaufman factorization of a
complex Hermitian matrix.
504
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_chetrf (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * a , lapack_int lda , lapack_int * ipiv );
lapack_int LAPACKE_zhetrf (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * a , lapack_int lda , lapack_int * ipiv );
Include Files
• mkl.h
Description
The routine computes the factorization of a complex Hermitian matrix A using the Bunch-Kaufman diagonal
pivoting method:
if uplo='U', A = U*D*UH
if uplo='L', A = L*D*LH,
where A is the input matrix, U and L are products of permutation and triangular matrices with unit diagonal
(upper triangular for U and lower triangular for L), and D is a Hermitian block-diagonal matrix with 1-by-1
and 2-by-2 diagonal blocks. U and L have 2-by-2 unit diagonal blocks corresponding to the 2-by-2 blocks of
D.
NOTE
This routine supports the Progress Routine feature. See Progress Routine for details.
Input Parameters
If uplo = 'L', the array a stores the lower triangular part of the
matrix A, and A is factored as L*D*LH.
The array a contains the upper or the lower triangular part of the
matrix A (see uplo).
505
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
ipiv Array, size at least max(1, n). Contains details of the interchanges
and the block structure of D. If ipiv[i-1] = k >0, then dii is a 1-
by-1 block, and the i-th row and column of A was interchanged with
the k-th row and column.
If uplo = 'U' and ipiv[i] =ipiv[i-1] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and i-th row and column of A was
interchanged with the m-th row and column.
If uplo = 'L' and ipiv[i] =ipiv[i-1] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and (i+1)-th row and column of A
was interchanged with the m-th row and column.
Return Values
This function returns a value info.
If info = i, dii is 0. The factorization has been completed, but D is exactly singular. Division by 0 will occur
if you use D for solving a system of linear equations.
Application Notes
This routine is suitable for Hermitian matrices that are not known to be positive-definite. If A is in fact
positive-definite, the routine does not perform interchanges, and no 2-by-2 diagonal blocks occur in D.
The 2-by-2 unit diagonal blocks and the unit diagonal elements of U and L are not stored. The remaining
elements of U and L are stored in the corresponding columns of the array a, but additional row interchanges
are required to recover U or L explicitly (which is seldom necessary).
Ifipiv[i-1] = i for all i =1...n, then all off-diagonal elements of U (L) are stored explicitly in the
corresponding elements of the array a.
If uplo = 'U', the computed factors U and D are the exact factors of a perturbed matrix A + E, where
|E| ≤c(n)εP|U||D||UT|PT
c(n) is a modest linear function of n, and ε is the machine precision.
A similar estimate holds for the computed L and D when uplo = 'L'.
After calling this routine, you can call the following routines:
See Also
mkl_progress
506
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Matrix Storage Schemes
?hetrf_aa
Computes the factorization of a complex hermitian
matrix using Aasen's algorithm.
LAPACK_DECL lapack_int LAPACKE_chetrf_aa (int matrix_layout, char uplo, lapack_int n,
lapack_complex_float * a, lapack_int lda, lapack_int * ipiv );
LAPACK_DECL lapack_int LAPACKE_zhetrf_aa (int matrix_layout, char uplo, lapack_int n,
lapack_complex_double * a, lapack_int lda, lapack_int * ipiv );
Description
?hetrf_aa computes the factorization of a complex Hermitian matrix A using Aasen's algorithm. The form of
the factorization is A = U * T * UH or a = L*T*LH where U (or L) is a product of permutation and unit upper
(lower) triangular matrices, and T is a Hermitian tridiagonal matrix. This is the blocked version of the
algorithm, calling Level 3 BLAS.
Input Parameters
If uplo = 'U', the leading n-by-n upper triangular part of a contains the
upper triangular part of the matrix A, and the strictly lower triangular part
of a is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of a contains the
lower triangular part of the matrix A, and the strictly upper triangular part
of a is not referenced.
lwork See Syntax - Workspace. The length of work. lwork≥ 2*n. For optimum
performance lwork≥n*(1 + nb), where nb is the optimal block size. If
lwork = -1, then a workspace query is assumed; the routine only
calculates the optimal size of the work array, returns this value as the first
entry of the work array, and no error message related to lwork is issued by
xerbla.
Output Parameters
ipiv array, dimension (n) On exit, it contains the details of the interchanges: the
row and column k of a were interchanged with the row and column
ipiv[k].
work See Syntax - Workspace. Array of size (max(1, lwork)). On exit, if info =
0, work[0] returns the optimal lwork.
507
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Return Values
This function returns a value info.
If info = 0: successful exit < 0: if info = -i, the i-th argument had an illegal value,
If info > 0: if info = i, Di, i is exactly zero. The factorization has been completed, but the block diagonal
matrix D is exactly singular, and division by zero will occur if it is used to solve a system of equations.
Syntax - Workspace
Use this interface if you want to explicitly provide the workspace array.
LAPACK_DECL lapack_int LAPACKE_chetrf_aa_work (int matrix_layout, char uplo, lapack_int
n, lapack_complex_float * a, lapack_int lda, lapack_int * ipiv, lapack_complex_float *
work, lapack_int lwork );
LAPACK_DECL lapack_int LAPACKE_zhetrf_aa_work (int matrix_layout, char uplo, lapack_int
n, lapack_complex_double * a, lapack_int lda, lapack_int * ipiv, lapack_complex_double
* work, lapack_int lwork );
?hetrf_rook
Computes the bounded Bunch-Kaufman factorization
of a complex Hermitian matrix.
Syntax
lapack_int LAPACKE_chetrf_rook (int matrix_layout, char uplo, lapack_int n,
lapack_complex_float * a, lapack_int lda, lapack_int * ipiv);
lapack_int LAPACKE_zhetrf_rook (int matrix_layout, char uplo, lapack_int n,
lapack_complex_double * a, lapack_int lda, lapack_int * ipiv);
Include Files
• mkl.h
Description
The routine computes the factorization of a complex Hermitian matrix A using the bounded Bunch-Kaufman
diagonal pivoting method:
if uplo='U', A = U*D*UH
if uplo='L', A = L*D*LH,
where A is the input matrix, U (or L ) is a product of permutation and unit upper ( or lower) triangular
matrices, and D is a Hermitian block-diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks.
This is the blocked version of the algorithm, calling Level 3 BLAS.
Input Parameters
matrix_layout Specifies whether matrix storage layout for array b is row major
(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).
508
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If uplo = 'L', the array a stores the lower triangular part of the
matrix A.
The array a contains the upper or the lower triangular part of the
matrix A (see uplo).
If uplo = 'U', the leading n-by-n upper triangular part of a contains
the upper triangular part of the matrix A, and the strictly lower
triangular part of a is not referenced. If uplo = 'L', the leading n-by-n
lower triangular part of a contains the lower triangular part of the
matrix A, and the strictly upper triangular part of a is not referenced.
Output Parameters
a The block diagonal matrix D and the multipliers used to obtain the
factor U or L (see Application Notes for further details).
Return Values
This function returns a value info.
If info = i, Dii is exactly 0. The factorization has been completed, but the block diagonal matrix D is
exactly singular, and division by 0 will occur if you use D for solving a system of linear equations.
Application Notes
509
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
i.e., U is a product of terms P(k)*U(k), where k decreases from n to 1 in steps of 1 or 2, and D is a block
diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks D(k). P(k) is a permutation matrix as defined by
ipiv(k), and U(k) is a unit upper triangular matrix, such that if the diagonal block D(k) is of order s (s = 1
or 2), then
k−s s n−k
k−s I v 0
U k =
s 0 I 0
n−k 0 0 I
If s = 1, D(k) overwrites A(k,k), and v overwrites A(1:k-1,k).
If s = 2, the upper triangle of D(k) overwrites A(k-1,k-1), A(k-1,k), and A(k,k), and v overwrites
A(1:k-2,k-1:k).
If uplo = 'L', then A = L*D*LH, where
k−1 s n−k−s+1
k−1 I 0 0
Lk =
s 0 I 0
n−k−s+1 0 v I
If s = 1, D(k) overwrites A(k,k), and v overwrites A(k+1:n,k).
If s = 2, the lower triangle of D(k) overwrites A(k,k), A(k+1,k), and A(k+1,k+1), and v overwrites A(k
+2:n,k:k+1).
See Also
mkl_progress
?hetrf_rk
Computes the factorization of a complex Hermitian
indefinite matrix using the bounded Bunch-Kaufman
(rook) diagonal pivoting method (BLAS3 blocked
algorithm).
lapack_int LAPACKE_chetrf_rk (int matrix_layout, char uplo, lapack_int n,
lapack_complex_float * A, lapack_int lda, lapack_complex_float * e, lapack_int * ipiv);
lapack_int LAPACKE_zhetrf_rk (int matrix_layout, char uplo, lapack_int n,
lapack_complex_double * A, lapack_int lda, lapack_complex_double * e, lapack_int *
ipiv);
Description
?hetrf_rk computes the factorization of a complex Hermitian matrix A using the bounded Bunch-Kaufman
(rook) diagonal pivoting method: A = P*U*D*(UH)*(PT) or A = P*L*D*(LH)*(PT), where U (or L) is unit upper
(or lower) triangular matrix, UH (or LH) is the conjugate of U (or L), P is a permutation matrix, PT is the
transpose of P, and D is Hermitian and block diagonal with 1-by-1 and 2-by-2 diagonal blocks.
This is the blocked version of the algorithm, calling Level 3 BLAS.
510
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
uplo Specifies whether the upper or lower triangular part of the Hermitian matrix
A is stored:
Output Parameters
A On exit, contains:
—and—
• If uplo = 'U', factor U in the superdiagonal part of A. If uplo = 'L',
factor L in the subdiagonal part of A.
ipiv Array of size n. ipiv describes the permutation matrix P in the factorization
of matrix A as follows: The absolute value of ipiv[k-1] represents the
index of row and column that were interchanged with the kth row and
column. The value of uplo describes the order in which the interchanges
were applied. Also, the sign of ipiv represents the block structure of the
Hermitian block diagonal matrix D with 1-by-1 or 2-by-2 diagonal blocks
that correspond to 1 or 2 interchanges at each factorization step. If uplo =
'U' (in factorization order, k decreases from n to 1):
511
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Return Values
This function returns a value info.
= 0: Successful exit.
< 0: If info = -k, the kth argument had an illegal value.
> 0: If info = k, the matrix A is singular. If uplo = 'U', the column k in the upper triangular part of A
contains all zeros. If uplo = 'L', the column k in the lower triangular part of A contains all zeros. Therefore
D(k,k) is exactly zero, and superdiagonal elements of column k of U (or subdiagonal elements of column k of
L ) are all zeros. The factorization has been completed, but the block diagonal matrix D is exactly singular,
and division by zero will occur if it is used to solve a system of equations.
512
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
?sptrf
Computes the Bunch-Kaufman factorization of a
symmetric matrix using packed storage.
Syntax
lapack_int LAPACKE_ssptrf (int matrix_layout , char uplo , lapack_int n , float * ap ,
lapack_int * ipiv );
lapack_int LAPACKE_dsptrf (int matrix_layout , char uplo , lapack_int n , double * ap ,
lapack_int * ipiv );
lapack_int LAPACKE_csptrf (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * ap , lapack_int * ipiv );
lapack_int LAPACKE_zsptrf (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * ap , lapack_int * ipiv );
Include Files
• mkl.h
Description
The routine computes the factorization of a real/complex symmetric matrix A stored in the packed format
using the Bunch-Kaufman diagonal pivoting method. The form of the factorization is:
if uplo='U', A = U*D*UT
if uplo='L', A = L*D*LT,
where U and L are products of permutation and triangular matrices with unit diagonal (upper triangular for U
and lower triangular for L), and D is a symmetric block-diagonal matrix with 1-by-1 and 2-by-2 diagonal
blocks. U and L have 2-by-2 unit diagonal blocks corresponding to the 2-by-2 blocks of D.
NOTE
This routine supports the Progress Routine feature. See Progress Function for details.
Input Parameters
If uplo = 'L', the array ap stores the lower triangular part of the
matrix A, and A is factored as L*D*LT.
ap Array, size at least max(1, n(n+1)/2). The array ap contains the upper
or the lower triangular part of the matrix A (as specified by uplo) in
packed storage (see Matrix Storage Schemes).
513
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
ipiv Array, size at least max(1, n). Contains details of the interchanges
and the block structure of D. If ipiv[i-1] = k >0, then dii is a 1-
by-1 block, and the i-th row and column of A was interchanged with
the k-th row and column.
If uplo = 'U' and ipiv[i] =ipiv[i-1] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and i-th row and column of A was
interchanged with the m-th row and column.
If uplo = 'L' and ipiv[i] =ipiv[i-1] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and (i+1)-th row and column of A
was interchanged with the m-th row and column.
Return Values
This function returns a value info.
If info = i, dii is 0. The factorization has been completed, but D is exactly singular. Division by 0 will occur
if you use D for solving a system of linear equations.
Application Notes
The 2-by-2 unit diagonal blocks and the unit diagonal elements of U and L are not stored. The remaining
elements of U and L overwrite elements of the corresponding columns of the array ap, but additional row
interchanges are required to recover U or L explicitly (which is seldom necessary).
If ipiv(i) = i for all i = 1...n, then all off-diagonal elements of U (L) are stored explicitly in packed form.
If uplo = 'U', the computed factors U and D are the exact factors of a perturbed matrix A + E, where
|E| ≤c(n)εP|U||D||UT|PT
c(n) is a modest linear function of n, and ε is the machine precision. A similar estimate holds for the
computed L and D when uplo = 'L'.
The total number of floating-point operations is approximately (1/3)n3 for real flavors or (4/3)n3 for
complex flavors.
After calling this routine, you can call the following routines:
See Also
mkl_progress
514
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
?hptrf
Computes the Bunch-Kaufman factorization of a
complex Hermitian matrix using packed storage.
Syntax
lapack_int LAPACKE_chptrf (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * ap , lapack_int * ipiv );
lapack_int LAPACKE_zhptrf (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * ap , lapack_int * ipiv );
Include Files
• mkl.h
Description
The routine computes the factorization of a complex Hermitian packed matrix A using the Bunch-Kaufman
diagonal pivoting method:
if uplo='U', A = U*D*UH
if uplo='L', A = L*D*LH,
where A is the input matrix, U and L are products of permutation and triangular matrices with unit diagonal
(upper triangular for U and lower triangular for L), and D is a Hermitian block-diagonal matrix with 1-by-1
and 2-by-2 diagonal blocks. U and L have 2-by-2 unit diagonal blocks corresponding to the 2-by-2 blocks of
D.
NOTE
This routine supports the Progress Routine feature. See Progress Function for details.
Input Parameters
If uplo = 'L', the array ap stores the lower triangular part of the
matrix A, and A is factored as L*D*LH.
ap Array, size at least max(1, n(n+1)/2). The array ap contains the upper
or the lower triangular part of the matrix A (as specified by uplo) in
packed storage (see Matrix Storage Schemes).
515
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
ipiv Array, size at least max(1, n). Contains details of the interchanges
and the block structure of D. If ipiv[i-1] = k >0, then dii is a 1-
by-1 block, and the i-th row and column of A was interchanged with
the k-th row and column.
If uplo = 'U' and ipiv[i] =ipiv[i-1] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and i-th row and column of A was
interchanged with the m-th row and column.
If uplo = 'L' and ipiv[i] =ipiv[i-1] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and (i+1)-th row and column of A
was interchanged with the m-th row and column.
Return Values
This function returns a value info.
If info = i, dii is 0. The factorization has been completed, but D is exactly singular. Division by 0 will occur
if you use D for solving a system of linear equations.
Application Notes
The 2-by-2 unit diagonal blocks and the unit diagonal elements of U and L are not stored. The remaining
elements of U and L are stored in the array ap, but additional row interchanges are required to recover U or L
explicitly (which is seldom necessary).
If ipiv[i-1] = i for all i = 1...n, then all off-diagonal elements of U (L) are stored explicitly in the
corresponding elements of the array a.
If uplo = 'U', the computed factors U and D are the exact factors of a perturbed matrix A + E, where
|E| ≤c(n)εP|U||D||UT|PT
c(n) is a modest linear function of n, and ε is the machine precision.
A similar estimate holds for the computed L and D when uplo = 'L'.
See Also
mkl_progress
516
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
mkl_?spffrt2, mkl_?spffrtx
Computes the partial LDLT factorization of a
symmetric matrix using packed storage.
Syntax
void mkl_sspffrt2 (float *ap , const MKL_INT *n , const MKL_INT *ncolm , float *work ,
float *work2 );
void mkl_dspffrt2 (double *ap , const MKL_INT *n , const MKL_INT *ncolm , double
*work , double *work2 );
void mkl_cspffrt2 (MKL_Complex8 *ap , const MKL_INT *n , const MKL_INT *ncolm ,
MKL_Complex8 *work , MKL_Complex8 *work2 );
void mkl_zspffrt2 (MKL_Complex16 *ap , const MKL_INT *n , const MKL_INT *ncolm ,
MKL_Complex16 *work , MKL_Complex16 *work2 );
void mkl_sspffrtx (float *ap , const MKL_INT *n , const MKL_INT *ncolm , float *work ,
float *work2 );
void mkl_dspffrtx (double *ap , const MKL_INT *n , const MKL_INT *ncolm , double
*work , double *work2 );
void mkl_cspffrtx (MKL_Complex8 *ap , const MKL_INT *n , const MKL_INT *ncolm ,
MKL_Complex8 *work , MKL_Complex8 *work2 );
void mkl_zspffrtx (MKL_Complex16 *ap , const MKL_INT *n , const MKL_INT *ncolm ,
MKL_Complex16 *work , MKL_Complex16 *work2 );
Include Files
• mkl.h
Description
The routine computes the partial factorization A = LDLT , where L is a lower triangular matrix and D is a
diagonal matrix.
Caution
The routine assumes that the matrix A is factorizable. The routine does not perform pivoting
and does not handle diagonal elements which are zero, which cause the routine to produce
incorrect results without any indication.
T
a b
Consider the matrix A = , where a is the element in the first row and first column of A, b is a column
b C
vector of size n - 1 containing the elements from the second through n-th column of A, C is the lower-right
square submatrix of A, and I is the identity matrix.
The mkl_?spffrt2 routine performs ncolm successive factorizations of the form
T −1 T
a b a 0 a 0 a b
A= = .
b C b I 0 −1
C − ba b
T 0 I
517
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The approximate number of floating point operations performed by real flavors of these routines is
(1/6)*ncolm*(2*ncolm2 - 6*ncolm*n + 3*ncolm + 6*n2 - 6*n + 7).
The approximate number of floating point operations performed by complex flavors of these routines is
(1/3)*ncolm*(4*ncolm2 - 12*ncolm*n + 9*ncolm + 12*n2 - 18*n + 8).
Input Parameters
ap Array, size at least max(1, n(n+1)/2). The array ap contains the lower
triangular part of the matrix A in packed storage (see Matrix Storage
Schemes for uplo = 'L').
Output Parameters
NOTE
Specifying ncolm = n results in complete factorization A =
LDLT.
See Also
mkl_progress
?getrs
Solves a system of linear equations with an LU-
factored square coefficient matrix, with multiple right-
hand sides.
Syntax
lapack_int LAPACKE_sgetrs (int matrix_layout , char trans , lapack_int n , lapack_int
nrhs , const float * a , lapack_int lda , const lapack_int * ipiv , float * b ,
lapack_int ldb );
lapack_int LAPACKE_dgetrs (int matrix_layout , char trans , lapack_int n , lapack_int
nrhs , const double * a , lapack_int lda , const lapack_int * ipiv , double * b ,
lapack_int ldb );
518
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_cgetrs (int matrix_layout , char trans , lapack_int n , lapack_int
nrhs , const lapack_complex_float * a , lapack_int lda , const lapack_int * ipiv ,
lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zgetrs (int matrix_layout , char trans , lapack_int n , lapack_int
nrhs , const lapack_complex_double * a , lapack_int lda , const lapack_int * ipiv ,
lapack_complex_double * b , lapack_int ldb );
Include Files
• mkl.h
Description
A*X = B if trans='N',
AT*X = B if trans='T',
Before calling this routine, you must call ?getrf to compute the LU factorization of A.
Input Parameters
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
ipiv Array, size at least max(1, n). The ipiv array, as returned by ?getrf.
519
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
Return Values
This function returns a value info.
Application Notes
For each right-hand side b, the computed solution is the exact solution of a perturbed system of equations (A
+ E)x = b, where
|E| ≤c(n)εP|L||U|
c(n) is a modest linear function of n, and ε is the machine precision.
If x0 is the true solution, the computed solution x satisfies this error bound:
Note that cond(A,x) can be much smaller than κ∞(A); the condition number of AT and AH might or might
not be equal to κ∞(A).
The approximate number of floating-point operations for one right-hand side vector b is 2n2 for real flavors
and 8n2 for complex flavors.
See Also
Matrix Storage Schemes
?gbtrs
Solves a system of linear equations with an LU-
factored band coefficient matrix, with multiple right-
hand sides.
Syntax
lapack_int LAPACKE_sgbtrs (int matrix_layout , char trans , lapack_int n , lapack_int
kl , lapack_int ku , lapack_int nrhs , const float * ab , lapack_int ldab , const
lapack_int * ipiv , float * b , lapack_int ldb );
lapack_int LAPACKE_dgbtrs (int matrix_layout , char trans , lapack_int n , lapack_int
kl , lapack_int ku , lapack_int nrhs , const double * ab , lapack_int ldab , const
lapack_int * ipiv , double * b , lapack_int ldb );
520
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_cgbtrs (int matrix_layout , char trans , lapack_int n , lapack_int
kl , lapack_int ku , lapack_int nrhs , const lapack_complex_float * ab , lapack_int
ldab , const lapack_int * ipiv , lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zgbtrs (int matrix_layout , char trans , lapack_int n , lapack_int
kl , lapack_int ku , lapack_int nrhs , const lapack_complex_double * ab , lapack_int
ldab , const lapack_int * ipiv , lapack_complex_double * b , lapack_int ldb );
Include Files
• mkl.h
Description
The routine solves for X the following systems of linear equations:
A*X = B if trans='N',
AT*X = B if trans='T',
Here A is an LU-factored general band matrix of order n with kl non-zero subdiagonals and ku nonzero
superdiagonals. Before calling this routine, call ?gbtrf to compute the LU factorization of A.
Input Parameters
b Array b size max(1, ldb*nrhs) for column major layout and max(1,
ldb*n) for row major layout.
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.
ldab The leading dimension of the array ab; ldab≥ 2*kl + ku +1.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
ipiv Array, size at least max(1, n). The ipiv array, as returned by ?gbtrf.
Output Parameters
521
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Return Values
This function returns a value info.
Application Notes
For each right-hand side b, the computed solution is the exact solution of a perturbed system of equations (A
+ E)x = b, where
Note that cond(A,x) can be much smaller than κ∞(A); the condition number of AT and AH might or might
not be equal to κ∞(A).
The approximate number of floating-point operations for one right-hand side vector is 2n(ku + 2kl) for real
flavors. The number of operations for complex flavors is 4 times greater. All these estimates assume that kl
and ku are much less than min(m,n).
To estimate the condition number κ∞(A), call ?gbcon.
See Also
Matrix Storage Schemes
?gttrs
Solves a system of linear equations with a tridiagonal
coefficient matrix using the LU factorization computed
by ?gttrf.
Syntax
lapack_int LAPACKE_sgttrs (int matrix_layout , char trans , lapack_int n , lapack_int
nrhs , const float * dl , const float * d , const float * du , const float * du2 ,
const lapack_int * ipiv , float * b , lapack_int ldb );
lapack_int LAPACKE_dgttrs (int matrix_layout , char trans , lapack_int n , lapack_int
nrhs , const double * dl , const double * d , const double * du , const double * du2 ,
const lapack_int * ipiv , double * b , lapack_int ldb );
lapack_int LAPACKE_cgttrs (int matrix_layout , char trans , lapack_int n , lapack_int
nrhs , const lapack_complex_float * dl , const lapack_complex_float * d , const
lapack_complex_float * du , const lapack_complex_float * du2 , const lapack_int *
ipiv , lapack_complex_float * b , lapack_int ldb );
522
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_zgttrs (int matrix_layout , char trans , lapack_int n , lapack_int
nrhs , const lapack_complex_double * dl , const lapack_complex_double * d , const
lapack_complex_double * du , const lapack_complex_double * du2 , const lapack_int *
ipiv , lapack_complex_double * b , lapack_int ldb );
Include Files
• mkl.h
Description
The routine solves for X the following systems of linear equations with multiple right hand sides:
A*X = B if trans='N',
AT*X = B if trans='T',
Before calling this routine, you must call ?gttrf to compute the LU factorization of A.
Input Parameters
matrix_layout Specifies whether matrix storage layout for array b is row major
(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).
n The order of A; n≥ 0.
nrhs The number of right-hand sides, that is, the number of columns in B;
nrhs≥ 0.
b Array of size max(1, ldb*nrhs) for column major layout and max(1,
n*ldb) for row major layout. Contains the matrix B whose columns
are the right-hand sides for the systems of equations.
523
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
Return Values
This function returns a value info.
Application Notes
For each right-hand side b, the computed solution is the exact solution of a perturbed system of equations (A
+ E)x = b, where
|E| ≤c(n)εP|L||U|
c(n) is a modest linear function of n, and ε is the machine precision.
If x0 is the true solution, the computed solution x satisfies this error bound:
Note that cond(A,x) can be much smaller than κ∞(A); the condition number of AT and AH might or might
not be equal to κ∞(A).
The approximate number of floating-point operations for one right-hand side vector b is 7n (including n
divisions) for real flavors and 34n (including 2n divisions) for complex flavors.
See Also
Matrix Storage Schemes
?dttrsb
Solves a system of linear equations with a diagonally
dominant tridiagonal coefficient matrix using the LU
factorization computed by ?dttrfb.
Syntax
void sdttrsb (const char * trans, const MKL_INT * n, const MKL_INT * nrhs, const float
* dl, const float * d, const float * du, float * b, const MKL_INT * ldb, MKL_INT *
info );
void ddttrsb (const char * trans, const MKL_INT * n, const MKL_INT * nrhs, const double
* dl, const double * d, const double * du, double * b, const MKL_INT * ldb, MKL_INT *
info );
524
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void cdttrsb (const char * trans, const MKL_INT * n, const MKL_INT * nrhs, const
MKL_Complex8 * dl, const MKL_Complex8 * d, const MKL_Complex8 * du, MKL_Complex8 * b,
const MKL_INT * ldb, MKL_INT * info );
void zdttrsb (const char * trans, const MKL_INT * n, const MKL_INT * nrhs, const
MKL_Complex16 * dl, const MKL_Complex16 * d, const MKL_Complex16 * du, MKL_Complex16 *
b, const MKL_INT * ldb, MKL_INT * info );
Include Files
• mkl.h
Description
The ?dttrsb routine solves the following systems of linear equations with multiple right hand sides for X:
A*X = B if trans='N',
AT*X = B if trans='T',
Input Parameters
n The order of A; n≥ 0.
nrhs The number of right-hand sides, that is, the number of columns in B;
nrhs≥ 0.
Output Parameters
525
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?potrs
Solves a system of linear equations with a Cholesky-
factored symmetric (Hermitian) positive-definite
coefficient matrix.
Syntax
lapack_int LAPACKE_spotrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const float * a , lapack_int lda , float * b , lapack_int ldb );
lapack_int LAPACKE_dpotrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const double * a , lapack_int lda , double * b , lapack_int ldb );
lapack_int LAPACKE_cpotrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_float * a , lapack_int lda , lapack_complex_float * b ,
lapack_int ldb );
lapack_int LAPACKE_zpotrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_double * a , lapack_int lda , lapack_complex_double * b ,
lapack_int ldb );
Include Files
• mkl.h
Description
The routine solves for X the system of linear equations A*X = B with a symmetric positive-definite or, for
complex data, Hermitian positive-definite matrix A, given the Cholesky factorization of A:
where L is a lower triangular matrix and U is upper triangular. The system is solved with multiple right-hand
sides stored in the columns of the matrix B.
Before calling this routine, you must call ?potrf to compute the Cholesky factorization of A.
Input Parameters
526
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
a Array A of size at least max(1, lda*n)
b The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations. The size of b must be at least
max(1, ldb*nrhs) for column major layout and max(1, ldb*n) for
row major layout.
ldb The leading dimension of b. ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
Output Parameters
Return Values
This function returns a value info.
Application Notes
If uplo = 'U', the computed solution for each right-hand side b is the exact solution of a perturbed system
of equations (A + E)x = b, where
Note that cond(A,x) can be much smaller than κ∞ (A). The approximate number of floating-point operations
for one right-hand side vector b is 2n2 for real flavors and 8n2 for complex flavors.
See Also
Matrix Storage Schemes
527
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?pftrs
Solves a system of linear equations with a Cholesky-
factored symmetric (Hermitian) positive-definite
coefficient matrix using the Rectangular Full Packed
(RFP) format.
Syntax
lapack_int LAPACKE_spftrs (int matrix_layout , char transr , char uplo , lapack_int n ,
lapack_int nrhs , const float * a , float * b , lapack_int ldb );
lapack_int LAPACKE_dpftrs (int matrix_layout , char transr , char uplo , lapack_int n ,
lapack_int nrhs , const double * a , double * b , lapack_int ldb );
lapack_int LAPACKE_cpftrs (int matrix_layout , char transr , char uplo , lapack_int n ,
lapack_int nrhs , const lapack_complex_float * a , lapack_complex_float * b ,
lapack_int ldb );
lapack_int LAPACKE_zpftrs (int matrix_layout , char transr , char uplo , lapack_int n ,
lapack_int nrhs , const lapack_complex_double * a , lapack_complex_double * b ,
lapack_int ldb );
Include Files
• mkl.h
Description
The routine solves a system of linear equations A*X = B with a symmetric positive-definite or, for complex
data, Hermitian positive-definite matrix A using the Cholesky factorization of A:
Before calling ?pftrs, you must call ?pftrf to compute the Cholesky factorization of A. L stands for a lower
triangular matrix and U for an upper triangular matrix.
The matrix A is in the Rectangular Full Packed (RFP) format. For the description of the RFP format, see Matrix
Storage Schemes.
Input Parameters
transr Must be 'N', 'T' (for real data) or 'C' (for complex data).
528
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If uplo = 'L', L is stored, where A = L*LT for real data, A = L*LH for
complex data
nrhs The number of right-hand sides, that is, the number of columns of the
matrix B; nrhs≥ 0.
b The array b of size max(1, ldb*nrhs) for column major layout and
max(1,ldb*n) for row major layout contains the matrix B whose
columns are the right-hand sides for the systems of equations.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
Output Parameters
Return Values
This function returns a value info.
See Also
Matrix Storage Schemes
?pptrs
Solves a system of linear equations with a packed
Cholesky-factored symmetric (Hermitian) positive-
definite coefficient matrix.
Syntax
lapack_int LAPACKE_spptrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const float * ap , float * b , lapack_int ldb );
lapack_int LAPACKE_dpptrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const double * ap , double * b , lapack_int ldb );
lapack_int LAPACKE_cpptrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_float * ap , lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zpptrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_double * ap , lapack_complex_double * b , lapack_int ldb );
Include Files
• mkl.h
529
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Description
The routine solves for X the system of linear equations A*X = B with a packed symmetric positive-definite or,
for complex data, Hermitian positive-definite matrix A, given the Cholesky factorization of A:
where L is a lower triangular matrix and U is upper triangular. The system is solved with multiple right-hand
sides stored in the columns of the matrix B.
Before calling this routine, you must call ?pptrf to compute the Cholesky factorization of A.
Input Parameters
b The array b of size max(1, ldb*nrhs) for column major layout and
max(1, ldb*n) for row major layout contains the matrix B whose
columns are the right-hand sides for the systems of equations.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
Output Parameters
Return Values
This function returns a value info.
530
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Application Notes
If uplo = 'U', the computed solution for each right-hand side b is the exact solution of a perturbed system
of equations (A + E)x = b, where
If x0 is the true solution, the computed solution x satisfies this error bound:
The approximate number of floating-point operations for one right-hand side vector b is 2n2 for real flavors
and 8n2 for complex flavors.
See Also
Matrix Storage Schemes
?pbtrs
Solves a system of linear equations with a Cholesky-
factored symmetric (Hermitian) positive-definite band
coefficient matrix.
Syntax
lapack_int LAPACKE_spbtrs (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , lapack_int nrhs , const float * ab , lapack_int ldab , float * b , lapack_int
ldb );
lapack_int LAPACKE_dpbtrs (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , lapack_int nrhs , const double * ab , lapack_int ldab , double * b , lapack_int
ldb );
lapack_int LAPACKE_cpbtrs (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , lapack_int nrhs , const lapack_complex_float * ab , lapack_int ldab ,
lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zpbtrs (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , lapack_int nrhs , const lapack_complex_double * ab , lapack_int ldab ,
lapack_complex_double * b , lapack_int ldb );
Include Files
• mkl.h
531
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Description
The routine solves for real data a system of linear equations A*X = B with a symmetric positive-definite or,
for complex data, Hermitian positive-definite band matrix A, given the Cholesky factorization of A:
where L is a lower triangular matrix and U is upper triangular. The system is solved with multiple right-hand
sides stored in the columns of the matrix B.
Before calling this routine, you must call ?pbtrf to compute the Cholesky factorization of A in the band
storage form.
Input Parameters
b The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.
The size of b is at least max(1, ldb*nrhs) for column major layout
and max(1, ldb*n) for row major layout.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
Output Parameters
532
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Return Values
This function returns a value info.
Application Notes
For each right-hand side b, the computed solution is the exact solution of a perturbed system of equations (A
+ E)x = b, where
The approximate number of floating-point operations for one right-hand side vector is 4n*kd for real flavors
and 16n*kd for complex flavors.
To estimate the condition number κ∞(A), call ?pbcon.
See Also
Matrix Storage Schemes
?pttrs
Solves a system of linear equations with a symmetric
(Hermitian) positive-definite tridiagonal coefficient
matrix using the factorization computed by ?pttrf.
Syntax
lapack_int LAPACKE_spttrs( int matrix_layout, lapack_int n, lapack_int nrhs, const
float* d, const float* e, float* b, lapack_int ldb );
lapack_int LAPACKE_dpttrs( int matrix_layout, lapack_int n, lapack_int nrhs, const
double* d, const double* e, double* b, lapack_int ldb );
lapack_int LAPACKE_cpttrs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const float* d, const lapack_complex_float* e, lapack_complex_float* b, lapack_int
ldb );
lapack_int LAPACKE_zpttrs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const double* d, const lapack_complex_double* e, lapack_complex_double* b, lapack_int
ldb );
533
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine solves for X a system of linear equations A*X = B with a symmetric (Hermitian) positive-definite
tridiagonal matrix A. Before calling this routine, call ?pttrf to compute the L*D*LT or UT*D*Ufor real data
and the L*D*LH or UH*D*Ufactorization of A for complex data.
Input Parameters
n The order of A; n≥ 0.
nrhs The number of right-hand sides, that is, the number of columns of the
matrix B; nrhs≥ 0.
e, b The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.
The size of b is at least max(1, ldb*nrhs) for column major layout
and max(1, ldb*n) for row major layout.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
Output Parameters
Return Values
This function returns a value info.
534
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = -i, parameter i had an illegal value.
See Also
Matrix Storage Schemes
?sytrs
Solves a system of linear equations with a UDUT- or
LDLT-factored symmetric coefficient matrix.
Syntax
lapack_int LAPACKE_ssytrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const float * a , lapack_int lda , const lapack_int * ipiv , float * b ,
lapack_int ldb );
lapack_int LAPACKE_dsytrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const double * a , lapack_int lda , const lapack_int * ipiv , double * b ,
lapack_int ldb );
lapack_int LAPACKE_csytrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_float * a , lapack_int lda , const lapack_int * ipiv ,
lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zsytrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_double * a , lapack_int lda , const lapack_int * ipiv ,
lapack_complex_double * b , lapack_int ldb );
Include Files
• mkl.h
Description
The routine solves for X the system of linear equations A*X = B with a symmetric matrix A, given the Bunch-
Kaufman factorization of A:
if uplo='U', A = U*D*UT
if uplo='L', A = L*D*LT,
where U and L are upper and lower triangular matrices with unit diagonal and D is a symmetric block-
diagonal matrix. The system is solved with multiple right-hand sides stored in the columns of the matrix B.
You must supply to this routine the factor U (or L) and the array ipiv returned by the factorization
routine ?sytrf.
Input Parameters
535
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If uplo = 'L', the array a stores the lower triangular factor L of the
factorization A = L*D*LT.
ipiv Array, size at least max(1, n). The ipiv array, as returned by ?sytrf.
a The array aof size max(1, lda*n) contains the factor U or L (see
uplo). .
b The array b contains the matrix B whose columns are the right-hand
sides for the system of equations.
The size of b is at least max(1, ldb*nrhs) for column major layout
and max(1, ldb*n) for row major layout.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
Output Parameters
Return Values
This function returns a value info.
Application Notes
For each right-hand side b, the computed solution is the exact solution of a perturbed system of equations (A
+ E)x = b, where
The total number of floating-point operations for one right-hand side vector is approximately 2n2 for real
flavors or 8n2 for complex flavors.
536
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
To estimate the condition number κ∞(A), call ?sycon.
See Also
Matrix Storage Schemes
?sytrs_aa
Solves a system of linear equations A * X = B with a
symmetric matrix.
lapack_int LAPACKE_ssytrs_aa (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const float * A, lapack_int lda, const lapack_int * ipiv, float * B, lapack_int
ldb);
lapack_int LAPACKE_dsytrs_aa (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const double * A, lapack_int lda, const lapack_int * ipiv, double * B, lapack_int
ldb);
lapack_int LAPACKE_csytrs_aa (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const lapack_complex_float * A, lapack_int lda, const lapack_int * ipiv,
lapack_complex_float * B, lapack_int ldb);
lapack_int LAPACKE_zsytrs_aa (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const lapack_complex_double * A, lapack_int lda, const lapack_int * ipiv,
lapack_complex_double * B, lapack_int ldb);
Description
?sytrs_aa solves a system of linear equations A * X = B with a symmetric matrix A using the factorization A
= U*T*UT or A = L*T*LT computed by ?sytrf_aa.
Input Parameters
uplo Specifies whether the details of the factorization are stored as an upper or
lower triangular matrix.
nrhs The number of right-hand sides; that is, the number of columns of the
matrix B. nrhs ≥ 0.
ldb The leading dimension of the array B. ldb ≥ max(1, n) for column-major
layout and ldb ≥ nrhs for row-major layout.
537
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
Return Values
This function returns a value info.
= 0: Successful exit.
< 0: If info = -i, the ith argument had an illegal value.
?sytrs_rook
Solves a system of linear equations with a UDU- or
LDL-factored symmetric coefficient matrix.
Syntax
lapack_int LAPACKE_ssytrs_rook (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const float * a, lapack_int lda, const lapack_int * ipiv, float * b, lapack_int
ldb);
lapack_int LAPACKE_dsytrs_rook (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const double * a, lapack_int lda, const lapack_int * ipiv, double * b, lapack_int
ldb);
lapack_int LAPACKE_csytrs_rook (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const lapack_complex_float * a, lapack_int lda, const lapack_int * ipiv,
lapack_complex_float * b, lapack_int ldb);
lapack_int LAPACKE_zsytrs_rook (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const lapack_complex_double * a, lapack_int lda, const lapack_int * ipiv,
lapack_complex_double * b, lapack_int ldb);
Include Files
• mkl.h
Description
The routine solves a system of linear equations A*X = B with a symmetric matrix A, using the factorization A
= U*D*UT or A = L*D*LT computed by ?sytrf_rook.
Input Parameters
matrix_layout Specifies whether matrix storage layout for array b is row major
(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).
538
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ipiv Array, size at least max(1, n). The ipiv array, as returned
by ?sytrf_rook.
The array a contains the block diagonal matrix D and the multipliers
used to obtain U or L as computed by ?sytrf_rook (see uplo).
The array b contains the matrix B whose columns are the right-hand
sides for the system of equations.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs) for row major layout.
Output Parameters
Return Values
This function returns a value info.
Application Notes
The total number of floating-point operations for one right-hand side vector is approximately 2n2 for real
flavors or 8n2 for complex flavors.
See Also
Matrix Storage Schemes
?hetrs
Solves a system of linear equations with a UDUT- or
LDLT-factored Hermitian coefficient matrix.
Syntax
lapack_int LAPACKE_chetrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_float * a , lapack_int lda , const lapack_int * ipiv ,
lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zhetrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_double * a , lapack_int lda , const lapack_int * ipiv ,
lapack_complex_double * b , lapack_int ldb );
Include Files
• mkl.h
Description
The routine solves for X the system of linear equations A*X = B with a Hermitian matrix A, given the Bunch-
Kaufman factorization of A:
539
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
if uplo='U', A = U*D*UH
if uplo='L', A = L*D*LH,
where U and L are upper and lower triangular matrices with unit diagonal and D is a symmetric block-
diagonal matrix. The system is solved with multiple right-hand sides stored in the columns of the matrix B.
You must supply to this routine the factor U (or L) and the array ipiv returned by the factorization
routine ?hetrf.
Input Parameters
If uplo = 'L', the array a stores the lower triangular factor L of the
factorization A = L*D*LH.
a The array aof size max(1, lda*n) contains the factor U or L (see
uplo).
b The array b contains the matrix B whose columns are the right-hand
sides for the system of equations.
The size of b is at least max(1, ldb*nrhs) for column major layout
and max(1, ldb*n) for row major layout.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
Output Parameters
Return Values
This function returns a value info.
540
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Application Notes
For each right-hand side b, the computed solution is the exact solution of a perturbed system of equations (A
+ E)x = b, where
The total number of floating-point operations for one right-hand side vector is approximately 8n2.
See Also
Matrix Storage Schemes
?hetrs_aa
BSolves a system of linear equations A*X = with a
complex Hermitian matrix.
LAPACK_DECL lapack_int LAPACKE_chetrs_aa (int matrix_layout, char uplo, lapack_int n,
lapack_int nrhs, const lapack_complex_float * a, lapack_int lda, const lapack_int *
ipiv, lapack_complex_float * b, lapack_int ldb );
LAPACK_DECL lapack_int LAPACKE_zhetrs_aa (int matrix_layout, char uplo, lapack_int n,
lapack_int nrhs, const lapack_complex_double * a, lapack_int lda, const lapack_int *
ipiv, lapack_complex_double * b, lapack_int ldb );
Description
?hetrs_aa solves a system of linear equations A*X = X with a complex Hermitian matrix A using the
factorization A = U * T * UH or A = L * T * LH computed by ?hetrf_aa.
Input Parameters
uplo Specifies whether the details of the factorization are stored as an upper or
lower triangular matrix.
If uplo = 'U': Upper triangular of the form A = U * T * UH.
541
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
nrhs The number of right hand sides: the number of columns of the matrix b.
nrhs≥ 0.
Output Parameters
Return Values
This function returns a value info.
If info < 0: if info = -i, the i-th argument had an illegal value.
Syntax - Workspace
Use this interface if you want to explicitly provide the workspace array.
LAPACK_DECL lapack_int LAPACKE_chetrs_aa_work (int matrix_layout, char uplo, lapack_int
n, lapack_int nrhs, const lapack_complex_float * a, lapack_int lda, const lapack_int *
ipiv, lapack_complex_float * b, lapack_int ldb, lapack_complex_float * work, lapack_int
lwork );
LAPACK_DECL lapack_int LAPACKE_zhetrs_aa_work (int matrix_layout, char uplo, lapack_int
n, lapack_int nrhs, const lapack_complex_double * a, lapack_int lda, const lapack_int *
ipiv, lapack_complex_double * b, lapack_int ldb, lapack_complex_double * work,
lapack_int lwork );
?hetrs_rook
Solves a system of linear equations with a UDU- or
LDL-factored Hermitian coefficient matrix.
Syntax
lapack_int LAPACKE_chetrs_rook (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const lapack_complex_float * a, lapack_int lda, const lapack_int * ipiv,
lapack_complex_float * b, lapack_int ldb);
lapack_int LAPACKE_zhetrs_rook (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const lapack_complex_double * a, lapack_int lda, const lapack_int * ipiv,
lapack_complex_double * b, lapack_int ldb);
542
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h
Description
The routine solves for a system of linear equations A*X = B with a complex Hermitian matrix A using the
factorization A = U*D*UH or A = L*D*LH computed by ?hetrf_rook.
Input Parameters
matrix_layout Specifies whether matrix storage layout for array b is row major
(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).
The array a contains the block diagonal matrix D and the multipliers
used to obtain the factor U or L as computed by ?hetrf_rook (see
uplo).
The array b contains the matrix B whose columns are the right-hand
sides for the system of equations.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs) for row major layout.
Output Parameters
Return Values
This function returns a value info.
?sytrs2
Solves a system of linear equations with a UDU- or
LDL-factored symmetric coefficient matrix.
543
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
lapack_int LAPACKE_ssytrs2 (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const float * a , lapack_int lda , const lapack_int * ipiv , float * b ,
lapack_int ldb );
lapack_int LAPACKE_dsytrs2 (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const double * a , lapack_int lda , const lapack_int * ipiv , double * b ,
lapack_int ldb );
lapack_int LAPACKE_csytrs2 (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_float * a , lapack_int lda , const lapack_int * ipiv ,
lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zsytrs2 (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_double * a , lapack_int lda , const lapack_int * ipiv ,
lapack_complex_double * b , lapack_int ldb );
Include Files
• mkl.h
Description
The routine solves a system of linear equations A*X = B with a symmetric matrix A using the factorization of
A:
if uplo='U', A = U*D*UT
if uplo='L', A = L*D*LT
where
• U and L are upper and lower triangular matrices with unit diagonal
• D is a symmetric block-diagonal matrix.
The factorization is computed by ?sytrf.
Input Parameters
If uplo = 'L', the array a stores the lower triangular factor L of the
factorization A = L*D*LT.
a The array aof size max(1, lda*n) contains the block diagonal matrix D
and the multipliers used to obtain the factor U or L as computed
by ?sytrf.
544
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The size of b is at least max(1, ldb*nrhs) for column major layout
and max(1, ldb*n) for row major layout.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
ipiv Array of size n. The ipiv array contains details of the interchanges
and the block structure of D as determined by ?sytrf.
Output Parameters
Return Values
This function returns a value info.
See Also
?sytrf
Matrix Storage Schemes
?hetrs2
Solves a system of linear equations with a UDU- or
LDL-factored Hermitian coefficient matrix.
Syntax
lapack_int LAPACKE_chetrs2 (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_float * a , lapack_int lda , const lapack_int * ipiv ,
lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zhetrs2 (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_double * a , lapack_int lda , const lapack_int * ipiv ,
lapack_complex_double * b , lapack_int ldb );
Include Files
• mkl.h
Description
The routine solves a system of linear equations A*X = B with a complex Hermitian matrix A using the
factorization of A:
if uplo='U', A = U*D*UH
if uplo='L', A = L*D*LH
where
• U and L are upper and lower triangular matrices with unit diagonal
• D is a Hermitian block-diagonal matrix.
The factorization is computed by ?hetrf.
545
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
If uplo = 'L', the array a stores the lower triangular factor L of the
factorization A = L*D*LH.
a The array a of size max(1, lda*n) contains the block diagonal matrix
D and the multipliers used to obtain the factor U or L as computed
by ?hetrf.
b The array b of size max(1, ldb*nrhs) for column major layout and
max(1, ldb*n) for row major layout contains the right-hand side
matrix B.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
ipiv Array of size n. The ipiv array contains details of the interchanges
and the block structure of D as determined by ?hetrf.
Output Parameters
Return Values
This function returns a value info.
See Also
?hetrf
Matrix Storage Schemes
?sytrs_3
Solves a system of linear equations A * X = B with a
real or complex symmetric matrix.
lapack_int LAPACKE_ssytrs_3 (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const float * A, lapack_int lda, const float * e, const lapack_int * ipiv, float
* B, lapack_int ldb);
546
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_dsytrs_3 (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const double * A, lapack_int lda, const double * e, const lapack_int * ipiv,
double * B, lapack_int ldb);
lapack_int LAPACKE_csytrs_3 (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const lapack_complex_float * A, lapack_int lda, const lapack_complex_float * e,
const lapack_int * ipiv, lapack_complex_float * B, lapack_int ldb);
lapack_int LAPACKE_zsytrs_3 (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const lapack_complex_double * A, lapack_int lda, const lapack_complex_double * e,
const lapack_int * ipiv, lapack_complex_double * B, lapack_int ldb);
Description
?sytrs_3 solves a system of linear equations A * X = B with a real or complex symmetric matrix A using the
factorization computed by ?sytrf_rk: A = P*U*D*(UT)*(PT) or A = P*L*D*(LT)*(PT), where U (or L) is unit
upper (or lower) triangular matrix, UT (or LT) is the transpose of U (or L), P is a permutation matrix, PT is the
transpose of P, and D is a symmetric and block diagonal with 1-by-1 and 2-by-2 diagonal blocks.
This algorithm uses Level 3 BLAS.
Input Parameters
uplo Specifies whether the details of the factorization are stored as an upper or
lower triangular matrix:
nrhs The number of right-hand sides; that is, the number of columns of the
matrix B. nrhs ≥ 0.
A Array of size max(1, lda*n). Diagonal of the block diagonal matrix D and
factors U or L as computed by ?sytrf_rk:
—and—
• If uplo = 'U', factor U in the superdiagonal part of A. If uplo = 'L',
factor L in the subdiagonal part of A.
547
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ipiv Array of size n. Details of the interchanges and the block structure of D as
determined by ?sytrf_rk.
ldb The leading dimension of the array B. ldb ≥ max(1, n) for column-major
layout and ldb ≥ nrhs for row-major layout.
Output Parameters
Return Values
This function returns a value info.
= 0: Successful exit.
< 0: If info = -i, the ith argument had an illegal value.
?hetrs_3
Solves a system of linear equations A * X = B with a
complex Hermitian matrix using the factorization
computed by ?hetrf_rk.
lapack_int LAPACKE_chetrs_3 (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const lapack_complex_float * A, lapack_int lda, const lapack_complex_float * e,
const lapack_int * ipiv, lapack_complex_float * B, lapack_int ldb);
lapack_int LAPACKE_zhetrs_3 (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const lapack_complex_double * A, lapack_int lda, const lapack_complex_double * e,
const lapack_int * ipiv, lapack_complex_double * B, lapack_int ldb);
Description
?hetrs_3 solves a system of linear equations A * X = B with a complex Hermitian matrix A using the
factorization computed by ?hetrf_rk: A = P*U*D*(UH)*(PT) or A = P*L*D*(LH)*(PT), where U (or L) is unit
upper (or lower) triangular matrix, UH (or LH) is the conjugate of U (or L), P is a permutation matrix, PT is the
transpose of P, and D is a Hermitian and block diagonal with 1-by-1 and 2-by-2 diagonal blocks.
This algorithm uses Level 3 BLAS.
Input Parameters
uplo Specifies whether the details of the factorization are stored as an upper or
lower triangular matrix:
548
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• = 'U': Upper triangular; form is A = P*U*D*(UH)*(PT).
• = 'L': Lower triangular; form is A = P*L*D*(LH)*(PT).
nrhs The number of right-hand sides; that is, the number of columns in the
matrix B. nrhs ≥ 0.
A Array of size max(1, lda*n). Diagonal of the block diagonal matrix D and
factor U or L as computed by ?hetrf_rk:
ipiv Array of size (n. Details of the interchanges and the block structure of D as
determined by ?hetrf_rk.
ldb The leading dimension of the array B. ldb ≥ max(1, n) for column-major
layout and ldb ≥ nrhs for row-major layout.
Output Parameters
Return Values
This function returns a value info.
= 0: Successful exit.
< 0: If info = -i, the ith argument had an illegal value.
?sptrs
Solves a system of linear equations with a UDU- or
LDL-factored symmetric coefficient matrix using
packed storage.
549
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
lapack_int LAPACKE_ssptrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const float * ap , const lapack_int * ipiv , float * b , lapack_int ldb );
lapack_int LAPACKE_dsptrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const double * ap , const lapack_int * ipiv , double * b , lapack_int ldb );
lapack_int LAPACKE_csptrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_float * ap , const lapack_int * ipiv , lapack_complex_float
* b , lapack_int ldb );
lapack_int LAPACKE_zsptrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_double * ap , const lapack_int * ipiv ,
lapack_complex_double * b , lapack_int ldb );
Include Files
• mkl.h
Description
The routine solves for X the system of linear equations A*X = B with a symmetric matrix A, given the Bunch-
Kaufman factorization of A:
if uplo='U', A = U*D*UT
if uplo='L', A = L*D*LT,
where U and L are upper and lower packed triangular matrices with unit diagonal and D is a symmetric
block-diagonal matrix. The system is solved with multiple right-hand sides stored in the columns of the
matrix B. You must supply the factor U (or L) and the array ipiv returned by the factorization routine ?sptrf.
Input Parameters
ipiv Array, size at least max(1, n). The ipiv array, as returned by ?sptrf.
b The array b contains the matrix B whose columns are the right-hand
sides for the system of equations. The size of b is max(1, ldb*nrhs)
for column major layout and max(1, ldb*n) for row major layout.
550
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
Output Parameters
Return Values
This function returns a value info.
Application Notes
For each right-hand side b, the computed solution is the exact solution of a perturbed system of equations (A
+ E)x = b, where
The total number of floating-point operations for one right-hand side vector is approximately 2n2 for real
flavors or 8n2 for complex flavors.
See Also
Matrix Storage Schemes
?hptrs
Solves a system of linear equations with a UDU- or
LDL-factored Hermitian coefficient matrix using
packed storage.
Syntax
lapack_int LAPACKE_chptrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_float * ap , const lapack_int * ipiv , lapack_complex_float
* b , lapack_int ldb );
551
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine solves for X the system of linear equations A*X = B with a Hermitian matrix A, given the Bunch-
Kaufman factorization of A:
if uplo='U', A = U*D*UH
if uplo='L', A = L*D*LH,
where U and L are upper and lower packed triangular matrices with unit diagonal and D is a symmetric
block-diagonal matrix. The system is solved with multiple right-hand sides stored in the columns of the
matrix B.
You must supply to this routine the arrays ap (containing U or L)and ipiv in the form returned by the
factorization routine ?hptrf.
Input Parameters
ipiv Array, size at least max(1, n). The ipiv array, as returned by ?hptrf.
b The array b contains the matrix B whose columns are the right-hand
sides for the system of equations. The size of b is max(1, ldb*nrhs)
for column major layout and max(1, ldb*n) for row major layout.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
Output Parameters
552
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Return Values
This function returns a value info.
Application Notes
For each right-hand side b, the computed solution is the exact solution of a perturbed system of equations (A
+ E)x = b, where
The total number of floating-point operations for one right-hand side vector is approximately 8n2 for complex
flavors.
To estimate the condition number κ∞(A), call ?hpcon.
See Also
Matrix Storage Schemes
?trtrs
Solves a system of linear equations with a triangular
coefficient matrix, with multiple right-hand sides.
Syntax
lapack_int LAPACKE_strtrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int nrhs , const float * a , lapack_int lda , float * b ,
lapack_int ldb );
lapack_int LAPACKE_dtrtrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int nrhs , const double * a , lapack_int lda , double * b ,
lapack_int ldb );
lapack_int LAPACKE_ctrtrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int nrhs , const lapack_complex_float * a , lapack_int lda ,
lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_ztrtrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int nrhs , const lapack_complex_double * a , lapack_int lda ,
lapack_complex_double * b , lapack_int ldb );
553
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine solves for X the following systems of linear equations with a triangular matrix A, with multiple
right-hand sides stored in B:
A*X = B if trans='N',
AT*X = B if trans='T',
Input Parameters
b The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.
The size of b is max(1, ldb*nrhs) for column major layout and
max(1, ldb*n) for row major layout.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
554
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
Return Values
This function returns a value info.
Application Notes
For each right-hand side b, the computed solution is the exact solution of a perturbed system of equations (A
+ E)x = b, where
Note that cond(A,x) can be much smaller than κ∞(A); the condition number of AT and AH might or might
not be equal to κ∞(A).
The approximate number of floating-point operations for one right-hand side vector b is n2 for real flavors
and 4n2 for complex flavors.
See Also
Matrix Storage Schemes
?tptrs
Solves a system of linear equations with a packed
triangular coefficient matrix, with multiple right-hand
sides.
Syntax
lapack_int LAPACKE_stptrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int nrhs , const float * ap , float * b , lapack_int ldb );
lapack_int LAPACKE_dtptrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int nrhs , const double * ap , double * b , lapack_int ldb );
lapack_int LAPACKE_ctptrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int nrhs , const lapack_complex_float * ap , lapack_complex_float
* b , lapack_int ldb );
lapack_int LAPACKE_ztptrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int nrhs , const lapack_complex_double * ap ,
lapack_complex_double * b , lapack_int ldb );
555
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine solves for X the following systems of linear equations with a packed triangular matrix A, with
multiple right-hand sides stored in B:
A*X = B if trans='N',
AT*X = B if trans='T',
Input Parameters
b The array b contains the matrix B whose columns are the right-hand
sides for the system of equations.
The size of b is max(1, ldb*nrhs) for column major layout and
max(1, ldb*n) for row major layout.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
556
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
Return Values
This function returns a value info.
Application Notes
For each right-hand side b, the computed solution is the exact solution of a perturbed system of equations (A
+ E)x = b, where
Note that cond(A,x) can be much smaller than κ∞(A); the condition number of AT and AH might or might
not be equal to κ∞(A).
The approximate number of floating-point operations for one right-hand side vector b is n2 for real flavors
and 4n2 for complex flavors.
See Also
Matrix Storage Schemes
?tbtrs
Solves a system of linear equations with a band
triangular coefficient matrix, with multiple right-hand
sides.
Syntax
lapack_int LAPACKE_stbtrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int kd , lapack_int nrhs , const float * ab , lapack_int ldab ,
float * b , lapack_int ldb );
lapack_int LAPACKE_dtbtrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int kd , lapack_int nrhs , const double * ab , lapack_int ldab ,
double * b , lapack_int ldb );
lapack_int LAPACKE_ctbtrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int kd , lapack_int nrhs , const lapack_complex_float * ab ,
lapack_int ldab , lapack_complex_float * b , lapack_int ldb );
557
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
lapack_int LAPACKE_ztbtrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int kd , lapack_int nrhs , const lapack_complex_double * ab ,
lapack_int ldab , lapack_complex_double * b , lapack_int ldb );
Include Files
• mkl.h
Description
The routine solves for X the following systems of linear equations with a band triangular matrix A, with
multiple right-hand sides stored in B:
A*X = B if trans='N',
AT*X = B if trans='T',
Input Parameters
b The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.
The size of b is max(1, ldb*nrhs) for column major layout and
max(1, ldb*n) for row major layout.
558
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ldab The leading dimension of ab; ldab≥kd + 1.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
Output Parameters
Return Values
This function returns a value info.
Application Notes
For each right-hand side b, the computed solution is the exact solution of a perturbed system of equations (A
+ E)x = b, where
|E|≤ c(n)ε|A|
c(n) is a modest linear function of n, and ε is the machine precision. If x0 is the true solution, the computed
solution x satisfies this error bound:
Note that cond(A,x) can be much smaller than κ∞(A); the condition number of AT and AH might or might
not be equal to κ∞(A).
The approximate number of floating-point operations for one right-hand side vector b is 2n*kd for real
flavors and 8n*kd for complex flavors.
See Also
Matrix Storage Schemes
?gecon
Estimates the reciprocal of the condition number of a
general matrix in the 1-norm or the infinity-norm.
559
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
lapack_int LAPACKE_sgecon( int matrix_layout, char norm, lapack_int n, const float* a,
lapack_int lda, float anorm, float* rcond );
lapack_int LAPACKE_dgecon( int matrix_layout, char norm, lapack_int n, const double* a,
lapack_int lda, double anorm, double* rcond );
lapack_int LAPACKE_cgecon( int matrix_layout, char norm, lapack_int n, const
lapack_complex_float* a, lapack_int lda, float anorm, float* rcond );
lapack_int LAPACKE_zgecon( int matrix_layout, char norm, lapack_int n, const
lapack_complex_double* a, lapack_int lda, double anorm, double* rcond );
Include Files
• mkl.h
Description
The routine estimates the reciprocal of the condition number of a general matrix A in the 1-norm or infinity-
norm:
κ1(A) =||A||1||A-1||1 = κ∞(AT) = κ∞(AH)
κ∞(A) =||A||∞||A-1||∞ = κ1(AT) = κ1(AH).
An estimate is obtained for ||A-1||, and the reciprocal of the condition number is computed as rcond =
1 / (||A|| ||A-1||).
Before calling this routine:
Input Parameters
560
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
rcond An estimate of the reciprocal of the condition number. The routine sets
rcond = 0 if the estimate underflows; in this case the matrix is
singular (to working precision). However, anytime rcond is small
compared to 1.0, for the working precision, the matrix may be poorly
conditioned or even singular.
Return Values
This function returns a value info.
Application Notes
The computed rcond is never less than r (the reciprocal of the true condition number) and in practice is
nearly always less than 10r. A call to this routine involves solving a number of systems of linear equations
A*x = b or AH*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires
approximately 2*n2 floating-point operations for real flavors and 8*n2 for complex flavors.
See Also
Matrix Storage Schemes
?gbcon
Estimates the reciprocal of the condition number of a
band matrix in the 1-norm or the infinity-norm.
Syntax
lapack_int LAPACKE_sgbcon( int matrix_layout, char norm, lapack_int n, lapack_int kl,
lapack_int ku, const float* ab, lapack_int ldab, const lapack_int* ipiv, float anorm,
float* rcond );
lapack_int LAPACKE_dgbcon( int matrix_layout, char norm, lapack_int n, lapack_int kl,
lapack_int ku, const double* ab, lapack_int ldab, const lapack_int* ipiv, double anorm,
double* rcond );
lapack_int LAPACKE_cgbcon( int matrix_layout, char norm, lapack_int n, lapack_int kl,
lapack_int ku, const lapack_complex_float* ab, lapack_int ldab, const lapack_int* ipiv,
float anorm, float* rcond );
lapack_int LAPACKE_zgbcon( int matrix_layout, char norm, lapack_int n, lapack_int kl,
lapack_int ku, const lapack_complex_double* ab, lapack_int ldab, const lapack_int*
ipiv, double anorm, double* rcond );
Include Files
• mkl.h
Description
The routine estimates the reciprocal of the condition number of a general band matrix A in the 1-norm or
infinity-norm:
κ1(A) = ||A||1||A-1||1 = κ∞(AT) = κ∞(AH)
κ∞(A) = ||A||∞||A-1||∞ = κ1(AT) = κ1(AH).
561
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
An estimate is obtained for ||A-1||, and the reciprocal of the condition number is computed as rcond =
1 / (||A|| ||A-1||).
Before calling this routine:
Input Parameters
ldab The leading dimension of the array ab. (ldab≥ 2*kl + ku +1).
ipiv Array, size at least max(1, n). The ipiv array, as returned by ?gbtrf.
ab The array abof size max(1, ldab*n) contains the factored band matrix
A, as returned by ?gbtrf.
Output Parameters
rcond An estimate of the reciprocal of the condition number. The routine sets
rcond =0 if the estimate underflows; in this case the matrix is singular
(to working precision). However, anytime rcond is small compared to
1.0, for the working precision, the matrix may be poorly conditioned
or even singular.
Return Values
This function returns a value info.
Application Notes
The computed rcond is never less than r (the reciprocal of the true condition number) and in practice is
nearly always less than 10r. A call to this routine involves solving a number of systems of linear equations
A*x = b or AH*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires
approximately 2n(ku + 2kl) floating-point operations for real flavors and 8n(ku + 2kl) for complex
flavors.
562
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
See Also
Matrix Storage Schemes
?gtcon
Estimates the reciprocal of the condition number of a
tridiagonal matrix.
Syntax
lapack_int LAPACKE_sgtcon( char norm, lapack_int n, const float* dl, const float* d,
const float* du, const float* du2, const lapack_int* ipiv, float anorm, float* rcond );
lapack_int LAPACKE_dgtcon( char norm, lapack_int n, const double* dl, const double* d,
const double* du, const double* du2, const lapack_int* ipiv, double anorm, double*
rcond );
lapack_int LAPACKE_cgtcon( char norm, lapack_int n, const lapack_complex_float* dl,
const lapack_complex_float* d, const lapack_complex_float* du, const
lapack_complex_float* du2, const lapack_int* ipiv, float anorm, float* rcond );
lapack_int LAPACKE_zgtcon( char norm, lapack_int n, const lapack_complex_double* dl,
const lapack_complex_double* d, const lapack_complex_double* du, const
lapack_complex_double* du2, const lapack_int* ipiv, double anorm, double* rcond );
Include Files
• mkl.h
Description
The routine estimates the reciprocal of the condition number of a real or complex tridiagonal matrix A in the
1-norm or infinity-norm:
κ1(A) = ||A||1||A-1||1
κ∞(A) = ||A||∞||A-1||∞
An estimate is obtained for ||A-1||, and the reciprocal of the condition number is computed as rcond =
1 / (||A|| ||A-1||).
Before calling this routine:
Input Parameters
563
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ipiv Array, size (n). The array of pivot indices, as returned by ?gttrf.
Output Parameters
rcond An estimate of the reciprocal of the condition number. The routine sets
rcond=0 if the estimate underflows; in this case the matrix is singular
(to working precision). However, anytime rcond is small compared to
1.0, for the working precision, the matrix may be poorly conditioned
or even singular.
Return Values
This function returns a value info.
Application Notes
The computed rcond is never less than r (the reciprocal of the true condition number) and in practice is
nearly always less than 10r. A call to this routine involves solving a number of systems of linear equations
A*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires approximately 2n2
floating-point operations for real flavors and 8n2 for complex flavors.
?pocon
Estimates the reciprocal of the condition number of a
symmetric (Hermitian) positive-definite matrix.
Syntax
lapack_int LAPACKE_spocon( int matrix_layout, char uplo, lapack_int n, const float* a,
lapack_int lda, float anorm, float* rcond );
lapack_int LAPACKE_dpocon( int matrix_layout, char uplo, lapack_int n, const double* a,
lapack_int lda, double anorm, double* rcond );
lapack_int LAPACKE_cpocon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float* a, lapack_int lda, float anorm, float* rcond );
lapack_int LAPACKE_zpocon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double* a, lapack_int lda, double anorm, double* rcond );
Include Files
• mkl.h
564
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
The routine estimates the reciprocal of the condition number of a symmetric (Hermitian) positive-definite
matrix A:
κ1(A) = ||A||1 ||A-1||1 (since A is symmetric or Hermitian, κ∞(A) = κ1(A)).
An estimate is obtained for ||A-1||, and the reciprocal of the condition number is computed as rcond =
1 / (||A|| ||A-1||).
Before calling this routine:
Input Parameters
Output Parameters
rcond An estimate of the reciprocal of the condition number. The routine sets
rcond =0 if the estimate underflows; in this case the matrix is singular
(to working precision). However, anytime rcond is small compared to
1.0, for the working precision, the matrix may be poorly conditioned
or even singular.
Return Values
This function returns a value info.
565
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Application Notes
The computed rcond is never less than r (the reciprocal of the true condition number) and in practice is
nearly always less than 10r. A call to this routine involves solving a number of systems of linear equations
A*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires approximately 2n2
floating-point operations for real flavors and 8n2 for complex flavors.
See Also
Matrix Storage Schemes
?ppcon
Estimates the reciprocal of the condition number of a
packed symmetric (Hermitian) positive-definite
matrix.
Syntax
lapack_int LAPACKE_sppcon( int matrix_layout, char uplo, lapack_int n, const float* ap,
float anorm, float* rcond );
lapack_int LAPACKE_dppcon( int matrix_layout, char uplo, lapack_int n, const double*
ap, double anorm, double* rcond );
lapack_int LAPACKE_cppcon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float* ap, float anorm, float* rcond );
lapack_int LAPACKE_zppcon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double* ap, double anorm, double* rcond );
Include Files
• mkl.h
Description
The routine estimates the reciprocal of the condition number of a packed symmetric (Hermitian) positive-
definite matrix A:
κ1(A) = ||A||1 ||A-1||1 (since A is symmetric or Hermitian, κ∞(A) = κ1(A)).
An estimate is obtained for ||A-1||, and the reciprocal of the condition number is computed as rcond =
1 / (||A|| ||A-1||).
Before calling this routine:
Input Parameters
566
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If uplo = 'L', A is factored as A = L*LT for real flavors or A = L*LH
for complex flavors, and L is stored.
Output Parameters
rcond An estimate of the reciprocal of the condition number. The routine sets
rcond =0 if the estimate underflows; in this case the matrix is singular
(to working precision). However, anytime rcond is small compared to
1.0, for the working precision, the matrix may be poorly conditioned
or even singular.
Return Values
This function returns a value info.
Application Notes
The computed rcond is never less than r (the reciprocal of the true condition number) and in practice is
nearly always less than 10r. A call to this routine involves solving a number of systems of linear equations
A*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires approximately 2n2
floating-point operations for real flavors and 8n2 for complex flavors.
See Also
Matrix Storage Schemes
?pbcon
Estimates the reciprocal of the condition number of a
symmetric (Hermitian) positive-definite band matrix.
Syntax
lapack_int LAPACKE_spbcon( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
const float* ab, lapack_int ldab, float anorm, float* rcond );
lapack_int LAPACKE_dpbcon( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
const double* ab, lapack_int ldab, double anorm, double* rcond );
lapack_int LAPACKE_cpbcon( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
const lapack_complex_float* ab, lapack_int ldab, float anorm, float* rcond );
lapack_int LAPACKE_zpbcon( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
const lapack_complex_double* ab, lapack_int ldab, double anorm, double* rcond );
Include Files
• mkl.h
Description
567
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The routine estimates the reciprocal of the condition number of a symmetric (Hermitian) positive-definite
band matrix A:
κ1(A) = ||A||1 ||A-1||1 (since A is symmetric or Hermitian, κ∞(A) = κ1(A)).
An estimate is obtained for ||A-1||, and the reciprocal of the condition number is computed as rcond =
1 / (||A|| ||A-1||).
Before calling this routine:
Input Parameters
Output Parameters
rcond An estimate of the reciprocal of the condition number. The routine sets
rcond =0 if the estimate underflows; in this case the matrix is singular
(to working precision). However, anytime rcond is small compared to
1.0, for the working precision, the matrix may be poorly conditioned
or even singular.
Return Values
This function returns a value info.
Application Notes
The computed rcond is never less than r (the reciprocal of the true condition number) and in practice is
nearly always less than 10r. A call to this routine involves solving a number of systems of linear equations
A*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires approximately
4*n(kd + 1) floating-point operations for real flavors and 16*n(kd + 1) for complex flavors.
568
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
See Also
Matrix Storage Schemes
?ptcon
Estimates the reciprocal of the condition number of a
symmetric (Hermitian) positive-definite tridiagonal
matrix.
Syntax
lapack_int LAPACKE_sptcon( lapack_int n, const float* d, const float* e, float anorm,
float* rcond );
lapack_int LAPACKE_dptcon( lapack_int n, const double* d, const double* e, double
anorm, double* rcond );
lapack_int LAPACKE_cptcon( lapack_int n, const float* d, const lapack_complex_float* e,
float anorm, float* rcond );
lapack_int LAPACKE_zptcon( lapack_int n, const double* d, const lapack_complex_double*
e, double anorm, double* rcond );
Include Files
• mkl.h
Description
The routine computes the reciprocal of the condition number (in the 1-norm) of a real symmetric or complex
Hermitian positive-definite tridiagonal matrix using the factorization A = L*D*LT for real flavors and A =
L*D*LH for complex flavors or A = UT*D*U for real flavors and A = UH*D*U for complex flavors computed
by ?pttrf :
Input Parameters
569
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
rcond An estimate of the reciprocal of the condition number. The routine sets
rcond =0 if the estimate underflows; in this case the matrix is singular
(to working precision). However, anytime rcond is small compared to
1.0, for the working precision, the matrix may be poorly conditioned
or even singular.
Return Values
This function returns a value info.
Application Notes
The computed rcond is never less than r (the reciprocal of the true condition number) and in practice is
nearly always less than 10r. A call to this routine involves solving a number of systems of linear equations
A*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires approximately
4*n(kd + 1) floating-point operations for real flavors and 16*n(kd + 1) for complex flavors.
?sycon
Estimates the reciprocal of the condition number of a
symmetric matrix.
Syntax
lapack_int LAPACKE_ssycon( int matrix_layout, char uplo, lapack_int n, const float* a,
lapack_int lda, const lapack_int* ipiv, float anorm, float* rcond );
lapack_int LAPACKE_dsycon( int matrix_layout, char uplo, lapack_int n, const double* a,
lapack_int lda, const lapack_int* ipiv, double anorm, double* rcond );
lapack_int LAPACKE_csycon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float* a, lapack_int lda, const lapack_int* ipiv, float anorm, float*
rcond );
lapack_int LAPACKE_zsycon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double* a, lapack_int lda, const lapack_int* ipiv, double anorm, double*
rcond );
Include Files
• mkl.h
Description
The routine estimates the reciprocal of the condition number of a symmetric matrix A:
κ1(A) = ||A||1 ||A-1||1 (since A is symmetric, κ∞(A) = κ1(A)).
An estimate is obtained for ||A-1||, and the reciprocal of the condition number is computed as rcond =
1 / (||A|| ||A-1||).
Before calling this routine:
570
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• compute anorm (either ||A||1 = maxjΣi |aij| or ||A||∞ = maxiΣj |aij|)
• call ?sytrf to compute the factorization of A.
Input Parameters
If uplo = 'L', the array a stores the lower triangular factor L of the
factorization A = L*D*LT.
Output Parameters
rcond An estimate of the reciprocal of the condition number. The routine sets
rcond =0 if the estimate underflows; in this case the matrix is singular
(to working precision). However, anytime rcond is small compared to
1.0, for the working precision, the matrix may be poorly conditioned
or even singular.
Return Values
This function returns a value info.
Application Notes
The computed rcond is never less than r (the reciprocal of the true condition number) and in practice is
nearly always less than 10r. A call to this routine involves solving a number of systems of linear equations
A*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires approximately 2n2
floating-point operations for real flavors and 8n2 for complex flavors.
See Also
Matrix Storage Schemes
571
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?sycon_3
Estimates the reciprocal of the condition number (in
the 1-norm) of a real or complex symmetric matrix A
using the factorization computed by ?sytrf_rk.
lapack_int LAPACKE_ssycon_3 (int matrix_layout, char uplo, lapack_int n, const float *
A, lapack_int lda, const float * e, const lapack_int * ipiv, float anorm, float *
rcond);
lapack_int LAPACKE_dsycon_3 (int matrix_layout, char uplo, lapack_int n, const double *
A, lapack_int lda, const double * e, const lapack_int * ipiv, double anorm, double *
rcond);
lapack_int LAPACKE_csycon_3 (int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float * A, lapack_int lda, const lapack_complex_float * e, const
lapack_int * ipiv, float anorm, float * rcond);
lapack_int LAPACKE_zsycon_3 (int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double * A, lapack_int lda, const lapack_complex_double * e, const
lapack_int * ipiv, double anorm, double * rcond);
Description
?sycon_3 estimates the reciprocal of the condition number (in the 1-norm) of a real or complex symmetric
matrix A using the factorization computed by ?sytrf_rk. A = P*U*D*(UT)*(PT) or A = P*L*D*(LT)*(PT),
where U (or L) is unit upper (or lower) triangular matrix, UT (or LT) is the transpose of U (or L), P is a
permutation matrix, PT is the transpose of P, and D is symmetric and block diagonal with 1-by-1 and 2-by-2
diagonal blocks.
An estimate is obtained for norm(inv(A)), and the reciprocal of the condition number is computed as rcond
= 1 / (anorm * norm(inv(A))).
Input Parameters
uplo Specifies whether the details of the factorization are stored as an upper or
lower triangular matrix:
A Array of size max(1, lda*n). Diagonal of the block diagonal matrix D and
factors U or L as computed by ?sytrf_rk:
—and—
• If uplo = 'U', factor U in the superdiagonal part of A. If uplo = 'L',
factor L in the subdiagonal part of A.
572
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
e Array of size n. On entry, contains the superdiagonal (or subdiagonal)
elements of the symmetric block diagonal matrix D with 1-by-1 or 2-by-2
diagonal blocks. If uplo = 'U', e(i) = D(i-1,i), i=2:N, and e(1) is not
referenced. If uplo = 'L', e(i) = D(i+1,i), i=1:N-1, and e(n) is not
referenced.
ipiv Array of size n. Details of the interchanges and the block structure of D as
determined by ?sytrf_rk.
Output Parameters
rcond The reciprocal of the condition number of the matrix A, computed as rcond
= 1/(anorm * AINVNM), where AINVNM is an estimate of the 1-norm of
inv(A) computed in this routine.
Return Values
This function returns a value info.
= 0: Successful exit.
< 0: If info = -i, the ith argument had an illegal value.
?hecon
Estimates the reciprocal of the condition number of a
Hermitian matrix.
Syntax
lapack_int LAPACKE_checon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float* a, lapack_int lda, const lapack_int* ipiv, float anorm, float*
rcond );
lapack_int LAPACKE_zhecon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double* a, lapack_int lda, const lapack_int* ipiv, double anorm, double*
rcond );
Include Files
• mkl.h
Description
The routine estimates the reciprocal of the condition number of a Hermitian matrix A:
κ1(A) = ||A||1 ||A-1||1 (since A is Hermitian, κ∞(A) = κ1(A)).
Before calling this routine:
573
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
If uplo = 'L', the array a stores the lower triangular factor L of the
factorization A = L*D*LH.
Output Parameters
rcond An estimate of the reciprocal of the condition number. The routine sets
rcond =0 if the estimate underflows; in this case the matrix is singular
(to working precision). However, anytime rcond is small compared to
1.0, for the working precision, the matrix may be poorly conditioned
or even singular.
Return Values
This function returns a value info.
Application Notes
The computed rcond is never less than r (the reciprocal of the true condition number) and in practice is
nearly always less than 10r. A call to this routine involves solving a number of systems of linear equations
A*x = b; the number is usually 5 and never more than 11. Each solution requires approximately 8n2
floating-point operations.
See Also
Matrix Storage Schemes
?hecon_3
Estimates the reciprocal of the condition number (in
the 1-norm) of a complex Hermitian matrix A.
574
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_checon_3 (int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float * A, lapack_int lda, const lapack_complex_float * e, const
lapack_int * ipiv, float anorm, float * rcond);
lapack_int LAPACKE_zhecon_3 (int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double * A, lapack_int lda, const lapack_complex_double * e, const
lapack_int * ipiv, double anorm, double * rcond);
Description
?hecon_3 estimates the reciprocal of the condition number (in the 1-norm) of a complex Hermitian matrix A
using the factorization computed by ?hetrf_rk: A = P*U*D*(UH)*(PT) or A = P*L*D*(LH)*(PT), where U (or
L) is unit upper (or lower) triangular matrix, UH (or LH) is the conjugate of U (or L), P is a permutation
matrix, PT is the transpose of P, and D is Hermitian and block diagonal with 1-by-1 and 2-by-2 diagonal
blocks. An estimate is obtained for norm(inv(A)), and the reciprocal of the condition number is computed as
rcond = 1 / (anorm * norm(inv(A))).
This routine uses BLAS3 solver ?hetrs_3.
Input Parameters
uplo Specifies whether the details of the factorization are stored as an upper or
lower triangular matrix: = 'U': Upper triangular, form is A =
P*U*D*(UH)*(PT); = 'L': Lower triangular, form is A = P*L*D*(LH)*(PT).
A Array of size max(1, lda*n). Diagonal of the block diagonal matrix D and
factor U or L as computed by ?hetrf_rk:
—and—
• If uplo = 'U', factor U in the superdiagonal part of A. If uplo = 'L',
factor L in the subdiagonal part of A.
ipiv Array of size n. Details of the interchanges and the block structure of D as
determined by ?hetrf_rk.
575
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
rcond The reciprocal of the condition number of the matrix A, computed as rcond
= 1/(anorm * AINVNM), where AINVNM is an estimate of the 1-norm of
inv(A) computed in this routine.
Return Values
This function returns a value info.
= 0: Successful exit.
< 0: If info = -i, the ith argument had an illegal value.
?spcon
Estimates the reciprocal of the condition number of a
packed symmetric matrix.
Syntax
lapack_int LAPACKE_sspcon( int matrix_layout, char uplo, lapack_int n, const float* ap,
const lapack_int* ipiv, float anorm, float* rcond );
lapack_int LAPACKE_dspcon( int matrix_layout, char uplo, lapack_int n, const double*
ap, const lapack_int* ipiv, double anorm, double* rcond );
lapack_int LAPACKE_cspcon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float* ap, const lapack_int* ipiv, float anorm, float* rcond );
lapack_int LAPACKE_zspcon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double* ap, const lapack_int* ipiv, double anorm, double* rcond );
Include Files
• mkl.h
Description
The routine estimates the reciprocal of the condition number of a packed symmetric matrix A:
κ1(A) = ||A||1 ||A-1||1 (since A is symmetric, κ∞(A) = κ1(A)).
An estimate is obtained for ||A-1||, and the reciprocal of the condition number is computed as rcond =
1 / (||A|| ||A-1||).
Before calling this routine:
Input Parameters
576
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If uplo = 'U', the array ap stores the packed upper triangular factor
U of the factorization A = U*D*UT.
If uplo = 'L', the array ap stores the packed lower triangular factor
L of the factorization A = L*D*LT.
Output Parameters
rcond An estimate of the reciprocal of the condition number. The routine sets
rcond = 0 if the estimate underflows; in this case the matrix is
singular (to working precision). However, anytime rcond is small
compared to 1.0, for the working precision, the matrix may be poorly
conditioned or even singular.
Return Values
This function returns a value info.
Application Notes
The computed rcond is never less than r (the reciprocal of the true condition number) and in practice is
nearly always less than 10r. A call to this routine involves solving a number of systems of linear equations
A*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires approximately 2n2
floating-point operations for real flavors and 8n2 for complex flavors.
See Also
Matrix Storage Schemes
?hpcon
Estimates the reciprocal of the condition number of a
packed Hermitian matrix.
Syntax
lapack_int LAPACKE_chpcon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float* ap, const lapack_int* ipiv, float anorm, float* rcond );
lapack_int LAPACKE_zhpcon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double* ap, const lapack_int* ipiv, double anorm, double* rcond );
Include Files
• mkl.h
577
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Description
The routine estimates the reciprocal of the condition number of a Hermitian matrix A:
κ1(A) = ||A||1 ||A-1||1 (since A is Hermitian, κ∞(A) = k1(A)).
An estimate is obtained for ||A-1||, and the reciprocal of the condition number is computed as rcond =
1 / (||A|| ||A-1||).
Before calling this routine:
Input Parameters
If uplo = 'L', the array ap stores the packed lower triangular factor
L of the factorization A = L*D*LT.
ipiv Array, size at least max(1, n). The array ipiv, as returned by ?hptrf.
Output Parameters
rcond An estimate of the reciprocal of the condition number. The routine sets
rcond =0 if the estimate underflows; in this case the matrix is singular
(to working precision). However, anytime rcond is small compared to
1.0, for the working precision, the matrix may be poorly conditioned
or even singular.
Return Values
This function returns a value info.
Application Notes
The computed rcond is never less than r (the reciprocal of the true condition number) and in practice is
nearly always less than 10r. A call to this routine involves solving a number of systems of linear equations
A*x = b; the number is usually 5 and never more than 11. Each solution requires approximately 8n2
floating-point operations.
578
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
See Also
Matrix Storage Schemes
?trcon
Estimates the reciprocal of the condition number of a
triangular matrix.
Syntax
lapack_int LAPACKE_strcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, const float* a, lapack_int lda, float* rcond );
lapack_int LAPACKE_dtrcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, const double* a, lapack_int lda, double* rcond );
lapack_int LAPACKE_ctrcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, const lapack_complex_float* a, lapack_int lda, float* rcond );
lapack_int LAPACKE_ztrcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, const lapack_complex_double* a, lapack_int lda, double* rcond );
Include Files
• mkl.h
Description
The routine estimates the reciprocal of the condition number of a triangular matrix A in either the 1-norm or
infinity-norm:
κ1(A) =||A||1 ||A-1||1 = κ∞(AT) = κ∞(AH)
κ∞ (A) =||A||∞ ||A-1||∞ =k1 (AT) = κ1 (AH) .
Input Parameters
579
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
rcond An estimate of the reciprocal of the condition number. The routine sets
rcond =0 if the estimate underflows; in this case the matrix is singular
(to working precision). However, anytime rcond is small compared to
1.0, for the working precision, the matrix may be poorly conditioned
or even singular.
Return Values
This function returns a value info.
Application Notes
The computed rcond is never less than r (the reciprocal of the true condition number) and in practice is
nearly always less than 10r. A call to this routine involves solving a number of systems of linear equations
A*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires approximately n2
floating-point operations for real flavors and 4n2 operations for complex flavors.
See Also
Matrix Storage Schemes
?tpcon
Estimates the reciprocal of the condition number of a
packed triangular matrix.
Syntax
lapack_int LAPACKE_stpcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, const float* ap, float* rcond );
lapack_int LAPACKE_dtpcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, const double* ap, double* rcond );
lapack_int LAPACKE_ctpcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, const lapack_complex_float* ap, float* rcond );
lapack_int LAPACKE_ztpcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, const lapack_complex_double* ap, double* rcond );
Include Files
• mkl.h
Description
580
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The routine estimates the reciprocal of the condition number of a packed triangular matrix A in either the 1-
norm or infinity-norm:
κ1(A) =||A||1 ||A-1||1 = κ∞(AT) = κ∞(AH)
κ∞(A) =||A||∞ ||A-1||∞ =κ1 (AT) = κ1(AH) .
Input Parameters
Output Parameters
rcond An estimate of the reciprocal of the condition number. The routine sets
rcond =0 if the estimate underflows; in this case the matrix is singular
(to working precision). However, anytime rcond is small compared to
1.0, for the working precision, the matrix may be poorly conditioned
or even singular.
Return Values
This function returns a value info.
Application Notes
The computed rcond is never less than r (the reciprocal of the true condition number) and in practice is
nearly always less than 10r. A call to this routine involves solving a number of systems of linear equations
A*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires approximately n2
floating-point operations for real flavors and 4n2 operations for complex flavors.
581
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
See Also
Matrix Storage Schemes
?tbcon
Estimates the reciprocal of the condition number of a
triangular band matrix.
Syntax
lapack_int LAPACKE_stbcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, lapack_int kd, const float* ab, lapack_int ldab, float* rcond );
lapack_int LAPACKE_dtbcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, lapack_int kd, const double* ab, lapack_int ldab, double* rcond );
lapack_int LAPACKE_ctbcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, lapack_int kd, const lapack_complex_float* ab, lapack_int ldab, float*
rcond );
lapack_int LAPACKE_ztbcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, lapack_int kd, const lapack_complex_double* ab, lapack_int ldab, double*
rcond );
Include Files
• mkl.h
Description
The routine estimates the reciprocal of the condition number of a triangular band matrix A in either the 1-
norm or infinity-norm:
κ1(A) =||A||1 ||A-1||1 = κ∞(AT) = κ∞(AH)
κ∞(A) =||A||∞ ||A-1||∞ =κ1 (AT) = κ1(AH) .
Input Parameters
582
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If diag = 'U', then A is unit triangular: diagonal elements are
assumed to be 1 and not referenced in the array ab.
Output Parameters
rcond An estimate of the reciprocal of the condition number. The routine sets
rcond =0 if the estimate underflows; in this case the matrix is singular
(to working precision). However, anytime rcond is small compared to
1.0, for the working precision, the matrix may be poorly conditioned
or even singular.
Return Values
This function returns a value info.
Application Notes
The computed rcond is never less than r (the reciprocal of the true condition number) and in practice is
nearly always less than 10r. A call to this routine involves solving a number of systems of linear equations
A*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires approximately
2*n(kd + 1) floating-point operations for real flavors and 8*n(kd + 1) operations for complex flavors.
See Also
Matrix Storage Schemes
Refining the Solution and Estimating Its Error: LAPACK Computational Routines
This section describes the LAPACK routines for refining the computed solution of a system of linear equations
and estimating the solution error. You can call these routines after factorizing the matrix of the system of
equations and computing the solution (see Routines for Matrix Factorization and Routines for Solving
Systems of Linear Equations).
?gerfs
Refines the solution of a system of linear equations
with a general coefficient matrix and estimates its
error.
Syntax
lapack_int LAPACKE_sgerfs( int matrix_layout, char trans, lapack_int n, lapack_int
nrhs, const float* a, lapack_int lda, const float* af, lapack_int ldaf, const
lapack_int* ipiv, const float* b, lapack_int ldb, float* x, lapack_int ldx, float* ferr,
float* berr );
lapack_int LAPACKE_dgerfs( int matrix_layout, char trans, lapack_int n, lapack_int
nrhs, const double* a, lapack_int lda, const double* af, lapack_int ldaf, const
lapack_int* ipiv, const double* b, lapack_int ldb, double* x, lapack_int ldx, double*
ferr, double* berr );
583
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine performs an iterative refinement of the solution to a system of linear equations A*X = B or AT*X
= B or AH*X = B with a general matrix A, with multiple right-hand sides. For each computed solution vector
x, the routine computes the component-wise backward errorβ. This error is the smallest relative
perturbation in elements of A and b such that x is the exact solution of the perturbed system:
|δaij| ≤β|aij|, |δbi| ≤β|bi| such that (A + δA)x = (b + δb).
Finally, the routine estimates the component-wise forward error in the computed solution ||x - xe||∞/||
x||∞ (here xe is the exact solution).
Before calling this routine:
Input Parameters
a,af,b,x Arrays:
a(size max(1, lda*n)) contains the original matrix A, as supplied
to ?getrf.
bof size max(1, ldb*nrhs) for column major layout and max(1,
ldb*n) for row major layout contains the right-hand side matrix B.
584
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
xof size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout contains the solution matrix X.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
ldx The leading dimension of x; ldx≥ max(1, n) for column major layout
and ldx≥nrhs for row major layout.
Output Parameters
ferr, berr Arrays, size at least max(1, nrhs). Contain the component-wise
forward and backward errors, respectively, for each solution vector.
Return Values
This function returns a value info.
Application Notes
The bounds returned in ferr are not rigorous, but in practice they almost always overestimate the actual
error.
For each right-hand side, computation of the backward error involves a minimum of 4n2 floating-point
operations (for real flavors) or 16n2 operations (for complex flavors). In addition, each step of iterative
refinement involves 6n2 operations (for real flavors) or 24n2 operations (for complex flavors); the number of
iterations may range from 1 to 5. Estimating the forward error involves solving a number of systems of linear
equations A*x = b with the same coefficient matrix A and different right hand sides b; the number is usually
4 or 5 and never more than 11. Each solution requires approximately 2n2 floating-point operations for real
flavors or 8n2 for complex flavors.
See Also
Matrix Storage Schemes
?gerfsx
Uses extra precise iterative refinement to improve the
solution to the system of linear equations with a
general coefficient matrix A and provides error bounds
and backward error estimates.
585
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
lapack_int LAPACKE_sgerfsx( int matrix_layout, char trans, char equed, lapack_int n,
lapack_int nrhs, const float* a, lapack_int lda, const float* af, lapack_int ldaf,
const lapack_int* ipiv, const float* r, const float* c, const float* b, lapack_int ldb,
float* x, lapack_int ldx, float* rcond, float* berr, lapack_int n_err_bnds, float*
err_bnds_norm, float* err_bnds_comp, lapack_int nparams, float* params );
lapack_int LAPACKE_dgerfsx( int matrix_layout, char trans, char equed, lapack_int n,
lapack_int nrhs, const double* a, lapack_int lda, const double* af, lapack_int ldaf,
const lapack_int* ipiv, const double* r, const double* c, const double* b, lapack_int
ldb, double* x, lapack_int ldx, double* rcond, double* berr, lapack_int n_err_bnds,
double* err_bnds_norm, double* err_bnds_comp, lapack_int nparams, double* params );
lapack_int LAPACKE_cgerfsx( int matrix_layout, char trans, char equed, lapack_int n,
lapack_int nrhs, const lapack_complex_float* a, lapack_int lda, const
lapack_complex_float* af, lapack_int ldaf, const lapack_int* ipiv, const float* r,
const float* c, const lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x,
lapack_int ldx, float* rcond, float* berr, lapack_int n_err_bnds, float* err_bnds_norm,
float* err_bnds_comp, lapack_int nparams, float* params );
lapack_int LAPACKE_zgerfsx( int matrix_layout, char trans, char equed, lapack_int n,
lapack_int nrhs, const lapack_complex_double* a, lapack_int lda, const
lapack_complex_double* af, lapack_int ldaf, const lapack_int* ipiv, const double* r,
const double* c, const lapack_complex_double* b, lapack_int ldb, lapack_complex_double*
x, lapack_int ldx, double* rcond, double* berr, lapack_int n_err_bnds, double*
err_bnds_norm, double* err_bnds_comp, lapack_int nparams, double* params );
Include Files
• mkl.h
Description
The routine improves the computed solution to a system of linear equations and provides error bounds and
backward error estimates for the solution. In addition to a normwise error bound, the code provides a
maximum componentwise error bound, if possible. See comments for err_bnds_norm and err_bnds_comp
for details of the error bounds.
The original system of linear equations may have been equilibrated before calling this routine, as described
by the parameters equed, r, and c below. In this case, the solution and error bounds returned are for the
original unequilibrated system.
Input Parameters
586
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
equed Must be 'N', 'R', 'C', or 'B'.
If equed = 'R', row equilibration was done, that is, A has been
premultiplied by diag(r).
If equed = 'C', column equilibration was done, that is, A has been
postmultiplied by diag(c).
If equed = 'B', both row and column equilibration was done, that is,
A has been replaced by diag(r)*A*diag(c). The right-hand side B
has been changed accordingly.
The array af contains the factored form of the matrix A, that is, the
factors L and U from the factorization A = P*L*U as computed
by ?getrf.
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.
ipiv Array, size at least max(1, n). Contains the pivot indices as
computed by ?getrf; for row 1 ≤i≤n, row i of the matrix was
interchanged with row ipiv(i).
r, c Arrays: r (size n), c (size n). The array r contains the row scale
factors for A, and the array c contains the column scale factors for A.
587
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ldb The leading dimension of the array b; ldb≥ max(1, n) for column
major layout and ldb≥nrhs for row major layout.
x Array, of size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout.
The solution matrix X as computed by ?getrs
ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.
n_err_bnds Number of error bounds to return for each right hand side and each
type (normwise or componentwise). See err_bnds_norm and
err_bnds_comp descriptions in Output Arguments section below.
Default 10.0
588
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
params[2] : Flag determining if the code will attempt to find a
solution with a small componentwise relative error in the double-
precision algorithm. Positive is true, 0.0 is false. Default: 1.0 (attempt
componentwise convergence).
Output Parameters
589
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
590
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
precision flavors. This error bound should
only be trusted if the previous boolean is
true.
params Output parameter only if the input contains erroneous values, namely,
in params[0], params[1], params[2]. In such a case, the
corresponding elements of params are filled with default values on
output.
Return Values
This function returns a value info.
If info = 0, the execution is successful. The solution to every right-hand side is guaranteed.
If 0 < info≤n: Uinfo,info is exactly zero. The factorization has been completed, but the factor U is exactly
singular, so the solution and error bounds could not be computed; rcond = 0 is returned.
If info = n+j: The solution corresponding to the j-th right-hand side is not guaranteed. The solutions
corresponding to other right-hand sides k with k > j may not be guaranteed as well, but only the first such
right-hand side is reported. If a small componentwise error is not requested params[2] = 0.0, then the j-th
right-hand side is the first with a normwise error bound that is not guaranteed (the smallest j such that for
column major layout err_bnds_norm[j - 1] = 0.0 or err_bnds_comp[j - 1] = 0.0; or for row major
layout err_bnds_norm[(j - 1)*n_err_bnds] = 0.0 or err_bnds_comp[(j - 1)*n_err_bnds] = 0.0).
See the definition of err_bnds_norm and err_bnds_comp for err = 1. To get information about all of the
right-hand sides, check err_bnds_norm or err_bnds_comp.
See Also
Matrix Storage Schemes
591
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?gbrfs
Refines the solution of a system of linear equations
with a general band coefficient matrix and estimates
its error.
Syntax
lapack_int LAPACKE_sgbrfs( int matrix_layout, char trans, lapack_int n, lapack_int kl,
lapack_int ku, lapack_int nrhs, const float* ab, lapack_int ldab, const float* afb,
lapack_int ldafb, const lapack_int* ipiv, const float* b, lapack_int ldb, float* x,
lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_dgbrfs( int matrix_layout, char trans, lapack_int n, lapack_int kl,
lapack_int ku, lapack_int nrhs, const double* ab, lapack_int ldab, const double* afb,
lapack_int ldafb, const lapack_int* ipiv, const double* b, lapack_int ldb, double* x,
lapack_int ldx, double* ferr, double* berr );
lapack_int LAPACKE_cgbrfs( int matrix_layout, char trans, lapack_int n, lapack_int kl,
lapack_int ku, lapack_int nrhs, const lapack_complex_float* ab, lapack_int ldab, const
lapack_complex_float* afb, lapack_int ldafb, const lapack_int* ipiv, const
lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x, lapack_int ldx,
float* ferr, float* berr );
lapack_int LAPACKE_zgbrfs( int matrix_layout, char trans, lapack_int n, lapack_int kl,
lapack_int ku, lapack_int nrhs, const lapack_complex_double* ab, lapack_int ldab, const
lapack_complex_double* afb, lapack_int ldafb, const lapack_int* ipiv, const
lapack_complex_double* b, lapack_int ldb, lapack_complex_double* x, lapack_int ldx,
double* ferr, double* berr );
Include Files
• mkl.h
Description
The routine performs an iterative refinement of the solution to a system of linear equations A*X = B or AT*X
= B or AH*X = B with a band matrix A, with multiple right-hand sides. For each computed solution vector x,
the routine computes the component-wise backward errorβ. This error is the smallest relative perturbation
in elements of A and b such that x is the exact solution of the perturbed system:
|δaij| ≤β|aij|, |δbi| ≤β|bi| such that (A + δA)x = (b + δb).
Finally, the routine estimates the component-wise forward error in the computed solution ||x - xe||∞/||
x||∞ (here xe is the exact solution).
Before calling this routine:
Input Parameters
592
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If trans = 'N', the system has the form A*X = B.
ab,afb,b,x Arrays:
ab(size max(1, ldab*n)) contains the original band matrix A, as
supplied to ?gbtrf, but stored in rows from 1 to kl + ku + 1 for
column major layout, and columns from 1 to kl + ku + 1 for row
major layout.
afb(size max(1, ldafb*n)) contains the factored band matrix A, as
returned by ?gbtrf.
bof size max(1, ldb*nrhs) for column major layout and max(1,
ldb*n) for row major layout contains the right-hand side matrix B.
xof size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout contains the solution matrix X.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
ipiv Array, size at least max(1, n). The ipiv array, as returned by ?gbtrf.
Output Parameters
ferr, berr Arrays, size at least max(1, nrhs). Contain the component-wise
forward and backward errors, respectively, for each solution vector.
Return Values
This function returns a value info.
Application Notes
The bounds returned in ferr are not rigorous, but in practice they almost always overestimate the actual
error.
593
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
For each right-hand side, computation of the backward error involves a minimum of 4n(kl + ku) floating-
point operations (for real flavors) or 16n(kl + ku) operations (for complex flavors). In addition, each step of
iterative refinement involves 2n(4kl + 3ku) operations (for real flavors) or 8n(4kl + 3ku) operations (for
complex flavors); the number of iterations may range from 1 to 5. Estimating the forward error involves
solving a number of systems of linear equations A*x = b; the number is usually 4 or 5 and never more than
11. Each solution requires approximately 2n2 floating-point operations for real flavors or 8n2 for complex
flavors.
See Also
Matrix Storage Schemes
?gbrfsx
Uses extra precise iterative refinement to improve the
solution to the system of linear equations with a
banded coefficient matrix A and provides error bounds
and backward error estimates.
Syntax
lapack_int LAPACKE_sgbrfsx( int matrix_layout, char trans, char equed, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, const float* ab, lapack_int ldab, const
float* afb, lapack_int ldafb, const lapack_int* ipiv, const float* r, const float* c,
const float* b, lapack_int ldb, float* x, lapack_int ldx, float* rcond, float* berr,
lapack_int n_err_bnds, float* err_bnds_norm, float* err_bnds_comp, lapack_int nparams,
float* params );
lapack_int LAPACKE_dgbrfsx( int matrix_layout, char trans, char equed, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, const double* ab, lapack_int ldab, const
double* afb, lapack_int ldafb, const lapack_int* ipiv, const double* r, const double*
c, const double* b, lapack_int ldb, double* x, lapack_int ldx, double* rcond, double*
berr, lapack_int n_err_bnds, double* err_bnds_norm, double* err_bnds_comp, lapack_int
nparams, double* params );
lapack_int LAPACKE_cgbrfsx( int matrix_layout, char trans, char equed, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, const lapack_complex_float* ab,
lapack_int ldab, const lapack_complex_float* afb, lapack_int ldafb, const lapack_int*
ipiv, const float* r, const float* c, const lapack_complex_float* b, lapack_int ldb,
lapack_complex_float* x, lapack_int ldx, float* rcond, float* berr, lapack_int
n_err_bnds, float* err_bnds_norm, float* err_bnds_comp, lapack_int nparams, float*
params );
lapack_int LAPACKE_zgbrfsx( int matrix_layout, char trans, char equed, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, const lapack_complex_double* ab,
lapack_int ldab, const lapack_complex_double* afb, lapack_int ldafb, const lapack_int*
ipiv, const double* r, const double* c, const lapack_complex_double* b, lapack_int ldb,
lapack_complex_double* x, lapack_int ldx, double* rcond, double* berr, lapack_int
n_err_bnds, double* err_bnds_norm, double* err_bnds_comp, lapack_int nparams, double*
params );
Include Files
• mkl.h
594
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
The routine improves the computed solution to a system of linear equations and provides error bounds and
backward error estimates for the solution. In addition to a normwise error bound, the code provides a
maximum componentwise error bound, if possible. See comments for err_bnds_norm and err_bnds_comp
for details of the error bounds.
The original system of linear equations may have been equilibrated before calling this routine, as described
by the parameters equed, r, and c below. In this case, the solution and error bounds returned are for the
original unequilibrated system.
Input Parameters
If trans = 'C', the system has the form AH*X = B (Conjugate transpose
for complex flavors, Transpose for real flavors).
Specifies the form of equilibration that was done to A before calling this
routine.
If equed = 'N', no equilibration was done.
If equed = 'R', row equilibration was done, that is, A has been
premultiplied by diag(r).
If equed = 'C', column equilibration was done, that is, A has been
postmultiplied by diag(c).
If equed = 'B', both row and column equilibration was done, that is, A has
been replaced by diag(r)*A*diag(c). The right-hand side B has been
changed accordingly.
nrhs The number of right-hand sides; the number of columns of the matrices B
and X; nrhs≥ 0.
ab, afb, b The array abof size max(1, ldab*n) contains the original matrix A in band
storage, in rows from 1 to kl+ku + 1 for column major layout, and in
columns from 1 to kl+ku + 1 for row major layout.
595
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The array bof size max(1, ldb*nrhs) for column major layout and max(1,
ldb*n) for row major layout contains the matrix B whose columns are the
right-hand sides for the systems of equations.
ipiv Array, size at least max(1, n). Contains the pivot indices as computed
by ?gbtrf; for row 1 ≤i≤n, row i of the matrix was interchanged with row
ipiv[i-1].
r, c Arrays: r(n), c(n). The array r contains the row scale factors for A, and
the array c contains the column scale factors for A.
ldb The leading dimension of the array b; ldb≥ max(1, n) for column major
layout and ldb≥nrhs for row major layout.
x Array, size max(1, ldx*nrhs) for column major layout and max(1, ldx*n)
for row major layout.
The solution matrix X as computed by sgbtrs/dgbtrs for real flavors or
cgbtrs/zgbtrs for complex flavors.
ldx The leading dimension of the output array x; ldx≥ max(1, n) for column
major layout and ldx≥nrhs for row major layout.
n_err_bnds Number of error bounds to return for each right-hand side and each type
(normwise or componentwise). See err_bnds_norm and err_bnds_comp
descriptions in Output Arguments section below.
nparams Specifies the number of parameters set in params. If ≤ 0, the params array
is never referenced and default values are used.
params Array, size nparams. Specifies algorithm parameters. If an entry is less than
0.0, that entry will be filled with the default value used for that parameter.
Only positions up to nparams are accessed; defaults are used for higher-
numbered parameters. If defaults are acceptable, you can pass nparams =
0, which prevents the source code from accessing the params argument.
596
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
=0.0 No refinement is performed and no error bounds
are computed.
Default 10.0
Output Parameters
berr Array, size at least max(1, nrhs). Contains the componentwise relative
backward error for each solution vector xj, that is, the smallest relative
change in any element of A or B that makes xj an exact solution.
597
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in:
598
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
err=1 "Trust/don't trust" boolean. Trust the answer if
the reciprocal condition number is less than the
threshold sqrt(n)*slamch(ε) for single
precision flavors and sqrt(n)*dlamch(ε) for
double precision flavors.
The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in:
params Output parameter only if the input contains erroneous values, namely, in
params[0], params[1], and params[2]. In such a case, the corresponding
elements of params are filled with default values on output.
Return Values
This function returns a value info.
If info = 0, the execution is successful. The solution to every right-hand side is guaranteed.
If 0 < info≤n: Uinfo,info is exactly zero. The factorization has been completed, but the factor U is exactly
singular, so the solution and error bounds could not be computed; rcond = 0 is returned.
If info = n+j: The solution corresponding to the j-th right-hand side is not guaranteed. The solutions
corresponding to other right-hand sides k with k > j may not be guaranteed as well, but only the first such
right-hand side is reported. If a small componentwise error is not requested params[2] = 0.0, then the j-th
599
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
right-hand side is the first with a normwise error bound that is not guaranteed (the smallest j such that for
column major layout err_bnds_norm[j - 1] = 0.0 or err_bnds_comp[j - 1] = 0.0; or for row major
layout err_bnds_norm[(j - 1)*n_err_bnds] = 0.0 or err_bnds_comp[(j - 1)*n_err_bnds] = 0.0).
See the definition of err_bnds_norm and err_bnds_comp for err = 1. To get information about all of the
right-hand sides, check err_bnds_norm or err_bnds_comp.
See Also
Matrix Storage Schemes
?gtrfs
Refines the solution of a system of linear equations
with a tridiagonal coefficient matrix and estimates its
error.
Syntax
lapack_int LAPACKE_sgtrfs( int matrix_layout, char trans, lapack_int n, lapack_int
nrhs, const float* dl, const float* d, const float* du, const float* dlf, const float*
df, const float* duf, const float* du2, const lapack_int* ipiv, const float* b,
lapack_int ldb, float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_dgtrfs( int matrix_layout, char trans, lapack_int n, lapack_int
nrhs, const double* dl, const double* d, const double* du, const double* dlf, const
double* df, const double* duf, const double* du2, const lapack_int* ipiv, const double*
b, lapack_int ldb, double* x, lapack_int ldx, double* ferr, double* berr );
lapack_int LAPACKE_cgtrfs( int matrix_layout, char trans, lapack_int n, lapack_int
nrhs, const lapack_complex_float* dl, const lapack_complex_float* d, const
lapack_complex_float* du, const lapack_complex_float* dlf, const lapack_complex_float*
df, const lapack_complex_float* duf, const lapack_complex_float* du2, const lapack_int*
ipiv, const lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x,
lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_zgtrfs( int matrix_layout, char trans, lapack_int n, lapack_int
nrhs, const lapack_complex_double* dl, const lapack_complex_double* d, const
lapack_complex_double* du, const lapack_complex_double* dlf, const
lapack_complex_double* df, const lapack_complex_double* duf, const
lapack_complex_double* du2, const lapack_int* ipiv, const lapack_complex_double* b,
lapack_int ldb, lapack_complex_double* x, lapack_int ldx, double* ferr, double* berr );
Include Files
• mkl.h
Description
The routine performs an iterative refinement of the solution to a system of linear equations A*X = B or AT*X
= B or AH*X = B with a tridiagonal matrix A, with multiple right-hand sides. For each computed solution
vector x, the routine computes the component-wise backward errorβ. This error is the smallest relative
perturbation in elements of A and b such that x is the exact solution of the perturbed system:
|δaij|/|aij| ≤β|aij|, |δbi|/|bi| ≤β|bi| such that (A + δA)x = (b + δb).
Finally, the routine estimates the component-wise forward error in the computed solution ||x - xe||∞/||
x||∞ (here xe is the exact solution).
Before calling this routine:
600
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• call the solver routine ?gttrs.
Input Parameters
nrhs The number of right-hand sides, that is, the number of columns of the
matrix B; nrhs≥ 0.
dlf Array dlf of size n -1 contains the (n - 1) multipliers that define the
matrix L from the LU factorization of A as computed by ?gttrf.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
ldx The leading dimension of x; ldx≥ max(1, n) for column major layout
and ldx≥nrhs for row major layout.
ipiv Array, size at least max(1, n). The ipiv array, as returned by ?gttrf.
Output Parameters
601
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Return Values
This function returns a value info.
See Also
Matrix Storage Schemes
?porfs
Refines the solution of a system of linear equations
with a symmetric (Hermitian) positive-definite
coefficient matrix and estimates its error.
Syntax
lapack_int LAPACKE_sporfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const float* a, lapack_int lda, const float* af, lapack_int ldaf, const float* b,
lapack_int ldb, float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_dporfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const double* a, lapack_int lda, const double* af, lapack_int ldaf, const double* b,
lapack_int ldb, double* x, lapack_int ldx, double* ferr, double* berr );
lapack_int LAPACKE_cporfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_float* a, lapack_int lda, const lapack_complex_float* af,
lapack_int ldaf, const lapack_complex_float* b, lapack_int ldb, lapack_complex_float*
x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_zporfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_double* a, lapack_int lda, const lapack_complex_double* af,
lapack_int ldaf, const lapack_complex_double* b, lapack_int ldb, lapack_complex_double*
x, lapack_int ldx, double* ferr, double* berr );
Include Files
• mkl.h
Description
The routine performs an iterative refinement of the solution to a system of linear equations A*X = B with a
symmetric (Hermitian) positive definite matrix A, with multiple right-hand sides. For each computed solution
vector x, the routine computes the component-wise backward errorβ. This error is the smallest relative
perturbation in elements of A and b such that x is the exact solution of the perturbed system:
|δaij| ≤β|aij|, |δbi| ≤β|bi| such that (A + δA)x = (b + δb).
Finally, the routine estimates the component-wise forward error in the computed solution ||x - xe||∞/||
x||∞ (here xe is the exact solution).
Before calling this routine:
602
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
b Array bof size max(1, ldb*nrhs) for column major layout and max(1,
ldb*n) for row major layout contains the right-hand side matrix B.
The second dimension of b must be at least max(1, nrhs).
x Array x of size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout contains the solution matrix X.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
ldx The leading dimension of x; ldx≥ max(1, n) for column major layout
and ldx≥nrhs for row major layout.
Output Parameters
ferr, berr Arrays, size at least max(1, nrhs). Contain the component-wise
forward and backward errors, respectively, for each solution vector.
Return Values
This function returns a value info.
Application Notes
The bounds returned in ferr are not rigorous, but in practice they almost always overestimate the actual
error.
603
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
For each right-hand side, computation of the backward error involves a minimum of 4n2 floating-point
operations (for real flavors) or 16n2 operations (for complex flavors). In addition, each step of iterative
refinement involves 6n2 operations (for real flavors) or 24n2 operations (for complex flavors); the number of
iterations may range from 1 to 5. Estimating the forward error involves solving a number of systems of linear
equations A*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires
approximately 2n2 floating-point operations for real flavors or 8n2 for complex flavors.
See Also
Matrix Storage Schemes
?porfsx
Uses extra precise iterative refinement to improve the
solution to the system of linear equations with a
symmetric/Hermitian positive-definite coefficient
matrix A and provides error bounds and backward
error estimates.
Syntax
lapack_int LAPACKE_sporfsx( int matrix_layout, char uplo, char equed, lapack_int n,
lapack_int nrhs, const float* a, lapack_int lda, const float* af, lapack_int ldaf,
const float* s, const float* b, lapack_int ldb, float* x, lapack_int ldx, float* rcond,
float* berr, lapack_int n_err_bnds, float* err_bnds_norm, float* err_bnds_comp,
lapack_int nparams, float* params );
lapack_int LAPACKE_dporfsx( int matrix_layout, char uplo, char equed, lapack_int n,
lapack_int nrhs, const double* a, lapack_int lda, const double* af, lapack_int ldaf,
const double* s, const double* b, lapack_int ldb, double* x, lapack_int ldx, double*
rcond, double* berr, lapack_int n_err_bnds, double* err_bnds_norm, double*
err_bnds_comp, lapack_int nparams, double* params );
lapack_int LAPACKE_cporfsx( int matrix_layout, char uplo, char equed, lapack_int n,
lapack_int nrhs, const lapack_complex_float* a, lapack_int lda, const
lapack_complex_float* af, lapack_int ldaf, const float* s, const lapack_complex_float*
b, lapack_int ldb, lapack_complex_float* x, lapack_int ldx, float* rcond, float* berr,
lapack_int n_err_bnds, float* err_bnds_norm, float* err_bnds_comp, lapack_int nparams,
float* params );
lapack_int LAPACKE_zporfsx( int matrix_layout, char uplo, char equed, lapack_int n,
lapack_int nrhs, const lapack_complex_double* a, lapack_int lda, const
lapack_complex_double* af, lapack_int ldaf, const double* s, const
lapack_complex_double* b, lapack_int ldb, lapack_complex_double* x, lapack_int ldx,
double* rcond, double* berr, lapack_int n_err_bnds, double* err_bnds_norm, double*
err_bnds_comp, lapack_int nparams, double* params );
Include Files
• mkl.h
Description
The routine improves the computed solution to a system of linear equations and provides error bounds and
backward error estimates for the solution. In addition to a normwise error bound, the code provides a
maximum componentwise error bound, if possible. See comments for err_bnds_norm and err_bnds_comp
for details of the error bounds.
The original system of linear equations may have been equilibrated before calling this routine, as described
by the parameters equed and s below. In this case, the solution and error bounds returned are for the
original unequilibrated system.
604
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
Specifies the form of equilibration that was done to A before calling this
routine.
If equed = 'N', no equilibration was done.
If equed = 'Y', both row and column equilibration was done, that is, A has
been replaced by diag(s)*A*diag(s). The right-hand side B has been
changed accordingly.
nrhs The number of right-hand sides; the number of columns of the matrices B
and X; nrhs≥ 0.
b The array b (size max(1, ldb*nrhs for column major layout and max(1,
ldb*n) for row major layout) contains the matrix B whose columns are the
right-hand sides for the systems of equations.
605
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ldb The leading dimension of the array b; ldb≥ max(1, n) for column major
layout and ldb≥nrhs for row major layout.
x Array, size max(1, ldx*nrhs) for column major layout and max(1, ldx*n)
for row major layout.
The solution matrix X as computed by ?potrs
ldx The leading dimension of the output array x; ldx≥ max(1, n) for column
major layout and ldx≥nrhs for row major layout.
n_err_bnds Number of error bounds to return for each right hand side and each type
(normwise or componentwise). See err_bnds_norm and err_bnds_comp
descriptions in Output Arguments section below.
nparams Specifies the number of parameters set in params. If ≤ 0, the params array
is never referenced and default values are used.
params Array, size nparams. Specifies algorithm parameters. If an entry is less than
0.0, that entry will be filled with the default value used for that parameter.
Only positions up to nparams are accessed; defaults are used for higher-
numbered parameters. If defaults are acceptable, you can pass nparams =
0, which prevents the source code from accessing the params argument.
Default 10.0
Output Parameters
606
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
rcond Reciprocal scaled condition number. An estimate of the reciprocal Skeel
condition number of the matrix A after equilibration (if done). If rcond is
less than the machine precision, in particular, if rcond = 0, the matrix is
singular to working precision. Note that the error may still be small even if
this number is very small and the matrix appears ill-conditioned.
berr Array, size at least max(1, nrhs). Contains the componentwise relative
backward error for each solution vector xj, that is, the smallest relative
change in any element of A or B that makes xj an exact solution.
607
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in:
The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in:
608
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• Column major layout: err_bnds_comp[(err - 1)*nrhs + i - 1].
• Row major layout: err_bnds_comp[err - 1 + (i - 1)*n_err_bnds]
params Output parameter only if the input contains erroneous values, namely in
params[0], params[1], or params[2]. In such a case, the corresponding
elements of params are filled with default values on output.
Return Values
This function returns a value info.
If info = 0, the execution is successful. The solution to every right-hand side is guaranteed.
If 0 < info≤n: Uinfo,info is exactly zero. The factorization has been completed, but the factor U is exactly
singular, so the solution and error bounds could not be computed; rcond = 0 is returned.
If info = n+j: The solution corresponding to the j-th right-hand side is not guaranteed. The solutions
corresponding to other right-hand sides k with k > j may not be guaranteed as well, but only the first such
right-hand side is reported. If a small componentwise error is not requested params[2] = 0.0, then the j-th
right-hand side is the first with a normwise error bound that is not guaranteed (the smallest j such that for
column major layout err_bnds_norm[j - 1] = 0.0 or err_bnds_comp[j - 1] = 0.0; or for row major
layout err_bnds_norm[(j - 1)*n_err_bnds] = 0.0 or err_bnds_comp[(j - 1)*n_err_bnds] = 0.0).
See the definition of err_bnds_norm and err_bnds_comp for err = 1. To get information about all of the
right-hand sides, check err_bnds_norm or err_bnds_comp.
See Also
Matrix Storage Schemes
?pprfs
Refines the solution of a system of linear equations
with a symmetric (Hermitian) positive-definite
coefficient matrix stored in a packed format and
estimates its error.
Syntax
lapack_int LAPACKE_spprfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const float* ap, const float* afp, const float* b, lapack_int ldb, float* x, lapack_int
ldx, float* ferr, float* berr );
lapack_int LAPACKE_dpprfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const double* ap, const double* afp, const double* b, lapack_int ldb, double* x,
lapack_int ldx, double* ferr, double* berr );
lapack_int LAPACKE_cpprfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_float* ap, const lapack_complex_float* afp, const
lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x, lapack_int ldx,
float* ferr, float* berr );
lapack_int LAPACKE_zpprfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_double* ap, const lapack_complex_double* afp, const
lapack_complex_double* b, lapack_int ldb, lapack_complex_double* x, lapack_int ldx,
double* ferr, double* berr );
609
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine performs an iterative refinement of the solution to a system of linear equations A*X = B with a
symmetric (Hermitian) positive definite matrix A, with multiple right-hand sides. For each computed solution
vector x, the routine computes the component-wise backward errorβ. This error is the smallest relative
perturbation in elements of A and b such that x is the exact solution of the perturbed system:
|δaij| ≤β|aij|, |δbi| ≤β|bi| such that (A + δA)x = (b + δb).
Finally, the routine estimates the component-wise forward error in the computed solution
||x - xe||∞/||x||∞
where xe is the exact solution.
Input Parameters
x Array x of size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout contains the solution matrix X.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
ldx The leading dimension of x; ldx≥ max(1, n) for column major layout
and ldx≥nrhs for row major layout.
610
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
ferr, berr Arrays, size at least max(1, nrhs). Contain the component-wise
forward and backward errors, respectively, for each solution vector.
Return Values
This function returns a value info.
Application Notes
The bounds returned in ferr are not rigorous, but in practice they almost always overestimate the actual
error.
For each right-hand side, computation of the backward error involves a minimum of 4n2 floating-point
operations (for real flavors) or 16n2 operations (for complex flavors). In addition, each step of iterative
refinement involves 6n2 operations (for real flavors) or 24n2 operations (for complex flavors); the number of
iterations may range from 1 to 5.
Estimating the forward error involves solving a number of systems of linear equations A*x = b; the number
of systems is usually 4 or 5 and never more than 11. Each solution requires approximately 2n2 floating-point
operations for real flavors or 8n2 for complex flavors.
See Also
Matrix Storage Schemes
?pbrfs
Refines the solution of a system of linear equations
with a band symmetric (Hermitian) positive-definite
coefficient matrix and estimates its error.
Syntax
lapack_int LAPACKE_spbrfs( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
lapack_int nrhs, const float* ab, lapack_int ldab, const float* afb, lapack_int ldafb,
const float* b, lapack_int ldb, float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_dpbrfs( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
lapack_int nrhs, const double* ab, lapack_int ldab, const double* afb, lapack_int
ldafb, const double* b, lapack_int ldb, double* x, lapack_int ldx, double* ferr, double*
berr );
lapack_int LAPACKE_cpbrfs( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
lapack_int nrhs, const lapack_complex_float* ab, lapack_int ldab, const
lapack_complex_float* afb, lapack_int ldafb, const lapack_complex_float* b, lapack_int
ldb, lapack_complex_float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_zpbrfs( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
lapack_int nrhs, const lapack_complex_double* ab, lapack_int ldab, const
lapack_complex_double* afb, lapack_int ldafb, const lapack_complex_double* b,
lapack_int ldb, lapack_complex_double* x, lapack_int ldx, double* ferr, double* berr );
611
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine performs an iterative refinement of the solution to a system of linear equations A*X = B with a
symmetric (Hermitian) positive definite band matrix A, with multiple right-hand sides. For each computed
solution vector x, the routine computes the component-wise backward errorβ. This error is the smallest
relative perturbation in elements of A and b such that x is the exact solution of the perturbed system:
|δaij| ≤β|aij|, |δbi| ≤β|bi| such that (A + δA)x = (b + δb).
Finally, the routine estimates the component-wise forward error in the computed solution ||x - xe||∞/||
x||∞ (here xe is the exact solution).
Before calling this routine:
Input Parameters
afb Array afb (size max(ldafb*n)) contains the factored band matrix A,
as returned by ?pbtrf.
x Array x of size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout contains the solution matrix X.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
612
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ldx The leading dimension of x; ldx≥ max(1, n) for column major layout
and ldx≥nrhs for row major layout.
Output Parameters
ferr, berr Arrays, size at least max(1, nrhs). Contain the component-wise
forward and backward errors, respectively, for each solution vector.
Return Values
This function returns a value info.
Application Notes
The bounds returned in ferr are not rigorous, but in practice they almost always overestimate the actual
error.
For each right-hand side, computation of the backward error involves a minimum of 8n*kd floating-point
operations (for real flavors) or 32n*kd operations (for complex flavors). In addition, each step of iterative
refinement involves 12n*kd operations (for real flavors) or 48n*kd operations (for complex flavors); the
number of iterations may range from 1 to 5.
Estimating the forward error involves solving a number of systems of linear equations A*x = b; the number
is usually 4 or 5 and never more than 11. Each solution requires approximately 4n*kd floating-point
operations for real flavors or 16n*kd for complex flavors.
See Also
Matrix Storage Schemes
?ptrfs
Refines the solution of a system of linear equations
with a symmetric (Hermitian) positive-definite
tridiagonal coefficient matrix and estimates its error.
Syntax
lapack_int LAPACKE_sptrfs( int matrix_layout, lapack_int n, lapack_int nrhs, const
float* d, const float* e, const float* df, const float* ef, const float* b, lapack_int
ldb, float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_dptrfs( int matrix_layout, lapack_int n, lapack_int nrhs, const
double* d, const double* e, const double* df, const double* ef, const double* b,
lapack_int ldb, double* x, lapack_int ldx, double* ferr, double* berr );
lapack_int LAPACKE_cptrfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const float* d, const lapack_complex_float* e, const float* df, const
lapack_complex_float* ef, const lapack_complex_float* b, lapack_int ldb,
lapack_complex_float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_zptrfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const double* d, const lapack_complex_double* e, const double* df, const
lapack_complex_double* ef, const lapack_complex_double* b, lapack_int ldb,
lapack_complex_double* x, lapack_int ldx, double* ferr, double* berr );
613
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine performs an iterative refinement of the solution to a system of linear equations A*X = B with a
symmetric (Hermitian) positive definite tridiagonal matrix A, with multiple right-hand sides. For each
computed solution vector x, the routine computes the component-wise backward errorβ. This error is the
smallest relative perturbation in elements of A and b such that x is the exact solution of the perturbed
system:
|δaij| ≤β|aij|, |δbi| ≤β|bi| such that (A + δA)x = (b + δb).
Finally, the routine estimates the component-wise forward error in the computed solution ||x - xe||∞/||
x||∞ (here xe is the exact solution).
Before calling this routine:
Input Parameters
The array b of size max(1, ldb*nrhs) for column major layout and
max(1, ldb*n) for row major layout contains the matrix B whose
columns are the right-hand sides for the systems of equations.
614
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The array x of size max(1, ldx*nrhs) for column major layout and
max(1, ldx*n) for row major layout contains the solution matrix X as
computed by ?pttrs.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
ldx The leading dimension of x; ldx≥ max(1, n) for column major layout
and ldx≥nrhs for row major layout.
Output Parameters
ferr, berr Arrays, size at least max(1, nrhs). Contain the component-wise
forward and backward errors, respectively, for each solution vector.
Return Values
This function returns a value info.
See Also
Matrix Storage Schemes
?syrfs
Refines the solution of a system of linear equations
with a symmetric coefficient matrix and estimates its
error.
Syntax
lapack_int LAPACKE_ssyrfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const float* a, lapack_int lda, const float* af, lapack_int ldaf, const lapack_int*
ipiv, const float* b, lapack_int ldb, float* x, lapack_int ldx, float* ferr, float*
berr );
lapack_int LAPACKE_dsyrfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const double* a, lapack_int lda, const double* af, lapack_int ldaf, const lapack_int*
ipiv, const double* b, lapack_int ldb, double* x, lapack_int ldx, double* ferr, double*
berr );
lapack_int LAPACKE_csyrfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_float* a, lapack_int lda, const lapack_complex_float* af,
lapack_int ldaf, const lapack_int* ipiv, const lapack_complex_float* b, lapack_int ldb,
lapack_complex_float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_zsyrfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_double* a, lapack_int lda, const lapack_complex_double* af,
lapack_int ldaf, const lapack_int* ipiv, const lapack_complex_double* b, lapack_int
ldb, lapack_complex_double* x, lapack_int ldx, double* ferr, double* berr );
Include Files
• mkl.h
615
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Description
The routine performs an iterative refinement of the solution to a system of linear equations A*X = B with a
symmetric full-storage matrix A, with multiple right-hand sides. For each computed solution vector x, the
routine computes the component-wise backward errorβ. This error is the smallest relative perturbation in
elements of A and b such that x is the exact solution of the perturbed system:
|δaij| ≤β|aij|, |δbi| ≤β|bi| such that (A + δA)x = (b + δb).
Finally, the routine estimates the component-wise forward error in the computed solution ||x - xe||∞/||
x||∞ (here xe is the exact solution).
Before calling this routine:
Input Parameters
x Array x of size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout contains the solution matrix X.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
ldx The leading dimension of x; ldx≥ max(1, n) for column major layout
and ldx≥nrhs for row major layout.
ipiv Array, size at least max(1, n). The ipiv array, as returned by ?sytrf.
616
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
ferr, berr Arrays, size at least max(1, nrhs). Contain the component-wise
forward and backward errors, respectively, for each solution vector.
Return Values
This function returns a value info.
Application Notes
The bounds returned in ferr are not rigorous, but in practice they almost always overestimate the actual
error.
For each right-hand side, computation of the backward error involves a minimum of 4n2 floating-point
operations (for real flavors) or 16n2 operations (for complex flavors). In addition, each step of iterative
refinement involves 6n2 operations (for real flavors) or 24n2 operations (for complex flavors); the number of
iterations may range from 1 to 5. Estimating the forward error involves solving a number of systems of linear
equations A*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires
approximately 2n2 floating-point operations for real flavors or 8n2 for complex flavors.
See Also
Matrix Storage Schemes
?syrfsx
Uses extra precise iterative refinement to improve the
solution to the system of linear equations with a
symmetric indefinite coefficient matrix A and provides
error bounds and backward error estimates.
Syntax
lapack_int LAPACKE_ssyrfsx( int matrix_layout, char uplo, char equed, lapack_int n,
lapack_int nrhs, const float* a, lapack_int lda, const float* af, lapack_int ldaf,
const lapack_int* ipiv, const float* s, const float* b, lapack_int ldb, float* x,
lapack_int ldx, float* rcond, float* berr, lapack_int n_err_bnds, float* err_bnds_norm,
float* err_bnds_comp, lapack_int nparams, float* params );
lapack_int LAPACKE_dsyrfsx( int matrix_layout, char uplo, char equed, lapack_int n,
lapack_int nrhs, const double* a, lapack_int lda, const double* af, lapack_int ldaf,
const lapack_int* ipiv, const double* s, const double* b, lapack_int ldb, double* x,
lapack_int ldx, double* rcond, double* berr, lapack_int n_err_bnds, double*
err_bnds_norm, double* err_bnds_comp, lapack_int nparams, double* params );
lapack_int LAPACKE_csyrfsx( int matrix_layout, char uplo, char equed, lapack_int n,
lapack_int nrhs, const lapack_complex_float* a, lapack_int lda, const
lapack_complex_float* af, lapack_int ldaf, const lapack_int* ipiv, const float* s,
const lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x, lapack_int ldx,
float* rcond, float* berr, lapack_int n_err_bnds, float* err_bnds_norm, float*
err_bnds_comp, lapack_int nparams, float* params );
lapack_int LAPACKE_zsyrfsx( int matrix_layout, char uplo, char equed, lapack_int n,
lapack_int nrhs, const lapack_complex_double* a, lapack_int lda, const
lapack_complex_double* af, lapack_int ldaf, const lapack_int* ipiv, const double* s,
617
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine improves the computed solution to a system of linear equations when the coefficient matrix is
symmetric indefinite, and provides error bounds and backward error estimates for the solution. In addition to
a normwise error bound, the code provides a maximum componentwise error bound, if possible. See
comments for err_bnds_norm and err_bnds_comp for details of the error bounds.
The original system of linear equations may have been equilibrated before calling this routine, as described
by the parameters equed and s below. In this case, the solution and error bounds returned are for the
original unequilibrated system.
Input Parameters
Specifies the form of equilibration that was done to A before calling this
routine.
If equed = 'N', no equilibration was done.
If equed = 'Y', both row and column equilibration was done, that is, A has
been replaced by diag(s)*A*diag(s). The right-hand side B has been
changed accordingly.
nrhs The number of right-hand sides; the number of columns of the matrices B
and X; nrhs≥ 0.
a, af, b The array a (size max(1, lda*n)) contains the symmetric/Hermitian matrix
A as specified by uplo. If uplo = 'U', the leading n-by-n upper triangular
part of a contains the upper triangular part of the matrix A and the strictly
lower triangular part of a is not referenced. If uplo = 'L', the leading n-
by-n lower triangular part of a contains the lower triangular part of the
matrix A and the strictly upper triangular part of a is not referenced.
618
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The array b (size max(1, ldb*nrhs) for column major layout and max(1,
ldb*n) for row major layout) contains the matrix B whose columns are the
right-hand sides for the systems of equations.
ipiv Array, size at least max(1, n). Contains details of the interchanges and the
block structure of D as determined by ssytrf for real flavors or dsytrf for
complex flavors.
s Array, size (n). The array s contains the scale factors for A.
ldb The leading dimension of the array b; ldb≥ max(1, n) for column major
layout and ldb≥nrhs for row major layout.
x Array, of size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout.
The solution matrix X as computed by ?sytrs
ldx The leading dimension of the output array x; ldx≥ max(1, n) for column
major layout and ldx≥nrhs for row major layout.
n_err_bnds Number of error bounds to return for each right hand side and each type
(normwise or componentwise). See err_bnds_norm and err_bnds_comp
descriptions in Output Arguments section below.
nparams Specifies the number of parameters set in params. If ≤ 0, the params array
is never referenced and default values are used.
params Array, size nparams. Specifies algorithm parameters. If an entry is less than
0.0, that entry will be filled with the default value used for that parameter.
Only positions up to nparams are accessed; defaults are used for higher-
numbered parameters. If defaults are acceptable, you can pass nparams =
0, which prevents the source code from accessing the params argument.
619
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Default 10.0
Output Parameters
berr Array, size at least max(1, nrhs). Contains the componentwise relative
backward error for each solution vector xj, that is, the smallest relative
change in any element of A or B that makes xj an exact solution.
620
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
greater than the threshold sqrt(n)*slamch(ε)
for single precision flavors and
sqrt(n)*dlamch(ε) for double precision
flavors. This error bound should only be trusted
if the previous boolean is true.
The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in:
621
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in:
params Output parameter only if the input contains erroneous values, namely, in
params[0], params[1], params[2]. In such a case, the corresponding
elements of params are filled with default values on output.
Return Values
This function returns a value info.
If info = 0, the execution is successful. The solution to every right-hand side is guaranteed.
If 0 < info≤n: Uinfo,info is exactly zero. The factorization has been completed, but the factor U is exactly
singular, so the solution and error bounds could not be computed; rcond = 0 is returned.
If info = n+j: The solution corresponding to the j-th right-hand side is not guaranteed. The solutions
corresponding to other right-hand sides k with k > j may not be guaranteed as well, but only the first such
right-hand side is reported. If a small componentwise error is not requested params[2] = 0.0, then the j-th
right-hand side is the first with a normwise error bound that is not guaranteed (the smallest j such that for
column major layout err_bnds_norm[j - 1] = 0.0 or err_bnds_comp[j - 1] = 0.0; or for row major
layout err_bnds_norm[(j - 1)*n_err_bnds] = 0.0 or err_bnds_comp[(j - 1)*n_err_bnds] = 0.0).
See the definition of err_bnds_norm and err_bnds_comp for err = 1. To get information about all of the
right-hand sides, check err_bnds_norm or err_bnds_comp.
See Also
Matrix Storage Schemes
?herfs
Refines the solution of a system of linear equations
with a complex Hermitian coefficient matrix and
estimates its error.
622
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_cherfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_float* a, lapack_int lda, const lapack_complex_float* af,
lapack_int ldaf, const lapack_int* ipiv, const lapack_complex_float* b, lapack_int ldb,
lapack_complex_float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_zherfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_double* a, lapack_int lda, const lapack_complex_double* af,
lapack_int ldaf, const lapack_int* ipiv, const lapack_complex_double* b, lapack_int
ldb, lapack_complex_double* x, lapack_int ldx, double* ferr, double* berr );
Include Files
• mkl.h
Description
The routine performs an iterative refinement of the solution to a system of linear equations A*X = B with a
complex Hermitian full-storage matrix A, with multiple right-hand sides. For each computed solution vector x,
the routine computes the component-wise backward errorβ. This error is the smallest relative perturbation
in elements of A and b such that x is the exact solution of the perturbed system:
|δaij| ≤β|aij|, |δbi| ≤β|bi| such that (A + δA)x = (b + δb).
Finally, the routine estimates the component-wise forward error in the computed solution ||x - xe||∞/||
x||∞ (here xe is the exact solution).
Before calling this routine:
Input Parameters
a,af,b,x Arrays:
a(size max(1, lda*n)) contains the original matrix A, as supplied
to ?hetrf.
bof size max(1, ldb*nrhs) for column major layout and max(1,
ldb*n) for row major layout contains the right-hand side matrix B.
623
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
xof size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout contains the solution matrix X.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
ldx The leading dimension of x; ldx≥ max(1, n) for column major layout
and ldx≥nrhs for row major layout.
ipiv Array, size at least max(1, n). The ipiv array, as returned by ?hetrf.
Output Parameters
ferr, berr Arrays, size at least max(1, nrhs). Contain the component-wise
forward and backward errors, respectively, for each solution vector.
Return Values
This function returns a value info.
Application Notes
The bounds returned in ferr are not rigorous, but in practice they almost always overestimate the actual
error.
For each right-hand side, computation of the backward error involves a minimum of 16n2 operations. In
addition, each step of iterative refinement involves 24n2 operations; the number of iterations may range
from 1 to 5.
Estimating the forward error involves solving a number of systems of linear equations A*x = b; the number
is usually 4 or 5 and never more than 11. Each solution requires approximately 8n2 floating-point operations.
See Also
Matrix Storage Schemes
?herfsx
Uses extra precise iterative refinement to improve the
solution to the system of linear equations with a
symmetric indefinite coefficient matrix A and provides
error bounds and backward error estimates.
Syntax
lapack_int LAPACKE_cherfsx( int matrix_layout, char uplo, char equed, lapack_int n,
lapack_int nrhs, const lapack_complex_float* a, lapack_int lda, const
lapack_complex_float* af, lapack_int ldaf, const lapack_int* ipiv, const float* s,
624
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
const lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x, lapack_int ldx,
float* rcond, float* berr, lapack_int n_err_bnds, float* err_bnds_norm, float*
err_bnds_comp, lapack_int nparams, float* params );
lapack_int LAPACKE_zherfsx( int matrix_layout, char uplo, char equed, lapack_int n,
lapack_int nrhs, const lapack_complex_double* a, lapack_int lda, const
lapack_complex_double* af, lapack_int ldaf, const lapack_int* ipiv, const double* s,
const lapack_complex_double* b, lapack_int ldb, lapack_complex_double* x, lapack_int
ldx, double* rcond, double* berr, lapack_int n_err_bnds, double* err_bnds_norm, double*
err_bnds_comp, lapack_int nparams, double* params );
Include Files
• mkl.h
Description
The routine improves the computed solution to a system of linear equations when the coefficient matrix is
Hermitian indefinite, and provides error bounds and backward error estimates for the solution. In addition to
a normwise error bound, the code provides a maximum componentwise error bound, if possible. See
comments for err_bnds_norm and err_bnds_comp for details of the error bounds.
The original system of linear equations may have been equilibrated before calling this routine, as described
by the parameters equed and s below. In this case, the solution and error bounds returned are for the
original unequilibrated system.
Input Parameters
Specifies the form of equilibration that was done to A before calling this
routine.
If equed = 'N', no equilibration was done.
If equed = 'Y', both row and column equilibration was done, that is, A has
been replaced by diag(s)*A*diag(s). The right-hand side B has been
changed accordingly.
nrhs The number of right-hand sides; the number of columns of the matrices B
and X; nrhs≥ 0.
a, af, b The array a of size max(1, lda*n) contains the Hermitian matrix A as
specified by uplo. If uplo = 'U', the leading n-by-n upper triangular part
of a contains the upper triangular part of the matrix A and the strictly lower
625
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The array af of size max(1, ldaf*n) contains the block diagonal matrix D
and the multipliers used to obtain the factor U or L from the factorization A
= U*D*UT or A = L*D*LT as computed by ssytrf for cherfsx or dsytrf
for zherfsx.
The array b of size max(1, ldb*nrhs) for row major layout and max(1,
ldb*n) for column major layout contains the matrix B whose columns are
the right-hand sides for the systems of equations.
ipiv Array, size at least max(1, n). Contains details of the interchanges and the
block structure of D as determined by ssytrf for real flavors or dsytrf for
complex flavors.
s Array, size (n). The array s contains the scale factors for A.
ldb The leading dimension of the array b; ldb≥ max(1, n) for column major
layout and ldb≥nrhs for row major layout.
x Array, size max(1, ldx*nrhs) for column major layout and max(1, ldx*n)
for row major layout.
The solution matrix X as computed by ?hetrs
ldx The leading dimension of the output array x; ldx≥ max(1, n) for column
major layout and ldx≥nrhs for row major layout.
n_err_bnds Number of error bounds to return for each right hand side and each type
(normwise or componentwise). See err_bnds_norm and err_bnds_comp
descriptions in Output Arguments section below.
nparams Specifies the number of parameters set in params. If ≤ 0, the params array
is never referenced and default values are used.
params Array, size nparams. Specifies algorithm parameters. If an entry is less than
0.0, that entry will be filled with the default value used for that parameter.
Only positions up to nparams are accessed; defaults are used for higher-
numbered parameters. If defaults are acceptable, you can pass nparams =
0, which prevents the source code from accessing the params argument.
626
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
=0.0 No refinement is performed and no error bounds
are computed.
Default 10
Output Parameters
berr Array, size at least max(1, nrhs). Contains the componentwise relative
backward error for each solution vector xj, that is, the smallest relative
change in any element of A or B that makes xj an exact solution.
627
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in:
628
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
err=2 "Guaranteed" error bpound. The estimated
forward error, almost certainly within a factor of
10 of the true error so long as the next entry is
greater than the threshold sqrt(n)*slamch(ε)
for cherfsx and sqrt(n)*dlamch(ε) for
zherfsx. This error bound should only be
trusted if the previous boolean is true.
The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in:
params Output parameter only if the input contains erroneous values, namely, in
params[0], params[1], params[2]. In such a case, the corresponding
elements of params are filled with default values on output.
Return Values
This function returns a value info.
If info = 0, the execution is successful. The solution to every right-hand side is guaranteed.
If 0 < info≤n: Uinfo,info is exactly zero. The factorization has been completed, but the factor D is exactly
singular, so the solution and error bounds could not be computed; rcond = 0 is returned.
If info = n+j: The solution corresponding to the j-th right-hand side is not guaranteed. The solutions
corresponding to other right-hand sides k with k > j may not be guaranteed as well, but only the first such
right-hand side is reported. If a small componentwise error is not requested params[2] = 0.0, then the j-th
right-hand side is the first with a normwise error bound that is not guaranteed (the smallest j such that for
column major layout err_bnds_norm[j - 1] = 0.0 or err_bnds_comp[j - 1] = 0.0; or for row major
layout err_bnds_norm[(j - 1)*n_err_bnds] = 0.0 or err_bnds_comp[(j - 1)*n_err_bnds] = 0.0).
See the definition of err_bnds_norm and err_bnds_comp for err = 1. To get information about all of the
right-hand sides, check err_bnds_norm or err_bnds_comp.
See Also
Matrix Storage Schemes
629
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?sprfs
Refines the solution of a system of linear equations
with a packed symmetric coefficient matrix and
estimates the solution error.
Syntax
lapack_int LAPACKE_ssprfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const float* ap, const float* afp, const lapack_int* ipiv, const float* b, lapack_int
ldb, float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_dsprfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const double* ap, const double* afp, const lapack_int* ipiv, const double* b,
lapack_int ldb, double* x, lapack_int ldx, double* ferr, double* berr );
lapack_int LAPACKE_csprfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_float* ap, const lapack_complex_float* afp, const lapack_int*
ipiv, const lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x,
lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_zsprfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_double* ap, const lapack_complex_double* afp, const lapack_int*
ipiv, const lapack_complex_double* b, lapack_int ldb, lapack_complex_double* x,
lapack_int ldx, double* ferr, double* berr );
Include Files
• mkl.h
Description
The routine performs an iterative refinement of the solution to a system of linear equations A*X = B with a
packed symmetric matrix A, with multiple right-hand sides. For each computed solution vector x, the routine
computes the component-wise backward errorβ. This error is the smallest relative perturbation in elements
of A and b such that x is the exact solution of the perturbed system:
|δaij| ≤β|aij|, |δbi| ≤β|bi| such that (A + δA)x = (b + δb).
Finally, the routine estimates the component-wise forward error in the computed solution ||x - xe||∞/||
x||∞ (here xe is the exact solution).
Before calling this routine:
Input Parameters
630
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ap,afp,b,x Arrays:
ap of size max(1, n(n+1)/2) contains the original packed matrix A, as
supplied to ?sptrf.
bof size max(1, ldb*nrhs) for column major layout and max(1,
ldb*n) for row major layout contains the right-hand side matrix B.
xof size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout contains the solution matrix X.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
ldx The leading dimension of x; ldx≥ max(1, n) for column major layout
and ldx≥ max(1,nrhs) for row major layout.
ipiv Array, size at least max(1, n). The ipiv array, as returned by ?sptrf.
Output Parameters
ferr, berr Arrays, size at least max(1, nrhs). Contain the component-wise
forward and backward errors, respectively, for each solution vector.
Return Values
This function returns a value info.
Application Notes
The bounds returned in ferr are not rigorous, but in practice they almost always overestimate the actual
error.
For each right-hand side, computation of the backward error involves a minimum of 4n2 floating-point
operations (for real flavors) or 16n2 operations (for complex flavors). In addition, each step of iterative
refinement involves 6n2 operations (for real flavors) or 24n2 operations (for complex flavors); the number of
iterations may range from 1 to 5.
Estimating the forward error involves solving a number of systems of linear equations A*x = b; the number
of systems is usually 4 or 5 and never more than 11. Each solution requires approximately 2n2 floating-point
operations for real flavors or 8n2 for complex flavors.
See Also
Matrix Storage Schemes
?hprfs
Refines the solution of a system of linear equations
with a packed complex Hermitian coefficient matrix
and estimates the solution error.
631
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
lapack_int LAPACKE_chprfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_float* ap, const lapack_complex_float* afp, const lapack_int*
ipiv, const lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x,
lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_zhprfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_double* ap, const lapack_complex_double* afp, const lapack_int*
ipiv, const lapack_complex_double* b, lapack_int ldb, lapack_complex_double* x,
lapack_int ldx, double* ferr, double* berr );
Include Files
• mkl.h
Description
The routine performs an iterative refinement of the solution to a system of linear equations A*X = B with a
packed complex Hermitian matrix A, with multiple right-hand sides. For each computed solution vector x, the
routine computes the component-wise backward errorβ. This error is the smallest relative perturbation in
elements of A and b such that x is the exact solution of the perturbed system:
|δaij| ≤β|aij|, |δbi| ≤β|bi| such that (A + δA)x = (b + δb).
Finally, the routine estimates the component-wise forward error in the computed solution ||x - xe||∞/||
x||∞ (here xe is the exact solution).
Before calling this routine:
Input Parameters
ap,afp,b,x Arrays:
apmax(1, n(n + 1)/2) contains the original packed matrix A, as
supplied to ?hptrf.
bof size max(1, ldb*nrhs) for column major layout and max(1,
ldb*n) for row major layout contains the right-hand side matrix B.
632
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
xof size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout contains the solution matrix X.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
ldx The leading dimension of x; ldx≥ max(1, n) for column major layout
and ldx≥nrhs for row major layout.
ipiv Array, size at least max(1, n). The ipiv array, as returned by ?hptrf.
Output Parameters
Return Values
This function returns a value info.
Application Notes
The bounds returned in ferr are not rigorous, but in practice they almost always overestimate the actual
error.
For each right-hand side, computation of the backward error involves a minimum of 16n2 operations. In
addition, each step of iterative refinement involves 24n2 operations; the number of iterations may range
from 1 to 5.
Estimating the forward error involves solving a number of systems of linear equations A*x = b; the number
is usually 4 or 5 and never more than 11. Each solution requires approximately 8n2 floating-point operations.
See Also
Matrix Storage Schemes
?trrfs
Estimates the error in the solution of a system of
linear equations with a triangular coefficient matrix.
Syntax
lapack_int LAPACKE_strrfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int nrhs, const float* a, lapack_int lda, const float* b,
lapack_int ldb, const float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_dtrrfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int nrhs, const double* a, lapack_int lda, const double* b,
lapack_int ldb, const double* x, lapack_int ldx, double* ferr, double* berr );
633
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
lapack_int LAPACKE_ctrrfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int nrhs, const lapack_complex_float* a, lapack_int lda, const
lapack_complex_float* b, lapack_int ldb, const lapack_complex_float* x, lapack_int ldx,
float* ferr, float* berr );
lapack_int LAPACKE_ztrrfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int nrhs, const lapack_complex_double* a, lapack_int lda, const
lapack_complex_double* b, lapack_int ldb, const lapack_complex_double* x, lapack_int
ldx, double* ferr, double* berr );
Include Files
• mkl.h
Description
The routine estimates the errors in the solution to a system of linear equations A*X = B or AT*X = B or
AH*X = B with a triangular matrix A, with multiple right-hand sides. For each computed solution vector x, the
routine computes the component-wise backward errorβ. This error is the smallest relative perturbation in
elements of A and b such that x is the exact solution of the perturbed system:
|δaij| ≤β|aij|, |δbi| ≤β|bi| such that (A + δA)x = (b + δb).
The routine also estimates the component-wise forward error in the computed solution ||x - xe||∞/||
x||∞ (here xe is the exact solution).
Before calling this routine, call the solver routine ?trtrs.
Input Parameters
634
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
a, b, x Arrays:
a(size max(1, lda*n)) contains the upper or lower triangular matrix
A, as specified by uplo.
bof size max(1, ldb*nrhs) for column major layout and max(1,
ldb*n) for row major layout contains the right-hand side matrix B.
xof size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout contains the solution matrix X.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
ldx The leading dimension of x; ldx≥ max(1, n) for column major layout
and ldx≥nrhs for row major layout.
Output Parameters
ferr, berr Arrays, size at least max(1, nrhs). Contain the component-wise
forward and backward errors, respectively, for each solution vector.
Return Values
This function returns a value info.
Application Notes
The bounds returned in ferr are not rigorous, but in practice they almost always overestimate the actual
error.
A call to this routine involves, for each right-hand side, solving a number of systems of linear equations A*x
= b; the number of systems is usually 4 or 5 and never more than 11. Each solution requires approximately
n2 floating-point operations for real flavors or 4n2 for complex flavors.
See Also
Matrix Storage Schemes
?tprfs
Estimates the error in the solution of a system of
linear equations with a packed triangular coefficient
matrix.
Syntax
lapack_int LAPACKE_stprfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int nrhs, const float* ap, const float* b, lapack_int ldb, const
float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_dtprfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int nrhs, const double* ap, const double* b, lapack_int ldb, const
double* x, lapack_int ldx, double* ferr, double* berr );
635
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
lapack_int LAPACKE_ctprfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int nrhs, const lapack_complex_float* ap, const
lapack_complex_float* b, lapack_int ldb, const lapack_complex_float* x, lapack_int ldx,
float* ferr, float* berr );
lapack_int LAPACKE_ztprfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int nrhs, const lapack_complex_double* ap, const
lapack_complex_double* b, lapack_int ldb, const lapack_complex_double* x, lapack_int
ldx, double* ferr, double* berr );
Include Files
• mkl.h
Description
The routine estimates the errors in the solution to a system of linear equations A*X = B or AT*X = B or
AH*X = B with a packed triangular matrix A, with multiple right-hand sides. For each computed solution
vector x, the routine computes the component-wise backward errorβ. This error is the smallest relative
perturbation in elements of A and b such that x is the exact solution of the perturbed system:
|δaij| ≤β|aij|, |δbi| ≤β|bi| such that (A + δA)x = (b + δb).
The routine also estimates the component-wise forward error in the computed solution ||x - xe||∞/||
x||∞ (here xe is the exact solution).
Before calling this routine, call the solver routine ?tptrs.
Input Parameters
636
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ap, b, x Arrays:
apmax(1, n(n + 1)/2) contains the upper or lower triangular matrix A,
as specified by uplo.
bof size max(1, ldb*nrhs) for column major layout and max(1,
ldb*n) for row major layout contains the right-hand side matrix B.
xof size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout contains the solution matrix X.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
ldx The leading dimension of x; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
Output Parameters
ferr, berr Arrays, size at least max(1, nrhs). Contain the component-wise
forward and backward errors, respectively, for each solution vector.
Return Values
This function returns a value info.
Application Notes
The bounds returned in ferr are not rigorous, but in practice they almost always overestimate the actual
error.
A call to this routine involves, for each right-hand side, solving a number of systems of linear equations A*x
= b; the number of systems is usually 4 or 5 and never more than 11. Each solution requires approximately
n2 floating-point operations for real flavors or 4n2 for complex flavors.
See Also
Matrix Storage Schemes
?tbrfs
Estimates the error in the solution of a system of
linear equations with a triangular band coefficient
matrix.
Syntax
lapack_int LAPACKE_stbrfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int kd, lapack_int nrhs, const float* ab, lapack_int ldab, const
float* b, lapack_int ldb, const float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_dtbrfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int kd, lapack_int nrhs, const double* ab, lapack_int ldab, const
double* b, lapack_int ldb, const double* x, lapack_int ldx, double* ferr, double*
berr );
637
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
lapack_int LAPACKE_ctbrfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int kd, lapack_int nrhs, const lapack_complex_float* ab,
lapack_int ldab, const lapack_complex_float* b, lapack_int ldb, const
lapack_complex_float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_ztbrfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int kd, lapack_int nrhs, const lapack_complex_double* ab,
lapack_int ldab, const lapack_complex_double* b, lapack_int ldb, const
lapack_complex_double* x, lapack_int ldx, double* ferr, double* berr );
Include Files
• mkl.h
Description
The routine estimates the errors in the solution to a system of linear equations A*X = B or AT*X = B or
AH*X = B with a triangular band matrix A, with multiple right-hand sides. For each computed solution vector
x, the routine computes the component-wise backward errorβ. This error is the smallest relative
perturbation in elements of A and b such that x is the exact solution of the perturbed system:
|δaij| ≤β|aij|, |δbi| ≤β|bi| such that (A + δA)x = (b + δb).
The routine also estimates the component-wise forward error in the computed solution ||x - xe||∞/||
x||∞ (here xe is the exact solution).
Before calling this routine, call the solver routine ?tbtrs.
Input Parameters
638
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
nrhs The number of right-hand sides; nrhs≥ 0.
ab, b, x Arrays:
ab(size max(1, ldab*n)) contains the upper or lower triangular matrix
A, as specified by uplo, in band storage format.
bof size max(1, ldb*nrhs) for column major layout and max(1,
ldb*n) for row major layout contains the right-hand side matrix B.
xof size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout contains the solution matrix X.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
ldx The leading dimension of x; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
Output Parameters
ferr, berr Arrays, size at least max(1, nrhs). Contain the component-wise
forward and backward errors, respectively, for each solution vector.
Return Values
This function returns a value info.
Application Notes
The bounds returned in ferr are not rigorous, but in practice they almost always overestimate the actual
error.
A call to this routine involves, for each right-hand side, solving a number of systems of linear equations A*x
= b; the number of systems is usually 4 or 5 and never more than 11. Each solution requires approximately
2n*kd floating-point operations for real flavors or 8n*kd operations for complex flavors.
See Also
Matrix Storage Schemes
?getri
Computes the inverse of an LU-factored general
matrix.
639
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
lapack_int LAPACKE_sgetri (int matrix_layout , lapack_int n , float * a , lapack_int
lda , const lapack_int * ipiv );
lapack_int LAPACKE_dgetri (int matrix_layout , lapack_int n , double * a , lapack_int
lda , const lapack_int * ipiv );
lapack_int LAPACKE_cgetri (int matrix_layout , lapack_int n , lapack_complex_float *
a , lapack_int lda , const lapack_int * ipiv );
lapack_int LAPACKE_zgetri (int matrix_layout , lapack_int n , lapack_complex_double *
a , lapack_int lda , const lapack_int * ipiv );
Include Files
• mkl.h
Description
The routine computes the inverse inv(A) of a general matrix A. Before calling this routine, call ?getrf to
factorize A.
Input Parameters
Output Parameters
Return Values
This function returns a value info.
Application Notes
The computed inverse X satisfies the following error bound:
|XA - I| ≤c(n)ε|X|P|L||U|,
640
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
where c(n) is a modest linear function of n; ε is the machine precision; I denotes the identity matrix; P, L,
and U are the factors of the matrix factorization A = P*L*U.
The total number of floating-point operations is approximately (4/3)n3 for real flavors and (16/3)n3 for
complex flavors.
See Also
Matrix Storage Schemes
mkl_?getrinp
Computes the inverse of an LU-factored general
matrix without pivoting.
Syntax
lapack_int LAPACKE_mkl_sgetrinp (int matrix_layout , lapack_int n , float * a ,
lapack_int lda );
lapack_int LAPACKE_mkl_dgetrinp (int matrix_layout , lapack_int n , double * a ,
lapack_int lda );
lapack_int LAPACKE_mkl_cgetrinp (int matrix_layout , lapack_int n ,
lapack_complex_float * a , lapack_int lda );
lapack_int LAPACKE_mkl_zgetrinp (int matrix_layout , lapack_int n ,
lapack_complex_double * a , lapack_int lda );
Include Files
• mkl.h
Description
The routine computes the inverse inv(A) of a general matrix A. Before calling this routine, call
mkl_?getrfnp to factorize A.
Input Parameters
Output Parameters
Return Values
This function returns a value info.
641
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If info = i, the i-th diagonal element of the factor U is zero, U is singular, and the inversion could not be
completed.
Application Notes
The total number of floating-point operations is approximately (4/3)n3 for real flavors and (16/3)n3 for
complex flavors.
See Also
Matrix Storage Schemes
?potri
Computes the inverse of a symmetric (Hermitian)
positive-definite matrix using the Cholesky
factorization.
Syntax
lapack_int LAPACKE_spotri (int matrix_layout , char uplo , lapack_int n , float * a ,
lapack_int lda );
lapack_int LAPACKE_dpotri (int matrix_layout , char uplo , lapack_int n , double * a ,
lapack_int lda );
lapack_int LAPACKE_cpotri (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * a , lapack_int lda );
lapack_int LAPACKE_zpotri (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * a , lapack_int lda );
Include Files
• mkl.h
Description
The routine computes the inverse inv(A) of a symmetric positive definite or, for complex flavors, Hermitian
positive-definite matrix A. Before calling this routine, call ?potrf to factorize A.
Input Parameters
642
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
Return Values
This function returns a value info.
If info = i, the i-th diagonal element of the Cholesky factor (and therefore the factor itself) is zero, and the
inversion could not be completed.
Application Notes
The computed inverse X satisfies the following error bounds:
The 2-norm ||A||2 of a matrix A is defined by ||A||2 = maxx·x=1(Ax·Ax)1/2, and the condition number
κ2(A) is defined by κ2(A) = ||A||2 ||A-1||2.
The total number of floating-point operations is approximately (2/3)n3 for real flavors and (8/3)n3 for
complex flavors.
See Also
Matrix Storage Schemes
?pftri
Computes the inverse of a symmetric (Hermitian)
positive-definite matrix in RFP format using the
Cholesky factorization.
Syntax
lapack_int LAPACKE_spftri (int matrix_layout , char transr , char uplo , lapack_int n ,
float * a );
lapack_int LAPACKE_dpftri (int matrix_layout , char transr , char uplo , lapack_int n ,
double * a );
lapack_int LAPACKE_cpftri (int matrix_layout , char transr , char uplo , lapack_int n ,
lapack_complex_float * a );
lapack_int LAPACKE_zpftri (int matrix_layout , char transr , char uplo , lapack_int n ,
lapack_complex_double * a );
Include Files
• mkl.h
Description
The routine computes the inverse inv(A) of a symmetric positive definite or, for complex data, Hermitian
positive-definite matrix A using the Cholesky factorization:
643
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The matrix A is in the Rectangular Full Packed (RFP) format. For the description of the RFP format, see Matrix
Storage Schemes.
Input Parameters
transr Must be 'N', 'T' (for real data) or 'C' (for complex data).
If transr = 'N', the Normal transr of RFP U (if uplo = 'U') or L (if
uplo = 'L') is stored.
If transr = 'T', the Transpose transr of RFP U (if uplo = 'U') or L
(if uplo = 'L' is stored.
Output Parameters
Return Values
This function returns a value info.
If info = i, the (i,i) element of the factor U or L is zero, and the inverse could not be computed.
See Also
Matrix Storage Schemes
?pptri
Computes the inverse of a packed symmetric
(Hermitian) positive-definite matrix using Cholesky
factorization.
644
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_spptri (int matrix_layout , char uplo , lapack_int n , float * ap );
lapack_int LAPACKE_dpptri (int matrix_layout , char uplo , lapack_int n , double *
ap );
lapack_int LAPACKE_cpptri (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * ap );
lapack_int LAPACKE_zpptri (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * ap );
Include Files
• mkl.h
Description
The routine computes the inverse inv(A) of a symmetric positive definite or, for complex flavors, Hermitian
positive-definite matrix A in packed form. Before calling this routine, call ?pptrf to factorize A.
Input Parameters
Output Parameters
Return Values
This function returns a value info.
645
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Application Notes
The computed inverse X satisfies the following error bounds:
The 2-norm ||A||2 of a matrix A is defined by ||A||2 =maxx·x=1(Ax·Ax)1/2, and the condition number
κ2(A) is defined by κ2(A) = ||A||2 ||A-1||2 .
The total number of floating-point operations is approximately (2/3)n3 for real flavors and (8/3)n3 for
complex flavors.
See Also
Matrix Storage Schemes
?sytri
Computes the inverse of a symmetric matrix using
U*D*UT or L*D*LT Bunch-Kaufman factorization.
Syntax
lapack_int LAPACKE_ssytri (int matrix_layout , char uplo , lapack_int n , float * a ,
lapack_int lda , const lapack_int * ipiv );
lapack_int LAPACKE_dsytri (int matrix_layout , char uplo , lapack_int n , double * a ,
lapack_int lda , const lapack_int * ipiv );
lapack_int LAPACKE_csytri (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * a , lapack_int lda , const lapack_int * ipiv );
lapack_int LAPACKE_zsytri (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * a , lapack_int lda , const lapack_int * ipiv );
Include Files
• mkl.h
Description
The routine computes the inverse inv(A) of a symmetric matrix A. Before calling this routine, call ?sytrf to
factorize A.
Input Parameters
646
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
a a(size max(1, lda*n)) contains the factorization of the matrix A, as
returned by ?sytrf.
Output Parameters
Return Values
This function returns a value info.
Application Notes
The computed inverse X satisfies the following error bounds:
See Also
Matrix Storage Schemes
?hetri
Computes the inverse of a complex Hermitian matrix
using U*D*UH or L*D*LH Bunch-Kaufman
factorization.
Syntax
lapack_int LAPACKE_chetri (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * a , lapack_int lda , const lapack_int * ipiv );
lapack_int LAPACKE_zhetri (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * a , lapack_int lda , const lapack_int * ipiv );
Include Files
• mkl.h
Description
647
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The routine computes the inverse inv(A) of a complex Hermitian matrix A. Before calling this routine,
call ?hetrf to factorize A.
Input Parameters
ipiv Array, size at least max(1, n). The ipiv array, as returned by ?hetrf.
Output Parameters
Return Values
This function returns a value info.
Application Notes
The computed inverse X satisfies the following error bounds:
See Also
Matrix Storage Schemes
648
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
?sytri2
Computes the inverse of a symmetric indefinite matrix
through allocating memory and calling ?sytri2x.
Syntax
lapack_int LAPACKE_ssytri2 (int matrix_layout , char uplo , lapack_int n , float * a ,
lapack_int lda , const lapack_int * ipiv );
lapack_int LAPACKE_dsytri2 (int matrix_layout , char uplo , lapack_int n , double * a ,
lapack_int lda , const lapack_int * ipiv );
lapack_int LAPACKE_csytri2 (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * a , lapack_int lda , const lapack_int * ipiv );
lapack_int LAPACKE_zsytri2 (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * a , lapack_int lda , const lapack_int * ipiv );
Include Files
• mkl.h
Description
The routine computes the inverse inv(A) of a symmetric indefinite matrix A using the factorization A =
U*D*UT or A = L*D*LT computed by ?sytrf.
The ?sytri2 routine allocates a temporary buffer before calling ?sytri2x that actually computes the
inverse.
Input Parameters
a Array a(size max(1, lda*n)) contains the block diagonal matrix D and
the multipliers used to obtain the factor U or L as returned by ?sytrf.
Output Parameters
If uplo = 'U', the upper triangular part of the inverse is formed and
the part of A below the diagonal is not referenced.
649
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If uplo = 'L', the lower triangular part of the inverse is formed and
the part of A above the diagonal is not referenced.
Return Values
This function returns a value info.
See Also
?sytrf
?sytri2x
Matrix Storage Schemes
?hetri2
Computes the inverse of a Hermitian indefinite matrix
through allocating memory and calling ?hetri2x.
Syntax
lapack_int LAPACKE_chetri2 (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * a , lapack_int lda , const lapack_int * ipiv );
lapack_int LAPACKE_zhetri2 (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * a , lapack_int lda , const lapack_int * ipiv );
Include Files
• mkl.h
Description
The routine computes the inverse inv(A) of a Hermitian indefinite matrix A using the factorization A =
U*D*UH or A = L*D*LH computed by ?hetrf.
The ?hetri2 routine allocates a temporary buffer before calling ?hetri2x that actually computes the
inverse.
Input Parameters
a Array a(size max(1, lda*n)) contains the block diagonal matrix D and
the multipliers used to obtain the factor U or L as returned by ?sytrf.
650
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lda The leading dimension of a; lda≥ max(1, n).
Output Parameters
If uplo = 'U', the upper triangular part of the inverse is formed and
the part of A below the diagonal is not referenced.
If uplo = 'L', the lower triangular part of the inverse is formed and
the part of A above the diagonal is not referenced.
Return Values
This function returns a value info.
See Also
?hetrf
?hetri2x
Matrix Storage Schemes
?sytri2x
Computes the inverse of a symmetric indefinite matrix
after ?sytri2allocates memory.
Syntax
lapack_int LAPACKE_ssytri2x (int matrix_layout , char uplo , lapack_int n , float * a ,
lapack_int lda , const lapack_int * ipiv , lapack_int nb );
lapack_int LAPACKE_dsytri2x (int matrix_layout , char uplo , lapack_int n , double *
a , lapack_int lda , const lapack_int * ipiv , lapack_int nb );
lapack_int LAPACKE_csytri2x (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * a , lapack_int lda , const lapack_int * ipiv , lapack_int nb );
lapack_int LAPACKE_zsytri2x (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * a , lapack_int lda , const lapack_int * ipiv , lapack_int nb );
Include Files
• mkl.h
Description
The routine computes the inverse inv(A) of a symmetric indefinite matrix A using the factorization A =
U*D*UT or A = L*D*LT computed by ?sytrf.
The ?sytri2x actually computes the inverse after the ?sytri2 routine allocates memory before
calling ?sytri2x.
651
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
nb Block size.
Output Parameters
If info = 'U', the upper triangular part of the inverse is formed and
the part of A below the diagonal is not referenced.
If info = 'L', the lower triangular part of the inverse is formed and
the part of A above the diagonal is not referenced.
Return Values
This function returns a value info.
See Also
?sytrf
?sytri2
Matrix Storage Schemes
?hetri2x
Computes the inverse of a Hermitian indefinite matrix
after ?hetri2allocates memory.
652
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_chetri2x (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * a , lapack_int lda , const lapack_int * ipiv , lapack_int nb );
lapack_int LAPACKE_zhetri2x (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * a , lapack_int lda , const lapack_int * ipiv , lapack_int nb );
Include Files
• mkl.h
Description
The routine computes the inverse inv(A) of a Hermitian indefinite matrix A using the factorization A =
U*D*UH or A = L*D*LH computed by ?hetrf.
The ?hetri2x actually computes the inverse after the ?hetri2 routine allocates memory before
calling ?hetri2x.
Input Parameters
nb Block size.
Output Parameters
If info = 'U', the upper triangular part of the inverse is formed and
the part of A below the diagonal is not referenced.
If info = 'L', the lower triangular part of the inverse is formed and
the part of A above the diagonal is not referenced.
Return Values
This function returns a value info.
653
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
See Also
?hetrf
?hetri2
Matrix Storage Schemes
?sytri_3
Computes the inverse of a real or complex symmetric
matrix.
lapack_int LAPACKE_ssytri_3 (int matrix_layout, char uplo, lapack_int n, float * A,
lapack_int lda, const float * e, const lapack_int * ipiv);
lapack_int LAPACKE_dsytri_3 (int matrix_layout, char uplo, lapack_int n, double * A,
lapack_int lda, const double * e, const lapack_int * ipiv);
lapack_int LAPACKE_csytri_3 (int matrix_layout, char uplo, lapack_int n,
lapack_complex_float * A, lapack_int lda, const lapack_complex_float * e, const
lapack_int * ipiv);
lapack_int LAPACKE_zsytri_3 (int matrix_layout, char uplo, lapack_int n,
lapack_complex_double * A, lapack_int lda, const lapack_complex_double * e, const
lapack_int * ipiv);
Description
?sytri_3 computes the inverse of a real or complex symmetric matrix A using the factorization computed
by ?sytrf_rk: A = P*U*D*(UT)*(PT) or A = P*L*D*(LT)*(PT), where U (or L) is a unit upper (or lower)
triangular matrix, UT (or LT) is the transpose of U (or L), P is a permutation matrix, PT is the transpose of P,
and D is symmetric and block diagonal with 1-by-1 and 2-by-2 diagonal blocks.
?sytri_3 sets the leading dimension of the workspace before calling ?sytri_3x, which actually computes
the inverse. This is the blocked version of the algorithm, calling Level-3 BLAS.
Input Parameters
uplo Specifies whether the details of the factorization are stored as an upper or
lower triangular matrix.
—and—
654
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• If uplo = 'U', factor U in the superdiagonal part of A. If uplo = 'L',
factor L in the subdiagonal part of A.
ipiv Array of size n. Details of the interchanges and the block structure of D as
determined by ?sytrf_rk.
Output Parameters
Return Values
This function returns a value info.
= 0: Successful exit.
< 0: If info = -i, the ith argument had an illegal value.
> 0: If info = i, D(i,i) = 0; the matrix is singular and its inverse could not be computed.
?hetri_3
Computes the inverse of a complex Hermitian matrix
using the factorization computed by ?hetrf_rk.
lapack_int LAPACKE_chetri_3 (int matrix_layout, char uplo, lapack_int n,
lapack_complex_float * A, lapack_int lda, const lapack_complex_float * e, const
lapack_int * ipiv);
lapack_int LAPACKE_zhetri_3 (int matrix_layout, char uplo, lapack_int n,
lapack_complex_double * A, lapack_int lda, const lapack_complex_double * e, const
lapack_int * ipiv);
Description
?hetri_3 computes the inverse of a complex Hermitian matrix A using the factorization computed
by ?hetrf_rk: A = P*U*D*(UH)*(PT) or A = P*L*D*(LH)*(PT), where U (or L) is a unit upper (or lower)
triangular matrix, UH (or LH) is the conjugate of U (or L), P is a permutation matrix, PT is the transpose of P,
and D is a Hermitian and block diagonal with 1-by-1 and 2-by-2 diagonal blocks.
?hetri_3 sets the leading dimension of the workspace before calling ?hetri_3x, which actually computes
the inverse.
655
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
uplo Specifies whether the details of the factorization are stored as an upper or
lower triangular matrix.
ipiv Array of size n. Details of the interchanges and the block structure of D as
determined by ?hetrf_rk.
Output Parameters
Return Values
This function returns a value info.
= 0: Successful exit.
< 0: If info = -i, the ith argument had an illegal value.
> 0: If info = i, D(i,i) = 0; the matrix is singular and its inverse could not be computed.
656
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
?sptri
Computes the inverse of a symmetric matrix using
U*D*UT or L*D*LT Bunch-Kaufman factorization of
matrix in packed storage.
Syntax
lapack_int LAPACKE_ssptri (int matrix_layout , char uplo , lapack_int n , float * ap ,
const lapack_int * ipiv );
lapack_int LAPACKE_dsptri (int matrix_layout , char uplo , lapack_int n , double * ap ,
const lapack_int * ipiv );
lapack_int LAPACKE_csptri (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * ap , const lapack_int * ipiv );
lapack_int LAPACKE_zsptri (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * ap , const lapack_int * ipiv );
Include Files
• mkl.h
Description
The routine computes the inverse inv(A) of a packed symmetric matrix A. Before calling this routine,
call ?sptrf to factorize A.
Input Parameters
ipiv Array, size at least max(1, n). The ipiv array, as returned by ?sptrf.
Output Parameters
Return Values
This function returns a value info.
657
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Application Notes
The computed inverse X satisfies the following error bounds:
See Also
Matrix Storage Schemes
?hptri
Computes the inverse of a complex Hermitian matrix
using U*D*UH or L*D*LH Bunch-Kaufman factorization
of matrix in packed storage.
Syntax
lapack_int LAPACKE_chptri (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * ap , const lapack_int * ipiv );
lapack_int LAPACKE_zhptri (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * ap , const lapack_int * ipiv );
Include Files
• mkl.h
Description
The routine computes the inverse inv(A) of a complex Hermitian matrix A using packed storage. Before
calling this routine, call ?hptrf to factorize A.
Input Parameters
658
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ap Array ap (size max(1,n(n+1)/2)) contains the factorization of the
matrix A, as returned by ?hptrf.
Output Parameters
Return Values
This function returns a value info.
If info = i, the i-th diagonal element of D is zero, D is singular, and the inversion could not be completed.
Application Notes
The computed inverse X satisfies the following error bounds:
See Also
Matrix Storage Schemes
?trtri
Computes the inverse of a triangular matrix.
Syntax
lapack_int LAPACKE_strtri (int matrix_layout , char uplo , char diag , lapack_int n ,
float * a , lapack_int lda );
lapack_int LAPACKE_dtrtri (int matrix_layout , char uplo , char diag , lapack_int n ,
double * a , lapack_int lda );
lapack_int LAPACKE_ctrtri (int matrix_layout , char uplo , char diag , lapack_int n ,
lapack_complex_float * a , lapack_int lda );
lapack_int LAPACKE_ztrtri (int matrix_layout , char uplo , char diag , lapack_int n ,
lapack_complex_double * a , lapack_int lda );
Include Files
• mkl.h
659
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Description
Input Parameters
Output Parameters
Return Values
This function returns a value info.
Application Notes
The computed inverse X satisfies the following error bounds:
The total number of floating-point operations is approximately (1/3)n3 for real flavors and (4/3)n3 for
complex flavors.
See Also
Matrix Storage Schemes
660
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
?tftri
Computes the inverse of a triangular matrix stored in
the Rectangular Full Packed (RFP) format.
Syntax
lapack_int LAPACKE_stftri (int matrix_layout , char transr , char uplo , char diag ,
lapack_int n , float * a );
lapack_int LAPACKE_dtftri (int matrix_layout , char transr , char uplo , char diag ,
lapack_int n , double * a );
lapack_int LAPACKE_ctftri (int matrix_layout , char transr , char uplo , char diag ,
lapack_int n , lapack_complex_float * a );
lapack_int LAPACKE_ztftri (int matrix_layout , char transr , char uplo , char diag ,
lapack_int n , lapack_complex_double * a );
Include Files
• mkl.h
Description
Computes the inverse of a triangular matrix A stored in the Rectangular Full Packed (RFP) format. For the
description of the RFP format, see Matrix Storage Schemes.
This is the block version of the algorithm, calling Level 3 BLAS.
Input Parameters
transr Must be 'N', 'T' (for real data) or 'C' (for complex data).
661
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
a Array, size max(1, n*(n + 1)/2). The array a contains the matrix A in
the RFP format.
Output Parameters
Return Values
This function returns a value info.
If info = i, Ai, i is exactly zero. The triangular matrix is singular and its inverse cannot be computed.
See Also
Matrix Storage Schemes
?tptri
Computes the inverse of a triangular matrix using
packed storage.
Syntax
lapack_int LAPACKE_stptri (int matrix_layout , char uplo , char diag , lapack_int n ,
float * ap );
lapack_int LAPACKE_dtptri (int matrix_layout , char uplo , char diag , lapack_int n ,
double * ap );
lapack_int LAPACKE_ctptri (int matrix_layout , char uplo , char diag , lapack_int n ,
lapack_complex_float * ap );
lapack_int LAPACKE_ztptri (int matrix_layout , char uplo , char diag , lapack_int n ,
lapack_complex_double * ap );
Include Files
• mkl.h
Description
Input Parameters
662
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
diag Must be 'N' or 'U'.
Output Parameters
Return Values
This function returns a value info.
Application Notes
The computed inverse X satisfies the following error bounds:
The total number of floating-point operations is approximately (1/3)n3 for real flavors and (4/3)n3 for
complex flavors.
See Also
Matrix Storage Schemes
?geequ
Computes row and column scaling factors intended to
equilibrate a general matrix and reduce its condition
number.
Syntax
lapack_int LAPACKE_sgeequ( int matrix_layout, lapack_int m, lapack_int n, const float*
a, lapack_int lda, float* r, float* c, float* rowcnd, float* colcnd, float* amax );
lapack_int LAPACKE_dgeequ( int matrix_layout, lapack_int m, lapack_int n, const double*
a, lapack_int lda, double* r, double* c, double* rowcnd, double* colcnd, double* amax );
663
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine computes row and column scalings intended to equilibrate an m-by-n matrix A and reduce its
condition number. The output array r returns the row scale factors and the array c the column scale factors.
These factors are chosen to try to make the largest element in each row and column of the matrix B with
elements bij=r[i-1]*aij*c[j-1] have absolute value 1.
Input Parameters
a Array: size max(1, lda*n) for column major layout and max(1,
lda*m) for row major layout.
Contains the m-by-n matrix A whose equilibration factors are to be
computed.
Output Parameters
colcnd If info = 0, colcnd contains the ratio of the smallest c[i] to the
largest c[i].
Return Values
This function returns a value info.
664
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = -i, parameter i had an illegal value.
Application Notes
All the components of r and c are restricted to be between SMLNUM = smallest safe number and BIGNUM=
largest safe number. Use of these scaling factors is not guaranteed to reduce the condition number of A but
works well in practice.
SMLNUM and BIGNUM are parameters representing machine precision. You can use the ?lamch routines to
compute them. For example, compute single precision values of SMLNUM and BIGNUM as follows:
If amax is very close to SMLNUM or very close to BIGNUM, the matrix A should be scaled.
See Also
Error Analysis
Matrix Storage Schemes
?geequb
Computes row and column scaling factors restricted to
a power of radix to equilibrate a general matrix and
reduce its condition number.
Syntax
lapack_int LAPACKE_sgeequb( int matrix_layout, lapack_int m, lapack_int n, const float*
a, lapack_int lda, float* r, float* c, float* rowcnd, float* colcnd, float* amax );
lapack_int LAPACKE_dgeequb( int matrix_layout, lapack_int m, lapack_int n, const
double* a, lapack_int lda, double* r, double* c, double* rowcnd, double* colcnd, double*
amax );
lapack_int LAPACKE_cgeequb( int matrix_layout, lapack_int m, lapack_int n, const
lapack_complex_float* a, lapack_int lda, float* r, float* c, float* rowcnd, float*
colcnd, float* amax );
lapack_int LAPACKE_zgeequb( int matrix_layout, lapack_int m, lapack_int n, const
lapack_complex_double* a, lapack_int lda, double* r, double* c, double* rowcnd, double*
colcnd, double* amax );
Include Files
• mkl.h
Description
665
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The routine computes row and column scalings intended to equilibrate an m-by-n general matrix A and
reduce its condition number. The output array r returns the row scale factors and the array c - the column
scale factors. These factors are chosen to try to make the largest element in each row and column of the
matrix B with elements bi,j = r[i-1]*ai,j*c[j-1] have an absolute value of at most the radix.
r[i-1] and c[j-1] are restricted to be a power of the radix between SMLNUM = smallest safe number and
BIGNUM = largest safe number. Use of these scaling factors is not guaranteed to reduce the condition number
of a but works well in practice.
SMLNUM and BIGNUM are parameters representing machine precision. You can use the ?lamch routines to
compute them. For example, compute single precision values of SMLNUM and BIGNUM as follows:
Input Parameters
a Array: size max(1, lda*n) for column major layout and max(1,
lda*m) for row major layout.
Contains the m-by-n matrix A whose equilibration factors are to be
computed.
Output Parameters
If info = 0, or info>m, the array r contains the row scale factors for
the matrix A.
If info = 0, the array c contains the column scale factors for the
matrix A.
colcnd If info = 0, colcnd contains the ratio of the smallest c[i] to the
largest c[i]. If colcnd≥ 0.1, it is not worth scaling by c.
amax Absolute value of the largest element of the matrix A. If amax is very
close to SMLNUM or very close to BIGNUM, the matrix should be scaled.
Return Values
This function returns a value info.
666
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = 0, the execution is successful.
See Also
Error Analysis
Matrix Storage Schemes
?gbequ
Computes row and column scaling factors intended to
equilibrate a banded matrix and reduce its condition
number.
Syntax
lapack_int LAPACKE_sgbequ( int matrix_layout, lapack_int m, lapack_int n, lapack_int
kl, lapack_int ku, const float* ab, lapack_int ldab, float* r, float* c, float* rowcnd,
float* colcnd, float* amax );
lapack_int LAPACKE_dgbequ( int matrix_layout, lapack_int m, lapack_int n, lapack_int
kl, lapack_int ku, const double* ab, lapack_int ldab, double* r, double* c, double*
rowcnd, double* colcnd, double* amax );
lapack_int LAPACKE_cgbequ( int matrix_layout, lapack_int m, lapack_int n, lapack_int
kl, lapack_int ku, const lapack_complex_float* ab, lapack_int ldab, float* r, float* c,
float* rowcnd, float* colcnd, float* amax );
lapack_int LAPACKE_zgbequ( int matrix_layout, lapack_int m, lapack_int n, lapack_int
kl, lapack_int ku, const lapack_complex_double* ab, lapack_int ldab, double* r, double*
c, double* rowcnd, double* colcnd, double* amax );
Include Files
• mkl.h
Description
The routine computes row and column scalings intended to equilibrate an m-by-n band matrix A and reduce
its condition number. The output array r returns the row scale factors and the array c the column scale
factors. These factors are chosen to try to make the largest element in each row and column of the matrix B
with elements bij=r[i - 1]*aij*c[j - 1] have absolute value 1.
Input Parameters
667
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ab Array, size max(1, ldab*n) for column major layout and max(1,
ldab*m) for row major layout. Contains the original band matrix A.
Output Parameters
colcnd If info = 0, colcnd contains the ratio of the smallest c[i] to the
largest c[i].
Return Values
This function returns a value info.
If info = i and
Application Notes
All the components of r and c are restricted to be between SMLNUM = smallest safe number and BIGNUM=
largest safe number. Use of these scaling factors is not guaranteed to reduce the condition number of A but
works well in practice.
SMLNUM and BIGNUM are parameters representing machine precision. You can use the ?lamch routines to
compute them. For example, compute single precision values of SMLNUM and BIGNUM as follows:
If amax is very close to SMLNUM or very close to BIGNUM, the matrix A should be scaled.
See Also
Error Analysis
Matrix Storage Schemes
668
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
?gbequb
Computes row and column scaling factors restricted to
a power of radix to equilibrate a banded matrix and
reduce its condition number.
Syntax
lapack_int LAPACKE_sgbequb( int matrix_layout, lapack_int m, lapack_int n, lapack_int
kl, lapack_int ku, const float* ab, lapack_int ldab, float* r, float* c, float* rowcnd,
float* colcnd, float* amax );
lapack_int LAPACKE_dgbequb( int matrix_layout, lapack_int m, lapack_int n, lapack_int
kl, lapack_int ku, const double* ab, lapack_int ldab, double* r, double* c, double*
rowcnd, double* colcnd, double* amax );
lapack_int LAPACKE_cgbequb( int matrix_layout, lapack_int m, lapack_int n, lapack_int
kl, lapack_int ku, const lapack_complex_float* ab, lapack_int ldab, float* r, float* c,
float* rowcnd, float* colcnd, float* amax );
lapack_int LAPACKE_zgbequb( int matrix_layout, lapack_int m, lapack_int n, lapack_int
kl, lapack_int ku, const lapack_complex_double* ab, lapack_int ldab, double* r, double*
c, double* rowcnd, double* colcnd, double* amax );
Include Files
• mkl.h
Description
The routine computes row and column scalings intended to equilibrate an m-by-n banded matrix A and
reduce its condition number. The output array r returns the row scale factors and the array c - the column
scale factors. These factors are chosen to try to make the largest element in each row and column of the
matrix B with elements bi, j=r[i-1]*ai, j*c[j-1] have an absolute value of at most the radix.
r[i] and c[j] are restricted to be a power of the radix between SMLNUM = smallest safe number and
BIGNUM = largest safe number. Use of these scaling factors is not guaranteed to reduce the condition
number of a but works well in practice.
SMLNUM and BIGNUM are parameters representing machine precision. You can use the ?lamch routines to
compute them. For example, compute single precision values of SMLNUM and BIGNUM as follows:
Input Parameters
669
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ab Array: size max(1, ldab*n) for column major layout and max(1,
ldab*m) for row major layout
Output Parameters
If info = 0, or info>m, the array r contains the row scale factors for
the matrix A.
If info = 0, the array c contains the column scale factors for the
matrix A.
colcnd If info = 0, colcnd contains the ratio of the smallest c[i] to the
largest c[i]. If colcnd≥ 0.1, it is not worth scaling by c.
amax Absolute value of the largest element of the matrix A. If amax is very
close to SMLNUM or BIGNUM, the matrix should be scaled.
Return Values
This function returns a value info.
See Also
Error Analysis
Matrix Storage Schemes
?poequ
Computes row and column scaling factors intended to
equilibrate a symmetric (Hermitian) positive definite
matrix and reduce its condition number.
Syntax
lapack_int LAPACKE_spoequ( int matrix_layout, lapack_int n, const float* a, lapack_int
lda, float* s, float* scond, float* amax );
lapack_int LAPACKE_dpoequ( int matrix_layout, lapack_int n, const double* a, lapack_int
lda, double* s, double* scond, double* amax );
lapack_int LAPACKE_cpoequ( int matrix_layout, lapack_int n, const lapack_complex_float*
a, lapack_int lda, float* s, float* scond, float* amax );
670
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_zpoequ( int matrix_layout, lapack_int n, const
lapack_complex_double* a, lapack_int lda, double* s, double* scond, double* amax );
Include Files
• mkl.h
Description
The routine computes row and column scalings intended to equilibrate a symmetric (Hermitian) positive-
definite matrix A and reduce its condition number (with respect to the two-norm). The output array s returns
scale factors such that contains
These factors are chosen so that the scaled matrix B with elements Bi,j=s[i-1]*Ai,j*s[j-1] has diagonal
elements equal to 1.
This choice of s puts the condition number of B within a factor n of the smallest possible condition number
over all possible diagonal scalings.
Input Parameters
Output Parameters
s Array, size n.
If info = 0, the array s contains the scale factors for A.
scond If info = 0, scond contains the ratio of the smallest s[i] to the
largest s[i].
Return Values
This function returns a value info.
671
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Application Notes
If scond≥ 0.1 and amax is neither too large nor too small, it is not worth scaling by s.
If amax is very close to SMLNUM or very close to BIGNUM, the matrix A should be scaled.
See Also
Error Analysis
Matrix Storage Schemes
?poequb
Computes row and column scaling factors intended to
equilibrate a symmetric (Hermitian) positive definite
matrix and reduce its condition number.
Syntax
lapack_int LAPACKE_spoequb( int matrix_layout, lapack_int n, const float* a, lapack_int
lda, float* s, float* scond, float* amax );
lapack_int LAPACKE_dpoequb( int matrix_layout, lapack_int n, const double* a,
lapack_int lda, double* s, double* scond, double* amax );
lapack_int LAPACKE_cpoequb( int matrix_layout, lapack_int n, const
lapack_complex_float* a, lapack_int lda, float* s, float* scond, float* amax );
lapack_int LAPACKE_zpoequb( int matrix_layout, lapack_int n, const
lapack_complex_double* a, lapack_int lda, double* s, double* scond, double* amax );
Include Files
• mkl.h
Description
The routine computes row and column scalings intended to equilibrate a symmetric (Hermitian) positive-
definite matrix A and reduce its condition number (with respect to the two-norm).
These factors are chosen so that the scaled matrix B with elements Bi,j=s[i-1]*Ai,j*s[j-1] has diagonal
elements equal to 1. s[i - 1] is a power of two nearest to, but not exceeding 1/sqrt(Ai,i).
This choice of s puts the condition number of B within a factor n of the smallest possible condition number
over all possible diagonal scalings.
Input Parameters
672
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
scond If info = 0, scond contains the ratio of the smallest s[i] to the
largest s[i]. If scond≥ 0.1, and amax is neither too large nor too
small, it is not worth scaling by s.
amax Absolute value of the largest element of the matrix A. If amax is very
close to SMLNUM or BIGNUM, the matrix should be scaled.
Return Values
This function returns a value info.
See Also
Error Analysis
Matrix Storage Schemes
?ppequ
Computes row and column scaling factors intended to
equilibrate a symmetric (Hermitian) positive definite
matrix in packed storage and reduce its condition
number.
Syntax
lapack_int LAPACKE_sppequ( int matrix_layout, char uplo, lapack_int n, const float* ap,
float* s, float* scond, float* amax );
lapack_int LAPACKE_dppequ( int matrix_layout, char uplo, lapack_int n, const double*
ap, double* s, double* scond, double* amax );
lapack_int LAPACKE_cppequ( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float* ap, float* s, float* scond, float* amax );
lapack_int LAPACKE_zppequ( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double* ap, double* s, double* scond, double* amax );
Include Files
• mkl.h
Description
The routine computes row and column scalings intended to equilibrate a symmetric (Hermitian) positive
definite matrix A in packed storage and reduce its condition number (with respect to the two-norm). The
output array s returns scale factors such that contains
673
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
These factors are chosen so that the scaled matrix B with elements bij=s[i-1]*aij*s[j-1] has diagonal
elements equal to 1.
This choice of s puts the condition number of B within a factor n of the smallest possible condition number
over all possible diagonal scalings.
Input Parameters
Output Parameters
scond If info = 0, scond contains the ratio of the smallest s[i] to the
largest s[i].
674
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Return Values
This function returns a value info.
Application Notes
If scond≥ 0.1 and amax is neither too large nor too small, it is not worth scaling by s.
If amax is very close to SMLNUM or very close to BIGNUM, the matrix A should be scaled.
See Also
Error Analysis
Matrix Storage Schemes
?pbequ
Computes row and column scaling factors intended to
equilibrate a symmetric (Hermitian) positive-definite
band matrix and reduce its condition number.
Syntax
lapack_int LAPACKE_spbequ( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
const float* ab, lapack_int ldab, float* s, float* scond, float* amax );
lapack_int LAPACKE_dpbequ( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
const double* ab, lapack_int ldab, double* s, double* scond, double* amax );
lapack_int LAPACKE_cpbequ( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
const lapack_complex_float* ab, lapack_int ldab, float* s, float* scond, float* amax );
lapack_int LAPACKE_zpbequ( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
const lapack_complex_double* ab, lapack_int ldab, double* s, double* scond, double*
amax );
Include Files
• mkl.h
Description
The routine computes row and column scalings intended to equilibrate a symmetric (Hermitian) positive
definite band matrix A and reduce its condition number (with respect to the two-norm). The output array s
returns scale factors such that contains
These factors are chosen so that the scaled matrix B with elements bij=s[i-1]*aij*s[j-1] has diagonal
elements equal to 1. This choice of s puts the condition number of B within a factor n of the smallest possible
condition number over all possible diagonal scalings.
675
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
The array ap contains either the upper or the lower triangular part of
the matrix A (as specified by uplo) in band storage (see Matrix
Storage Schemes).
Output Parameters
scond If info = 0, scond contains the ratio of the smallest s[i] to the
largest s[i].
Return Values
This function returns a value info.
Application Notes
If scond≥ 0.1 and amax is neither too large nor too small, it is not worth scaling by s.
If amax is very close to SMLNUM or very close to BIGNUM, the matrix A should be scaled.
See Also
Error Analysis
Matrix Storage Schemes
676
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
?syequb
Computes row and column scaling factors intended to
equilibrate a symmetric indefinite matrix and reduce
its condition number.
Syntax
lapack_int LAPACKE_ssyequb( int matrix_layout, char uplo, lapack_int n, const float* a,
lapack_int lda, float* s, float* scond, float* amax );
lapack_int LAPACKE_dsyequb( int matrix_layout, char uplo, lapack_int n, const double*
a, lapack_int lda, double* s, double* scond, double* amax );
lapack_int LAPACKE_csyequb( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float* a, lapack_int lda, float* s, float* scond, float* amax );
lapack_int LAPACKE_zsyequb( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double* a, lapack_int lda, double* s, double* scond, double* amax );
Include Files
• mkl.h
Description
The routine computes row and column scalings intended to equilibrate a symmetric indefinite matrix A and
reduce its condition number (with respect to the two-norm).
The array s contains the scale factors, s[i-1] = 1/sqrt(A(i,i)). These factors are chosen so that the
scaled matrix B with elements bi,j=s[i-1]*ai, j*s[j-1] has ones on the diagonal.
This choice of s puts the condition number of B within a factor n of the smallest possible condition number
over all possible diagonal scalings.
Input Parameters
677
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
scond If info = 0, scond contains the ratio of the smallest s[i] to the
largest s[i]. If scond≥ 0.1, and amax is neither too large nor too
small, it is not worth scaling by s.
amax Absolute value of the largest element of the matrix A. If amax is very
close to SMLNUM or BIGNUM, the matrix should be scaled.
Return Values
This function returns a value info.
See Also
Error Analysis
Matrix Storage Schemes
?heequb
Computes row and column scaling factors intended to
equilibrate a Hermitian indefinite matrix and reduce its
condition number.
Syntax
lapack_int LAPACKE_cheequb( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float* a, lapack_int lda, float* s, float* scond, float* amax );
lapack_int LAPACKE_zheequb( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double* a, lapack_int lda, double* s, double* scond, double* amax );
Include Files
• mkl.h
Description
The routine computes row and column scalings intended to equilibrate a Hermitian indefinite matrix A and
reduce its condition number (with respect to the two-norm).
The array s contains the scale factors, s[i-1] = 1/sqrt(ai,i). These factors are chosen so that the scaled
matrix B with elements bi,j=s[i-1]*ai,j*s[j-1] has ones on the diagonal.
This choice of s puts the condition number of B within a factor n of the smallest possible condition number
over all possible diagonal scalings.
Input Parameters
678
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
uplo Must be 'U' or 'L'.
Output Parameters
scond If info = 0, scond contains the ratio of the smallest s[i] to the
largest s[i]. If scond≥ 0.1, and amax is neither too large nor too
small, it is not worth scaling by s.
amax Absolute value of the largest element of the matrix A. If amax is very
close to SMLNUM or BIGNUM, the matrix should be scaled.
Return Values
This function returns a value info.
See Also
Error Analysis
Matrix Storage Schemes
679
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Matrix type, storage Simple Driver Expert Driver Expert Driver using
scheme Extra-Precise
Interative Refinement
In this table ? stands for s (single precision real), d (double precision real), c (single precision complex), or z
(double precision complex). In the description of ?gesv and ?posv routines, the ? sign stands for combined
character codes ds and zc for the mixed precision subroutines.
?gesv
Computes the solution to the system of linear
equations with a square coefficient matrix A and
multiple right-hand sides.
Syntax
lapack_int LAPACKE_sgesv (int matrix_layout , lapack_int n , lapack_int nrhs , float *
a , lapack_int lda , lapack_int * ipiv , float * b , lapack_int ldb );
lapack_int LAPACKE_dgesv (int matrix_layout , lapack_int n , lapack_int nrhs , double *
a , lapack_int lda , lapack_int * ipiv , double * b , lapack_int ldb );
lapack_int LAPACKE_cgesv (int matrix_layout , lapack_int n , lapack_int nrhs ,
lapack_complex_float * a , lapack_int lda , lapack_int * ipiv , lapack_complex_float *
b , lapack_int ldb );
680
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_zgesv (int matrix_layout , lapack_int n , lapack_int nrhs ,
lapack_complex_double * a , lapack_int lda , lapack_int * ipiv , lapack_complex_double
* b , lapack_int ldb );
lapack_int LAPACKE_dsgesv (int matrix_layout, lapack_int n, lapack_int nrhs, double *
a, lapack_int lda, lapack_int * ipiv, double * b, lapack_int ldb, double * x, lapack_int
ldx, lapack_int * iter);
lapack_int LAPACKE_zcgesv (int matrix_layout, lapack_int n, lapack_int nrhs,
lapack_complex_double * a, lapack_int lda, lapack_int * ipiv, lapack_complex_double *
b, lapack_int ldb, lapack_complex_double * x, lapack_int ldx, lapack_int * iter);
Include Files
• mkl.h
Description
The routine solves for X the system of linear equations A*X = B, where A is an n-by-n matrix, the columns
of matrix B are individual right-hand sides, and the columns of X are the corresponding solutions.
The LU decomposition with partial pivoting and row interchanges is used to factor A as A = P*L*U, where P
is a permutation matrix, L is unit lower triangular, and U is upper triangular. The factored form of A is then
used to solve the system of equations A*X = B.
The dsgesv and zcgesv are mixed precision iterative refinement subroutines for exploiting fast single
precision hardware. They first attempt to factorize the matrix in single precision (dsgesv) or single complex
precision (zcgesv) and use this factorization within an iterative refinement procedure to produce a solution
with double precision (dsgesv) / double complex precision (zcgesv) normwise backward error quality (see
below). If the approach fails, the method switches to a double precision or double complex precision
factorization respectively and computes the solution.
The iterative refinement is not going to be a winning strategy if the ratio single precision performance over
double precision performance is too small. A reasonable strategy should take the number of right-hand sides
and the size of the matrix into account. This might be done with a call to ilaenv in the future. At present,
iterative refinement is implemented.
The iterative refinement process is stopped if
Input Parameters
681
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
n The number of linear equations, that is, the order of the matrix A; n≥
0.
nrhs The number of right-hand sides, that is, the number of columns of the
matrix B; nrhs≥ 0.
b The array bof size max(1, ldb*nrhs) for column major layout and
max(1, ldb*n) for row major layout contains the n-by-nrhs matrix of
right hand side matrix B.
ldb The leading dimension of the array b; ldb≥ max(1, n) for column
major layout and ldb≥nrhs for row major layout.
ldx The leading dimension of the array x; ldx≥ max(1, n) for column
major layout and ldx≥nrhs for row major layout.
Output Parameters
ipiv Array, size at least max(1, n). The pivot indices that define the
permutation matrix P; row i of the matrix was interchanged with row
ipiv[i-1]. Corresponds to the single precision factorization (if
info= 0 and iter≥ 0) or the double precision factorization (if info=
0 and iter < 0).
x Array, size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout. If info = 0, contains the n-by-nrhs
solution matrix X.
682
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If iter > 0: iterative refinement has been successfully used. Returns
the number of iterations.
Return Values
This function returns a value info.
If info = i, Ui, i (computed in double precision for mixed precision subroutines) is exactly zero. The
factorization has been completed, but the factor U is exactly singular, so the solution could not be computed.
See Also
dlamch
sgetrf
Matrix Storage Schemes
?gesvx
Computes the solution to the system of linear
equations with a square coefficient matrix A and
multiple right-hand sides, and provides error bounds
on the solution.
Syntax
lapack_int LAPACKE_sgesvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int nrhs, float* a, lapack_int lda, float* af, lapack_int ldaf, lapack_int* ipiv,
char* equed, float* r, float* c, float* b, lapack_int ldb, float* x, lapack_int ldx,
float* rcond, float* ferr, float* berr, float* rpivot );
lapack_int LAPACKE_dgesvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int nrhs, double* a, lapack_int lda, double* af, lapack_int ldaf, lapack_int*
ipiv, char* equed, double* r, double* c, double* b, lapack_int ldb, double* x,
lapack_int ldx, double* rcond, double* ferr, double* berr, double* rpivot );
lapack_int LAPACKE_cgesvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int nrhs, lapack_complex_float* a, lapack_int lda, lapack_complex_float* af,
lapack_int ldaf, lapack_int* ipiv, char* equed, float* r, float* c,
lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x, lapack_int ldx,
float* rcond, float* ferr, float* berr, float* rpivot );
lapack_int LAPACKE_zgesvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int nrhs, lapack_complex_double* a, lapack_int lda, lapack_complex_double* af,
lapack_int ldaf, lapack_int* ipiv, char* equed, double* r, double* c,
lapack_complex_double* b, lapack_int ldb, lapack_complex_double* x, lapack_int ldx,
double* rcond, double* ferr, double* berr, double* rpivot );
Include Files
• mkl.h
Description
The routine uses the LU factorization to compute the solution to a real or complex system of linear equations
A*X = B, where A is an n-by-n matrix, the columns of matrix B are individual right-hand sides, and the
columns of X are the corresponding solutions.
683
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Error bounds on the solution and a condition estimate are also provided.
The routine ?gesvx performs the following steps:
1. If fact = 'E', real scaling factors r and c are computed to equilibrate the system:
Input Parameters
If trans = 'C', the system has the form AH*X = B (Transpose for
real flavors, conjugate transpose for complex flavors).
684
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
nrhs The number of right hand sides; the number of columns of the
matrices B and X; nrhs≥ 0.
b The array bbof size max(1, ldb*nrhs) for column major layout and
max(1, ldb*n) for row major layout contains the matrix B whose
columns are the right-hand sides for the systems of equations.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
ipiv Array, size at least max(1, n). The array ipiv is an input argument if
fact = 'F'. It contains the pivot indices from the factorization A =
P*L*U as computed by ?getrf; row i of the matrix was interchanged
with row ipiv[i-1].
r, c Arrays: r (size n), c (size n). The array r contains the row scale
factors for A, and the array c contains the column scale factors for A.
These arrays are input arguments if fact = 'F' only; otherwise they
are output arguments.
If equed = 'R' or 'B', A is multiplied on the left by diag(r); if equed
= 'N' or 'C', r is not accessed.
If fact = 'F' and equed = 'R' or 'B', each element of r must be
positive.
If equed = 'C' or 'B', A is multiplied on the right by diag(c); if
equed = 'N' or 'R', c is not accessed.
685
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.
Output Parameters
x Array, size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout.
If info = 0 or info = n+1, the array x contains the solution matrix
X to the original system of equations. Note that A and B are modified
on exit if equed≠'N', and the solution to the equilibrated system is:
ferr Array, size at least max(1, nrhs). Contains the estimated forward
error bound for each solution vector xj (the j-th column of the
solution matrix X). If xtrue is the true solution corresponding to xj,
ferr[j-1] is an estimated upper bound for the magnitude of the
largest element in (xj - xtrue) divided by the magnitude of the
largest element in xj. The estimate is as reliable as the estimate for
rcond, and is almost always a slight overestimate of the true error.
686
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
berr Array, size at least max(1, nrhs). Contains the component-wise
relative backward error for each solution vector xj, that is, the
smallest relative change in any element of A or B that makes xj an
exact solution.
ipiv If fact = 'N'or 'E', then ipiv is an output argument and on exit
contains the pivot indices from the factorization A = P*L*U of the
original matrix A (if fact = 'N') or of the equilibrated matrix A (if
fact = 'E').
Return Values
This function returns a value info.
If info = i, and i≤n, then U(i, i) is exactly zero. The factorization has been completed, but the factor U is
exactly singular, so the solution and error bounds could not be computed; rcond = 0 is returned.
If info = n + 1, then U is nonsingular, but rcond is less than machine precision, meaning that the matrix is
singular to working precision. Nevertheless, the solution and error bounds are computed because there are a
number of situations where the computed solution can be more accurate than the value of rcond would
suggest.
See Also
Matrix Storage Schemes
?gesvxx
Uses extra precise iterative refinement to compute the
solution to the system of linear equations with a
square coefficient matrix A and multiple right-hand
sides
Syntax
lapack_int LAPACKE_sgesvxx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int nrhs, float* a, lapack_int lda, float* af, lapack_int ldaf, lapack_int* ipiv,
char* equed, float* r, float* c, float* b, lapack_int ldb, float* x, lapack_int ldx,
float* rcond, float* rpvgrw, float* berr, lapack_int n_err_bnds, float* err_bnds_norm,
float* err_bnds_comp, lapack_int nparams, const float* params );
687
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine uses the LU factorization to compute the solution to a real or complex system of linear equations
A*X = B, where A is an n-by-n matrix, the columns of the matrix B are individual right-hand sides, and the
columns of X are the corresponding solutions.
Both normwise and maximum componentwise error bounds are also provided on request. The routine returns
a solution with a small guaranteed error (O(eps), where eps is the working machine precision) unless the
matrix is very ill-conditioned, in which case a warning is returned. Relevant condition numbers are also
calculated and returned.
The routine accepts user-provided factorizations and equilibration factors; see definitions of the fact and
equed options. Solving with refinement and using a factorization from a previous call of the routine also
produces a solution with O(eps) errors or warnings but that may not be true for general user-provided
factorizations and equilibration factors if they differ from what the routine would itself produce.
The routine ?gesvxx performs the following steps:
1. If fact = 'E', scaling factors r and c are computed to equilibrate the system:
688
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
3. If some Ui,i= 0, so that U is exactly singular, then the routine returns with info = i. Otherwise, the
factored form of A is used to estimate the condition number of the matrix A (see the rcond parameter).
If the reciprocal of the condition number is less than machine precision, the routine still goes on to
solve for X and compute error bounds.
4. The system of equations is solved for X using the factored form of A.
5. By default, unless is set to zero, the routine applies iterative refinement to improve the computed
solution matrix and calculate error bounds. Refinement calculates the residual to at least twice the
working precision.
6. If equilibration was used, the matrix X is premultiplied by diag(c) (if trans = 'N') or diag(r) (if
trans = 'T' or 'C') so that it solves the original system before equilibration.
Input Parameters
nrhs The number of right hand sides; the number of columns of the
matrices B and X; nrhs≥ 0.
689
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.
ipiv Array, size at least max(1, n). The array ipiv is an input argument if
fact = 'F'. It contains the pivot indices from the factorization A =
P*L*U as computed by ?getrf; row i of the matrix was interchanged
with row ipiv[i-1].
r, c Arrays: r (size n), c (size n). The array r contains the row scale
factors for A, and the array c contains the column scale factors for A.
These arrays are input arguments if fact = 'F' only; otherwise they
are output arguments.
If equed = 'R' or 'B', A is multiplied on the left by diag(r); if equed
= 'N' or 'C', r is not accessed.
If fact = 'F' and equed = 'R'or 'B', each element of r must be
positive.
If equed = 'C' or 'B', A is multiplied on the right by diag(c); if
equed = 'N' or 'R', c is not accessed.
If fact = 'F' and equed = 'C' or 'B', each element of c must be
positive.
Each element of r or c should be a power of the radix to ensure a
reliable solution and error estimates. Scaling by powers of the radix
does not cause rounding errors unless the result underflows or
overflows. Rounding errors during scaling lead to refining with a
matrix that is not equivalent to the input matrix, producing error
estimates that may not be reliable.
ldb The leading dimension of the array b; ldb≥ max(1, n) for column
major layout and ldb≥nrhs for row major layout.
690
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.
n_err_bnds Number of error bounds to return for each right hand side and each
type (normwise or componentwise). See err_bnds_norm and
err_bnds_comp descriptions in Output Arguments section below.
Default 10.0
Output Parameters
x Array, size max(1, ldx*nrhs) for column major layout and max(1, ldx*n)
for row major layout.
If info = 0, the array x contains the solution n-by-nrhs matrix X to the
original system of equations. Note that A and B are modified on exit if
equed≠'N', and the solution to the equilibrated system is:
691
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
a Array a is not modified on exit if fact = 'F' or 'N', or if fact = 'E' and
equed = 'N'.
If equed≠'N', A is scaled on exit as follows:
berr Array, size at least max(1, nrhs). Contains the componentwise relative
backward error for each solution vector xj, that is, the smallest relative
change in any element of A or B that makes xj an exact solution.
692
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in err_bnds_norm[(err-1)*nrhs + i - 1].
693
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in err_bnds_comp[(err-1)*nrhs + i - 1].
ipiv If fact = 'N' or 'E', then ipiv is an output argument and on exit
contains the pivot indices from the factorization A = P*L*U of the original
matrix A (if fact = 'N') or of the equilibrated matrix A (if fact = 'E').
params If an entry is less than 0.0, that entry is filled with the default value used
for that parameter, otherwise the entry is not modified
Return Values
This function returns a value info.
If info = 0, the execution is successful. The solution to every right-hand side is guaranteed.
694
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = -i, parameter i had an illegal value.
If 0 < info≤n: Uinfo,info is exactly zero. The factorization has been completed, but the factor U is exactly
singular, so the solution and error bounds could not be computed; rcond = 0 is returned.
If info = n+j: The solution corresponding to the j-th right-hand side is not guaranteed. The solutions
corresponding to other right-hand sides k with k > j may not be guaranteed as well, but only the first such
right-hand side is reported. If a small componentwise error is not requested params[2] = 0.0, then the j-th
right-hand side is the first with a normwise error bound that is not guaranteed (the smallest j such that for
column major layout err_bnds_norm[j - 1] = 0.0 or err_bnds_comp[j - 1] = 0.0; or for row major
layout err_bnds_norm[(j - 1)*n_err_bnds] = 0.0 or err_bnds_comp[(j - 1)*n_err_bnds] = 0.0).
See the definition of err_bnds_norm and err_bnds_comp for err = 1. To get information about all of the
right-hand sides, check err_bnds_norm or err_bnds_comp.
See Also
Matrix Storage Schemes
?gbsv
Computes the solution to the system of linear
equations with a band coefficient matrix A and
multiple right-hand sides.
Syntax
lapack_int LAPACKE_sgbsv (int matrix_layout , lapack_int n , lapack_int kl , lapack_int
ku , lapack_int nrhs , float * ab , lapack_int ldab , lapack_int * ipiv , float * b ,
lapack_int ldb );
lapack_int LAPACKE_dgbsv (int matrix_layout , lapack_int n , lapack_int kl , lapack_int
ku , lapack_int nrhs , double * ab , lapack_int ldab , lapack_int * ipiv , double * b ,
lapack_int ldb );
lapack_int LAPACKE_cgbsv (int matrix_layout , lapack_int n , lapack_int kl , lapack_int
ku , lapack_int nrhs , lapack_complex_float * ab , lapack_int ldab , lapack_int *
ipiv , lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zgbsv (int matrix_layout , lapack_int n , lapack_int kl , lapack_int
ku , lapack_int nrhs , lapack_complex_double * ab , lapack_int ldab , lapack_int *
ipiv , lapack_complex_double * b , lapack_int ldb );
Include Files
• mkl.h
Description
The routine solves for X the real or complex system of linear equations A*X = B, where A is an n-by-n band
matrix with kl subdiagonals and ku superdiagonals, the columns of matrix B are individual right-hand sides,
and the columns of X are the corresponding solutions.
The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L*U, where L is a
product of permutation and unit lower triangular matrices with kl subdiagonals, and U is upper triangular
with kl+ku superdiagonals. The factored form of A is then used to solve the system of equations A*X = B.
695
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
ab, b Arrays: ab(size max(1, ldab*n)), bof size max(1, ldb*nrhs) for
column major layout and max(1, ldb*n) for row major layout.
ldab The leading dimension of the array ab. (ldab≥ 2kl + ku +1)
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
Output Parameters
ipiv Array, size at least max(1, n). The pivot indices: row i was
interchanged with row ipiv[i-1].
Return Values
This function returns a value info.
If info = i, Ui, i is exactly zero. The factorization has been completed, but the factor U is exactly singular,
so the solution could not be computed.
See Also
Matrix Storage Schemes
?gbsvx
Computes the solution to the real or complex system
of linear equations with a band coefficient matrix A
and multiple right-hand sides, and provides error
bounds on the solution.
696
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_sgbsvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, float* ab, lapack_int ldab, float* afb,
lapack_int ldafb, lapack_int* ipiv, char* equed, float* r, float* c, float* b,
lapack_int ldb, float* x, lapack_int ldx, float* rcond, float* ferr, float* berr, float*
rpivot );
lapack_int LAPACKE_dgbsvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, double* ab, lapack_int ldab, double* afb,
lapack_int ldafb, lapack_int* ipiv, char* equed, double* r, double* c, double* b,
lapack_int ldb, double* x, lapack_int ldx, double* rcond, double* ferr, double* berr,
double* rpivot );
lapack_int LAPACKE_cgbsvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, lapack_complex_float* ab, lapack_int
ldab, lapack_complex_float* afb, lapack_int ldafb, lapack_int* ipiv, char* equed,
float* r, float* c, lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x,
lapack_int ldx, float* rcond, float* ferr, float* berr, float* rpivot );
lapack_int LAPACKE_zgbsvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, lapack_complex_double* ab, lapack_int
ldab, lapack_complex_double* afb, lapack_int ldafb, lapack_int* ipiv, char* equed,
double* r, double* c, lapack_complex_double* b, lapack_int ldb, lapack_complex_double*
x, lapack_int ldx, double* rcond, double* ferr, double* berr, double* rpivot );
Include Files
• mkl.h
Description
The routine uses the LU factorization to compute the solution to a real or complex system of linear equations
A*X = B, AT*X = B, or AH*X = B, where A is a band matrix of order n with kl subdiagonals and ku
superdiagonals, the columns of matrix B are individual right-hand sides, and the columns of X are the
corresponding solutions.
Error bounds on the solution and a condition estimate are also provided.
The routine ?gbsvx performs the following steps:
1. If fact = 'E', real scaling factors r and c are computed to equilibrate the system:
697
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
5. Iterative refinement is applied to improve the computed solution matrix and calculate error bounds and
backward error estimates for it.
6. If equilibration was used, the matrix X is premultiplied by diag(c) (if trans = 'N') or diag(r) (if
trans = 'T' or 'C') so that it solves the original system before equilibration.
Input Parameters
If trans = 'C', the system has the form AH*X = B (Transpose for
real flavors, conjugate transpose for complex flavors).
nrhs The number of right hand sides, the number of columns of the
matrices B and X; nrhs≥ 0.
698
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
ipiv Array, size at least max(1, n). The array ipiv is an input argument if
fact = 'F'. It contains the pivot indices from the factorization A =
P*L*U as computed by ?gbtrf; row i of the matrix was interchanged
with row ipiv[i-1].
If equed = 'C', column equilibration was done, that is, A has been
postmultiplied by diag(c).
if equed = 'B', both row and column equilibration was done, that is,
A has been replaced by diag(r)*A*diag(c).
The array r contains the row scale factors for A, and the array c
contains the column scale factors for A. These arrays are input
arguments if fact = 'F' only; otherwise they are output arguments.
ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.
Output Parameters
x Array, size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout.
699
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
afb If fact = 'N' or 'E', then afb is an output argument and on exit
returns details of the LU factorization of the original matrix A (if fact
= 'N') or of the equilibrated matrix A (if fact = 'E'). See the
description of ab for the form of the equilibrated matrix.
ferr Array, size at least max(1, nrhs). Contains the estimated forward
error bound for each solution vector xj (the j-th column of the solution
matrix X). If xtrue is the true solution corresponding to xj, ferr[j-1]
is an estimated upper bound for the magnitude of the largest element
in (xj - xtrue) divided by the magnitude of the largest element in xj.
The estimate is as reliable as the estimate for rcond, and is almost
always a slight overestimate of the true error.
ipiv If fact = 'N' or 'E', then ipiv is an output argument and on exit
contains the pivot indices from the factorization A = L*U of the
original matrix A (if fact = 'N') or of the equilibrated matrix A (if
fact = 'E').
700
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
equed If fact≠'F', then equed is an output argument. It specifies the form
of equilibration that was done (see the description of equed in Input
Arguments section).
Return Values
This function returns a value info.
If info = i, and i≤n, then Ui, i is exactly zero. The factorization has been completed, but the factor U is
exactly singular, so the solution and error bounds could not be computed; rcond = 0 is returned. If info =
i, and i = n+1, then U is nonsingular, but rcond is less than machine precision, meaning that the matrix is
singular to working precision. Nevertheless, the solution and error bounds are computed because there are a
number of situations where the computed solution can be more accurate than the value of rcond would
suggest.
See Also
Matrix Storage Schemes
?gbsvxx
Uses extra precise iterative refinement to compute the
solution to the system of linear equations with a
banded coefficient matrix A and multiple right-hand
sides
Syntax
lapack_int LAPACKE_sgbsvxx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, float* ab, lapack_int ldab, float* afb,
lapack_int ldafb, lapack_int* ipiv, char* equed, float* r, float* c, float* b,
lapack_int ldb, float* x, lapack_int ldx, float* rcond, float* rpvgrw, float* berr,
lapack_int n_err_bnds, float* err_bnds_norm, float* err_bnds_comp, lapack_int nparams,
const float* params );
lapack_int LAPACKE_dgbsvxx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, double* ab, lapack_int ldab, double* afb,
lapack_int ldafb, lapack_int* ipiv, char* equed, double* r, double* c, double* b,
lapack_int ldb, double* x, lapack_int ldx, double* rcond, double* rpvgrw, double* berr,
lapack_int n_err_bnds, double* err_bnds_norm, double* err_bnds_comp, lapack_int
nparams, const double* params );
lapack_int LAPACKE_cgbsvxx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, lapack_complex_float* ab, lapack_int
ldab, lapack_complex_float* afb, lapack_int ldafb, lapack_int* ipiv, char* equed,
701
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine uses the LU factorization to compute the solution to a real or complex system of linear equations
A*X = B, AT*X = B, or AH*X = B, where A is an n-by-n banded matrix, the columns of the matrix B are
individual right-hand sides, and the columns of X are the corresponding solutions.
Both normwise and maximum componentwise error bounds are also provided on request. The routine returns
a solution with a small guaranteed error (O(eps), where eps is the working machine precision) unless the
matrix is very ill-conditioned, in which case a warning is returned. Relevant condition numbers are also
calculated and returned.
The routine accepts user-provided factorizations and equilibration factors; see definitions of the fact and
equed options. Solving with refinement and using a factorization from a previous call of the routine also
produces a solution with O(eps) errors or warnings but that may not be true for general user-provided
factorizations and equilibration factors if they differ from what the routine would itself produce.
The routine ?gbsvxx performs the following steps:
1. If fact = 'E', scaling factors r and c are computed to equilibrate the system:
702
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
If fact = 'F' and equed is not 'N', then AB must have been
equilibrated by the scaling factors in r and/or c.
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.
703
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ipiv Array, size at least max(1, n). The array ipiv is an input argument if
fact = 'F'. It contains the pivot indices from the factorization A =
P*L*U as computed by ?gbtrf; row i of the matrix was interchanged
with row ipiv[i-1].
r, c Arrays: r (size n), c (size n). The array r contains the row scale factors
for A, and the array c contains the column scale factors for A. These
arrays are input arguments if fact = 'F' only; otherwise they are
output arguments.
If equed = 'R' or 'B', A is multiplied on the left by diag(r); if equed
= 'N'or 'C', r is not accessed.
If fact = 'F' and equed = 'R' or 'B', each element of r must be
positive.
If equed = 'C' or 'B', A is multiplied on the right by diag(c); if
equed = 'N' or 'R', c is not accessed.
If fact = 'F' and equed = 'C' or 'B', each element of c must be
positive.
Each element of r or c should be a power of the radix to ensure a
reliable solution and error estimates. Scaling by powers of the radix
does not cause rounding errors unless the result underflows or
overflows. Rounding errors during scaling lead to refining with a
matrix that is not equivalent to the input matrix, producing error
estimates that may not be reliable.
ldb The leading dimension of the array b; ldb≥ max(1, n) for column
major layout and ldb≥nrhs for row major layout.
ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.
n_err_bnds Number of error bounds to return for each right hand side and each
type (normwise or componentwise). See err_bnds_norm and
err_bnds_comp descriptions in Output Arguments section below.
704
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
params Array, size max(1, nparams). Specifies algorithm parameters. If an
entry is less than 0.0, that entry is filled with the default value used
for that parameter. Only positions up to nparams are accessed;
defaults are used for higher-numbered parameters. If defaults are
acceptable, you can pass nparams = 0, which prevents the source
code from accessing the params argument.
Default 10.0
Output Parameters
x Array, size max(1, ldx*nrhs) for column major layout and max(1, ldx*n)
for row major layout.
If info = 0, the array x contains the solution n-by-nrhs matrix X to the
original system of equations. Note that A and B are modified on exit if
equed≠'N', and the solution to the equilibrated system is:
inv(diag(c))*X, if trans = 'N' and equed = 'C' or 'B'; or
inv(diag(r))*X, if trans = 'T' or 'C' and equed = 'R' or 'B'.
705
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
afb If fact = 'N' or 'E', then afb is an output argument and on exit returns
the factors L and U from the factorization A = PLU of the original matrix A
(if fact = 'N') or of the equilibrated matrix A (if fact = 'E').
berr Array, size at least max(1, nrhs). Contains the componentwise relative
backward error for each solution vector xj, that is, the smallest relative
change in any element of A or B that makes xj an exact solution.
706
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
err=2 "Guaranteed" error bound. The estimated
forward error, almost certainly within a factor of
10 of the true error so long as the next entry is
greater than the threshold sqrt(n)*slamch(ε)
for single precision flavors and
sqrt(n)*dlamch(ε) for double precision
flavors. This error bound should only be trusted
if the previous boolean is true.
The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in err_bnds_norm[(err-1)*nrhs + i - 1].
707
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in err_bnds_comp[(err-1)*nrhs + i - 1].
ipiv If fact = 'N' or 'E', then ipiv is an output argument and on exit
contains the pivot indices from the factorization A = P*L*U of the original
matrix A (if fact = 'N') or of the equilibrated matrix A (if fact = 'E').
params If an entry is less than 0.0, that entry is filled with the default value used
for that parameter, otherwise the entry is not modified.
If info = n+j: The solution corresponding to the j-th right-hand side is not
guaranteed. The solutions corresponding to other right-hand sides k with k
> j may not be guaranteed as well, but only the first such right-hand side is
reported. If a small componentwise error is not requested params[2] =
0.0, then the j-th right-hand side is the first with a normwise error bound
that is not guaranteed (the smallest j such that err_bnds_norm[j - 1] =
0.0 or err_bnds_comp[j - 1] = 0.0. See the definition of
err_bnds_norm and err_bnds_comp for err = 1. To get information about
all of the right-hand sides, check err_bnds_norm or err_bnds_comp.
708
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Return Values
This function returns a value info.
If info = 0, the execution is successful. The solution to every right-hand side is guaranteed.
If 0 < info≤n: Uinfo,info is exactly zero. The factorization has been completed, but the factor U is exactly
singular, so the solution and error bounds could not be computed; rcond = 0 is returned.
If info = n+j: The solution corresponding to the j-th right-hand side is not guaranteed. The solutions
corresponding to other right-hand sides k with k > j may not be guaranteed as well, but only the first such
right-hand side is reported. If a small componentwise error is not requested params[2] = 0.0, then the j-th
right-hand side is the first with a normwise error bound that is not guaranteed (the smallest j such that for
column major layout err_bnds_norm[j - 1] = 0.0 or err_bnds_comp[j - 1] = 0.0; or for row major
layout err_bnds_norm[(j - 1)*n_err_bnds] = 0.0 or err_bnds_comp[(j - 1)*n_err_bnds] = 0.0).
See the definition of err_bnds_norm and err_bnds_comp for err = 1. To get information about all of the
right-hand sides, check err_bnds_norm or err_bnds_comp.
See Also
Matrix Storage Schemes
?gtsv
Computes the solution to the system of linear
equations with a tridiagonal coefficient matrix A and
multiple right-hand sides.
Syntax
lapack_int LAPACKE_sgtsv (int matrix_layout , lapack_int n , lapack_int nrhs , float *
dl , float * d , float * du , float * b , lapack_int ldb );
lapack_int LAPACKE_dgtsv (int matrix_layout , lapack_int n , lapack_int nrhs , double *
dl , double * d , double * du , double * b , lapack_int ldb );
lapack_int LAPACKE_cgtsv (int matrix_layout , lapack_int n , lapack_int nrhs ,
lapack_complex_float * dl , lapack_complex_float * d , lapack_complex_float * du ,
lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zgtsv (int matrix_layout , lapack_int n , lapack_int nrhs ,
lapack_complex_double * dl , lapack_complex_double * d , lapack_complex_double * du ,
lapack_complex_double * b , lapack_int ldb );
Include Files
• mkl.h
Description
The routine solves for X the system of linear equations A*X = B, where A is an n-by-n tridiagonal matrix, the
columns of matrix B are individual right-hand sides, and the columns of X are the corresponding solutions.
The routine uses Gaussian elimination with partial pivoting.
Note that the equation AT*X = B may be solved by interchanging the order of the arguments du and dl.
709
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
b The array b of size max(1, ldb*nrhs) for column major layout and
max(1, ldb*n) for row major layout contains the matrix B whose
columns are the right-hand sides for the systems of equations.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
Output Parameters
Return Values
This function returns a value info.
If info = i, Ui, i is exactly zero, and the solution has not been computed. The factorization has not been
completed unless i = n.
See Also
Matrix Storage Schemes
?gtsvx
Computes the solution to the real or complex system
of linear equations with a tridiagonal coefficient matrix
A and multiple right-hand sides, and provides error
bounds on the solution.
710
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_sgtsvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int nrhs, const float* dl, const float* d, const float* du, float* dlf, float*
df, float* duf, float* du2, lapack_int* ipiv, const float* b, lapack_int ldb, float* x,
lapack_int ldx, float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_dgtsvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int nrhs, const double* dl, const double* d, const double* du, double* dlf,
double* df, double* duf, double* du2, lapack_int* ipiv, const double* b, lapack_int ldb,
double* x, lapack_int ldx, double* rcond, double* ferr, double* berr );
lapack_int LAPACKE_cgtsvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int nrhs, const lapack_complex_float* dl, const lapack_complex_float* d, const
lapack_complex_float* du, lapack_complex_float* dlf, lapack_complex_float* df,
lapack_complex_float* duf, lapack_complex_float* du2, lapack_int* ipiv, const
lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x, lapack_int ldx,
float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_zgtsvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int nrhs, const lapack_complex_double* dl, const lapack_complex_double* d, const
lapack_complex_double* du, lapack_complex_double* dlf, lapack_complex_double* df,
lapack_complex_double* duf, lapack_complex_double* du2, lapack_int* ipiv, const
lapack_complex_double* b, lapack_int ldb, lapack_complex_double* x, lapack_int ldx,
double* rcond, double* ferr, double* berr );
Include Files
• mkl.h
Description
The routine uses the LU factorization to compute the solution to a real or complex system of linear equations
A*X = B, AT*X = B, or AH*X = B, where A is a tridiagonal matrix of order n, the columns of matrix B are
individual right-hand sides, and the columns of X are the corresponding solutions.
Error bounds on the solution and a condition estimate are also provided.
The routine ?gtsvx performs the following steps:
1. If fact = 'N', the LU decomposition is used to factor the matrix A as A = L*U, where L is a product
of permutation and unit lower bidiagonal matrices and U is an upper triangular matrix with nonzeroes in
only the main diagonal and first two superdiagonals.
2. If some Ui,i= 0, so that U is exactly singular, then the routine returns with info = i. Otherwise, the
factored form of A is used to estimate the condition number of the matrix A. If the reciprocal of the
condition number is less than machine precision, info = n + 1 is returned as a warning, but the
routine still goes on to solve for X and compute error bounds as described below.
3. The system of equations is solved for X using the factored form of A.
4. Iterative refinement is applied to improve the computed solution matrix and calculate error bounds and
backward error estimates for it.
Input Parameters
711
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Specifies whether or not the factored form of the matrix A has been
supplied on entry.
If fact = 'F': on entry, dlf, df, duf, du2, and ipiv contain the
factored form of A; arrays dl, d, du, dlf, df, duf, du2, and ipiv will not
be modified.
If fact = 'N', the matrix A will be copied to dlf, df, and duf and
factored.
nrhs The number of right hand sides, the number of columns of the
matrices B and X; nrhs≥ 0.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
ldx The leading dimension of x; ldx≥ max(1, n) for column major layout
and ldx≥nrhs for row major layout.
ipiv Array, size at least max(1, n). If fact = 'F', then ipiv is an input
argument and on entry contains the pivot indices, as returned
by ?gttrf.
712
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
x Array, size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout.
If info = 0 or info = n+1, the array x contains the solution matrix
X.
dlf If fact = 'N', then dlf is an output argument and on exit contains
the (n-1) multipliers that define the matrix L from the LU
factorization of A.
duf If fact = 'N', then duf is an output argument and on exit contains
the (n-1) elements of the first superdiagonal of U.
du2 If fact = 'N', then du2 is an output argument and on exit contains
the (n-2) elements of the second superdiagonal of U.
ferr Array, size at least max(1, nrhs). Contains the estimated forward
error bound for each solution vector xj (the j-th column of the solution
matrix X). If xtrue is the true solution corresponding to xj, ferr[j-1]
is an estimated upper bound for the magnitude of the largest element
in xj - xtrue divided by the magnitude of the largest element in xj. The
estimate is as reliable as the estimate for rcond, and is almost always
a slight overestimate of the true error.
Return Values
This function returns a value info.
If info = i, and i≤n, then Ui, i is exactly zero. The factorization has not been completed unless i = n, but
the factor U is exactly singular, so the solution and error bounds could not be computed; rcond = 0 is
returned. If info = i, and i = n + 1, then U is nonsingular, but rcond is less than machine precision,
713
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
meaning that the matrix is singular to working precision. Nevertheless, the solution and error bounds are
computed because there are a number of situations where the computed solution can be more accurate than
the value of rcond would suggest.
See Also
Matrix Storage Schemes
?dtsvb
Computes the solution to the system of linear
equations with a diagonally dominant tridiagonal
coefficient matrix A and multiple right-hand sides.
Syntax
void sdtsvb (const MKL_INT * n, const MKL_INT * nrhs, float * dl, float * d, const
float * du, float * b, const MKL_INT * ldb, MKL_INT * info );
void ddtsvb (const MKL_INT * n, const MKL_INT * nrhs, double * dl, double * d, const
double * du, double * b, const MKL_INT * ldb, MKL_INT * info );
void cdtsvb (const MKL_INT * n, const MKL_INT * nrhs, MKL_Complex8 * dl, MKL_Complex8 *
d, const MKL_Complex8 * du, MKL_Complex8 * b, const MKL_INT * ldb, MKL_INT * info );
void zdtsvb (const MKL_INT * n, const MKL_INT * nrhs, MKL_Complex16 * dl, MKL_Complex16
* d, const MKL_Complex16 * du, MKL_Complex16 * b, const MKL_INT * ldb, MKL_INT *
info );
Include Files
• mkl.h
Description
The ?dtsvb routine solves a system of linear equations A*X = B for X, where A is an n-by-n diagonally
dominant tridiagonal matrix, the columns of matrix B are individual right-hand sides, and the columns of X
are the corresponding solutions. The routine uses the BABE (Burning At Both Ends) algorithm.
Note that the equation AT*X = B may be solved by interchanging the order of the arguments du and dl.
Input Parameters
dl, d, du, b Arrays: dl (size n - 1), d (size n), du (size n - 1), b(max(ldb*nrhs)
for column major layout and max(ldb*n) for row major layout).
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.
714
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
If info = i, uii is exactly zero, and the solution has not been
computed. The factorization has not been completed unless i = n.
Application Notes
A diagonally dominant tridiagonal system is defined such that |di| > |dli-1| + |dui| for any i:
The underlying BABE algorithm is designed for diagonally dominant systems. Such systems have no
numerical stability issue unlike the canonical systems that use elimination with partial pivoting (see ?gtsv).
The diagonally dominant systems are much faster than the canonical systems.
NOTE
• The current implementation of BABE has a potential accuracy issue on very small or large data
close to the underflow or overflow threshold respectively. Scale the matrix before applying the
solver in the case of such input data.
• Applying the ?dtsvb factorization to non-diagonally dominant systems may lead to an accuracy
loss, or false singularity detected due to no pivoting.
?posv
Computes the solution to the system of linear
equations with a symmetric or Hermitian positive-
definite coefficient matrix A and multiple right-hand
sides.
Syntax
lapack_int LAPACKE_sposv (int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
float * a, lapack_int lda, float * b, lapack_int ldb);
lapack_int LAPACKE_dposv (int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
double * a, lapack_int lda, double * b, lapack_int ldb);
lapack_int LAPACKE_cposv (int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
lapack_complex_float * a, lapack_int lda, lapack_complex_float * b, lapack_int ldb);
lapack_int LAPACKE_zposv (int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
lapack_complex_double * a, lapack_int lda, lapack_complex_double * b, lapack_int ldb);
lapack_int LAPACKE_dsposv (int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
double * a, lapack_int lda, double * b, lapack_int ldb, double * x, lapack_int ldx,
lapack_int * iter);
715
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine solves for X the real or complex system of linear equations A*X = B, where A is an n-by-n
symmetric/Hermitian positive-definite matrix, the columns of matrix B are individual right-hand sides, and
the columns of X are the corresponding solutions.
The Cholesky decomposition is used to factor A as
A = UT*U (real flavors) and A = UH*U (complex flavors), if uplo = 'U'
or A = L*LT (real flavors) and A = L*LH (complex flavors), if uplo = 'L',
where U is an upper triangular matrix and L is a lower triangular matrix. The factored form of A is then used
to solve the system of equations A*X = B.
The dsposv and zcposv are mixed precision iterative refinement subroutines for exploiting fast single
precision hardware. They first attempt to factorize the matrix in single precision (dsposv) or single complex
precision (zcposv) and use this factorization within an iterative refinement procedure to produce a solution
with double precision (dsposv) / double complex precision (zcposv) normwise backward error quality (see
below). If the approach fails, the method switches to a double precision or double complex precision
factorization respectively and computes the solution.
The iterative refinement is not going to be a winning strategy if the ratio single precision/complex
performance over double precision/double complex performance is too small. A reasonable strategy should
take the number of right-hand sides and the size of the matrix into account. This might be done with a call to
ilaenv in the future. At present, iterative refinement is implemented.
The iterative refinement process is stopped if
iter > itermax
or for all the right-hand sides:
rnmr < sqrt(n)*xnrm*anrm*eps*bwdmax,
where
• iter is the number of the current iteration in the iterative refinement process
• rnmr is the infinity-norm of the residual
• xnrm is the infinity-norm of the solution
• anrm is the infinity-operator-norm of the matrix A
• eps is the machine epsilon returned by dlamch (‘Epsilon’).
The values itermax and bwdmax are fixed to 30 and 1.0d+00 respectively.
Input Parameters
716
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If uplo = 'L', the lower triangle of A is stored.
Note that in the case of zcposv the imaginary parts of the diagonal
elements need not be set and are assumed to be zero.
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
ldx The leading dimension of the array x; ldx≥ max(1, n) for column
major layout and ldx≥nrhs for row major layout.
Output Parameters
x Array, size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout. If info = 0, contains the n-by-nrhs
solution matrix X.
717
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Return Values
This function returns a value info.
If info = i, the leading minor of order i (and therefore the matrix A itself) is not positive definite, so the
factorization could not be completed, and the solution has not been computed.
See Also
Matrix Storage Schemes
?posvx
Uses the Cholesky factorization to compute the
solution to the system of linear equations with a
symmetric or Hermitian positive-definite coefficient
matrix A, and provides error bounds on the solution.
Syntax
lapack_int LAPACKE_sposvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, float* a, lapack_int lda, float* af, lapack_int ldaf, char* equed,
float* s, float* b, lapack_int ldb, float* x, lapack_int ldx, float* rcond, float* ferr,
float* berr );
lapack_int LAPACKE_dposvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, double* a, lapack_int lda, double* af, lapack_int ldaf, char* equed,
double* s, double* b, lapack_int ldb, double* x, lapack_int ldx, double* rcond, double*
ferr, double* berr );
lapack_int LAPACKE_cposvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, lapack_complex_float* a, lapack_int lda, lapack_complex_float* af,
lapack_int ldaf, char* equed, float* s, lapack_complex_float* b, lapack_int ldb,
lapack_complex_float* x, lapack_int ldx, float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_zposvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, lapack_complex_double* a, lapack_int lda, lapack_complex_double* af,
lapack_int ldaf, char* equed, double* s, lapack_complex_double* b, lapack_int ldb,
lapack_complex_double* x, lapack_int ldx, double* rcond, double* ferr, double* berr );
Include Files
• mkl.h
Description
The routine uses the Cholesky factorization A=UT*U (real flavors) / A=UH*U (complex flavors) or A=L*LT (real
flavors) / A=L*LH (complex flavors) to compute the solution to a real or complex system of linear equations
A*X = B, where A is a n-by-n real symmetric/Hermitian positive definite matrix, the columns of matrix B are
individual right-hand sides, and the columns of X are the corresponding solutions.
Error bounds on the solution and a condition estimate are also provided.
The routine ?posvx performs the following steps:
1. If fact = 'E', real scaling factors s are computed to equilibrate the system:
diag(s)*A*diag(s)*inv(diag(s))*X = diag(s)*B.
718
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Whether or not the system will be equilibrated depends on the scaling of the matrix A, but if
equilibration is used, A is overwritten by diag(s)*A*diag(s) and B by diag(s)*B.
2. If fact = 'N' or 'E', the Cholesky decomposition is used to factor the matrix A (after equilibration if
fact = 'E') as
A = UT*U (real), A = UH*U (complex), if uplo = 'U',
or A = L*LT (real), A = L*LH (complex), if uplo = 'L',
Input Parameters
719
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
s Array, size (n). The array s contains the scale factors for A. This array
is an input argument if fact = 'F' only; otherwise it is an output
argument.
If equed = 'N', s is not accessed.
ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.
Output Parameters
x Array, size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout.
If info = 0 or info = n+1, the array x contains the solution matrix
X to the original system of equations. Note that if equed = 'Y', A
and B are modified on exit, and the solution to the equilibrated system
is inv(diag(s))*X.
720
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
af If fact = 'N' or 'E', then af is an output argument and on exit
returns the triangular factor U or L from the Cholesky factorization
A=UT*U or A=L*LT (real routines), A=UH*U or A=L*LH (complex
routines) of the original matrix A (if fact = 'N'), or of the
equilibrated matrix A (if fact = 'E'). See the description of a for the
form of the equilibrated matrix.
ferr Array, size at least max(1, nrhs). Contains the estimated forward
error bound for each solution vector xj (the j-th column of the solution
matrix X). If xtrue is the true solution corresponding to xj, ferr[j-1]
is an estimated upper bound for the magnitude of the largest element
in (xj) - xtrue) divided by the magnitude of the largest element in xj.
The estimate is as reliable as the estimate for rcond, and is almost
always a slight overestimate of the true error.
Return Values
This function returns a value info.
If info = i, and i≤n, the leading minor of order i (and therefore the matrix A itself) is not positive-definite,
so the factorization could not be completed, and the solution and error bounds could not be computed; rcond
=0 is returned.
If info = i, and i = n + 1, then U is nonsingular, but rcond is less than machine precision, meaning that the
matrix is singular to working precision. Nevertheless, the solution and error bounds are computed because
there are a number of situations where the computed solution can be more accurate than the value of rcond
would suggest.
See Also
Matrix Storage Schemes
721
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?posvxx
Uses extra precise iterative refinement to compute the
solution to the system of linear equations with a
symmetric or Hermitian positive-definite coefficient
matrix A applying the Cholesky factorization.
Syntax
lapack_int LAPACKE_sposvxx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, float* a, lapack_int lda, float* af, lapack_int ldaf, char* equed,
float* s, float* b, lapack_int ldb, float* x, lapack_int ldx, float* rcond, float*
rpvgrw, float* berr, lapack_int n_err_bnds, float* err_bnds_norm, float* err_bnds_comp,
lapack_int nparams, const float* params );
lapack_int LAPACKE_dposvxx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, double* a, lapack_int lda, double* af, lapack_int ldaf, char* equed,
double* s, double* b, lapack_int ldb, double* x, lapack_int ldx, double* rcond, double*
rpvgrw, double* berr, lapack_int n_err_bnds, double* err_bnds_norm, double*
err_bnds_comp, lapack_int nparams, const double* params );
lapack_int LAPACKE_cposvxx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, lapack_complex_float* a, lapack_int lda, lapack_complex_float* af,
lapack_int ldaf, char* equed, float* s, lapack_complex_float* b, lapack_int ldb,
lapack_complex_float* x, lapack_int ldx, float* rcond, float* rpvgrw, float* berr,
lapack_int n_err_bnds, float* err_bnds_norm, float* err_bnds_comp, lapack_int nparams,
const float* params );
lapack_int LAPACKE_zposvxx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, lapack_complex_double* a, lapack_int lda, lapack_complex_double* af,
lapack_int ldaf, char* equed, double* s, lapack_complex_double* b, lapack_int ldb,
lapack_complex_double* x, lapack_int ldx, double* rcond, double* rpvgrw, double* berr,
lapack_int n_err_bnds, double* err_bnds_norm, double* err_bnds_comp, lapack_int
nparams, const double* params );
Include Files
• mkl.h
Description
The routine uses the Cholesky factorization A=UT*U (real flavors) / A=UH*U (complex flavors) or A=L*LT (real
flavors) / A=L*LH (complex flavors) to compute the solution to a real or complex system of linear equations
A*X = B, where A is an n-by-n real symmetric/Hermitian positive definite matrix, the columns of matrix B
are individual right-hand sides, and the columns of X are the corresponding solutions.
Both normwise and maximum componentwise error bounds are also provided on request. The routine returns
a solution with a small guaranteed error (O(eps), where eps is the working machine precision) unless the
matrix is very ill-conditioned, in which case a warning is returned. Relevant condition numbers are also
calculated and returned.
The routine accepts user-provided factorizations and equilibration factors; see definitions of the fact and
equed options. Solving with refinement and using a factorization from a previous call of the routine also
produces a solution with O(eps) errors or warnings but that may not be true for general user-provided
factorizations and equilibration factors if they differ from what the routine would itself produce.
The routine ?posvxx performs the following steps:
722
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
diag(s)*A*diag(s) *inv(diag(s))*X = diag(s)*B
Whether or not the system will be equilibrated depends on the scaling of the matrix A, but if
equilibration is used, A is overwritten by diag(s)*A*diag(s) and B by diag(s)*B.
2. If fact = 'N' or 'E', the Cholesky decomposition is used to factor the matrix A (after equilibration if
fact = 'E') as
A = UT*U (real), A = UH*U (complex), if uplo = 'U',
or A = L*LT (real), A = L*LH (complex), if uplo = 'L',
Input Parameters
723
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.
s Array, size (n). The array s contains the scale factors for A. This array
is an input argument if fact = 'F' only; otherwise it is an output
argument.
If equed = 'N', s is not accessed.
ldb The leading dimension of the array b; ldb≥ max(1, n) for column
major layout and ldb≥nrhs for row major layout.
ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.
n_err_bnds Number of error bounds to return for each right hand side and each
type (normwise or componentwise). See err_bnds_norm and
err_bnds_comp descriptions in the Output Arguments section below.
724
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
defaults are used for higher-numbered parameters. If defaults are
acceptable, you can pass nparams = 0, which prevents the source
code from accessing the params argument.
Default 10.0
Output Parameters
x Array, size max(1, ldx*nrhs) for column major layout and max(1, ldx*n)
for row major layout.
If info = 0, the array x contains the solution n-by-nrhs matrix X to the
original system of equations. Note that A and B are modified on exit if
equed≠'N', and the solution to the equilibrated system is:
inv(diag(s))*X.
a Array a is not modified on exit if fact = 'F' or 'N', or if fact = 'E' and
equed = 'N'.
If fact = 'E' and equed = 'Y', A is overwritten by diag(s)*A*diag(s).
725
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
berr Array, size at least max(1, nrhs). Contains the componentwise relative
backward error for each solution vector xj, that is, the smallest relative
change in any element of A or B that makes xj an exact solution.
726
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
sqrt(n)*slamch(ε) for single precision flavors
and sqrt(n)*dlamch(ε) for double precision
flavors to determine if the error estimate is
"guaranteed". These reciprocal condition
numbers for some appropriately scaled matrix Z
are:
The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in err_bnds_norm[(err-1)*nrhs + i - 1].
727
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in err_bnds_comp[(err-1)*nrhs + i - 1].
params If an entry is less than 0.0, that entry is filled with the default value used
for that parameter, otherwise the entry is not modified.
Return Values
This function returns a value info.
If info = 0, the execution is successful. The solution to every right-hand side is guaranteed.
If 0 < info≤n: Uinfo,info is exactly zero. The factorization has been completed, but the factor U is exactly
singular, so the solution and error bounds could not be computed; rcond = 0 is returned.
If info = n+j: The solution corresponding to the j-th right-hand side is not guaranteed. The solutions
corresponding to other right-hand sides k with k > j may not be guaranteed as well, but only the first such
right-hand side is reported. If a small componentwise error is not requested params[2] = 0.0, then the j-th
right-hand side is the first with a normwise error bound that is not guaranteed (the smallest j such that for
column major layout err_bnds_norm[j - 1] = 0.0 or err_bnds_comp[j - 1] = 0.0; or for row major
layout err_bnds_norm[(j - 1)*n_err_bnds] = 0.0 or err_bnds_comp[(j - 1)*n_err_bnds] = 0.0).
See the definition of err_bnds_norm and err_bnds_comp for err = 1. To get information about all of the
right-hand sides, check err_bnds_norm or err_bnds_comp.
See Also
Matrix Storage Schemes
?ppsv
Computes the solution to the system of linear
equations with a symmetric (Hermitian) positive
definite packed coefficient matrix A and multiple right-
hand sides.
Syntax
lapack_int LAPACKE_sppsv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , float * ap , float * b , lapack_int ldb );
lapack_int LAPACKE_dppsv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , double * ap , double * b , lapack_int ldb );
728
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_cppsv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , lapack_complex_float * ap , lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zppsv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , lapack_complex_double * ap , lapack_complex_double * b , lapack_int ldb );
Include Files
• mkl.h
Description
The routine solves for X the real or complex system of linear equations A*X = B, where A is an n-by-n real
symmetric/Hermitian positive-definite matrix stored in packed format, the columns of matrix B are individual
right-hand sides, and the columns of X are the corresponding solutions.
The Cholesky decomposition is used to factor A as
A = UT*U (real flavors) and A = UH*U (complex flavors), if uplo = 'U'
or A = L*LT (real flavors) and A = L*LH (complex flavors), if uplo = 'L',
where U is an upper triangular matrix and L is a lower triangular matrix. The factored form of A is then used
to solve the system of equations A*X = B.
Input Parameters
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
Output Parameters
729
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Return Values
This function returns a value info.
If info = i, the leading minor of order i (and therefore the matrix A itself) is not positive-definite, so the
factorization could not be completed, and the solution has not been computed.
See Also
Matrix Storage Schemes
?ppsvx
Uses the Cholesky factorization to compute the
solution to the system of linear equations with a
symmetric (Hermitian) positive definite packed
coefficient matrix A, and provides error bounds on the
solution.
Syntax
lapack_int LAPACKE_sppsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, float* ap, float* afp, char* equed, float* s, float* b, lapack_int ldb,
float* x, lapack_int ldx, float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_dppsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, double* ap, double* afp, char* equed, double* s, double* b, lapack_int
ldb, double* x, lapack_int ldx, double* rcond, double* ferr, double* berr );
lapack_int LAPACKE_cppsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, lapack_complex_float* ap, lapack_complex_float* afp, char* equed,
float* s, lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x, lapack_int
ldx, float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_zppsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, lapack_complex_double* ap, lapack_complex_double* afp, char* equed,
double* s, lapack_complex_double* b, lapack_int ldb, lapack_complex_double* x,
lapack_int ldx, double* rcond, double* ferr, double* berr );
Include Files
• mkl.h
Description
The routine uses the Cholesky factorization A=UT*U (real flavors) / A=UH*U (complex flavors) or A=L*LT (real
flavors) / A=L*LH (complex flavors) to compute the solution to a real or complex system of linear equations
A*X = B, where A is a n-by-n symmetric or Hermitian positive-definite matrix stored in packed format, the
columns of matrix B are individual right-hand sides, and the columns of X are the corresponding solutions.
Error bounds on the solution and a condition estimate are also provided.
The routine ?ppsvx performs the following steps:
730
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
1. If fact = 'E', real scaling factors s are computed to equilibrate the system:
diag(s)*A*diag(s)*inv(diag(s))*X = diag(s)*B.
Whether or not the system will be equilibrated depends on the scaling of the matrix A, but if
equilibration is used, A is overwritten by diag(s)*A*diag(s) and B by diag(s)*B.
2. If fact = 'N' or 'E', the Cholesky decomposition is used to factor the matrix A (after equilibration if
fact = 'E') as
A = UT*U (real), A = UH*U (complex), if uplo = 'U',
or A = L*LT (real), A = L*LH (complex), if uplo = 'L',
Input Parameters
ap, afp, b Arrays: (size max(1,n*(n+1)/2), afp (size max(1,n*(n+1)/2), bof size
max(1, ldb*nrhs) for column major layout and max(1, ldb*n) for
row major layout.
731
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The array afp is an input argument if fact = 'F' and contains the
triangular factor U or L from the Cholesky factorization of A in the
same storage format as A. If equed is not 'N', then afp is the
factored form of the equilibrated matrix A.
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
s Array, size (n). The array s contains the scale factors for A. This array
is an input argument if fact = 'F' only; otherwise it is an output
argument.
If equed = 'N', s is not accessed.
ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.
Output Parameters
x Array, size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout.
If info = 0 or info = n+1, the array x contains the solution matrix
X to the original system of equations. Note that if equed = 'Y', A
and B are modified on exit, and the solution to the equilibrated system
is inv(diag(s))*X.
afp If fact = 'N'or 'E', then afp is an output argument and on exit
returns the triangular factor U or L from the Cholesky factorization
A=UT*U or A=L*LT (real routines), A=UH*U or A=L*LH (complex
732
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
routines) of the original matrix A (if fact = 'N'), or of the
equilibrated matrix A (if fact = 'E'). See the description of ap for
the form of the equilibrated matrix.
ferr Array, size at least max(1, nrhs). Contains the estimated forward
error bound for each solution vector xj(the j-th column of the solution
matrix X). If xtrue is the true solution corresponding to xj,
ferr[j-1] is an estimated upper bound for the magnitude of the
largest element in (xj - xtrue) divided by the magnitude of the
largest element in xj. The estimate is as reliable as the estimate for
rcond, and is almost always a slight overestimate of the true error.
Return Values
This function returns a value info.
If info = i, and i≤n, the leading minor of order i (and therefore the matrix A itself) is not positive-definite,
so the factorization could not be completed, and the solution and error bounds could not be computed; rcond
= 0 is returned.
If info = i, and i = n + 1, then U is nonsingular, but rcond is less than machine precision, meaning that the
matrix is singular to working precision. Nevertheless, the solution and error bounds are computed because
there are a number of situations where the computed solution can be more accurate than the value of rcond
would suggest.
See Also
Matrix Storage Schemes
?pbsv
Computes the solution to the system of linear
equations with a symmetric or Hermitian positive-
definite band coefficient matrix A and multiple right-
hand sides.
733
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
lapack_int LAPACKE_spbsv (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , lapack_int nrhs , float * ab , lapack_int ldab , float * b , lapack_int ldb );
lapack_int LAPACKE_dpbsv (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , lapack_int nrhs , double * ab , lapack_int ldab , double * b , lapack_int ldb );
lapack_int LAPACKE_cpbsv (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , lapack_int nrhs , lapack_complex_float * ab , lapack_int ldab ,
lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zpbsv (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , lapack_int nrhs , lapack_complex_double * ab , lapack_int ldab ,
lapack_complex_double * b , lapack_int ldb );
Include Files
• mkl.h
Description
The routine solves for X the real or complex system of linear equations A*X = B, where A is an n-by-n
symmetric/Hermitian positive definite band matrix, the columns of matrix B are individual right-hand sides,
and the columns of X are the corresponding solutions.
The Cholesky decomposition is used to factor A as
A = UT*U (real flavors) and A = UH*U (complex flavors), if uplo = 'U'
or A = L*LT (real flavors) and A = L*LH (complex flavors), if uplo = 'L',
where U is an upper triangular band matrix and L is a lower triangular band matrix, with the same number of
superdiagonals or subdiagonals as A. The factored form of A is then used to solve the system of equations
A*X = B.
Input Parameters
734
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ab, b Arrays: ab(size max(1, ldab*n)), bof size max(1, ldb*nrhs) for
column major layout and max(1, ldb*n) for row major layout. The
array ab contains the upper or the lower triangular part of the matrix
A (as specified by uplo) in band storage (see Matrix Storage
Schemes).
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
Output Parameters
Return Values
This function returns a value info.
If info = i, the leading minor of order i (and therefore the matrix A itself) is not positive-definite, so the
factorization could not be completed, and the solution has not been computed.
See Also
Matrix Storage Schemes
?pbsvx
Uses the Cholesky factorization to compute the
solution to the system of linear equations with a
symmetric (Hermitian) positive-definite band
coefficient matrix A, and provides error bounds on the
solution.
Syntax
lapack_int LAPACKE_spbsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int kd, lapack_int nrhs, float* ab, lapack_int ldab, float* afb, lapack_int
ldafb, char* equed, float* s, float* b, lapack_int ldb, float* x, lapack_int ldx, float*
rcond, float* ferr, float* berr );
lapack_int LAPACKE_dpbsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int kd, lapack_int nrhs, double* ab, lapack_int ldab, double* afb, lapack_int
ldafb, char* equed, double* s, double* b, lapack_int ldb, double* x, lapack_int ldx,
double* rcond, double* ferr, double* berr );
lapack_int LAPACKE_cpbsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int kd, lapack_int nrhs, lapack_complex_float* ab, lapack_int ldab,
lapack_complex_float* afb, lapack_int ldafb, char* equed, float* s,
lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x, lapack_int ldx,
float* rcond, float* ferr, float* berr );
735
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine uses the Cholesky factorization A=UT*U (real flavors) / A=UH*U (complex flavors) or A=L*LT (real
flavors) / A=L*LH (complex flavors) to compute the solution to a real or complex system of linear equations
A*X = B, where A is a n-by-n symmetric or Hermitian positive definite band matrix, the columns of matrix B
are individual right-hand sides, and the columns of X are the corresponding solutions.
Error bounds on the solution and a condition estimate are also provided.
The routine ?pbsvx performs the following steps:
1. If fact = 'E', real scaling factors s are computed to equilibrate the system:
diag(s)*A*diag(s)*inv(diag(s))*X = diag(s)*B.
Whether or not the system will be equilibrated depends on the scaling of the matrix A, but if
equilibration is used, A is overwritten by diag(s)*A*diag(s) and B by diag(s)*B.
2. If fact = 'N' or 'E', the Cholesky decomposition is used to factor the matrix A (after equilibration if
fact = 'E') as
A = UT*U (real), A = UH*U (complex), if uplo = 'U',
or A = L*LT (real), A = L*LH (complex), if uplo = 'L',
where U is an upper triangular band matrix and L is a lower triangular band matrix.
3. If the leading i-by-i principal minor is not positive definite, then the routine returns with info = i.
Otherwise, the factored form of A is used to estimate the condition number of the matrix A. If the
reciprocal of the condition number is less than machine precision, info = n+1 is returned as a
warning, but the routine still goes on to solve for X and compute error bounds as described below.
4. The system of equations is solved for X using the factored form of A.
5. Iterative refinement is applied to improve the computed solution matrix and calculate error bounds and
backward error estimates for it.
6. If equilibration was used, the matrix X is premultiplied by diag(s) so that it solves the original system
before equilibration.
Input Parameters
736
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ab and afb will not be modified.
If fact = 'N', the matrix A will be copied to afb and factored.
ab, afb, b Arrays: ab(size max(1, ldab*n)), afb(size max(1, ldafb*n)), bof size
max(1, ldb*nrhs) for column major layout and max(1, ldb*n) for
row major layout.
The array ab contains the upper or lower triangle of the matrix A in
band storage (see Matrix Storage Schemes).
If fact = 'F' and equed = 'Y', then ab must contain the
equilibrated matrix diag(s)*A*diag(s).
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
s Array, size (n). The array s contains the scale factors for A. This array
is an input argument if fact = 'F' only; otherwise it is an output
argument.
If equed = 'N', s is not accessed.
737
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.
Output Parameters
x Array, size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout.
If info = 0 or info = n+1, the array x contains the solution matrix X
to the original system of equations. Note that if equed = 'Y', A and
B are modified on exit, and the solution to the equilibrated system is
inv(diag(s))*X.
afb If fact = 'N'or 'E', then afb is an output argument and on exit
returns the triangular factor U or L from the Cholesky factorization
A=UT*U or A=L*LT (real routines), A=UH*U or A=L*LH (complex
routines) of the original matrix A (if fact = 'N'), or of the
equilibrated matrix A (if fact = 'E'). See the description of ab for
the form of the equilibrated matrix.
ferr Array, size at least max(1, nrhs). Contains the estimated forward
error bound for each solution vector xj (the j-th column of the
solution matrix X). If xtrue is the true solution corresponding to xj,
ferr[j-1] is an estimated upper bound for the magnitude of the
largest element in (xj - xtrue) divided by the magnitude of the
largest element in xj. The estimate is as reliable as the estimate for
rcond, and is almost always a slight overestimate of the true error.
Return Values
This function returns a value info.
738
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = 0, the execution is successful.
If info = i, and i≤n, the leading minor of order i (and therefore the matrix A itself) is not positive definite,
so the factorization could not be completed, and the solution and error bounds could not be computed; rcond
=0 is returned. If info = i, and i = n + 1, then U is nonsingular, but rcond is less than machine precision,
meaning that the matrix is singular to working precision. Nevertheless, the solution and error bounds are
computed because there are a number of situations where the computed solution can be more accurate than
the value of rcond would suggest.
See Also
Matrix Storage Schemes
?ptsv
Computes the solution to the system of linear
equations with a symmetric or Hermitian positive
definite tridiagonal coefficient matrix A and multiple
right-hand sides.
Syntax
lapack_int LAPACKE_sptsv( int matrix_layout, lapack_int n, lapack_int nrhs, float* d,
float* e, float* b, lapack_int ldb );
lapack_int LAPACKE_dptsv( int matrix_layout, lapack_int n, lapack_int nrhs, double* d,
double* e, double* b, lapack_int ldb );
lapack_int LAPACKE_cptsv( int matrix_layout, lapack_int n, lapack_int nrhs, float* d,
lapack_complex_float* e, lapack_complex_float* b, lapack_int ldb );
lapack_int LAPACKE_zptsv( int matrix_layout, lapack_int n, lapack_int nrhs, double* d,
lapack_complex_double* e, lapack_complex_double* b, lapack_int ldb );
Include Files
• mkl.h
Description
The routine solves for X the real or complex system of linear equations A*X = B, where A is an n-by-n
symmetric/Hermitian positive-definite tridiagonal matrix, the columns of matrix B are individual right-hand
sides, and the columns of X are the corresponding solutions.
A is factored as A = L*D*LT (real flavors) or A = L*D*LH (complex flavors), and the factored form of A is
then used to solve the system of equations A*X = B.
Input Parameters
739
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
e, b Arrays: e (size n - 1), bof size max(1, ldb*nrhs) for column major
layout and max(1, ldb*n) for row major layout. The array e contains
the (n - 1) subdiagonal elements of A.
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
Output Parameters
Return Values
This function returns a value info.
If info = i, the leading minor of order i (and therefore the matrix A itself) is not positive-definite, and the
solution has not been computed. The factorization has not been completed unless i = n.
See Also
Matrix Storage Schemes
?ptsvx
Uses factorization to compute the solution to the
system of linear equations with a symmetric
(Hermitian) positive definite tridiagonal coefficient
matrix A, and provides error bounds on the solution.
Syntax
lapack_int LAPACKE_sptsvx( int matrix_layout, char fact, lapack_int n, lapack_int nrhs,
const float* d, const float* e, float* df, float* ef, const float* b, lapack_int ldb,
float* x, lapack_int ldx, float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_dptsvx( int matrix_layout, char fact, lapack_int n, lapack_int nrhs,
const double* d, const double* e, double* df, double* ef, const double* b, lapack_int
ldb, double* x, lapack_int ldx, double* rcond, double* ferr, double* berr );
lapack_int LAPACKE_cptsvx( int matrix_layout, char fact, lapack_int n, lapack_int nrhs,
const float* d, const lapack_complex_float* e, float* df, lapack_complex_float* ef,
const lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x, lapack_int ldx,
float* rcond, float* ferr, float* berr );
740
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_zptsvx( int matrix_layout, char fact, lapack_int n, lapack_int nrhs,
const double* d, const lapack_complex_double* e, double* df, lapack_complex_double* ef,
const lapack_complex_double* b, lapack_int ldb, lapack_complex_double* x, lapack_int
ldx, double* rcond, double* ferr, double* berr );
Include Files
• mkl.h
Description
The routine uses the Cholesky factorization A = L*D*LT (real)/A = L*D*LH (complex) to compute the
solution to a real or complex system of linear equations A*X = B, where A is a n-by-n symmetric or
Hermitian positive definite tridiagonal matrix, the columns of matrix B are individual right-hand sides, and
the columns of X are the corresponding solutions.
Error bounds on the solution and a condition estimate are also provided.
The routine ?ptsvx performs the following steps:
1. If fact = 'N', the matrix A is factored as A = L*D*LT (real flavors)/A = L*D*LH (complex flavors),
where L is a unit lower bidiagonal matrix and D is diagonal. The factorization can also be regarded as
having the form A = UT*D*U (real flavors)/A = UH*D*U (complex flavors).
2. If the leading i-by-i principal minor is not positive-definite, then the routine returns with info = i.
Otherwise, the factored form of A is used to estimate the condition number of the matrix A. If the
reciprocal of the condition number is less than machine precision, info = n+1 is returned as a
warning, but the routine still goes on to solve for X and compute error bounds as described below.
3. The system of equations is solved for X using the factored form of A.
4. Iterative refinement is applied to improve the computed solution matrix and calculate error bounds and
backward error estimates for it.
Input Parameters
741
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
e,ef,b Arrays: e (size n -1), ef (size n -1), b, size max(ldb*nrhs) for column
major layout and max(ldb*n) for row major layout. The array e
contains the (n - 1) subdiagonal elements of the tridiagonal matrix
A.
The array ef is an input argument if fact = 'F' and on entry
contains the (n - 1) subdiagonal elements of the unit bidiagonal
factor L from the L*D*LT (real)/ L*D*LH (complex) factorization of A.
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
ldx The leading dimension of x; ldx≥ max(1, n) for column major layout
and ldx≥nrhs for row major layout.
Output Parameters
x Array, size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout.
If info = 0 or info = n+1, the array x contains the solution matrix
X to the system of equations.
df, ef These arrays are output arguments if fact = 'N'. See the
description of df, ef in Input Arguments section.
ferr Array, size at least max(1, nrhs). Contains the estimated forward
error bound for each solution vector xj (the j-th column of the
solution matrix X). If xtrue is the true solution corresponding to xj,
ferrj is an estimated upper bound for the magnitude of the largest
element in (xj - xtrue) divided by the magnitude of the largest
element in xj. The estimate is as reliable as the estimate for rcond,
and is almost always a slight overestimate of the true error.
Return Values
This function returns a value info.
If info = i, and i≤n, the leading minor of order i (and therefore the matrix A itself) is not positive-definite,
so the factorization could not be completed, and the solution and error bounds could not be computed; rcond
=0 is returned.
742
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = i, and i = n + 1, then U is nonsingular, but rcond is less than machine precision, meaning that the
matrix is singular to working precision. Nevertheless, the solution and error bounds are computed because
there are a number of situations where the computed solution can be more accurate than the value of rcond
would suggest.
See Also
Matrix Storage Schemes
?sysv
Computes the solution to the system of linear
equations with a real or complex symmetric coefficient
matrix A and multiple right-hand sides.
Syntax
lapack_int LAPACKE_ssysv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , float * a , lapack_int lda , lapack_int * ipiv , float * b , lapack_int ldb );
lapack_int LAPACKE_dsysv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , double * a , lapack_int lda , lapack_int * ipiv , double * b , lapack_int ldb );
lapack_int LAPACKE_csysv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , lapack_complex_float * a , lapack_int lda , lapack_int * ipiv ,
lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zsysv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , lapack_complex_double * a , lapack_int lda , lapack_int * ipiv ,
lapack_complex_double * b , lapack_int ldb );
Include Files
• mkl.h
Description
The routine solves for X the real or complex system of linear equations A*X = B, where A is an n-by-n
symmetric matrix, the columns of matrix B are individual right-hand sides, and the columns of X are the
corresponding solutions.
The diagonal pivoting method is used to factor A as A = U*D*UT or A = L*D*LT, where U (or L) is a product
of permutation and unit upper (lower) triangular matrices, and D is symmetric and block diagonal with 1-
by-1 and 2-by-2 diagonal blocks.
The factored form of A is then used to solve the system of equations A*X = B.
Input Parameters
743
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
a, b Arrays: a(size max(1, lda*n)), bof size max(1, ldb*nrhs) for column
major layout and max(1, ldb*n) for row major layout.
The array a contains the upper or the lower triangular part of the
symmetric matrix A (see uplo).
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
Output Parameters
ipiv Array, size at least max(1, n). Contains details of the interchanges
and the block structure of D, as determined by ?sytrf.
Return Values
This function returns a value info.
If info = i, dii is 0. The factorization has been completed, but D is exactly singular, so the solution could
not be computed.
See Also
Matrix Storage Schemes
?sysv_aa
Computes the solution to a system of linear equations
A * X = B for symmetric matrices.
lapack_int LAPACKE_ssysv_aa (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, float * A, lapack_int lda, lapack_int * ipiv, float * B, lapack_int ldb);
744
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_dsysv_aa (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, double * A, lapack_int lda, lapack_int * ipiv, double * B, lapack_int ldb);
lapack_int LAPACKE_csysv_aa (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, lapack_complex_float * A, lapack_int lda, lapack_int * ipiv, lapack_complex_float
* B, lapack_int ldb);
lapack_int LAPACKE_zsysv_aa (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, lapack_complex_double * A, lapack_int lda, lapack_int * ipiv,
lapack_complex_double * B, lapack_int ldb);
Description
The ?sysv routine computes the solution to a complex system of linear equations A * X = B, where A is an
n-by-n symmetric matrix and X and B are n-by-nrhs matrices.
Aasen's algorithm is used to factor A as A = U * T * UT, if uplo = 'U', or A = L * T * LT, if uplo = 'L',
where U (or L) is a product of permutation and unit upper (lower) triangular matrices, and T is symmetric tri-
diagonal. The factored form of A is then used to solve the system of equations A * X= B.
Input Parameters
n The number of linear equations; that is, the order of the matrix A. n ≥ 0.
nrhs The number of right-hand sides; that is, the number of columns of the
matrix B. nrhs ≥ 0.
ldb The leading dimension of the array B. ldb ≥ max(1, n) for column-major
layout and ldb ≥ nrhs for row-major layout.
Output Parameters
ipiv Array of size n. On exit, it contains the details of the interchanges; that is,
the row and column k of A were interchanged with the row and column
ipiv(k).
745
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Return Values
This function returns a value info.
= 0: Successful exit.
< 0: If info = -i, the ith argument had an illegal value.
> 0: If info = i, D(i,i) is exactly zero. The factorization has been completed, but the block diagonal matrix D
is exactly singular, so the solution could not be computed.
?sysv_rook
Computes the solution to the system of linear
equations with a real or complex symmetric coefficient
matrix A and multiple right-hand sides.
Syntax
lapack_int LAPACKE_ssysv_rook (int matrix_layout , char uplo , lapack_int n ,
lapack_int nrhs , float * a , lapack_int lda , lapack_int * ipiv , float * b ,
lapack_int ldb );
lapack_int LAPACKE_dsysv_rook (int matrix_layout , char uplo , lapack_int n ,
lapack_int nrhs , double * a , lapack_int lda , lapack_int * ipiv , double * b ,
lapack_int ldb );
lapack_int LAPACKE_csysv_rook (int matrix_layout , char uplo , lapack_int n ,
lapack_int nrhs , lapack_complex_float * a , lapack_int lda , lapack_int * ipiv ,
lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zsysv_rook (int matrix_layout , char uplo , lapack_int n ,
lapack_int nrhs , lapack_complex_double * a , lapack_int lda , lapack_int * ipiv ,
lapack_complex_double * b , lapack_int ldb );
Include Files
• mkl.h
Description
The routine solves for X the real or complex system of linear equations A*X = B, where A is an n-by-n
symmetric matrix, the columns of matrix B are individual right-hand sides, and the columns of X are the
corresponding solutions.
The diagonal pivoting method is used to factor A as A = U*D*UT or A = L*D*LT, where U (or L) is a product
of permutation and unit upper (lower) triangular matrices, and D is symmetric and block diagonal with 1-
by-1 and 2-by-2 diagonal blocks.
The ?sysv_rook routine is called to compute the factorization of a complex symmetric matrix A using the
bounded Bunch-Kaufman ("rook") diagonal pivoting method.
The factored form of A is then used to solve the system of equations A*X = B.
Input Parameters
746
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
uplo Must be 'U' or 'L'.
a, b Arrays: a(size max(1, lda*n)), bof size max(1, ldb*nrhs) for column
major layout and max(1, ldb*n) for row major layout.
The array a contains the upper or the lower triangular part of the
symmetric matrix A (see uplo). The second dimension of a must be at
least max(1, n).
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations. The second dimension of b must
be at least max(1,nrhs).
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs) for row major layout.
Output Parameters
ipiv Array, size at least max(1, n). Contains details of the interchanges
and the block structure of D.
If ipiv[k - 1] > 0, then rows and columns k and ipiv[k - 1] were
interchanged and Dk, k is a 1-by-1 diagonal block.
If uplo = 'U' and ipiv[k - 1] < 0 and ipiv[k - 2] < 0, then
rows and columns k and -ipiv[k - 1] were interchanged, rows and
columns k - 1 and -ipiv[k - 2] were interchanged, and Dk-1:k, k-1:k is
a 2-by-2 diagonal block.
If uplo = 'L' and ipiv[k - 1] < 0 and ipiv[k] < 0, then rows
and columns k and -ipiv[k - 1] were interchanged, rows and columns
k + 1 and -ipiv[k ] were interchanged, and Dk:k+1, k:k+1 is a 2-by-2
diagonal block.
Return Values
This function returns a value info.
747
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If info = i, dii is 0. The factorization has been completed, but D is exactly singular, so the solution could
not be computed.
See Also
Matrix Storage Schemes
?sysv_rk
Computes the solution to system of linear equations A
* X = B for SY matrices.
lapack_int LAPACKE_ssysv_rk (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, float * A, lapack_int lda, float * e, lapack_int * ipiv, float * B, lapack_int
ldb);
lapack_int LAPACKE_dsysv_rk (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, double * A, lapack_int lda, double * e, lapack_int * ipiv, double * B, lapack_int
ldb);
lapack_int LAPACKE_csysv_rk (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, lapack_complex_float * A, lapack_int lda, lapack_complex_float * e, lapack_int *
ipiv, lapack_complex_float * B, lapack_int ldb);
lapack_int LAPACKE_zsysv_rk (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, lapack_complex_double * A, lapack_int lda, lapack_complex_double * e, lapack_int
* ipiv, lapack_complex_double * B, lapack_int ldb);
Description
?sysv_rk computes the solution to a real or complex system of linear equations A * X = B, where A is an n-
by-n symmetric matrix and X and B are n-by-nrhs matrices.
The bounded Bunch-Kaufman (rook) diagonal pivoting method is used to factor A as A= P*U*D*(UT)*(PT), if
uplo = 'U', or A= P*L*D*(LT)*(PT), if uplo = 'L', where U (or L) is unit upper (or lower) triangular matrix,
UT (or LT) is the transpose of U (or L), P is a permutation matrix, PT is the transpose of P, and D is symmetric
and block diagonal with 1-by-1 and 2-by-2 diagonal blocks.
?sytrf_rk is called to compute the factorization of a real or complex symmetric matrix. The factored form of
A is then used to solve the system of equations A * X = B by calling BLAS3 routine ?sytrs_3.
Input Parameters
uplo Specifies whether the upper or lower triangular part of the symmetric
matrix A is stored:
n The number of linear equations; that is, the order of the matrix A. n ≥ 0.
nrhs The number of right-hand sides; that is, the number of columns of the
matrix B. nrhs ≥ 0.
748
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
not referenced. If uplo = 'L', the leading n-by-n lower triangular part of A
contains the lower triangular part of the matrix A, and the strictly upper
triangular part of A is not referenced.
ldb The leading dimension of the array B. ldb ≥ max(1, n) for column-major
layout and ldb ≥ nrhs for row-major layout.
Output Parameters
A On exit, if info = 0, the diagonal of the block diagonal matrix D and factors
U or L as computed by ?sytrf_rk:
ipiv Array of size n. Details of the interchanges and the block structure of D, as
determined by ?sytrf_rk. For more information, see the description of
the ?sytrf_rk routine.
Return Values
This function returns a value info.
= 0: Successful exit.
< 0: If info = -k, the kth argument had an illegal value.
> 0: If info = k, the matrix A is singular. If uplo = 'U', column k in the upper triangular part of A contains
all zeros. If uplo = 'L', column k in the lower triangular part of A contains all zeros. Therefore D(k,k) is
exactly zero, and superdiagonal elements of column k of U (or subdiagonal elements of column k of L) are all
zeros. The factorization has been completed, but the block diagonal matrix D is exactly singular, and division
by zero will occur if it is used to solve a system of equations.
749
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?sysvx
Uses the diagonal pivoting factorization to compute
the solution to the system of linear equations with a
real or complex symmetric coefficient matrix A, and
provides error bounds on the solution.
Syntax
lapack_int LAPACKE_ssysvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const float* a, lapack_int lda, float* af, lapack_int ldaf,
lapack_int* ipiv, const float* b, lapack_int ldb, float* x, lapack_int ldx, float*
rcond, float* ferr, float* berr );
lapack_int LAPACKE_dsysvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const double* a, lapack_int lda, double* af, lapack_int ldaf,
lapack_int* ipiv, const double* b, lapack_int ldb, double* x, lapack_int ldx, double*
rcond, double* ferr, double* berr );
lapack_int LAPACKE_csysvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const lapack_complex_float* a, lapack_int lda, lapack_complex_float*
af, lapack_int ldaf, lapack_int* ipiv, const lapack_complex_float* b, lapack_int ldb,
lapack_complex_float* x, lapack_int ldx, float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_zsysvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const lapack_complex_double* a, lapack_int lda, lapack_complex_double*
af, lapack_int ldaf, lapack_int* ipiv, const lapack_complex_double* b, lapack_int ldb,
lapack_complex_double* x, lapack_int ldx, double* rcond, double* ferr, double* berr );
Include Files
• mkl.h
Description
The routine uses the diagonal pivoting factorization to compute the solution to a real or complex system of
linear equations A*X = B, where A is a n-by-n symmetric matrix, the columns of matrix B are individual
right-hand sides, and the columns of X are the corresponding solutions.
Error bounds on the solution and a condition estimate are also provided.
The routine ?sysvx performs the following steps:
1. If fact = 'N', the diagonal pivoting method is used to factor the matrix A. The form of the
factorization is A = U*D*UT or A = L*D*LT, where U (or L) is a product of permutation and unit upper
(lower) triangular matrices, and D is symmetric and block diagonal with 1-by-1 and 2-by-2 diagonal
blocks.
2. If some di,i= 0, so that D is exactly singular, then the routine returns with info = i. Otherwise, the
factored form of A is used to estimate the condition number of the matrix A. If the reciprocal of the
condition number is less than machine precision, info = n+1 is returned as a warning, but the routine
still goes on to solve for X and compute error bounds as described below.
3. The system of equations is solved for X using the factored form of A.
4. Iterative refinement is applied to improve the computed solution matrix and calculate error bounds and
backward error estimates for it.
Input Parameters
750
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
fact Must be 'F' or 'N'.
Specifies whether or not the factored form of the matrix A has been
supplied on entry.
If fact = 'F': on entry, af and ipiv contain the factored form of A.
Arrays a, af, and ipiv will not be modified.
If fact = 'N', the matrix A will be copied to af and factored.
a, af, b Arrays: a(size max(1, lda*n)), af(size max(1, ldaf*n)), bof size
max(1, ldb*nrhs) for column major layout and max(1, ldb*n) for
row major layout .
The array a contains the upper or the lower triangular part of the
symmetric matrix A (see uplo).
The array af is an input argument if fact = 'F'. It contains the block
diagonal matrix D and the multipliers used to obtain the factor U or L
from the factorization A = U*D*UT orA = L*D*LT as computed
by ?sytrf.
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
ipiv Array, size at least max(1, n). The array ipiv is an input argument if
fact = 'F'. It contains details of the interchanges and the block
structure of D, as determined by ?sytrf.
751
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.
Output Parameters
x Array, size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout.
If info = 0 or info = n+1, the array x contains the solution matrix
X to the system of equations.
ferr Array, size at least max(1, nrhs). Contains the estimated forward
error bound for each solution vector xj (the j-th column of the solution
matrix X). If xtrue is the true solution corresponding to xj, ferr[j-1]
is an estimated upper bound for the magnitude of the largest element
in (xj - xtrue) divided by the magnitude of the largest element in xj.
The estimate is as reliable as the estimate for rcond, and is almost
always a slight overestimate of the true error.
Return Values
This function returns a value info.
If info = i, and i≤n, then dii is exactly zero. The factorization has been completed, but the block diagonal
matrix D is exactly singular, so the solution and error bounds could not be computed; rcond = 0 is returned.
If info = i, and i = n + 1, then D is nonsingular, but rcond is less than machine precision, meaning that the
matrix is singular to working precision. Nevertheless, the solution and error bounds are computed because
there are a number of situations where the computed solution can be more accurate than the value of rcond
would suggest.
See Also
Matrix Storage Schemes
?sysvxx
Uses extra precise iterative refinement to compute the
solution to the system of linear equations with a
symmetric indefinite coefficient matrix A applying the
diagonal pivoting factorization.
752
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_ssysvxx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, float* a, lapack_int lda, float* af, lapack_int ldaf, lapack_int* ipiv,
char* equed, float* s, float* b, lapack_int ldb, float* x, lapack_int ldx, float* rcond,
float* rpvgrw, float* berr, lapack_int n_err_bnds, float* err_bnds_norm, float*
err_bnds_comp, lapack_int nparams, const float* params );
lapack_int LAPACKE_dsysvxx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, double* a, lapack_int lda, double* af, lapack_int ldaf, lapack_int*
ipiv, char* equed, double* s, double* b, lapack_int ldb, double* x, lapack_int ldx,
double* rcond, double* rpvgrw, double* berr, lapack_int n_err_bnds, double*
err_bnds_norm, double* err_bnds_comp, lapack_int nparams, const double* params );
lapack_int LAPACKE_csysvxx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, lapack_complex_float* a, lapack_int lda, lapack_complex_float* af,
lapack_int ldaf, lapack_int* ipiv, char* equed, float* s, lapack_complex_float* b,
lapack_int ldb, lapack_complex_float* x, lapack_int ldx, float* rcond, float* rpvgrw,
float* berr, lapack_int n_err_bnds, float* err_bnds_norm, float* err_bnds_comp,
lapack_int nparams, const float* params );
lapack_int LAPACKE_zsysvxx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, lapack_complex_double* a, lapack_int lda, lapack_complex_double* af,
lapack_int ldaf, lapack_int* ipiv, char* equed, double* s, lapack_complex_double* b,
lapack_int ldb, lapack_complex_double* x, lapack_int ldx, double* rcond, double*
rpvgrw, double* berr, lapack_int n_err_bnds, double* err_bnds_norm, double*
err_bnds_comp, lapack_int nparams, const double* params );
Include Files
• mkl.h
Description
The routine uses the diagonal pivoting factorization to compute the solution to a real or complex system of
linear equations A*X = B, where A is an n-by-n real symmetric/Hermitian matrix, the columns of matrix B
are individual right-hand sides, and the columns of X are the corresponding solutions.
Both normwise and maximum componentwise error bounds are also provided on request. The routine returns
a solution with a small guaranteed error (O(eps), where eps is the working machine precision) unless the
matrix is very ill-conditioned, in which case a warning is returned. Relevant condition numbers are also
calculated and returned.
The routine accepts user-provided factorizations and equilibration factors; see definitions of the fact and
equed options. Solving with refinement and using a factorization from a previous call of the routine also
produces a solution with O(eps) errors or warnings but that may not be true for general user-provided
factorizations and equilibration factors if they differ from what the routine would itself produce.
The routine ?sysvxx performs the following steps:
753
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
where U or L is a product of permutation and unit upper (lower) triangular matrices, and D is a
symmetric and block diagonal with 1-by-1 and 2-by-2 diagonal blocks.
3. If some D(i,i)=0, so that D is exactly singular, the routine returns with info = i. Otherwise, the
factored form of A is used to estimate the condition number of the matrix A (see the rcond parameter).
If the reciprocal of the condition number is less than machine precision, the routine still goes on to
solve for X and compute error bounds.
4. The system of equations is solved for X using the factored form of A.
5. By default, unless params[0] is set to zero, the routine applies iterative refinement to get a small error
and error bounds. Refinement calculates the residual to at least twice the working precision.
6. If equilibration was used, the matrix X is premultiplied by diag(r) so that it solves the original system
before equilibration.
Input Parameters
754
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The array af is an input argument if fact = 'F'. It contains the
block diagonal matrix D and the multipliers used to obtain the factor U
and L from the factorization A = U*D*UT or A = L*D*LT as computed
by ?sytrf.
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.
ipiv Array, size at least max(1, n). The array ipiv is an input argument if
fact = 'F'. It contains details of the interchanges and the block
structure of D as determined by ?sytrf. If ipiv[k-1] > 0, rows and
columns k and ipiv[k-1] were interchanged and D(k,k) is a 1-by-1
diagonal block.
If uplo = 'U' and ipiv[i] = ipiv[i - 1] = m < 0, D has a 2-
by-2 diagonal block in rows and columns i and i + 1, and the i-th row
and column of A were interchanged with the m-th row and column.
If uplo = 'L' and ipiv[i] = ipiv[i - 1] = m < 0, D has a 2-
by-2 diagonal block in rows and columns i and i + 1, and the (i + 1)-st
row and column of A were interchanged with the m-th row and
column.
s Array, size (n). The array s contains the scale factors for A. If equed
= 'Y', A is multiplied on the left and right by diag(s).
This array is an input argument if fact = 'F' only; otherwise it is an
output argument.
If fact = 'F' and equed = 'Y', each element of s must be positive.
ldb The leading dimension of the array b; ldb≥ max(1, n) for column
major layout and ldb≥nrhs for row major layout.
ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.
755
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
n_err_bnds Number of error bounds to return for each right hand side and each
type (normwise or componentwise). See err_bnds_norm and
err_bnds_comp descriptions in the Output Arguments section below.
Default 10.0
Output Parameters
x Array, size max(1, ldx*nrhs) for column major layout and max(1, ldx*n)
for row major layout).
If info = 0, the array x contains the solution n-by-nrhs matrix X to the
original system of equations. Note that A and B are modified on exit if
equed≠'N', and the solution to the equilibrated system is:
inv(diag(s))*X.
756
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
af If fact = 'N', af is an output argument and on exit returns the block
diagonal matrix D and the multipliers used to obtain the factor U or L from
the factorization A = U*D*UT or A = L*D*LT.
berr Array, size at least max(1, nrhs). Contains the componentwise relative
backward error for each solution vector xj, that is, the smallest relative
change in any element of A or B that makes xj an exact solution.
757
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in err_bnds_norm[(err-1)*nrhs + i - 1].
758
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
err=3 Reciprocal condition number. Estimated
componentwise reciprocal condition number.
Compared with the threshold
sqrt(n)*slamch(ε) for single precision flavors
and sqrt(n)*dlamch(ε) for double precision
flavors to determine if the error estimate is
"guaranteed". These reciprocal condition
numbers for some appropriately scaled matrix Z
are:
The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in err_bnds_comp[(err-1)*nrhs + i - 1].
ipiv If fact = 'N', ipiv is an output argument and on exit contains details of
the interchanges and the block structure D, as determined by ssytrf for
single precision flavors and dsytrf for double precision flavors.
params If an entry is less than 0.0, that entry is filled with the default value used
for that parameter, otherwise the entry is not modified.
Return Values
This function returns a value info.
If info = 0, the execution is successful. The solution to every right-hand side is guaranteed.
If 0 < info≤n: Uinfo,info is exactly zero. The factorization has been completed, but the factor U is exactly
singular, so the solution and error bounds could not be computed; rcond = 0 is returned.
If info = n+j: The solution corresponding to the j-th right-hand side is not guaranteed. The solutions
corresponding to other right-hand sides k with k > j may not be guaranteed as well, but only the first such
right-hand side is reported. If a small componentwise error is not requested params[2] = 0.0, then the j-th
right-hand side is the first with a normwise error bound that is not guaranteed (the smallest j such that for
column major layout err_bnds_norm[j - 1] = 0.0 or err_bnds_comp[j - 1] = 0.0; or for row major
layout err_bnds_norm[(j - 1)*n_err_bnds] = 0.0 or err_bnds_comp[(j - 1)*n_err_bnds] = 0.0).
See the definition of err_bnds_norm and err_bnds_comp for err = 1. To get information about all of the
right-hand sides, check err_bnds_norm or err_bnds_comp.
See Also
Matrix Storage Schemes
759
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?hesv
Computes the solution to the system of linear
equations with a Hermitian matrix A and multiple
right-hand sides.
Syntax
lapack_int LAPACKE_chesv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , lapack_complex_float * a , lapack_int lda , lapack_int * ipiv ,
lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zhesv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , lapack_complex_double * a , lapack_int lda , lapack_int * ipiv ,
lapack_complex_double * b , lapack_int ldb );
Include Files
• mkl.h
Description
The routine solves for X the complex system of linear equations A*X = B, where A is an n-by-n symmetric
matrix, the columns of matrix B are individual right-hand sides, and the columns of X are the corresponding
solutions.
The diagonal pivoting method is used to factor A as A = U*D*UH or A = L*D*LH, where U (or L) is a product
of permutation and unit upper (lower) triangular matrices, and D is Hermitian and block diagonal with 1-by-1
and 2-by-2 diagonal blocks.
The factored form of A is then used to solve the system of equations A*X = B.
Input Parameters
If uplo = 'L', the array a stores the lower triangular part of the
matrix A, and A is factored as L*D*LH.
760
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lda The leading dimension of a; lda≥ max(1, n).
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
Output Parameters
ipiv Array, size at least max(1, n). Contains details of the interchanges
and the block structure of D, as determined by ?hetrf.
Return Values
This function returns a value info.
If info = i, dii is 0. The factorization has been completed, but D is exactly singular, so the solution could
not be computed.
See Also
Matrix Storage Schemes
?hesv_aa
Computes the solution to system of linear equations
for HE matrices.
LAPACK_DECL lapack_int LAPACKE_chesv_aa (int matrix_layout, char uplo, lapack_int n,
lapack_int nrhs, lapack_complex_float * a, lapack_int lda, lapack_int * ipiv,
lapack_complex_float * b, lapack_int ldb );
LAPACK_DECL lapack_int LAPACKE_chesv_aa_work (int matrix_layout, char uplo, lapack_int
n, lapack_int nrhs, lapack_complex_float * a, lapack_int lda, lapack_int * ipiv,
lapack_complex_float * b, lapack_int ldb, lapack_complex_float * work, lapack_int
lwork );
Description
?hesv_aa computes the solution to a complex system of linear equations A * X = B, where A is an n-by-n
Hermitian matrix and X and B are n-by-nrhs matrices. Aasen's algorithm is used to factor A as
A = U * T * UH if uplo = 'U', or
761
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
A = L * T * LH if uplo = 'L',
where U (or L) is a product of permutation and unit upper (lower) triangular matrices, and T is Hermitian and
tridiagonal. The factored form of A is then used to solve the system of equations A * X = B.
Input Parameters
nrhs The number of right hand sides or the number of columns of the matrix B.
nrhs≥ 0.
If uplo = 'U', the leading n-by-n upper triangular part of a contains the
upper triangular part of the matrix A, and the strictly lower triangular part
of a is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of a contains the
lower triangular part of the matrix A, and the strictly upper triangular part
of a is not referenced.
b Array of size ldb*nrhs. On entry, the n-by-nrhs right hand side matrix B.
lwork The length of work. lwork≥ max(1, 2*n, 3*n-2), and for best performance
lwork≥ max(1,n*nb), where nb is the optimal blocksize for ?hetrf.
If lwork < n, TRS is done with Level BLAS 2. If lwork≥n, TRS is done with
Level BLAS 3.
If lwork = -1, then a workspace query is assumed; the routine only
calculates the optimal size of the work array, returns this value as the first
entry of the work array, and no error message related to lwork is issued by
xerbla.
Output Parameters
ipiv Array of size (n) On exit, it contains the details of the interchanges: row
and column k of A were interchanged with the row and column ipiv[k].
work Array of size (max(1, lwork)). On exit, if info = 0, work[0] returns the
optimal lwork.
762
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Return Values
This function returns a value info.
If info < 0: if info = -i, the i-th argument had an illegal value.
If info > 0: if info = i, Di, i is exactly zero. The factorization has been completed, but the block diagonal
matrix D is exactly singular, so the solution could not be computed.
?hesv_rk
?hesv_rk computes the solution to a system of linear
equations A * X = B for Hermitian matrices.
lapack_int LAPACKE_chesv_rk (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, lapack_complex_float * A, lapack_int lda, lapack_complex_float * e, lapack_int *
ipiv, lapack_complex_float * B, lapack_int ldb);
lapack_int LAPACKE_zhesv_rk (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, lapack_complex_double * A, lapack_int lda, lapack_complex_double * e, lapack_int
* ipiv, lapack_complex_double * B, lapack_int ldb);
Description
?hesv_rk computes the solution to a complex system of linear equations A * X = B, where A is an n-by-n
Hermitian matrix and X and B are n-by-nrhs matrices.
The bounded Bunch-Kaufman (rook) diagonal pivoting method is used to factor A as A = P*U*D*(UH)*(PT), if
uplo = 'U', or A = P*L*D*(LH)*(PT), if uplo = 'L', where U (or L) is unit upper (or lower) triangular
matrix, UH (or LH) is the conjugate of U (or L), P is a permutation matrix, PT is the transpose of P, and D is
Hermitian and block diagonal with 1-by-1 and 2-by-2 diagonal blocks.
?hetrf_rk is called to compute the factorization of a complex Hermitian matrix. The factored form of A is
then used to solve the system of equations A * X = B by calling BLAS3 routine ?hetrs_3.
Input Parameters
uplo Specifies whether the upper or lower triangular part of the Hermitian matrix
A is stored:
n The number of linear equations; that is, the order of the matrix A. n ≥ 0.
nrhs The number of right-hand sides; that is, the number of columns of the
matrix B. nrhs ≥ 0.
763
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ldb The leading dimension of the array B. ldb ≥ max(1, n) for column-major
layout and ldb ≥ nrhs for row-major layout.
Output Parameters
—and—
• If uplo = 'U', factor U in the superdiagonal part of A. If uplo = 'L',
factor L in the subdiagonal part of A.
For more information, see the description of the ?hetrf_rk routine.
ipiv Array of size n. Details of the interchanges and the block structure of D, as
determined by ?hetrf_rk.
Return Values
This function returns a value info.
= 0: Successful exit.
< 0: If info = -k, the kth argument had an illegal value.
> 0: If info = k, the matrix A is singular. If uplo = 'U', column k in the upper triangular part of A contains
all zeros. If uplo = 'L', column k in the lower triangular part of A contains all zeros. Therefore D(k,k) is
exactly zero, and superdiagonal elements of column k of U (or subdiagonal elements of column k of L ) are
all zeros. The factorization has been completed, but the block diagonal matrix D is exactly singular, and
division by zero will occur if it is used to solve a system of equations.
764
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
?hesvx
Uses the diagonal pivoting factorization to compute
the solution to the complex system of linear equations
with a Hermitian coefficient matrix A, and provides
error bounds on the solution.
Syntax
lapack_int LAPACKE_chesvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const lapack_complex_float* a, lapack_int lda, lapack_complex_float*
af, lapack_int ldaf, lapack_int* ipiv, const lapack_complex_float* b, lapack_int ldb,
lapack_complex_float* x, lapack_int ldx, float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_zhesvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const lapack_complex_double* a, lapack_int lda, lapack_complex_double*
af, lapack_int ldaf, lapack_int* ipiv, const lapack_complex_double* b, lapack_int ldb,
lapack_complex_double* x, lapack_int ldx, double* rcond, double* ferr, double* berr );
Include Files
• mkl.h
Description
The routine uses the diagonal pivoting factorization to compute the solution to a complex system of linear
equations A*X = B, where A is an n-by-n Hermitian matrix, the columns of matrix B are individual right-hand
sides, and the columns of X are the corresponding solutions.
Error bounds on the solution and a condition estimate are also provided.
The routine ?hesvx performs the following steps:
1. If fact = 'N', the diagonal pivoting method is used to factor the matrix A. The form of the
factorization is A = U*D*UH or A = L*D*LH, where U (or L) is a product of permutation and unit upper
(lower) triangular matrices, and D is Hermitian and block diagonal with 1-by-1 and 2-by-2 diagonal
blocks.
2. If some di,i= 0, so that D is exactly singular, then the routine returns with info = i. Otherwise, the
factored form of A is used to estimate the condition number of the matrix A. If the reciprocal of the
condition number is less than machine precision, info = n+1 is returned as a warning, but the routine
still goes on to solve for X and compute error bounds as described below.
3. The system of equations is solved for X using the factored form of A.
4. Iterative refinement is applied to improve the computed solution matrix and calculate error bounds and
backward error estimates for it.
Input Parameters
Specifies whether or not the factored form of the matrix A has been
supplied on entry.
If fact = 'F': on entry, af and ipiv contain the factored form of A.
Arrays a, af, and ipiv are not modified.
If fact = 'N', the matrix A is copied to af and factored.
765
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If uplo = 'L', the array a stores the lower triangular part of the
Hermitian matrix A; A is factored as L*D*LH.
a, af, b Arrays: a(size max(1, lda*n)), af(size max(1, ldaf*n)), bof size
max(1, ldb*nrhs) for column major layout and max(1, ldb*n) for
row major layout.
The array a contains the upper or the lower triangular part of the
Hermitian matrix A (see uplo).
The array af is an input argument if fact = 'F'. It contains he block
diagonal matrix D and the multipliers used to obtain the factor U or L
from the factorization A = U*D*UH or A = L*D*LH as computed
by ?hetrf.
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
ipiv Array, size at least max(1, n). The array ipiv is an input argument if
fact = 'F'. It contains details of the interchanges and the block
structure of D, as determined by ?hetrf.
ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.
766
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
x Array, size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout.
If info = 0 or info = n+1, the array x contains the solution matrix
X to the system of equations.
af, ipiv These arrays are output arguments if fact = 'N'. See the
description of af, ipiv in Input Arguments section.
ferr Array, size at least max(1, nrhs). Contains the estimated forward
error bound for each solution vector xj (the j-th column of the solution
matrix X). If xtrue is the true solution corresponding to xj, ferr[j-1]
is an estimated upper bound for the magnitude of the largest element
in (xj) - xtrue) divided by the magnitude of the largest element in xj.
The estimate is as reliable as the estimate for rcon, and is almost
always a slight overestimate of the true error.
Return Values
This function returns a value info.
If info = i, and i≤n, then dii is exactly zero. The factorization has been completed, but the block diagonal
matrix D is exactly singular, so the solution and error bounds could not be computed; rcond = 0 is returned.
If info = i, and i = n + 1, then D is nonsingular, but rcond is less than machine precision, meaning that the
matrix is singular to working precision. Nevertheless, the solution and error bounds are computed because
there are a number of situations where the computed solution can be more accurate than the value of rcond
would suggest.
See Also
Matrix Storage Schemes
?hesvxx
Uses extra precise iterative refinement to compute the
solution to the system of linear equations with a
Hermitian indefinite coefficient matrix A applying the
diagonal pivoting factorization.
Syntax
lapack_int LAPACKE_chesvxx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, lapack_complex_float* a, lapack_int lda, lapack_complex_float* af,
lapack_int ldaf, lapack_int* ipiv, char* equed, float* s, lapack_complex_float* b,
767
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine uses the diagonal pivoting factorization to compute the solution to a complex/double complex
system of linear equations A*X = B, where A is an n-by-n Hermitian matrix, the columns of matrix B are
individual right-hand sides, and the columns of X are the corresponding solutions.
Both normwise and maximum componentwise error bounds are also provided on request. The routine returns
a solution with a small guaranteed error (O(eps), where eps is the working machine precision) unless the
matrix is very ill-conditioned, in which case a warning is returned. Relevant condition numbers are also
calculated and returned.
The routine accepts user-provided factorizations and equilibration factors; see definitions of the fact and
equed options. Solving with refinement and using a factorization from a previous call of the routine also
produces a solution with O(eps) errors or warnings but that may not be true for general user-provided
factorizations and equilibration factors if they differ from what the routine would itself produce.
The routine ?hesvxx performs the following steps:
where U or L is a product of permutation and unit upper (lower) triangular matrices, and D is a
symmetric and block diagonal with 1-by-1 and 2-by-2 diagonal blocks.
3. If some D(i,i)=0, so that D is exactly singular, the routine returns with info = i. Otherwise, the
factored form of A is used to estimate the condition number of the matrix A (see the rcond parameter).
If the reciprocal of the condition number is less than machine precision, the routine still goes on to
solve for X and compute error bounds.
4. The system of equations is solved for X using the factored form of A.
5. By default, unless params[0] is set to zero, the routine applies iterative refinement to get a small error
and error bounds. Refinement calculates the residual to at least twice the working precision.
6. If equilibration was used, the matrix X is premultiplied by diag(r) so that it solves the original system
before equilibration.
768
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.
ipiv Array, size at least max(1, n). The array ipiv is an input argument if
fact = 'F'. It contains details of the interchanges and the block
structure of D as determined by ?sytrf.
769
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
s Array, size (n). The array s contains the scale factors for A. If equed
= 'Y', A is multiplied on the left and right by diag(s).
This array is an input argument if fact = 'F' only; otherwise it is an
output argument.
If fact = 'F' and equed = 'Y', each element of s must be positive.
ldb The leading dimension of the array b; ldb≥ max(1, n) for column
major layout and ldb≥nrhs for row major layout.
ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.
n_err_bnds Number of error bounds to return for each right hand side and each
type (normwise or componentwise). See err_bnds_norm and
err_bnds_comp descriptions in the Output Arguments section below.
770
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
params[0] : Whether to perform iterative refinement or not. Default:
1.0 (for single precision flavors), 1.0D+0 (for double precision
flavors).
Default 10
Output Parameters
x Array, size max(1, ldx*nrhs) for column major layout and max(1, ldx*n)
for row major layout.
If info = 0, the array x contains the solution n-by-nrhs matrix X to the
original system of equations. Note that A and B are modified on exit if
equed≠'N', and the solution to the equilibrated system is:
inv(diag(s))*X.
771
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
berr Array, size at least max(1, nrhs). Contains the component-wise relative
backward error for each solution vector xj, that is, the smallest relative
change in any element of A or B that makes xj an exact solution.
772
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in err_bnds_norm[(err-1)*nrhs + i - 1].
The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in err_bnds_comp[(err-1)*nrhs + i - 1].
ipiv If fact = 'N', ipiv is an output argument and on exit contains details of
the interchanges and the block structure D, as determined by ssytrf for
single precision flavors and dsytrf for double precision flavors.
773
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
params If an entry is less than 0.0, that entry is filled with the default value used
for that parameter, otherwise the entry is not modified.
Return Values
This function returns a value info.
If info = 0, the execution is successful. The solution to every right-hand side is guaranteed.
If 0 < info≤n: Uinfo,info is exactly zero. The factorization has been completed, but the factor U is exactly
singular, so the solution and error bounds could not be computed; rcond = 0 is returned.
If info = n+j: The solution corresponding to the j-th right-hand side is not guaranteed. The solutions
corresponding to other right-hand sides k with k > j may not be guaranteed as well, but only the first such
right-hand side is reported. If a small componentwise error is not requested params[2] = 0.0, then the j-th
right-hand side is the first with a normwise error bound that is not guaranteed (the smallest j such that for
column major layout err_bnds_norm[j - 1] = 0.0 or err_bnds_comp[j - 1] = 0.0; or for row major
layout err_bnds_norm[(j - 1)*n_err_bnds] = 0.0 or err_bnds_comp[(j - 1)*n_err_bnds] = 0.0).
See the definition of err_bnds_norm and err_bnds_comp for err = 1. To get information about all of the
right-hand sides, check err_bnds_norm or err_bnds_comp.
See Also
Matrix Storage Schemes
?spsv
Computes the solution to the system of linear
equations with a real or complex symmetric coefficient
matrix A stored in packed format, and multiple right-
hand sides.
Syntax
lapack_int LAPACKE_sspsv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , float * ap , lapack_int * ipiv , float * b , lapack_int ldb );
lapack_int LAPACKE_dspsv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , double * ap , lapack_int * ipiv , double * b , lapack_int ldb );
lapack_int LAPACKE_cspsv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , lapack_complex_float * ap , lapack_int * ipiv , lapack_complex_float * b ,
lapack_int ldb );
lapack_int LAPACKE_zspsv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , lapack_complex_double * ap , lapack_int * ipiv , lapack_complex_double * b ,
lapack_int ldb );
Include Files
• mkl.h
Description
774
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The routine solves for X the real or complex system of linear equations A*X = B, where A is an n-by-n
symmetric matrix stored in packed format, the columns of matrix B are individual right-hand sides, and the
columns of X are the corresponding solutions.
The diagonal pivoting method is used to factor A as A = U*D*UT or A = L*D*LT, where U (or L) is a product
of permutation and unit upper (lower) triangular matrices, and D is symmetric and block diagonal with 1-
by-1 and 2-by-2 diagonal blocks.
The factored form of A is then used to solve the system of equations A*X = B.
Input Parameters
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
Output Parameters
ipiv Array, size at least max(1, n). Contains details of the interchanges
and the block structure of D, as determined by ?sptrf.
If ipiv[i-1] = k > 0, then dii is a 1-by-1 block, and the i-th row
and column of A was interchanged with the k-th row and column.
If uplo = 'U'and ipiv[i]=ipiv[i-1] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and i-th row and column of A was
interchanged with the m-th row and column.
If uplo = 'L'and ipiv[i-1] =ipiv[i] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and (i+1)-th row and column of A
was interchanged with the m-th row and column.
775
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Return Values
This function returns a value info.
If info = i, dii is 0. The factorization has been completed, but D is exactly singular, so the solution could
not be computed.
See Also
Matrix Storage Schemes
?spsvx
Uses the diagonal pivoting factorization to compute
the solution to the system of linear equations with a
real or complex symmetric coefficient matrix A stored
in packed format, and provides error bounds on the
solution.
Syntax
lapack_int LAPACKE_sspsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const float* ap, float* afp, lapack_int* ipiv, const float* b,
lapack_int ldb, float* x, lapack_int ldx, float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_dspsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const double* ap, double* afp, lapack_int* ipiv, const double* b,
lapack_int ldb, double* x, lapack_int ldx, double* rcond, double* ferr, double* berr );
lapack_int LAPACKE_cspsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const lapack_complex_float* ap, lapack_complex_float* afp, lapack_int*
ipiv, const lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x,
lapack_int ldx, float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_zspsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const lapack_complex_double* ap, lapack_complex_double* afp,
lapack_int* ipiv, const lapack_complex_double* b, lapack_int ldb,
lapack_complex_double* x, lapack_int ldx, double* rcond, double* ferr, double* berr );
Include Files
• mkl.h
Description
The routine uses the diagonal pivoting factorization to compute the solution to a real or complex system of
linear equations A*X = B, where A is a n-by-n symmetric matrix stored in packed format, the columns of
matrix B are individual right-hand sides, and the columns of X are the corresponding solutions.
Error bounds on the solution and a condition estimate are also provided.
The routine ?spsvx performs the following steps:
1. If fact = 'N', the diagonal pivoting method is used to factor the matrix A. The form of the
factorization is A = U*D*UT orA = L*D*LT, where U (or L) is a product of permutation and unit upper
(lower) triangular matrices, and D is symmetric and block diagonal with 1-by-1 and 2-by-2 diagonal
blocks.
776
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
2. If some di,i= 0, so that D is exactly singular, then the routine returns with info = i. Otherwise, the
factored form of A is used to estimate the condition number of the matrix A. If the reciprocal of the
condition number is less than machine precision, info = n+1 is returned as a warning, but the routine
still goes on to solve for X and compute error bounds as described below.
3. The system of equations is solved for X using the factored form of A.
4. Iterative refinement is applied to improve the computed solution matrix and calculate error bounds and
backward error estimates for it.
Input Parameters
Specifies whether or not the factored form of the matrix A has been
supplied on entry.
If fact = 'F': on entry, afp and ipiv contain the factored form of A.
Arrays ap, afp, and ipiv are not modified.
If fact = 'N', the matrix A is copied to afp and factored.
If uplo = 'L', the array ap stores the lower triangular part of the
symmetric matrix A; A is factored as L*D*LT.
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
ipiv Array, size at least max(1, n). The array ipiv is an input argument if
fact = 'F'. It contains details of the interchanges and the block
structure of D, as determined by ?sptrf.
777
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If ipiv[i-1] = k > 0, then dii is a 1-by-1 block, and the i-th row
and column of A was interchanged with the k-th row and column.
If uplo = 'U'and ipiv[i]=ipiv[i-1] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and i-th row and column of A was
interchanged with the m-th row and column.
If uplo = 'L'and ipiv[i-1] =ipiv[i] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and (i+1)-th row and column of A
was interchanged with the m-th row and column.
ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.
Output Parameters
x Array, size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout.
If info = 0 or info = n+1, the array x contains the solution matrix
X to the system of equations.
afp, ipiv These arrays are output arguments if fact = 'N'. See the
description of afp, ipiv in Input Arguments section.
ferr, berr Arrays, size at least max(1, nrhs). Contain the component-wise
forward and relative backward errors, respectively, for each solution
vector.
Return Values
This function returns a value info.
If info = i, and i≤n, then dii is exactly zero. The factorization has been completed, but the block diagonal
matrix D is exactly singular, so the solution and error bounds could not be computed; rcond = 0 is returned.
If info = i, and i = n + 1, then D is nonsingular, but rcond is less than machine precision, meaning that the
matrix is singular to working precision. Nevertheless, the solution and error bounds are computed because
there are a number of situations where the computed solution can be more accurate than the value of rcond
would suggest.
See Also
Matrix Storage Schemes
?hpsv
Computes the solution to the system of linear
equations with a Hermitian coefficient matrix A stored
in packed format, and multiple right-hand sides.
778
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_chpsv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , lapack_complex_float * ap , lapack_int * ipiv , lapack_complex_float * b ,
lapack_int ldb );
lapack_int LAPACKE_zhpsv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , lapack_complex_double * ap , lapack_int * ipiv , lapack_complex_double * b ,
lapack_int ldb );
Include Files
• mkl.h
Description
The routine solves for X the system of linear equations A*X = B, where A is an n-by-n Hermitian matrix
stored in packed format, the columns of matrix B are individual right-hand sides, and the columns of X are
the corresponding solutions.
The diagonal pivoting method is used to factor A as A = U*D*UH or A = L*D*LH, where U (or L) is a product
of permutation and unit upper (lower) triangular matrices, and D is Hermitian and block diagonal with 1-by-1
and 2-by-2 diagonal blocks.
The factored form of A is then used to solve the system of equations A*X = B.
Input Parameters
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
779
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
ipiv Array, size at least max(1, n). Contains details of the interchanges
and the block structure of D, as determined by ?hptrf.
If ipiv[i-1] = k > 0, then dii is a 1-by-1 block, and the i-th row
and column of A was interchanged with the k-th row and column.
If uplo = 'U'and ipiv[i]=ipiv[i-1] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and i-th row and column of A was
interchanged with the m-th row and column.
If uplo = 'L'and ipiv[i-1] =ipiv[i] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and (i+1)-th row and column of A
was interchanged with the m-th row and column.
Return Values
This function returns a value info.
If info = i, dii is 0. The factorization has been completed, but D is exactly singular, so the solution could
not be computed.
See Also
Matrix Storage Schemes
?hpsvx
Uses the diagonal pivoting factorization to compute
the solution to the system of linear equations with a
Hermitian coefficient matrix A stored in packed
format, and provides error bounds on the solution.
Syntax
lapack_int LAPACKE_chpsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const lapack_complex_float* ap, lapack_complex_float* afp, lapack_int*
ipiv, const lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x,
lapack_int ldx, float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_zhpsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const lapack_complex_double* ap, lapack_complex_double* afp,
lapack_int* ipiv, const lapack_complex_double* b, lapack_int ldb,
lapack_complex_double* x, lapack_int ldx, double* rcond, double* ferr, double* berr );
Include Files
• mkl.h
Description
780
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The routine uses the diagonal pivoting factorization to compute the solution to a complex system of linear
equations A*X = B, where A is a n-by-n Hermitian matrix stored in packed format, the columns of matrix B
are individual right-hand sides, and the columns of X are the corresponding solutions.
Error bounds on the solution and a condition estimate are also provided.
The routine ?hpsvx performs the following steps:
1. If fact = 'N', the diagonal pivoting method is used to factor the matrix A. The form of the
factorization is A = U*D*UH or A = L*D*LH, where U (or L) is a product of permutation and unit upper
(lower) triangular matrices, and D is a Hermitian and block diagonal with 1-by-1 and 2-by-2 diagonal
blocks.
2. If some di,i = 0, so that D is exactly singular, then the routine returns with info = i. Otherwise, the
factored form of A is used to estimate the condition number of the matrix A. If the reciprocal of the
condition number is less than machine precision, info = n+1 is returned as a warning, but the routine
still goes on to solve for X and compute error bounds as described below.
3. The system of equations is solved for X using the factored form of A.
4. Iterative refinement is applied to improve the computed solution matrix and calculate error bounds and
backward error estimates for it.
Input Parameters
Specifies whether or not the factored form of the matrix A has been
supplied on entry.
If fact = 'F': on entry, afp and ipiv contain the factored form of A.
Arrays ap, afp, and ipiv are not modified.
If fact = 'N', the matrix A is copied to afp and factored.
If uplo = 'L', the array ap stores the lower triangular part of the
Hermitian matrix A, and A is factored as L*D*LH.
781
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
ipiv Array, size at least max(1, n). The array ipiv is an input argument if
fact = 'F'. It contains details of the interchanges and the block
structure of D, as determined by ?hptrf.
If ipiv[i-1] = k > 0, then dii is a 1-by-1 block, and the i-th row
and column of A was interchanged with the k-th row and column.
If uplo = 'U'and ipiv[i]=ipiv[i-1] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and i-th row and column of A was
interchanged with the m-th row and column.
If uplo = 'L'and ipiv[i-1] =ipiv[i] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and (i+1)-th row and column of A
was interchanged with the m-th row and column.
ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.
Output Parameters
x Array, size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout.
If info = 0 or info = n+1, the array x contains the solution matrix
X to the system of equations.
afp, ipiv These arrays are output arguments if fact = 'N'. See the
description of afp, ipiv in Input Arguments section.
ferr Array, size at least max(1, nrhs). Contains the estimated forward
error bound for each solution vector xj (the j-th column of the solution
matrix X). If xtrue is the true solution corresponding to xj, ferr[j-1]
is an estimated upper bound for the magnitude of the largest element
in (xj - xtrue) divided by the magnitude of the largest element in xj.
The estimate is as reliable as the estimate for rcond, and is almost
always a slight overestimate of the true error.
782
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Return Values
This function returns a value info.
If info = i, and i≤n, then dii is exactly zero. The factorization has been completed, but the block diagonal
matrix D is exactly singular, so the solution and error bounds could not be computed; rcond = 0 is returned.
If info = i, and i = n + 1, then D is nonsingular, but rcond is less than machine precision, meaning that the
matrix is singular to working precision. Nevertheless, the solution and error bounds are computed because
there are a number of situations where the computed solution can be more accurate than the value of rcond
would suggest.
See Also
Matrix Storage Schemes
783
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
where R is an n-by-n upper triangular matrix with real diagonal elements, and Q is an m-by-m orthogonal (or
unitary) matrix.
You can use the QR factorization for solving the following least squares problem: minimize ||Ax - b||2
where A is a full-rank m-by-n matrix (m≥n). After factoring the matrix, compute the solution x by solving Rx
= (Q1)Tb.
If m < n, the QR factorization is given by
A = QR = Q(R1R2)
where R is trapezoidal, R1 is upper triangular and R2 is rectangular.
784
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Q is represented as a product of min(m, n) elementary reflectors. Routines are provided to work with Q in
this representation.
LQ Factorization LQ factorization of an m-by-n matrix A is as follows. If m≤n,
where L is an m-by-m lower triangular matrix with real diagonal elements, and Q is an n-by-n orthogonal (or
unitary) matrix.
If m > n, the LQ factorization is
where L1 is an n-by-n lower triangular matrix, L2 is rectangular, and Q is an n-by-n orthogonal (or unitary)
matrix.
You can use the LQ factorization to find the minimum-norm solution of an underdetermined system of linear
equations Ax = b where A is an m-by-n matrix of rank m (m < n). After factoring the matrix, compute the
solution vector x as follows: solve Ly = b for y, and then compute x = (Q1)Hy.
Table "Computational Routines for Orthogonal Factorization" lists LAPACK routines that perform orthogonal
factorization of matrices.
Computational Routines for Orthogonal Factorization
Matrix type, factorization Factorize without Factorize with Generate Apply
pivoting pivoting matrix Q matrix Q
785
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?geqrf
Computes the QR factorization of a general m-by-n
matrix.
Syntax
lapack_int LAPACKE_sgeqrf (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int lda, float* tau);
lapack_int LAPACKE_dgeqrf (int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int lda, double* tau);
lapack_int LAPACKE_cgeqrf (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* tau);
lapack_int LAPACKE_zgeqrf (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* tau);
Include Files
• mkl.h
Description
The routine forms the QR factorization of a general m-by-n matrix A (see Orthogonal Factorizations). No
pivoting is performed.
The routine does not form the matrix Q explicitly. Instead, Q is represented as a product of min(m, n)
elementary reflectors. Routines are provided to work with Q in this representation.
NOTE
This routine supports the Progress Routine feature. See Progress Function for details.
786
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
a Array a of size max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout contains the matrix A.
lda The leading dimension of a; at least max(1, m) for column major layout and
at least max(1, n) for row major layout.
Output Parameters
tau Array, size at least max (1, min(m, n)). Contains scalars that define
elementary reflectors for the matrix Q in its decomposition in a product of
elementary reflectors (see Orthogonal Factorizations).
Return Values
This function returns a value info.
Application Notes
The computed factorization is the exact factorization of a matrix A + E, where
||E||2 = O(ε)||A||2.
The approximate number of floating-point operations for real flavors is
(4/3)n3 if m = n,
(2/3)n2(3m-n) if m > n,
(2/3)m2(3n-m) if m < n.
787
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
(The columns of the computed X are the least squares solution vectors x.)
To compute the elements of Q explicitly, call
See Also
mkl_progress
?geqrfp
Computes the QR factorization of a general m-by-n
matrix with non-negative diagonal elements.
Syntax
lapack_int LAPACKE_sgeqrfp (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int lda, float* tau);
lapack_int LAPACKE_dgeqrfp (int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int lda, double* tau);
lapack_int LAPACKE_cgeqrfp (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* tau);
lapack_int LAPACKE_zgeqrfp (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* tau);
Include Files
• mkl.h
Description
The routine forms the QR factorization of a general m-by-n matrix A (see Orthogonal Factorizations). No
pivoting is performed. The diagonal entries of R are real and nonnegative.
The routine does not form the matrix Q explicitly. Instead, Q is represented as a product of min(m, n)
elementary reflectors. Routines are provided to work with Q in this representation.
NOTE
This routine supports the Progress Routine feature. See Progress Function for details.
Input Parameters
788
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
a Array, size max(1,lda*n) for column major layout and max(1,lda*m) for
row major layout, containing the matrix A.
lda The leading dimension of a; at least max(1, m) for column major layout and
at least max(1, n) for row major layout.
Output Parameters
tau Array, size at least max (1, min(m, n)). Contains scalars that define
elementary reflectors for the matrix Qin its decomposition in a product of
elementary reflectors (see Orthogonal Factorizations).
Return Values
This function returns a value info.
Application Notes
The computed factorization is the exact factorization of a matrix A + E, where
||E||2 = O(ε)||A||2.
The approximate number of floating-point operations for real flavors is
(4/3)n3 if m = n,
(2/3)n2(3m-n) if m > n,
(2/3)m2(3n-m) if m < n.
(The columns of the computed X are the least squares solution vectors x.)
To compute the elements of Q explicitly, call
789
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
See Also
mkl_progress
?geqrt
Computes a blocked QR factorization of a general real
or complex matrix using the compact WY
representation of Q.
Syntax
lapack_int LAPACKE_sgeqrt (int matrix_layout, lapack_int m, lapack_int n, lapack_int
nb, float* a, lapack_int lda, float* t, lapack_int ldt);
lapack_int LAPACKE_dgeqrt (int matrix_layout, lapack_int m, lapack_int n, lapack_int
nb, double* a, lapack_int lda, double* t, lapack_int ldt);
lapack_int LAPACKE_cgeqrt (int matrix_layout, lapack_int m, lapack_int n, lapack_int
nb, lapack_complex_float* a, lapack_int lda, lapack_complex_float* t, lapack_int ldt);
lapack_int LAPACKE_zgeqrt (int matrix_layout, lapack_int m, lapack_int n, lapack_int
nb, lapack_complex_double* a, lapack_int lda, lapack_complex_double* t, lapack_int
ldt);
Include Files
• mkl.h
Description
The strictly lower triangular matrix V contains the elementary reflectors H(i) in the ith column below the
diagonal. For example, if m=5 and n=3, the matrix V is
790
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
where vi represents one of the vectors that define H(i). The vectors are returned in the lower triangular part
of array a.
NOTE
The 1s along the diagonal of V are not stored in a.
Let k = min(m,n). The number of blocks is b = ceiling(k/nb), where each block is of order nb except for
the last block, which is of order ib = k - (b-1)*nb. For each of the b blocks, a upper triangular block
reflector factor is computed:t1, t2, ..., tb. The nb-by-nb (and ib-by-ib for the last block) ts are stored
in the nb-by-n array t as
Input Parameters
a Array a of size max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout contains the m-by-n matrix A.
lda The leading dimension of a; at least max(1, m) for column major layout and
max(1, n) for row major layout.
ldt The leading dimension of t; at least nb for column major layout and max(1,
min(m, n)) for row major layout.
Output Parameters
t Array, size max(1, ldt*min(m, n)) for column major layout and max(1,
ldt*nb) for row major layout.
The upper triangular block reflector's factors stored as a sequence of upper
triangular blocks.
Return Values
This function returns a value info.
If info < 0 and info = -i, the i-th parameter had an illegal value.
791
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?gemqrt
Multiplies a general matrix by the orthogonal/unitary
matrix Q of the QR factorization formed by ?geqrt.
Syntax
lapack_int LAPACKE_sgemqrt (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int nb, const float* v, lapack_int ldv, const float*
t, lapack_int ldt, float* c, lapack_int ldc);
lapack_int LAPACKE_dgemqrt (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int nb, const double* v, lapack_int ldv, const
double* t, lapack_int ldt, double* c, lapack_int ldc);
lapack_int LAPACKE_cgemqrt (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int nb, const lapack_complex_float* v, lapack_int
ldv, const lapack_complex_float* t, lapack_int ldt, lapack_complex_float* c, lapack_int
ldc);
lapack_int LAPACKE_zgemqrt (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int nb, const lapack_complex_double* v, lapack_int
ldv, const lapack_complex_double* t, lapack_int ldt, lapack_complex_double* c,
lapack_int ldc);
Include Files
• mkl.h
Description
The ?gemqrt routine overwrites the general real or complex m-by-n matrixC with
where Q is a real orthogonal (complex unitary) matrix defined as the product of k elementary reflectors
Q = H(1) H(2)... H(k) = I - V*T*VT for real flavors, and
generated using the compact WY representation as returned by geqrt. Q is of order m if side = 'L' and of
order n if side = 'R'.
Input Parameters
792
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
n The number of columns in the matrix C, (n ≥ 0).
nb The block size used for the storage of t, k ≥ nb ≥ 1. This must be the same
value of nb used to generate t in geqrt.
v Array of size max(1, ldv*k) for column major layout, max(1, ldv*m) for
row major layout and side = 'L', and max(1, ldv*n) for row major layout
and side = 'R'.
The ith column must contain the vector which defines the elementary
reflector H(i), for i = 1,2,...,k, as returned by geqrt in the first k columns of
its array argument a.
if side = 'L', ldv must be at least max(1,m) for column major layout and
max(1, k) for row major layout;
if side = 'R', ldv must be at least max(1,n) for column major layout and
max(1, k) for row major layout.
t Array, size max(1, ldt*min(m, n)) for column major layout and max(1,
ldt*nb) for row major layout.
The upper triangular factors of the block reflectors as returned by geqrt.
ldt The leading dimension of the array t. ldt must be at least nb for column
major layout and max(1, k) for row major layout.
ldc The leadinng dimension of the array c. ldc must be at least max(1, m) for
column major layout and max(1, n) for row major layout.
Output Parameters
Return Values
This function returns a value info.
?geqpf
Computes the QR factorization of a general m-by-n
matrix with pivoting.
793
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
lapack_int LAPACKE_sgeqpf (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int lda, lapack_int* jpvt, float* tau);
lapack_int LAPACKE_dgeqpf (int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int lda, lapack_int* jpvt, double* tau);
lapack_int LAPACKE_cgeqpf (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_int* jpvt, lapack_complex_float* tau);
lapack_int LAPACKE_zgeqpf (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_int* jpvt, lapack_complex_double*
tau);
Include Files
• mkl.h
Description
The routine is deprecated and has been replaced by routine geqp3.
The routine ?geqpf forms the QR factorization of a general m-by-n matrix A with column pivoting: A*P =
Q*R (see Orthogonal Factorizations). Here P denotes an n-by-n permutation matrix.
The routine does not form the matrix Q explicitly. Instead, Q is represented as a product of min(m, n)
elementary reflectors. Routines are provided to work with Q in this representation.
Input Parameters
a Array a of size max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout contains the matrix A.
lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.
Output Parameters
794
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
tau Array, size at least max (1, min(m, n)). Contains additional information on
the matrix Q.
Return Values
This function returns a value info.
Application Notes
The computed factorization is the exact factorization of a matrix A + E, where
||E||2 = O(ε)||A||2.
The approximate number of floating-point operations for real flavors is
(4/3)n3 if m = n,
(2/3)n2(3m-n) if m > n,
(2/3)m2(3n-m) if m < n.
(The columns of the computed X are the permuted least squares solution vectors x; the output array jpvt
specifies the permutation order.)
To compute the elements of Q explicitly, call
?geqp3
Computes the QR factorization of a general m-by-n
matrix with column pivoting using level 3 BLAS.
Syntax
lapack_int LAPACKE_sgeqp3 (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int lda, lapack_int* jpvt, float* tau);
795
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine forms the QR factorization of a general m-by-n matrix A with column pivoting: A*P = Q*R (see
Orthogonal Factorizations) using Level 3 BLAS. Here P denotes an n-by-n permutation matrix. Use this
routine instead of geqpf for better performance.
The routine does not form the matrix Q explicitly. Instead, Q is represented as a product of min(m, n)
elementary reflectors. Routines are provided to work with Q in this representation.
Input Parameters
a Array a of size max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout contains the matrix A.
lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.
Output Parameters
tau Array, size at least max (1, min(m, n)). Contains scalar factors of the
elementary reflectors for the matrix Q.
796
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
jpvt Overwritten by details of the permutation matrix P in the factorization A*P
= Q*R. More precisely, the columns of AP are the columns of A in the
following order:
jpvt[0], jpvt[1], ..., jpvt[n - 1].
Return Values
This function returns a value info.
Application Notes
To solve a set of least squares problems minimizing ||A*x - b||2 for all columns b of a given matrix B, you
can call the following:
(The columns of the computed X are the permuted least squares solution vectors x; the output array jpvt
specifies the permutation order.)
To compute the elements of Q explicitly, call
?orgqr
Generates the real orthogonal matrix Q of the QR
factorization formed by ?geqrf.
Syntax
lapack_int LAPACKE_sorgqr (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
float* a, lapack_int lda, const float* tau);
lapack_int LAPACKE_dorgqr (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
double* a, lapack_int lda, const double* tau);
Include Files
• mkl.h
Description
The routine generates the whole or part of m-by-m orthogonal matrix Q of the QR factorization formed by
the routine ?geqrf or geqpf. Use this routine after a call to sgeqrf/dgeqrf or sgeqpf/dgeqpf.
797
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Usually Q is determined from the QR factorization of an m by p matrix A with m≥p. To compute the whole
matrix Q, use:
LAPACKE_?orgqr(matrix_layout, m, p, p, a, lda)
To compute the matrix Qk of the QR factorization of leading k columns of the matrix A:
Input Parameters
a, tau Arrays:
a and tau are the arrays returned by sgeqrf / dgeqrf or sgeqpf / dgeqpf.
The size of a is max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout .
The size of tau must be at least max(1, k).
lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.
Output Parameters
Return Values
This function returns a value info.
Application Notes
The computed Q differs from an exactly orthogonal matrix by a matrix E such that
||E||2 = O(ε)|*|A||2 where ε is the machine precision.
The total number of floating-point operations is approximately 4*m*n*k - 2*(m + n)*k2 + (4/3)*k3.
798
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If n = k, the number is approximately (2/3)*n2*(3m - n).
?ormqr
Multiplies a real matrix by the orthogonal matrix Q of
the QR factorization formed by ?geqrf or ?geqpf.
Syntax
lapack_int LAPACKE_sormqr (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const float* a, lapack_int lda, const float* tau, float* c,
lapack_int ldc);
lapack_int LAPACKE_dormqr (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const double* a, lapack_int lda, const double* tau, double*
c, lapack_int ldc);
Include Files
• mkl.h
Description
The routine multiplies a real matrix C by Q or QT, where Q is the orthogonal matrix Q of the QR factorization
formed by the routine ?geqrf or ?geqpf.
Depending on the parameters sideleft_right and trans, the routine can form one of the matrix products
Q*C, QT*C, C*Q, or C*QT (overwriting the result on C).
Input Parameters
a, tau, c Arrays:
a and tau are the arrays returned by sgeqrf / dgeqrf or sgeqpf / dgeqpf.
799
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The size of a is max(1, lda*k) for column major layout, max(1, lda*m) for
row major layout and side = 'L', and max(1, lda*n) for row major layout
and side = 'R'.
Output Parameters
c Overwritten by the product Q*C, QT*C, C*Q, or C*QT (as specified by side
and trans).
Return Values
This function returns a value info.
Application Notes
The complex counterpart of this routine is unmqr.
?ungqr
Generates the complex unitary matrix Q of the QR
factorization formed by ?geqrf.
Syntax
lapack_int LAPACKE_cungqr (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
lapack_complex_float* a, lapack_int lda, const lapack_complex_float* tau);
lapack_int LAPACKE_zungqr (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
lapack_complex_double* a, lapack_int lda, const lapack_complex_double* tau);
Include Files
• mkl.h
Description
The routine generates the whole or part of m-by-m unitary matrix Q of the QR factorization formed by the
routines ?geqrf or geqpf. Use this routine after a call to cgeqrf/zgeqrf or cgeqpf/zgeqpf.
800
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Usually Q is determined from the QR factorization of an m by p matrix A with m≥p. To compute the whole
matrix Q, use:
Input Parameters
a, tau Arrays: a and tau are the arrays returned by cgeqrf/zgeqrf or cgeqpf/
zgeqpf.
The size of a is max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout .
The size of tau must be at least max(1, k).
lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.
Output Parameters
Return Values
This function returns a value info.
Application Notes
The computed Q differs from an exactly unitary matrix by a matrix E such that ||E||2 = O(ε)*||A||2,
where ε is the machine precision.
The total number of floating-point operations is approximately 16*m*n*k - 8*(m + n)*k2 + (16/3)*k3.
801
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?unmqr
Multiplies a complex matrix by the unitary matrix Q of
the QR factorization formed by ?geqrf.
Syntax
lapack_int LAPACKE_cunmqr (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const lapack_complex_float* a, lapack_int lda, const
lapack_complex_float* tau, lapack_complex_float* c, lapack_int ldc);
lapack_int LAPACKE_zunmqr (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const lapack_complex_double* a, lapack_int lda, const
lapack_complex_double* tau, lapack_complex_double* c, lapack_int ldc);
Include Files
• mkl.h
Description
The routine multiplies a rectangular complex matrix C by Q or QH, where Q is the unitary matrix Q of the QR
factorization formed by the routines ?geqrf or geqpf.
Depending on the parameters side and trans, the routine can form one of the matrix products Q*C, QH*C,
C*Q, or C*QH (overwriting the result on C).
Input Parameters
a, c, tau Arrays:
802
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
a size max(1, lda*k) for column major layout, max(1, lda*m) for row
major layout when side ='L', and max(1, lda*n) for row major layout
when side ='R' and tau are the arrays returned by cgeqrf / zgeqrf or
cgeqpf / zgeqpf.
The size of tau must be at least max(1, k).
c(size max(1, ldc*n) for column major layout and max(1, ldc*m for row
major layout) contains the m-by-n matrix C.
lda≥ max(1, n) for column major layout and lda≥ max(1, k) for row
major layout if side = 'R'.
Output Parameters
c Overwritten by the product Q*C, QH*C, C*Q, or C*QH (as specified by side
and trans).
Return Values
This function returns a value info.
Application Notes
The real counterpart of this routine is ormqr.
?gelqf
Computes the LQ factorization of a general m-by-n
matrix.
Syntax
lapack_int LAPACKE_sgelqf (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int lda, float* tau);
lapack_int LAPACKE_dgelqf (int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int lda, double* tau);
lapack_int LAPACKE_cgelqf (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* tau);
lapack_int LAPACKE_zgelqf (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* tau);
Include Files
• mkl.h
803
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Description
The routine forms the LQ factorization of a general m-by-n matrix A (see Orthogonal Factorizations). No
pivoting is performed.
The routine does not form the matrix Q explicitly. Instead, Q is represented as a product of min(m, n)
elementary reflectors. Routines are provided to work with Q in this representation.
NOTE
This routine supports the Progress Routine feature. See Progress Function for details.
Input Parameters
a Array a of size max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout contains the matrix A.
lda The leading dimension of a; at least max(1, m) for column major layout and
max(1, n) for row major layout.
Output Parameters
Contains scalars that define elementary reflectors for the matrix Q (see
Orthogonal Factorizations).
Return Values
This function returns a value info.
Application Notes
The computed factorization is the exact factorization of a matrix A + E, where
||E||2 = O(ε) ||A||2.
The approximate number of floating-point operations for real flavors is
(4/3)n3 if m = n,
804
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
(2/3)n2(3m-n) if m > n,
(2/3)m2(3n-m) if m < n.
(The columns of the computed X are the minimum-norm solution vectors x. Here A is an m-by-n matrix with
m < n; Q1 denotes the first m columns of Q).
To compute the elements of Q explicitly, call
See Also
mkl_progress
?orglq
Generates the real orthogonal matrix Q of the LQ
factorization formed by ?gelqf.
Syntax
lapack_int LAPACKE_sorglq (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
float* a, lapack_int lda, const float* tau);
lapack_int LAPACKE_dorglq (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
double* a, lapack_int lda, const double* tau);
Include Files
• mkl.h
Description
The routine generates the whole or part of n-by-n orthogonal matrix Q of the LQ factorization formed by the
routines gelqf. Use this routine after a call to sgelqf/dgelqf.
Usually Q is determined from the LQ factorization of an p-by-n matrix A with n≥p. To compute the whole
matrix Q, use:
805
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
a, tau Arrays: a (size max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout) and tau are the arrays returned by sgelqf/dgelqf.
lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.
Output Parameters
Return Values
This function returns a value info.
Application Notes
The computed Q differs from an exactly orthogonal matrix by a matrix E such that ||E||2 = O(ε)*||A||2,
where ε is the machine precision.
The total number of floating-point operations is approximately 4*m*n*k - 2*(m + n)*k2 + (4/3)*k3.
?ormlq
Multiplies a real matrix by the orthogonal matrix Q of
the LQ factorization formed by ?gelqf.
Syntax
lapack_int LAPACKE_sormlq (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const float* a, lapack_int lda, const float* tau, float* c,
lapack_int ldc);
806
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_dormlq (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const double* a, lapack_int lda, const double* tau, double*
c, lapack_int ldc);
Include Files
• mkl.h
Description
The routine multiplies a real m-by-n matrix C by Q or QT, where Q is the orthogonal matrix Q of the LQ
factorization formed by the routine gelqf.
Depending on the parameters side and trans, the routine can form one of the matrix products Q*C, QT*C,
C*Q, or C*QT (overwriting the result on C).
Input Parameters
a, c, tau Arrays:
a and tau are arrays returned by ?gelqf.
807
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
lda The leading dimension of a. For column major layout, lda≥ max(1, k). For
row major layout, if side = 'L', lda≥ max(1, m), or, if side = 'R', lda≥
max(1, n).
ldc The leading dimension of c; ldc≥ max(1, m) for column major layout and
max(1, n) for row major layout.
Output Parameters
c Overwritten by the product Q*C, QT*C, C*Q, or C*QT (as specified by side
and trans).
Return Values
This function returns a value info.
Application Notes
The complex counterpart of this routine is unmlq.
?unglq
Generates the complex unitary matrix Q of the LQ
factorization formed by ?gelqf.
Syntax
lapack_int LAPACKE_cunglq (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
lapack_complex_float* a, lapack_int lda, const lapack_complex_float* tau);
lapack_int LAPACKE_zunglq (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
lapack_complex_double* a, lapack_int lda, const lapack_complex_double* tau);
Include Files
• mkl.h
Description
The routine generates the whole or part of n-by-n unitary matrix Q of the LQ factorization formed by the
routines gelqf. Use this routine after a call to cgelqf/zgelqf.
Usually Q is determined from the LQ factorization of an p-by-n matrix A with n < p. To compute the whole
matrix Q, use:
808
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
To compute the leading k rows of Qk, which form an orthonormal basis in the space spanned by the leading k
rows of A, use:
Input Parameters
a, tau Arrays: a (size max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout) and tau are the arrays returned by cgelqf/zgelqf.
lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.
Output Parameters
Return Values
This function returns a value info.
Application Notes
The computed Q differs from an exactly unitary matrix by a matrix E such that ||E||2 = O(ε)*||A||2,
where ε is the machine precision.
The total number of floating-point operations is approximately 16*m*n*k - 8*(m + n)*k2 + (16/3)*k3.
?unmlq
Multiplies a complex matrix by the unitary matrix Q of
the LQ factorization formed by ?gelqf.
Syntax
lapack_int LAPACKE_cunmlq (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const lapack_complex_float* a, lapack_int lda, const
lapack_complex_float* tau, lapack_complex_float* c, lapack_int ldc);
lapack_int LAPACKE_zunmlq (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const lapack_complex_double* a, lapack_int lda, const
lapack_complex_double* tau, lapack_complex_double* c, lapack_int ldc);
809
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine multiplies a real m-by-n matrix C by Q or QH, where Q is the unitary matrix Q of the LQ
factorization formed by the routine gelqf.
Depending on the parameters side and trans, the routine can form one of the matrix products Q*C, QH*C,
C*Q, or C*QH (overwriting the result on C).
Input Parameters
a, c, tau Arrays:
a and tau are arrays returned by ?gelqf.
lda The leading dimension of a. For column major layout, lda≥ max(1, k). For
row major layout, if side = 'L', lda≥ max(1, m), or, if side = 'R', lda≥
max(1, n).
ldc The leading dimension of c; ldc≥ max(1, m) for column major layout and
max(1, n) for row major layout.
810
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
c Overwritten by the product Q*C, QH*C, C*Q, or C*QH (as specified by side
and trans).
Return Values
This function returns a value info.
Application Notes
The real counterpart of this routine is ormlq.
?geqlf
Computes the QL factorization of a general m-by-n
matrix.
Syntax
lapack_int LAPACKE_sgelqf (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int lda, float* tau);
lapack_int LAPACKE_dgelqf (int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int lda, double* tau);
lapack_int LAPACKE_cgelqf (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* tau);
lapack_int LAPACKE_zgelqf (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* tau);
Include Files
• mkl.h
Description
The routine forms the QL factorization of a general m-by-n matrix A (see Orthogonal Factorizations). No
pivoting is performed.
The routine does not form the matrix Q explicitly. Instead, Q is represented as a product of min(m, n)
elementary reflectors. Routines are provided to work with Q in this representation.
NOTE
This routine supports the Progress Routine feature. See Progress Function for details.
Input Parameters
811
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
a Array a of size max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout contains the matrix A.
lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.
Output Parameters
tau Array, size at least max(1, min(m, n)). Contains scalar factors of the
elementary reflectors for the matrix Q (see Orthogonal Factorizations).
Return Values
This function returns a value info.
Application Notes
Related routines include:
See Also
mkl_progress
?orgql
Generates the real matrix Q of the QL factorization
formed by ?geqlf.
Syntax
lapack_int LAPACKE_sorgql (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
float* a, lapack_int lda, const float* tau);
lapack_int LAPACKE_dorgql (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
double* a, lapack_int lda, const double* tau);
812
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h
Description
The routine generates an m-by-n real matrix Q with orthonormal columns, which is defined as the last n
columns of a product of k elementary reflectors H(i) of order m: Q = H(k) *...* H(2)*H(1) as returned
by the routines geqlf. Use this routine after a call to sgeqlf/dgeqlf.
Input Parameters
a, tau Arrays: a (size max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout), tau.
On entry, the (n - k + i)th column of a must contain the vector which
defines the elementary reflector H(i), for i = 1,2,...,k, as returned by
sgeqlf/dgeqlf in the last k columns of its array argument a; tau[i - 1]
must contain the scalar factor of the elementary reflector H(i), as returned
by sgeqlf/dgeqlf;
lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.
Output Parameters
Return Values
This function returns a value info.
Application Notes
The complex counterpart of this routine is ungql.
?ungql
Generates the complex matrix Q of the QL
factorization formed by ?geqlf.
813
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
lapack_int LAPACKE_cungql (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
lapack_complex_float* a, lapack_int lda, const lapack_complex_float* tau);
lapack_int LAPACKE_zungql (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
lapack_complex_double* a, lapack_int lda, const lapack_complex_double* tau);
Include Files
• mkl.h
Description
The routine generates an m-by-n complex matrix Q with orthonormal columns, which is defined as the last n
columns of a product of k elementary reflectors H(i) of order m: Q = H(k) *...* H(2)*H(1) as returned
by the routines geqlf/geqlf . Use this routine after a call to cgeqlf/zgeqlf.
Input Parameters
a, tau Arrays: a (size max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout), tau.
On entry, the (n - k + i)th column of a must contain the vector which
defines the elementary reflector H(i), for i = 1,2,...,k, as returned by
cgeqlf/zgeqlf in the last k columns of its array argument a;
tau[i - 1] must contain the scalar factor of the elementary reflector H(i), as
returned by cgeqlf/zgeqlf;
lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.
Output Parameters
Return Values
This function returns a value info.
Application Notes
The real counterpart of this routine is orgql.
814
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
?ormql
Multiplies a real matrix by the orthogonal matrix Q of
the QL factorization formed by ?geqlf.
Syntax
lapack_int LAPACKE_sormql (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const float* a, lapack_int lda, const float* tau, float* c,
lapack_int ldc);
lapack_int LAPACKE_dormql (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const double* a, lapack_int lda, const double* tau, double*
c, lapack_int ldc);
Include Files
• mkl.h
Description
The routine multiplies a real m-by-n matrix C by Q or QT, where Q is the orthogonal matrix Q of the QL
factorization formed by the routine geqlf.
Depending on the parameters side and trans, the routine ormql can form one of the matrix products Q*C,
QT*C, C*Q, or C*QT (overwriting the result over C).
Input Parameters
815
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
On entry, the ith column of a must contain the vector which defines the
elementary reflector Hi, for i = 1,2,...,k, as returned by sgeqlf/dgeqlf in
the last k columns of its array argument a.
tau[i - 1] must contain the scalar factor of the elementary reflector Hi, as
returned by sgeqlf/dgeqlf.
ldc The leading dimension of c; ldc≥ max(1, m)for column major layout and
max(1, n) for row major layout.
Output Parameters
c Overwritten by the product Q*C, QT*C, C*Q, or C*QT (as specified by side
and trans).
Return Values
This function returns a value info.
Application Notes
The complex counterpart of this routine is unmql.
?unmql
Multiplies a complex matrix by the unitary matrix Q of
the QL factorization formed by ?geqlf.
Syntax
lapack_int LAPACKE_cunmql (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const lapack_complex_float* a, lapack_int lda, const
lapack_complex_float* tau, lapack_complex_float* c, lapack_int ldc);
lapack_int LAPACKE_zunmql (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const lapack_complex_double* a, lapack_int lda, const
lapack_complex_double* tau, lapack_complex_double* c, lapack_int ldc);
Include Files
• mkl.h
Description
816
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The routine multiplies a complex m-by-n matrix C by Q or QH, where Q is the unitary matrix Q of the QL
factorization formed by the routine geqlf.
Depending on the parameters side and trans, the routine unmql can form one of the matrix products Q*C,
QH*C, C*Q, or C*QH (overwriting the result over C).
Input Parameters
On entry, the i-th column of a must contain the vector which defines the
elementary reflector H(i), for i = 1,2,...,k, as returned by cgeqlf/zgeqlf
in the last k columns of its array argument a.
tau[i - 1] must contain the scalar factor of the elementary reflector H(i),
as returned by cgeqlf/zgeqlf.
817
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ldc The leading dimension of c; ldc≥ max(1, m)for column major layout and
max(1, n) for row major layout.
Output Parameters
c Overwritten by the product Q*C, QH*C, C*Q, or C*QH (as specified by side
and trans).
Return Values
This function returns a value info.
Application Notes
The real counterpart of this routine is ormql.
?gerqf
Computes the RQ factorization of a general m-by-n
matrix.
Syntax
lapack_int LAPACKE_sgerqf (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int lda, float* tau);
lapack_int LAPACKE_dgerqf (int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int lda, double* tau);
lapack_int LAPACKE_cgerqf (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* tau);
lapack_int LAPACKE_zgerqf (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* tau);
Include Files
• mkl.h
Description
The routine forms the RQ factorization of a general m-by-n matrix A(see Orthogonal Factorizations). No
pivoting is performed.
The routine does not form the matrix Q explicitly. Instead, Q is represented as a product of min(m, n)
elementary reflectors. Routines are provided to work with Q in this representation.
NOTE
This routine supports the Progress Routine feature. See Progress Function for details.
818
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
a Array a of size max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout contains the m-by-n matrix A.
lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.
Output Parameters
tau Array, size at least max (1, min(m, n)). (See Orthogonal Factorizations.)
Contains scalar factors of the elementary reflectors for the matrix Q.
Return Values
This function returns a value info.
Application Notes
Related routines include:
See Also
mkl_progress
819
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?orgrq
Generates the real matrix Q of the RQ factorization
formed by ?gerqf.
Syntax
lapack_int LAPACKE_sorgrq (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
float* a, lapack_int lda, const float* tau);
lapack_int LAPACKE_dorgrq (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
double* a, lapack_int lda, const double* tau);
Include Files
• mkl.h
Description
The routine generates an m-by-n real matrix with orthonormal rows, which is defined as the last m rows of a
product of k elementary reflectors H(i) of order n: Q = H(1)* H(2)*...*H(k)as returned by the routines
gerqf. Use this routine after a call to sgerqf/dgerqf.
Input Parameters
a, tau Arrays: a(size max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout), tau.
On entry, the (m - k + i)-th row of a must contain the vector which defines
the elementary reflector H(i), for i = 1,2,...,k, as returned by sgerqf/
dgerqf in the last k rows of its array argument a;
tau[i - 1] must contain the scalar factor of the elementary reflector H(i),
as returned by sgerqf/dgerqf;
lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.
Output Parameters
Return Values
This function returns a value info.
820
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Application Notes
The complex counterpart of this routine is ungrq.
?ungrq
Generates the complex matrix Q of the RQ
factorization formed by ?gerqf.
Syntax
lapack_int LAPACKE_cungrq (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
lapack_complex_float* a, lapack_int lda, const lapack_complex_float* tau);
lapack_int LAPACKE_zungrq (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
lapack_complex_double* a, lapack_int lda, const lapack_complex_double* tau);
Include Files
• mkl.h
Description
The routine generates an m-by-n complex matrix with orthonormal rows, which is defined as the last m rows
of a product of k elementary reflectors H(i) of order n: Q = H(1)H* H(2)H*...*H(k)H as returned by the
routines gerqf. Use this routine after a call to cgerqf/zgerqf.
Input Parameters
a, tau Arrays: a(size max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout), tau.
On entry, the (m - k + i)th row of a must contain the vector which defines
the elementary reflector H(i), for i = 1,2,...,k, as returned by cgerqf/
zgerqf in the last k rows of its array argument a;
tau[i - 1] must contain the scalar factor of the elementary reflector H(i), as
returned by cgerqf/zgerqf;
lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.
Output Parameters
821
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Return Values
This function returns a value info.
Application Notes
The real counterpart of this routine is orgrq.
?ormrq
Multiplies a real matrix by the orthogonal matrix Q of
the RQ factorization formed by ?gerqf.
Syntax
lapack_int LAPACKE_sormrq (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const float* a, lapack_int lda, const float* tau, float* c,
lapack_int ldc);
lapack_int LAPACKE_dormrq (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const double* a, lapack_int lda, const double* tau, double*
c, lapack_int ldc);
Include Files
• mkl.h
Description
The routine multiplies a real m-by-n matrix C by Q or QT, where Q is the real orthogonal matrix defined as a
product of k elementary reflectors Hi : Q = H1H2 ... Hk as returned by the RQ factorization routine gerqf.
Depending on the parameters side and trans, the routine can form one of the matrix products Q*C, QT*C,
C*Q, or C*QT (overwriting the result over C).
Input Parameters
822
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
k The number of elementary reflectors whose product defines the matrix Q.
Constraints:
0 ≤k≤m, if side = 'L';
0 ≤k≤n, if side = 'R'.
a, tau, c Arrays: a(size for side = 'L': max(1, lda*m) for column major layout and
max(1, lda*k) for row major layout; for side = 'R': max(1, lda*n) for
column major layout and max(1, lda*k) for row major layout), tau, c (size
max(1, ldc*n) for column major layout and max(1, ldc*m) for row major
layout).
On entry, the ith row of a must contain the vector which defines the
elementary reflector Hi, for i = 1,2,...,k, as returned by sgerqf/dgerqf in
the last k rows of its array argument a.
tau[i - 1] must contain the scalar factor of the elementary reflector Hi, as
returned by sgerqf/dgerqf.
lda The leading dimension of a; lda≥ max(1, k)for column major layout. For
row major layout, lda≥ max(1, m) if side = 'L', and lda≥ max(1, n) if
side = 'R'.
ldc The leading dimension of c; ldc≥ max(1, m)for column major layout and
max(1, n) for row major layout.
Output Parameters
c Overwritten by the product Q*C, QT*C, C*Q, or C*QT (as specified by side
and trans).
Return Values
This function returns a value info.
Application Notes
The complex counterpart of this routine is unmrq.
?unmrq
Multiplies a complex matrix by the unitary matrix Q of
the RQ factorization formed by ?gerqf.
Syntax
lapack_int LAPACKE_cunmrq (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const lapack_complex_float* a, lapack_int lda, const
lapack_complex_float* tau, lapack_complex_float* c, lapack_int ldc);
lapack_int LAPACKE_zunmrq (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const lapack_complex_double* a, lapack_int lda, const
lapack_complex_double* tau, lapack_complex_double* c, lapack_int ldc);
823
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine multiplies a complex m-by-n matrix C by Q or QH, where Q is the complex unitary matrix defined
as a product of k elementary reflectors H(i) of order n: Q = H(1)H* H(2)H*...*H(k)Has returned by the
RQ factorization routine gerqf .
Depending on the parameters side and trans, the routine can form one of the matrix products Q*C, QH*C,
C*Q, or C*QH (overwriting the result over C).
Input Parameters
a, tau, c Arrays: a(size for side = 'L': max(1, lda*m) for column major layout and
max(1, lda*k) for row major layout; for side = 'R': max(1, lda*n) for
column major layout and max(1, lda*k) for row major layout), tau, c (size
max(1, ldc*n) for column major layout and max(1, ldc*m) for row major
layout).
On entry, the ith row of a must contain the vector which defines the
elementary reflector H(i), for i = 1,2,...,k, as returned by cgerqf/zgerqf in
the last k rows of its array argument a.
tau[i - 1] must contain the scalar factor of the elementary reflector H(i), as
returned by cgerqf/zgerqf.
824
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lda The leading dimension of a; lda≥ max(1, k)for column major layout. For row
major layout, lda≥ max(1, m) if side = 'L', and lda≥ max(1, n) if side
= 'R' .
ldc The leading dimension of c; ldc≥ max(1, m)for column major layout and
max(1, n) for row major layout.
Output Parameters
c Overwritten by the product Q*C, QH*C, C*Q, or C*QH (as specified by side
and trans).
Return Values
This function returns a value info.
Application Notes
The real counterpart of this routine is ormrq.
?tzrzf
Reduces the upper trapezoidal matrix A to upper
triangular form.
Syntax
lapack_int LAPACKE_stzrzf (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int lda, float* tau);
lapack_int LAPACKE_dtzrzf (int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int lda, double* tau);
lapack_int LAPACKE_ctzrzf (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* tau);
lapack_int LAPACKE_ztzrzf (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* tau);
Include Files
• mkl.h
Description
The routine reduces the m-by-n (m≤n) real/complex upper trapezoidal matrix A to upper triangular form by
means of orthogonal/unitary transformations. The upper trapezoidal matrix A = [A1 A2] = [A1:m, 1:m, A1:m, m
+1:n] is factored as
A = [R0]*Z,
where Z is an n-by-n orthogonal/unitary matrix, R is an m-by-m upper triangular matrix, and 0 is the m-by-
(n-m) zero matrix.
The ?tzrzf routine replaces the deprecated ?tzrqf routine.
825
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
a Array a is of size max(1, lda*n) for column major layout and max(1,
lda*m) for row major layout.
The leading m-by-n upper trapezoidal part of the array a contains the
matrix A to be factorized.
lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.
Output Parameters
tau Array, size at least max (1, m). Contains scalar factors of the elementary
reflectors for the matrix Z.
Return Values
This function returns a value info.
Application Notes
The factorization is obtained by Householder's method. The k-th transformation matrix, Z(k), which is used
to introduce zeros into the (m - k + 1)-th row of A, is given in the form
826
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
where for real flavors
tau is a scalar and z(k) is an l-element vector. tau and z(k) are chosen to annihilate the elements of the k-th
row of A2.
The scalar tau is returned in the k-th element of tau and the vector u(k) in the k-th row of A, such that the
elements of z(k) are stored in the last m - n elements of the k-th row of array a.
?ormrz
Multiplies a real matrix by the orthogonal matrix
defined from the factorization formed by ?tzrzf.
Syntax
lapack_int LAPACKE_sormrz (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int l, const float* a, lapack_int lda, const float*
tau, float* c, lapack_int ldc);
lapack_int LAPACKE_dormrz (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int l, const double* a, lapack_int lda, const
double* tau, double* c, lapack_int ldc);
Include Files
• mkl.h
827
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Description
The ?ormrz routine multiplies a real m-by-n matrix C by Q or QT, where Q is the real orthogonal matrix
defined as a product of k elementary reflectors H(i) of order n: Q = H(1)* H(2)*...*H(k) as returned by
the factorization routine tzrzf .
Depending on the parameters side and trans, the routine can form one of the matrix products Q*C, QT*C,
C*Q, or C*QT (overwriting the result over C).
The matrix Q is of order m if side = 'L' and of order n if side = 'R'.
Input Parameters
a, tau, c Arrays: a(size for side = 'L': max(1, lda*m) for column major layout and
max(1, lda*k) for row major layout; for side = 'R': max(1, lda*b) for
column major layout and max(1, lda*k) for row major layout), tau, c (size
max(1, ldc*n) for column major layout and max(1, ldc*m) for row major
layout).
On entry, the ith row of a must contain the vector which defines the
elementary reflector H(i), for i = 1,2,...,k, as returned by stzrzf/dtzrzf
in the last k rows of its array argument a.
tau[i - 1] must contain the scalar factor of the elementary reflector H(i),
as returned by stzrzf/dtzrzf.
828
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
c contains the m-by-n matrix C.
lda The leading dimension of a; lda≥ max(1, k)for column major layout. For row
major layout, lda≥ max(1, m) if side = 'L', and lda≥ max(1, n) if side
= 'R' .
ldc The leading dimension of c; ldc≥ max(1, m)for column major layout and
max(1, n) for row major layout.
Output Parameters
c Overwritten by the product Q*C, QT*C, C*Q, or C*QT (as specified by side
and trans).
Return Values
This function returns a value info.
Application Notes
The complex counterpart of this routine is unmrz.
?unmrz
Multiplies a complex matrix by the unitary matrix
defined from the factorization formed by ?tzrzf.
Syntax
lapack_int LAPACKE_cunmrz (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int l, const lapack_complex_float* a, lapack_int
lda, const lapack_complex_float* tau, lapack_complex_float* c, lapack_int ldc);
lapack_int LAPACKE_zunmrz (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int l, const lapack_complex_double* a, lapack_int
lda, const lapack_complex_double* tau, lapack_complex_double* c, lapack_int ldc);
Include Files
• mkl.h
Description
The routine multiplies a complex m-by-n matrix C by Q or QH, where Q is the unitary matrix defined as a
product of k elementary reflectors H(i):
829
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
a, tau, c Arrays: a(size for side = 'L': max(1, lda*m) for column major layout and
max(1, lda*k) for row major layout; for side = 'R': max(1, lda*b) for
column major layout and max(1, lda*k) for row major layout), tau, c (size
max(1, ldc*n) for column major layout and max(1, ldc*m) for row major
layout).
On entry, the ith row of a must contain the vector which defines the
elementary reflector H(i), for i = 1,2,...,k, as returned by ctzrzf/ztzrzf
in the last k rows of its array argument a.
tau[i - 1] must contain the scalar factor of the elementary reflector H(i),
as returned by ctzrzf/ztzrzf.
lda The leading dimension of a; lda≥ max(1, k)for column major layout. For
row major layout, lda≥ max(1, m) if side = 'L', and lda≥ max(1, n) if
side = 'R'.
ldc The leading dimension of c; ldc≥ max(1, m)for column major layout and
max(1, n) for row major layout.
830
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
c Overwritten by the product Q*C, QH*C, C*Q, or C*QH (as specified by side
and trans).
Return Values
This function returns a value info.
Application Notes
The real counterpart of this routine is ormrz.
?ggqrf
Computes the generalized QR factorization of two
matrices.
Syntax
lapack_int LAPACKE_sggqrf (int matrix_layout, lapack_int n, lapack_int m, lapack_int p,
float* a, lapack_int lda, float* taua, float* b, lapack_int ldb, float* taub);
lapack_int LAPACKE_dggqrf (int matrix_layout, lapack_int n, lapack_int m, lapack_int p,
double* a, lapack_int lda, double* taua, double* b, lapack_int ldb, double* taub);
lapack_int LAPACKE_cggqrf (int matrix_layout, lapack_int n, lapack_int m, lapack_int p,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* taua,
lapack_complex_float* b, lapack_int ldb, lapack_complex_float* taub);
lapack_int LAPACKE_zggqrf (int matrix_layout, lapack_int n, lapack_int m, lapack_int p,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* taua,
lapack_complex_double* b, lapack_int ldb, lapack_complex_double* taub);
Include Files
• mkl.h
Description
The routine forms the generalized QR factorization of an n-by-m matrix A and an n-by-p matrix B as A =
Q*R, B = Q*T*Z, where Q is an n-by-n orthogonal/unitary matrix, Z is a p-by-p orthogonal/unitary matrix,
and R and T assume one of the forms:
or
831
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
a, b Array a of size max(1, lda*m) for column major layout and max(1, lda*n)
for row major layout contains the matrix A.
Array b of size max(1, ldb*p) for column major layout and max(1, ldb*n)
for row major layout contains the matrix B.
lda The leading dimension of a; at least max(1, n) for column major layout and
at least max(1, m) for row major layout.
ldb The leading dimension of b; at least max(1, n) for column major layout and
at least max(1, p) for row major layout.
832
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
taua, taub Arrays, size at least max (1, min(n, m)) for taua and at least max (1,
min(n, p)) for taub. The array taua contains the scalar factors of the
elementary reflectors which represent the orthogonal/unitary matrix Q.
The array taub contains the scalar factors of the elementary reflectors
which represent the orthogonal/unitary matrix Z.
Return Values
This function returns a value info.
Application Notes
The matrix Q is represented as a product of elementary reflectors
Q = H(1)H(2)...H(k), where k = min(n,m).
Each H(i) has the form
H(i) = I - τa*v*vT for real flavors, or
H(i) = I - τa*v*vH for complex flavors,
where τa is a real/complex scalar, and v is a real/complex vector with vj = 0 for 1 ≤j≤i - 1, vi = 1.
On exit, fori + 1 ≤j≤n, vj is stored in a[(j - 1) + (i - 1)*lda] for column major layout and in a[(j -
1)*lda + (i - 1)] for row major layout and τa is stored in taua[i - 1]
The matrix Z is represented as a product of elementary reflectors
Z = H(1)H(2)...H(k), where k = min(n,p).
Each H(i) has the form
H(i) = I - τb*v*vT for real flavors, or
H(i) = I - τb*v*vH for complex flavors,
where τb is a real/complex scalar, and v is a real/complex vector with vp - k + 1 = 1, vj = 0 for p - k + 1 ≤j≤p -
1, .
On exit, for 1 ≤j≤p - k + i - 1, vj is stored in b[(n - k + i - 1) + (j - 1)*ldb] for column major layout
and in b[(n - k + i - 1)*ldb + (j - 1)] for row major layout and τb is stored in taub[i - 1].
833
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?ggrqf
Computes the generalized RQ factorization of two
matrices.
Syntax
lapack_int LAPACKE_sggrqf (int matrix_layout, lapack_int m, lapack_int p, lapack_int n,
float* a, lapack_int lda, float* taua, float* b, lapack_int ldb, float* taub);
lapack_int LAPACKE_dggrqf (int matrix_layout, lapack_int m, lapack_int p, lapack_int n,
double* a, lapack_int lda, double* taua, double* b, lapack_int ldb, double* taub);
lapack_int LAPACKE_cggrqf (int matrix_layout, lapack_int m, lapack_int p, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* taua,
lapack_complex_float* b, lapack_int ldb, lapack_complex_float* taub);
lapack_int LAPACKE_zggrqf (int matrix_layout, lapack_int m, lapack_int p, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* taua,
lapack_complex_double* b, lapack_int ldb, lapack_complex_double* taub);
Include Files
• mkl.h
Description
The routine forms the generalized RQ factorization of an m-by-n matrix A and an p-by-n matrix B as A =
R*Q, B = Z*T*Q, where Q is an n-by-n orthogonal/unitary matrix, Z is a p-by-p orthogonal/unitary matrix,
and R and T assume one of the forms:
or
834
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
or
Input Parameters
a, b Arrays:
a(size max(1, lda*n) for column major layout and max(1, lda*m) for row
major layout) contains the m-by-n matrix A.
b(size max(1, ldb*n) for column major layout and max(1, ldb*p) for row
major layout) contains the p-by-n matrix B.
lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.
ldb The leading dimension of b; at least max(1, p)for column major layout and
max(1, n) for row major layout.
Output Parameters
835
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
if m > n, the elements on and above the (m-n)th subdiagonal contain the
m-by-n upper trapezoidal matrix R;
the remaining elements, with the array taua, represent the orthogonal/
unitary matrix Q as a product of elementary reflectors.
The elements on and above the diagonal of the array b contain the
min(p,n)-by-n upper trapezoidal matrix T (T is upper triangular if p≥n); the
elements below the diagonal, with the array taub, represent the orthogonal/
unitary matrix Z as a product of elementary reflectors.
taua, taub Arrays, size at least max (1, min(m, n)) for taua and at least max (1,
min(p, n)) for taub.
The array taua contains the scalar factors of the elementary reflectors
which represent the orthogonal/unitary matrix Q.
The array taub contains the scalar factors of the elementary reflectors
which represent the orthogonal/unitary matrix Z.
Return Values
This function returns a value info.
Application Notes
The matrix Q is represented as a product of elementary reflectors
Q = H(1)H(2)...H(k), where k = min(m,n).
Each H(i) has the form
H(i) = I - taua*v*vT for real flavors, or
H(i) = I - taua*v*vH for complex flavors,
where taua is a real/complex scalar, and v is a real/complex vector with vn - k + i = 1, vn - k + i + 1:n = 0.
On exit, v1:n - k + i - 1 is stored in a(m-k+i,1:n-k+i-1) and taua is stored in taua[i - 1].
?tpqrt
Computes a blocked QR factorization of a real or
complex "triangular-pentagonal" matrix, which is
composed of a triangular block and a pentagonal
block, using the compact WY representation for Q.
836
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_stpqrt (int matrix_layout, lapack_int m, lapack_int n, lapack_int l,
lapack_int nb, float* a, lapack_int lda, float* b, lapack_int ldb, float* t, lapack_int
ldt);
lapack_int LAPACKE_dtpqrt (int matrix_layout, lapack_int m, lapack_int n, lapack_int l,
lapack_int nb, double* a, lapack_int lda, double* b, lapack_int ldb, double* t,
lapack_int ldt);
lapack_int LAPACKE_ctpqrt (int matrix_layout, lapack_int m, lapack_int n, lapack_int l,
lapack_int nb, lapack_complex_float* a, lapack_int lda, lapack_complex_float* b,
lapack_int ldb, lapack_complex_float* t, lapack_int ldt);
lapack_int LAPACKE_ztpqrt (int matrix_layout, lapack_int m, lapack_int n, lapack_int l,
lapack_int nb, lapack_complex_double* a, lapack_int lda, lapack_complex_double* b,
lapack_int ldb, lapack_complex_double* t, lapack_int ldt);
Include Files
• mkl.h
Description
where A is an n-by-n upper triangular matrix, and B is an m-by-n pentagonal matrix consisting of an (m-l)-
by-n rectangular matrix B1 on top of an l-by-n upper trapezoidal matrix B2:
The upper trapezoidal matrix B2 consists of the first l rows of an n-by-n upper triangular matrix, where 0 ≤
l ≤ min(m,n). If l=0, B is an m-by-n rectangular matrix. If m=l=n, B is upper triangular. The elementary
reflectors H(i) are stored in the ith column below the diagonal in the (n+m)-by-n input matrix C. The
structure of vectors defining the elementary reflectors is illustrated by:
837
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The elements of the unit matrix I are not stored. Thus, V contains all of the necessary information, and is
returned in array b.
NOTE
Note that V has the same form as B:
Input Parameters
b size max(1, ldb*n) for column major layout and max(1, ldb*m) for row
major layout, the pentagonal m-by-n matrix B. The first (m-l) rows contain
the rectangular B1 matrix, and the next l rows contain the upper
trapezoidal B2 matrix.
838
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ldb The leading dimension of b; at least max(1, m) for column major layout and
at least max(1, n) for row major layout.
ldt The leading dimension of t; at least nb for column major layout and at least
max(1, n) for row major layout.
Output Parameters
a The elements on and above the diagonal of the array contain the upper
triangular matrix R.
t Array, size ldt*n for column major layout and ldt*nb for row major
layout.
The upper triangular block reflectors stored in compact form as a sequence
of upper triangular blocks.
Return Values
This function returns a value info.
?tpmqrt
Applies a real or complex orthogonal matrix obtained
from a "triangular-pentagonal" complex block reflector
to a general real or complex matrix, which consists of
two blocks.
Syntax
lapack_int LAPACKE_stpmqrt (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int l, lapack_int nb, const float* v, lapack_int ldv,
const float* t, lapack_int ldt, float* a, lapack_int lda, float* b, lapack_int ldb);
lapack_int LAPACKE_dtpmqrt (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int l, lapack_int nb, const double* v, lapack_int
ldv, const double* t, lapack_int ldt, double* a, lapack_int lda, double* b, lapack_int
ldb);
lapack_int LAPACKE_ctpmqrt (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int l, lapack_int nb, const lapack_complex_float* v,
lapack_int ldv, const lapack_complex_float* t, lapack_int ldt, lapack_complex_float* a,
lapack_int lda, lapack_complex_float* b, lapack_int ldb);
lapack_int LAPACKE_ztpmqrt (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int l, lapack_int nb, const lapack_complex_double*
v, lapack_int ldv, const lapack_complex_double* t, lapack_int ldt,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* b, lapack_int ldb);
Include Files
• mkl.h
839
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Description
The columns of the pentagonal matrix V contain the elementary reflectors H(1), H(2), ..., H(k); V is
composed of a rectangular block V1 and a trapezoidal block V2:
The size of the trapezoidal block V2 is determined by the parameter l, where 0 ≤ l ≤ k. V2 is upper
trapezoidal, consisting of the first l rows of a k-by-k upper triangular matrix.
If side = 'L':
840
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If side = 'R':
841
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
nb The block size used for the storage of t, k ≥ nb ≥ 1. This must be the same
value of nb used to generate t in tpqrt.
v Size ldv*k for column major layout; ldv*m for row major layout and side
= 'L', ldv*n for row major layout and side = 'R'.
The ith column must contain the vector which defines the elementary
reflector H(i), for i = 1,2,...,k, as returned by tpqrt in array argument b.
If side = 'L', ldv must be at least max(1,m) for column major layout and
max(1, k for row major layout;
If side = 'R', ldv must be at least max(1,n) for column major layout and
max(1, k for row major layout.
t Array, size ldt*k for column major layout and ldt*nb for row major
layout.
The upper triangular factors of the block reflectors as returned by tpqrt
ldt The leading dimension of the array t. ldt must be at least nb for column
major layout and max(1, k for row major layout.
a If side = 'L', size lda*n for column major layout and lda*k for row major
layout ..
If side = 'R', size lda*k for column major layout and lda*m for row major
layout ..
The k-by-n or m-by-k matrix A.
842
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If side = 'L', lda must be at least max(1,k) for column major layout and
max(1, n for row major layout.
If side = 'R', lda must be at least max(1,m) for column major layout and
max(1, k for row major layout.
b Size ldb*n for column major layout and ldb*m for row major layout.
ldb The leading dimension of the array b. ldb must be at least max(1,m) for
column major layout and max(1, n for row major layout.
Output Parameters
Return Values
This function returns a value info.
843
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
You can use the SVD to find a minimum-norm solution to a (possibly) rank-deficient least squares problem of
minimizing ||Ax - b||2. The effective rank k of the matrix A can be determined as the number of singular
values which exceed a suitable threshold. The minimum-norm solution is
x = Vk(Σk)-1c
where Σk is the leading k-by-k submatrix of Σ, the matrix Vk consists of the first k columns of V = PV1, and
the vector c consists of the first k elements of UHb = U1HQHb.
?gebrd
Reduces a general matrix to bidiagonal form.
Syntax
lapack_int LAPACKE_sgebrd( int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int lda, float* d, float* e, float* tauq, float* taup );
lapack_int LAPACKE_dgebrd( int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int lda, double* d, double* e, double* tauq, double* taup );
lapack_int LAPACKE_cgebrd( int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float* a, lapack_int lda, float* d, float* e, lapack_complex_float*
tauq, lapack_complex_float* taup );
lapack_int LAPACKE_zgebrd( int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double* a, lapack_int lda, double* d, double* e, lapack_complex_double*
tauq, lapack_complex_double* taup );
Include Files
• mkl.h
Description
The routine reduces a general m-by-n matrix A to a bidiagonal matrix B by an orthogonal (unitary)
transformation.
H B1 H
If m≥n, the reduction is given by A = QBP = P = Q1B1PH ,
0
where B1 is an n-by-n upper diagonal matrix, Q and P are orthogonal or, for a complex A, unitary matrices;
Q1 consists of the first n columns of Q.
If m < n, the reduction is given by
844
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The routine does not form the matrices Q and P explicitly, but represents them as products of elementary
reflectors. Routines are provided to work with the matrices Q and P in this representation:
If the matrix A is real,
Input Parameters
a Arrays:
a(size max(1, lda*n) for column major layout and max(1, lda*m) for row
major layout) contains the matrix A.
lda The leading dimension of a; at least max(1, m) for column major layout
and at least max(1, n) for row major layout.
Output Parameters
tauq, taup Arrays, size at least max (1, min(m, n)). The scalar factors of the
elementary reflectors which represent the orthogonal or unitary matrices P
and Q.
Return Values
This function returns a value info.
845
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Application Notes
The computed matrices Q, B, and P satisfy QBPH = A + E, where ||E||2 = c(n)ε ||A||2, c(n) is a
modestly increasing function of n, and ε is the machine precision.
The approximate number of floating-point operations for real flavors is
(4/3)*n2*(3*m - n) for m≥n,
(4/3)*m2*(3*n - m) for m < n.
The number of operations for complex flavors is four times greater.
If n is much less than m, it can be more efficient to first form the QR factorization of A by calling geqrf and
then reduce the factor R to bidiagonal form. This requires approximately 2*n2*(m + n) floating-point
operations.
If m is much less than n, it can be more efficient to first form the LQ factorization of A by calling gelqf and
then reduce the factor L to bidiagonal form. This requires approximately 2*m2*(m + n) floating-point
operations.
?gbbrd
Reduces a general band matrix to bidiagonal form.
Syntax
lapack_int LAPACKE_sgbbrd( int matrix_layout, char vect, lapack_int m, lapack_int n,
lapack_int ncc, lapack_int kl, lapack_int ku, float* ab, lapack_int ldab, float* d,
float* e, float* q, lapack_int ldq, float* pt, lapack_int ldpt, float* c, lapack_int
ldc );
lapack_int LAPACKE_dgbbrd( int matrix_layout, char vect, lapack_int m, lapack_int n,
lapack_int ncc, lapack_int kl, lapack_int ku, double* ab, lapack_int ldab, double* d,
double* e, double* q, lapack_int ldq, double* pt, lapack_int ldpt, double* c, lapack_int
ldc );
lapack_int LAPACKE_cgbbrd( int matrix_layout, char vect, lapack_int m, lapack_int n,
lapack_int ncc, lapack_int kl, lapack_int ku, lapack_complex_float* ab, lapack_int
ldab, float* d, float* e, lapack_complex_float* q, lapack_int ldq,
lapack_complex_float* pt, lapack_int ldpt, lapack_complex_float* c, lapack_int ldc );
lapack_int LAPACKE_zgbbrd( int matrix_layout, char vect, lapack_int m, lapack_int n,
lapack_int ncc, lapack_int kl, lapack_int ku, lapack_complex_double* ab, lapack_int
ldab, double* d, double* e, lapack_complex_double* q, lapack_int ldq,
lapack_complex_double* pt, lapack_int ldpt, lapack_complex_double* c, lapack_int ldc );
Include Files
• mkl.h
Description
The routine reduces an m-by-n band matrix A to upper bidiagonal matrix B: A = Q*B*PH. Here the matrices
Q and P are orthogonal (for real A) or unitary (for complex A). They are determined as products of Givens
rotation matrices, and may be formed explicitly by the routine if required. The routine can also update a
matrix C as follows: C = QH*C.
846
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
ab, c Arrays:
ab(size max(1, ldab*n) for column major layout and max(1, ldab*m) for
row major layout) contains the matrix A in band storage (see Matrix
Storage Schemes).
c(size max(1, ldc*ncc) for column major layout and max(1, ldc*m) for
row major layout) contains an m-by-ncc matrix C.
If ncc = 0, the array c is not referenced.
Output Parameters
d Array, size at least max(1, min(m, n)). Contains the diagonal elements of
the matrix B.
q, pt Arrays:
qsize max(1, ldq*m) contains the output m-by-m matrix Q.
847
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Return Values
This function returns a value info.
Application Notes
The computed matrices Q, B, and P satisfy Q*B*PH = A + E, where ||E||2 = c(n)ε ||A||2, c(n) is a
modestly increasing function of n, and ε is the machine precision.
If m = n, the total number of floating-point operations for real flavors is approximately the sum of:
?orgbr
Generates the real orthogonal matrix Q or PT
determined by ?gebrd.
Syntax
lapack_int LAPACKE_sorgbr (int matrix_layout, char vect, lapack_int m, lapack_int n,
lapack_int k, float* a, lapack_int lda, const float* tau);
lapack_int LAPACKE_dorgbr (int matrix_layout, char vect, lapack_int m, lapack_int n,
lapack_int k, double* a, lapack_int lda, const double* tau);
Include Files
• mkl.h
Description
The routine generates the whole or part of the orthogonal matrices Q and PT formed by the routines gebrd.
Use this routine after a call to sgebrd/dgebrd. All valid combinations of arguments are described in Input
parameters. In most cases you need the following:
To compute the whole m-by-m matrix Q:
848
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
To compute the whole n-by-n matrix PT:
Input Parameters
a Array, size at least lda*n for column major layout and lda*m for row major
layout. The vectors which define the elementary reflectors, as returned by
gebrd.
lda The leading dimension of the array a. lda ≥ max(1, m) for column major
layout and at least max(1, n) for row major layout .
tau Array, size min (m,k) if vect = 'Q', min (n,k) if vect = 'P'.
Scalar factor of the elementary reflector H(i) or G(i), which determines Q
and PT as returned by gebrd in the array tauq or taup.
Output Parameters
Return Values
This function returns a value info.
Application Notes
The computed matrix Q differs from an exactly orthogonal matrix by a matrix E such that ||E||2 = O(ε).
The approximate numbers of floating-point operations for the cases listed in Description are as follows:
849
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
(2/3)*n2*(3m - n) if m > n.
To form the whole of PT:
(4/3)*n3 if m≥n;
(4/3)*m*(3n2 - 3m*n + m2) if m < n.
To form the m leading columns of PT when m < n:
(2/3)*n2*(3m - n) if m > n.
The complex counterpart of this routine is ungbr.
?ormbr
Multiplies an arbitrary real matrix by the real
orthogonal matrix Q or PT determined by ?gebrd.
Syntax
lapack_int LAPACKE_sormbr (int matrix_layout, char vect, char side, char trans,
lapack_int m, lapack_int n, lapack_int k, const float* a, lapack_int lda, const float*
tau, float* c, lapack_int ldc);
lapack_int LAPACKE_dormbr (int matrix_layout, char vect, char side, char trans,
lapack_int m, lapack_int n, lapack_int k, const double* a, lapack_int lda, const
double* tau, double* c, lapack_int ldc);
Include Files
• mkl.h
Description
Given an arbitrary real matrix C, this routine forms one of the matrix products Q*C, QT*C, C*Q, C*QT, P*C,
PT*C, C*P, C*PT, where Q and P are orthogonal matrices computed by a call to gebrd. The routine overwrites
the product on C.
Input Parameters
In the descriptions below, r denotes the order of Q or PT:
If side = 'L', r = m; if side = 'R', r = n.
850
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If side = 'L', multipliers are applied to C from the left.
Constraints: m≥ 0, n≥ 0, k≥ 0.
a, c Arrays:
a is the array a as returned by ?gebrd.
The size of a depends on the value of the matrix_layout, vect, and side
parameters:
c(size max(1, ldc*n) for column major layout and max(1, ldc*m) for row
major layout) holds the matrix C.
lda≥ max(1, min(r,k)) for column major layout and at least max(1, r) for
row major layout if vect = 'P'.
ldc The leading dimension of c; ldc≥ max(1, m) for column major layout and
ldc≥ max(1, n) for row major layout .
851
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
c Overwritten by the product Q*C, QT*C, C*Q, C*Q,T, P*C, PT*C, C*P, or C*PT,
as specified by vect, side, and trans.
Return Values
This function returns a value info.
Application Notes
The computed product differs from the exact product by a matrix E such that ||E||2 = O(ε)*||C||2.
?ungbr
Generates the complex unitary matrix Q or PH
determined by ?gebrd.
Syntax
lapack_int LAPACKE_cungbr (int matrix_layout, char vect, lapack_int m, lapack_int n,
lapack_int k, lapack_complex_float* a, lapack_int lda, const lapack_complex_float*
tau);
lapack_int LAPACKE_zungbr (int matrix_layout, char vect, lapack_int m, lapack_int n,
lapack_int k, lapack_complex_double* a, lapack_int lda, const lapack_complex_double*
tau);
Include Files
• mkl.h
Description
The routine generates the whole or part of the unitary matrices Q and PH formed by the routines gebrd. Use
this routine after a call to cgebrd/zgebrd. All valid combinations of arguments are described in Input
Parameters; in most cases you need the following:
To compute the whole m-by-m matrix Q, use:
852
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
To compute the whole n-by-n matrix PH, use:
Input Parameters
Constraints: m≥ 0, n≥ 0, k≥ 0.
a Arrays:
a, size at least lda*n for column major layout and lda*m for row major
layout, is the array a as returned by ?gebrd.
lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.
tau For vect = 'Q', the array tauq as returned by ?gebrd. For vect = 'P',
the array taup as returned by ?gebrd.
The dimension of tau must be at least max(1, min(m, k)) for vect = 'Q',
or max(1, min(m, k)) for vect = 'P'.
Output Parameters
Return Values
This function returns a value info.
853
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Application Notes
The computed matrix Q differs from an exactly orthogonal matrix by a matrix E such that ||E||2 = O(ε).
(8/3)n2(3m - n2).
To compute the whole matrix PH:
(16/3)n3 if m≥n;
(16/3)m(3n2 - 3m*n + m2) if m < n.
To form the m leading columns of PH when m < n:
?unmbr
Multiplies an arbitrary complex matrix by the unitary
matrix Q or P determined by ?gebrd.
Syntax
lapack_int LAPACKE_cunmbr (int matrix_layout, char vect, char side, char trans,
lapack_int m, lapack_int n, lapack_int k, const lapack_complex_float* a, lapack_int
lda, const lapack_complex_float* tau, lapack_complex_float* c, lapack_int ldc);
lapack_int LAPACKE_zunmbr (int matrix_layout, char vect, char side, char trans,
lapack_int m, lapack_int n, lapack_int k, const lapack_complex_double* a, lapack_int
lda, const lapack_complex_double* tau, lapack_complex_double* c, lapack_int ldc);
Include Files
• mkl.h
Description
Given an arbitrary complex matrix C, this routine forms one of the matrix products Q*C, QH*C, C*Q, C*QH,
P*C, PH*C, C*P, or C*PH, where Q and P are unitary matrices computed by a call to gebrd/gebrd. The routine
overwrites the product on C.
Input Parameters
In the descriptions below, r denotes the order of Q or PH:
If side = 'L', r = m; if side = 'R', r = n.
854
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If vect = 'Q', then Q or QH is applied to C.
Constraints: m≥ 0, n≥ 0, k≥ 0.
a, c Arrays:
a is the array a as returned by ?gebrd.
The size of a depends on the value of the matrix_layout, vect, and side
parameters:
c(size max(1, ldc*n) for column major layout and max(1, ldc*m for row
major layout) holds the matrix C.
lda≥ max(1, min(r,k)) for column major layout and at least max(1, r) for
row major layout if vect = 'P'.
855
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
c Overwritten by the product Q*C, QH*C, C*Q, C*QH, P*C, PH*C, C*P, or
C*PH, as specified by vect, side, and trans.
Return Values
This function returns a value info.
Application Notes
The computed product differs from the exact product by a matrix E such that ||E||2 = O(ε)*||C||2.
?bdsqr
Computes the singular value decomposition of a
general matrix that has been reduced to bidiagonal
form.
Syntax
lapack_int LAPACKE_sbdsqr( int matrix_layout, char uplo, lapack_int n, lapack_int ncvt,
lapack_int nru, lapack_int ncc, float* d, float* e, float* vt, lapack_int ldvt, float*
u, lapack_int ldu, float* c, lapack_int ldc );
lapack_int LAPACKE_dbdsqr( int matrix_layout, char uplo, lapack_int n, lapack_int ncvt,
lapack_int nru, lapack_int ncc, double* d, double* e, double* vt, lapack_int ldvt,
double* u, lapack_int ldu, double* c, lapack_int ldc );
lapack_int LAPACKE_cbdsqr( int matrix_layout, char uplo, lapack_int n, lapack_int ncvt,
lapack_int nru, lapack_int ncc, float* d, float* e, lapack_complex_float* vt,
lapack_int ldvt, lapack_complex_float* u, lapack_int ldu, lapack_complex_float* c,
lapack_int ldc );
lapack_int LAPACKE_zbdsqr( int matrix_layout, char uplo, lapack_int n, lapack_int ncvt,
lapack_int nru, lapack_int ncc, double* d, double* e, lapack_complex_double* vt,
lapack_int ldvt, lapack_complex_double* u, lapack_int ldu, lapack_complex_double* c,
lapack_int ldc );
Include Files
• mkl.h
856
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
The routine computes the singular values and, optionally, the right and/or left singular vectors from the
Singular Value Decomposition (SVD) of a real n-by-n (upper or lower) bidiagonal matrix B using the implicit
zero-shift QR algorithm. The SVD of B has the form B = Q*S*PH where S is the diagonal matrix of singular
values, Q is an orthogonal matrix of left singular vectors, and P is an orthogonal matrix of right singular
vectors. If left singular vectors are requested, this subroutine actually returns U *Q instead of Q, and, if right
singular vectors are requested, this subroutine returns PH *VT instead of PH, for given real/complex input
matrices U and VT. When U and VT are the orthogonal/unitary matrices that reduce a general matrix A to
bidiagonal form: A = U*B*VT, as computed by ?gebrd, then
A = (U*Q)*S*(PH*VT)
is the SVD of A. Optionally, the subroutine may also compute QH *C for a given real/complex input matrix C.
Input Parameters
ncvt The number of columns of the matrix VT, that is, the number of right
singular vectors (ncvt≥ 0).
nru The number of rows in U, that is, the number of left singular vectors (nru≥
0).
Set nru = 0 if no left singular vectors are required.
ncc The number of columns in the matrix C used for computing the product
QH*C (ncc≥ 0). Set ncc = 0 if no matrix C is supplied.
d, e Arrays:
d contains the diagonal elements of B.
The size of d must be at least max(1, n).
vt, u, c Arrays:
vt, size max(1, ldvt*ncvt) for column major layout and max(1, ldvt*n)
for row major layout, contains an n-by-ncvt matrix VT.
vt is not referenced if ncvt = 0.
u, size max(1, ldu*n) for column major layout and max(1, ldu*nru) for
row major layout, contains an nru by n matrix U.
u is not referenced if nru = 0.
857
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
c, size max(1, ldc*ncc) for column major layout and max(1, ldc*n) for
row major layout, contains the n-by-ncc matrix C for computing the
product QH*C.
Output Parameters
Return Values
This function returns a value info.
If info > 0,
Application Notes
Each singular value and singular vector is computed to high relative accuracy. However, the reduction to
bidiagonal form (prior to calling the routine) may decrease the relative accuracy in the small singular values
of the original matrix if its singular values vary widely in magnitude.
If si is an exact singular value of B, and si is the corresponding computed value, then
858
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
|si - σi| ≤p*(m,n)*ε*σi
where p(m, n) is a modestly increasing function of m and n, and ε is the machine precision.
If only singular values are computed, they are computed more accurately than when some singular vectors
are also computed (that is, the function p(m, n) is smaller).
If ui is the corresponding exact left singular vector of B, and wi is the corresponding computed left singular
vector, then the angle θ(ui, wi) between them is bounded as follows:
θ(ui, wi) ≤p(m,n)*ε / min i≠j(|σi - σj|/|σi + σj|).
Here mini≠j(|σi - σj|/|σi + σj|) is the relative gap between σi and the other singular values. A similar
error bound holds for the right singular vectors.
The total number of real floating-point operations is roughly proportional to n2 if only the singular values are
computed. About 6n2*nru additional operations (12n2*nru for complex flavors) are required to compute the
left singular vectors and about 6n2*ncvt operations (12n2*ncvt for complex flavors) to compute the right
singular vectors.
?bdsdc
Computes the singular value decomposition of a real
bidiagonal matrix using a divide and conquer method.
Syntax
lapack_int LAPACKE_sbdsdc (int matrix_layout, char uplo, char compq, lapack_int n,
float* d, float* e, float* u, lapack_int ldu, float* vt, lapack_int ldvt, float* q,
lapack_int* iq);
lapack_int LAPACKE_dbdsdc (int matrix_layout, char uplo, char compq, lapack_int n,
double* d, double* e, double* u, lapack_int ldu, double* vt, lapack_int ldvt, double* q,
lapack_int* iq);
Include Files
• mkl.h
Description
The routine computes the Singular Value Decomposition (SVD) of a real n-by-n (upper or lower) bidiagonal
matrix B: B = U*Σ*VT, using a divide and conquer method, where Σ is a diagonal matrix with non-negative
diagonal elements (the singular values of B), and U and V are orthogonal matrices of left and right singular
vectors, respectively. ?bdsdc can be used to compute all singular values, and optionally, singular vectors or
singular vectors in compact form.
This rotuine
uses ?lasd0, ?lasd1, ?lasd2, ?lasd3, ?lasd4, ?lasd5, ?lasd6, ?lasd7, ?lasd8, ?lasd9, ?lasda,
?lasdq, ?lasdt.
Input Parameters
859
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
d, e Arrays:
d contains the n diagonal elements of the bidiagonal matrix B. The size of d
must be at least max(1, n).
e contains the off-diagonal elements of the bidiagonal matrix B. The size of
e must be at least max(1, n).
Output Parameters
e On exit, e is overwritten.
860
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If compq = 'P', then on exit, if info = 0, q and iq contain the left and
right singular vectors in a compact form. Specifically, iq contains all the
lapack_int data for singular vectors. For other values of compq, iq is not
referenced.
Return Values
This function returns a value info.
If info = i, the algorithm failed to compute a singular value. The update process of divide and conquer
failed.
861
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?sytrd
Reduces a real symmetric matrix to tridiagonal form.
Syntax
lapack_int LAPACKE_ssytrd (int matrix_layout, char uplo, lapack_int n, float* a,
lapack_int lda, float* d, float* e, float* tau);
lapack_int LAPACKE_dsytrd (int matrix_layout, char uplo, lapack_int n, double* a,
lapack_int lda, double* d, double* e, double* tau);
Include Files
• mkl.h
Description
The routine reduces a real symmetric matrix A to symmetric tridiagonal form T by an orthogonal similarity
transformation: A = Q*T*QT. The orthogonal matrix Q is not formed explicitly but is represented as a
product of n-1 elementary reflectors. Routines are provided for working with Q in this representation (see
Application Notes below).
Input Parameters
862
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
n The order of the matrix A (n≥ 0).
Output Parameters
a On exit,
if uplo = 'U', the diagonal and first superdiagonal of A are overwritten by
the corresponding elements of the tridiagonal matrix T, and the elements
above the first superdiagonal, with the array tau, represent the orthogonal
matrix Q as a product of elementary reflectors;
if uplo = 'L', the diagonal and first subdiagonal of A are overwritten by
the corresponding elements of the tridiagonal matrix T, and the elements
below the first subdiagonal, with the array tau, represent the orthogonal
matrix Q as a product of elementary reflectors.
d, e, tau Arrays:
d contains the diagonal elements of the matrix T.
The size of d must be at least max(1, n).
e contains the off-diagonal elements of T.
The size of e must be at least max(1, n-1).
tau stores (n-1) scalars that define elementary reflectors in decomposition
of the orthogonal matrix Q in a product of n-1 elementary reflectors. tau(n)
is used as workspace.
The size of tau must be at least max(1, n).
Return Values
This function returns a value info.
Application Notes
The computed matrix T is exactly similar to a matrix A+E, where ||E||2 = c(n)*ε*||A||2, c(n) is a
modestly increasing function of n, and ε is the machine precision.
The approximate number of floating-point operations is (4/3)n3.
After calling this routine, you can call the following:
863
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?orgtr
Generates the real orthogonal matrix Q determined
by ?sytrd.
Syntax
lapack_int LAPACKE_sorgtr (int matrix_layout, char uplo, lapack_int n, float* a,
lapack_int lda, const float* tau);
lapack_int LAPACKE_dorgtr (int matrix_layout, char uplo, lapack_int n, double* a,
lapack_int lda, const double* tau);
Include Files
• mkl.h
Description
The routine explicitly generates the n-by-n orthogonal matrix Q formed by ?sytrd when reducing a real
symmetric matrix A to tridiagonal form: A = Q*T*QT. Use this routine after a call to ?sytrd.
Input Parameters
a, tau Arrays:
a (size max(1, lda*n)) is the array a as returned by ?sytrd.
Output Parameters
Return Values
This function returns a value info.
Application Notes
The computed matrix Q differs from an exactly orthogonal matrix by a matrix E such that ||E||2 = O(ε),
where ε is the machine precision.
864
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The approximate number of floating-point operations is (4/3)n3.
The complex counterpart of this routine is ungtr.
?ormtr
Multiplies a real matrix by the real orthogonal matrix
Q determined by ?sytrd.
Syntax
lapack_int LAPACKE_sormtr (int matrix_layout, char side, char uplo, char trans,
lapack_int m, lapack_int n, const float* a, lapack_int lda, const float* tau, float* c,
lapack_int ldc);
lapack_int LAPACKE_dormtr (int matrix_layout, char side, char uplo, char trans,
lapack_int m, lapack_int n, const double* a, lapack_int lda, const double* tau, double*
c, lapack_int ldc);
Include Files
• mkl.h
Description
The routine multiplies a real matrix C by Q or QT, where Q is the orthogonal matrix Q formed by sytrd when
reducing a real symmetric matrix A to tridiagonal form: A = Q*T*QT. Use this routine after a call to ?sytrd.
Depending on the parameters side and trans, the routine can form one of the matrix products Q*C, QT*C,
C*Q, or C*QT (overwriting the result on C).
Input Parameters
In the descriptions below, r denotes the order of Q:
If side = 'L', r = m; if side = 'R', r = n.
a, c, tau a (size max(1, lda*r)) and tau are the arrays returned by ?sytrd.
865
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
c(size max(1, ldc*n) for column major layout and max(1, ldc*m) for row
major layout) contains the matrix C.
ldc The leading dimension of c; ldc≥ max(1, m) for column major layout and
at least max(1, n) for row major layout .
Output Parameters
c Overwritten by the product Q*C, QT*C, C*Q, or C*QT (as specified by side
and trans).
Return Values
This function returns a value info.
Application Notes
The computed product differs from the exact product by a matrix E such that ||E||2 = O(ε)*||C||2.
The total number of floating-point operations is approximately 2*m2*n, if side = 'L', or 2*n2*m, if side =
'R'.
The complex counterpart of this routine is unmtr.
?hetrd
Reduces a complex Hermitian matrix to tridiagonal
form.
Syntax
lapack_int LAPACKE_chetrd( int matrix_layout, char uplo, lapack_int n,
lapack_complex_float* a, lapack_int lda, float* d, float* e, lapack_complex_float*
tau );
lapack_int LAPACKE_zhetrd( int matrix_layout, char uplo, lapack_int n,
lapack_complex_double* a, lapack_int lda, double* d, double* e, lapack_complex_double*
tau );
Include Files
• mkl.h
Description
The routine reduces a complex Hermitian matrix A to symmetric tridiagonal form T by a unitary similarity
transformation: A = Q*T*QH. The unitary matrix Q is not formed explicitly but is represented as a product of
n-1 elementary reflectors. Routines are provided to work with Q in this representation. (They are described
later in this topic.)
866
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
Output Parameters
a On exit,
if uplo = 'U', the diagonal and first superdiagonal of A are overwritten by
the corresponding elements of the tridiagonal matrix T, and the elements
above the first superdiagonal, with the array tau, represent the orthogonal
matrix Q as a product of elementary reflectors;
if uplo = 'L', the diagonal and first subdiagonal of A are overwritten by
the corresponding elements of the tridiagonal matrix T, and the elements
below the first subdiagonal, with the array tau, represent the orthogonal
matrix Q as a product of elementary reflectors.
d, e Arrays:
d contains the diagonal elements of the matrix T.
The dimension of d must be at least max(1, n).
e contains the off-diagonal elements of T.
The dimension of e must be at least max(1, n-1).
tau Array, size at least max(1, n-1). Stores (n-1) scalars that define elementary
reflectors in decomposition of the unitary matrix Q in a product of n-1
elementary reflectors.
Return Values
This function returns a value info.
867
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Application Notes
The computed matrix T is exactly similar to a matrix A + E, where ||E||2 = c(n)*ε*||A||2, c(n) is a
modestly increasing function of n, and ε is the machine precision.
The approximate number of floating-point operations is (16/3)n3.
?ungtr
Generates the complex unitary matrix Q determined
by ?hetrd.
Syntax
lapack_int LAPACKE_cungtr (int matrix_layout, char uplo, lapack_int n,
lapack_complex_float* a, lapack_int lda, const lapack_complex_float* tau);
lapack_int LAPACKE_zungtr (int matrix_layout, char uplo, lapack_int n,
lapack_complex_double* a, lapack_int lda, const lapack_complex_double* tau);
Include Files
• mkl.h
Description
The routine explicitly generates the n-by-n unitary matrix Q formed by ?hetrd when reducing a complex
Hermitian matrix A to tridiagonal form: A = Q*T*QH. Use this routine after a call to ?hetrd.
Input Parameters
a, tau Arrays:
a (size max(1, lda*n)) is the array a as returned by ?hetrd.
Output Parameters
868
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Return Values
This function returns a value info.
Application Notes
The computed matrix Q differs from an exactly unitary matrix by a matrix E such that ||E||2 = O(ε), where
ε is the machine precision.
The approximate number of floating-point operations is (16/3)n3.
?unmtr
Multiplies a complex matrix by the complex unitary
matrix Q determined by ?hetrd.
Syntax
lapack_int LAPACKE_cunmtr (int matrix_layout, char side, char uplo, char trans,
lapack_int m, lapack_int n, const lapack_complex_float* a, lapack_int lda, const
lapack_complex_float* tau, lapack_complex_float* c, lapack_int ldc);
lapack_int LAPACKE_zunmtr (int matrix_layout, char side, char uplo, char trans,
lapack_int m, lapack_int n, const lapack_complex_double* a, lapack_int lda, const
lapack_complex_double* tau, lapack_complex_double* c, lapack_int ldc);
Include Files
• mkl.h
Description
The routine multiplies a complex matrix C by Q or QH, where Q is the unitary matrix Q formed by ?hetrd
when reducing a complex Hermitian matrix A to tridiagonal form: A = Q*T*QH. Use this routine after a call
to ?hetrd.
Depending on the parameters side and trans, the routine can form one of the matrix products Q*C, QH*C,
C*Q, or C*QH (overwriting the result on C).
Input Parameters
In the descriptions below, r denotes the order of Q:
If side = 'L', r = m; if side = 'R', r = n.
869
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
a, c, tau a (size max(1, lda*r)) and tau are the arrays returned by ?hetrd.
ldc The leading dimension of c; ldc≥ max(1, n) for column major layout and
ldc≥ max(1, m) for row major layout .
Output Parameters
c Overwritten by the product Q*C, QH*C, C*Q, or C*QH (as specified by side
and trans).
Return Values
This function returns a value info.
Application Notes
The computed product differs from the exact product by a matrix E such that ||E||2 = O(ε)*||C||2, where
ε is the machine precision.
The total number of floating-point operations is approximately 8*m2*n if side = 'L' or 8*n2*m if side =
'R'.
The real counterpart of this routine is ormtr.
?sptrd
Reduces a real symmetric matrix to tridiagonal form
using packed storage.
Syntax
lapack_int LAPACKE_ssptrd (int matrix_layout, char uplo, lapack_int n, float* ap,
float* d, float* e, float* tau);
lapack_int LAPACKE_dsptrd (int matrix_layout, char uplo, lapack_int n, double* ap,
double* d, double* e, double* tau);
Include Files
• mkl.h
870
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
The routine reduces a packed real symmetric matrix A to symmetric tridiagonal form T by an orthogonal
similarity transformation: A = Q*T*QT. The orthogonal matrix Q is not formed explicitly but is represented as
a product of n-1 elementary reflectors. Routines are provided for working with Q in this representation. See
Application Notes below for details.
Input Parameters
Output Parameters
d, e, tau Arrays:
d contains the diagonal elements of the matrix T.
The dimension of d must be at least max(1, n).
e contains the off-diagonal elements of T.
The dimension of e must be at least max(1, n-1).
tau Stores (n-1) scalars that define elementary reflectors in decomposition
of the matrix Q in a product of n-1 reflectors.
Return Values
This function returns a value info.
Application Notes
The matrix Q is represented as a product of n-1 elementary reflectors, as follows :
• If uplo = 'U', Q = H(n-1) ... H(2)H(1)
871
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
On exit, tau is stored in tau[i - 1], and v(1:i-1) is stored in AP, overwriting A(1:i-1, i+1).
• If uplo = 'L', Q = H(1)H(2) ... H(n-1)
On exit, tau is stored in tau[i - 1], and v(i+2:n) is stored in AP, overwriting A(i+2:n, i).
The computed matrix T is exactly similar to a matrix A+E, where ||E||2 = c(n)*ε*||A||2, c(n) is a
modestly increasing function of n, and ε is the machine precision. The approximate number of floating-point
operations is (4/3)n3.
?opgtr
Generates the real orthogonal matrix Q determined
by ?sptrd.
Syntax
lapack_int LAPACKE_sopgtr (int matrix_layout, char uplo, lapack_int n, const float* ap,
const float* tau, float* q, lapack_int ldq);
lapack_int LAPACKE_dopgtr (int matrix_layout, char uplo, lapack_int n, const double*
ap, const double* tau, double* q, lapack_int ldq);
Include Files
• mkl.h
Description
The routine explicitly generates the n-by-n orthogonal matrix Q formed by sptrd when reducing a packed real
symmetric matrix A to tridiagonal form: A = Q*T*QT. Use this routine after a call to ?sptrd.
Input Parameters
uplo Must be 'U' or 'L'. Use the same uplo as supplied to ?sptrd.
ldq The leading dimension of the output array q; at least max(1, n).
872
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
Return Values
This function returns a value info.
Application Notes
The computed matrix Q differs from an exactly orthogonal matrix by a matrix E such that ||E||2 = O(ε),
where ε is the machine precision.
The approximate number of floating-point operations is (4/3)n3.
?opmtr
Multiplies a real matrix by the real orthogonal matrix
Q determined by ?sptrd.
Syntax
lapack_int LAPACKE_sopmtr (int matrix_layout, char side, char uplo, char trans,
lapack_int m, lapack_int n, const float* ap, const float* tau, float* c, lapack_int
ldc);
lapack_int LAPACKE_dopmtr (int matrix_layout, char side, char uplo, char trans,
lapack_int m, lapack_int n, const double* ap, const double* tau, double* c, lapack_int
ldc);
Include Files
• mkl.h
Description
The routine multiplies a real matrix C by Q or QT, where Q is the orthogonal matrix Q formed by sptrd when
reducing a packed real symmetric matrix A to tridiagonal form: A = Q*T*QT. Use this routine after a call
to ?sptrd.
Depending on the parameters side and trans, the routine can form one of the matrix products Q*C, QT*C,
C*Q, or C*QT (overwriting the result on C).
Input Parameters
In the descriptions below, r denotes the order of Q:
If side = 'L', r = m; if side = 'R', r = n.
873
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ldc The leading dimension of c; ldc≥ max(1, n) for column major layout and
ldc≥ max(1, m) for row major layout .
Output Parameters
c Overwritten by the product Q*C, QT*C, C*Q, or C*QT (as specified by side
and trans).
Return Values
This function returns a value info.
Application Notes
The computed product differs from the exact product by a matrix E such that ||E||2 = O(ε) ||C||2, where
ε is the machine precision.
The total number of floating-point operations is approximately 2*m2*n if side = 'L', or 2*n2*m if side =
'R'.
The complex counterpart of this routine is upmtr.
?hptrd
Reduces a complex Hermitian matrix to tridiagonal
form using packed storage.
Syntax
lapack_int LAPACKE_chptrd( int matrix_layout, char uplo, lapack_int n,
lapack_complex_float* ap, float* d, float* e, lapack_complex_float* tau );
874
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_zhptrd( int matrix_layout, char uplo, lapack_int n,
lapack_complex_double* ap, double* d, double* e, lapack_complex_double* tau );
Include Files
• mkl.h
Description
The routine reduces a packed complex Hermitian matrix A to symmetric tridiagonal form T by a unitary
similarity transformation: A = Q*T*QH. The unitary matrix Q is not formed explicitly but is represented as a
product of n-1 elementary reflectors. Routines are provided for working with Q in this representation (see
Application Notes below).
Input Parameters
Output Parameters
d, e Arrays:
d contains the diagonal elements of the matrix T.
The size of d must be at least max(1, n).
e contains the off-diagonal elements of T.
The size of e must be at least max(1, n-1).
tau Array, size at least max(1, n-1). Stores (n-1) scalars that define elementary
reflectors in decomposition of the unitary matrix Q in a product of
reflectors.
Return Values
This function returns a value info.
875
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Application Notes
The computed matrix T is exactly similar to a matrix A + E, where ||E||2 = c(n)*ε*||A||2, c(n) is a
modestly increasing function of n, and ε is the machine precision.
The approximate number of floating-point operations is (16/3)n3.
?upgtr
Generates the complex unitary matrix Q determined
by ?hptrd.
Syntax
lapack_int LAPACKE_cupgtr (int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float* ap, const lapack_complex_float* tau, lapack_complex_float* q,
lapack_int ldq);
lapack_int LAPACKE_zupgtr (int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double* ap, const lapack_complex_double* tau, lapack_complex_double* q,
lapack_int ldq);
Include Files
• mkl.h
Description
The routine explicitly generates the n-by-n unitary matrix Q formed by hptrd when reducing a packed
complex Hermitian matrix A to tridiagonal form: A = Q*T*QH. Use this routine after a call to ?hptrd.
Input Parameters
uplo Must be 'U' or 'L'. Use the same uplo as supplied to ?hptrd.
Output Parameters
876
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Contains the computed matrix Q.
Return Values
This function returns a value info.
Application Notes
The computed matrix Q differs from an exactly orthogonal matrix by a matrix E such that ||E||2 = O(ε),
where ε is the machine precision.
The approximate number of floating-point operations is (16/3)n3.
?upmtr
Multiplies a complex matrix by the unitary matrix Q
determined by ?hptrd.
Syntax
lapack_int LAPACKE_cupmtr (int matrix_layout, char side, char uplo, char trans,
lapack_int m, lapack_int n, const lapack_complex_float* ap, const lapack_complex_float*
tau, lapack_complex_float* c, lapack_int ldc);
lapack_int LAPACKE_zupmtr (int matrix_layout, char side, char uplo, char trans,
lapack_int m, lapack_int n, const lapack_complex_double* ap, const
lapack_complex_double* tau, lapack_complex_double* c, lapack_int ldc);
Include Files
• mkl.h
Description
The routine multiplies a complex matrix C by Q or QH, where Q is the unitary matrix formed by hptrd when
reducing a packed complex Hermitian matrix A to tridiagonal form: A = Q*T*QH. Use this routine after a call
to ?hptrd.
Depending on the parameters side and trans, the routine can form one of the matrix products Q*C, QH*C,
C*Q, or C*QH (overwriting the result on C).
Input Parameters
In the descriptions below, r denotes the order of Q:
If side = 'L', r = m; if side = 'R', r = n.
877
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ldc The leading dimension of c; ldc≥ max(1, m) for column major layout and
ldc≥ max(1, n) for row major layout .
Output Parameters
c Overwritten by the product Q*C, QH*C, C*Q, or C*QH (as specified by side
and trans).
Return Values
This function returns a value info.
Application Notes
The computed product differs from the exact product by a matrix E such that ||E||2 = O(ε)*||C||2, where
ε is the machine precision.
The total number of floating-point operations is approximately 8*m2*n if side = 'L' or 8*n2*m if side =
'R'.
The real counterpart of this routine is opmtr.
?sbtrd
Reduces a real symmetric band matrix to tridiagonal
form.
Syntax
lapack_int LAPACKE_ssbtrd (int matrix_layout, char vect, char uplo, lapack_int n,
lapack_int kd, float* ab, lapack_int ldab, float* d, float* e, float* q, lapack_int
ldq);
lapack_int LAPACKE_dsbtrd (int matrix_layout, char vect, char uplo, lapack_int n,
lapack_int kd, double* ab, lapack_int ldab, double* d, double* e, double* q, lapack_int
ldq);
878
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h
Description
The routine reduces a real symmetric band matrix A to symmetric tridiagonal form T by an orthogonal
similarity transformation: A = Q*T*QT. The orthogonal matrix Q is determined as a product of Givens
rotations.
If required, the routine can also form the matrix Q explicitly.
Input Parameters
ab, q ab(size at least max(1, ldab*n) for column major layout and at least
max(1, ldab*(kd+ 1)) for row major layout) is an array containing either
upper or lower triangular part of the matrix A (as specified by uplo) in band
storage format.
q (size max(1, ldq*n)) is an array.
If vect = 'U', the q array must contain an n-by-n matrix X.
ldab The leading dimension of ab; at least kd+1 for column major layout and n
for row major layout .
879
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
d, e, q Arrays:
d contains the diagonal elements of the matrix T.
The size of d must be at least max(1, n).
e contains the off-diagonal elements of T.
The size of e must be at least max(1, n-1).
q is not referenced if vect = 'N'.
Return Values
This function returns a value info.
Application Notes
The computed matrix T is exactly similar to a matrix A+E, where ||E||2 = c(n)*ε*||A||2, c(n) is a
modestly increasing function of n, and ε is the machine precision. The computed matrix Q differs from an
exactly orthogonal matrix by a matrix E such that ||E||2 = O(ε).
The total number of floating-point operations is approximately 6n2*kd if vect = 'N', with 3n3*(kd-1)/kd
additional operations if vect = 'V'.
?hbtrd
Reduces a complex Hermitian band matrix to
tridiagonal form.
Syntax
lapack_int LAPACKE_chbtrd( int matrix_layout, char vect, char uplo, lapack_int n,
lapack_int kd, lapack_complex_float* ab, lapack_int ldab, float* d, float* e,
lapack_complex_float* q, lapack_int ldq );
lapack_int LAPACKE_zhbtrd( int matrix_layout, char vect, char uplo, lapack_int n,
lapack_int kd, lapack_complex_double* ab, lapack_int ldab, double* d, double* e,
lapack_complex_double* q, lapack_int ldq );
Include Files
• mkl.h
Description
880
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The routine reduces a complex Hermitian band matrix A to symmetric tridiagonal form T by a unitary
similarity transformation: A = Q*T*QH. The unitary matrix Q is determined as a product of Givens rotations.
Input Parameters
ab ab(size at least max(1, ldab*n) for column major layout and at least
max(1, ldab*(kd+ 1)) for row major layout) is an array containing either
upper or lower triangular part of the matrix A (as specified by uplo) in band
storage format.
ldab The leading dimension of ab; at least kd+1 for column major layout and n
for row major layout.
Output Parameters
d, e Arrays:
d contains the diagonal elements of the matrix T.
The dimension of d must be at least max(1, n).
881
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Return Values
This function returns a value info.
Application Notes
The computed matrix T is exactly similar to a matrix A + E, where ||E||2 = c(n)*ε*||A||2, c(n) is a
modestly increasing function of n, and ε is the machine precision. The computed matrix Q differs from an
exactly unitary matrix by a matrix E such that ||E||2 = O(ε).
The total number of floating-point operations is approximately 20n2*kd if vect = 'N', with 10n3*(kd-1)/
kd additional operations if vect = 'V'.
The real counterpart of this routine is sbtrd.
?sterf
Computes all eigenvalues of a real symmetric
tridiagonal matrix using QR algorithm.
Syntax
lapack_int LAPACKE_ssterf (lapack_int n, float* d, float* e);
lapack_int LAPACKE_dsterf (lapack_int n, double* d, double* e);
Include Files
• mkl.h
Description
The routine computes all the eigenvalues of a real symmetric tridiagonal matrix T (which can be obtained by
reducing a symmetric or Hermitian matrix to tridiagonal form). The routine uses a square-root-free variant of
the QR algorithm.
If you need not only the eigenvalues but also the eigenvectors, call steqr.
Input Parameters
d, e Arrays:
d contains the diagonal elements of T.
The dimension of d must be at least max(1, n).
e contains the off-diagonal elements of T.
882
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The dimension of e must be at least max(1, n-1).
Output Parameters
Return Values
This function returns a value info.
If info = i, the algorithm failed to find all the eigenvalues after 30n iterations:
i off-diagonal elements have not converged to zero. On exit, d and e contain, respectively, the diagonal and
off-diagonal elements of a tridiagonal matrix orthogonally similar to T.
If info = -i, the i-th parameter had an illegal value.
Application Notes
The computed eigenvalues and eigenvectors are exact for a matrix T+E such that ||E||2 = O(ε)*||T||2,
where ε is the machine precision.
If λi is an exact eigenvalue, and mi is the corresponding computed value, then
|μi - λi| ≤c(n)*ε*||T||2
where c(n) is a modestly increasing function of n.
The total number of floating-point operations depends on how rapidly the algorithm converges. Typically, it is
about 14n2.
?steqr
Computes all eigenvalues and eigenvectors of a
symmetric or Hermitian matrix reduced to tridiagonal
form (QR algorithm).
Syntax
lapack_int LAPACKE_ssteqr( int matrix_layout, char compz, lapack_int n, float* d,
float* e, float* z, lapack_int ldz );
lapack_int LAPACKE_dsteqr( int matrix_layout, char compz, lapack_int n, double* d,
double* e, double* z, lapack_int ldz );
lapack_int LAPACKE_csteqr( int matrix_layout, char compz, lapack_int n, float* d,
float* e, lapack_complex_float* z, lapack_int ldz );
lapack_int LAPACKE_zsteqr( int matrix_layout, char compz, lapack_int n, double* d,
double* e, lapack_complex_double* z, lapack_int ldz );
Include Files
• mkl.h
Description
883
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The routine computes all the eigenvalues and (optionally) all the eigenvectors of a real symmetric tridiagonal
matrix T. In other words, the routine can compute the spectral factorization: T = Z*Λ*ZT. Here Λ is a
diagonal matrix whose diagonal elements are the eigenvalues λi; Z is an orthogonal matrix whose columns
are eigenvectors. Thus,
T*zi = λi*zi for i = 1, 2, ..., n.
The routine normalizes the eigenvectors so that ||zi||2 = 1.
You can also use the routine for computing the eigenvalues and eigenvectors of an arbitrary real symmetric
(or complex Hermitian) matrix A reduced to tridiagonal form T: A = Q*T*QH. In this case, the spectral
factorization is as follows: A = Q*T*QH = (Q*Z)*Λ*(Q*Z)H. Before calling ?steqr, you must reduce A to
tridiagonal form and generate the explicit matrix Q by calling the following routines:
If you need eigenvalues only, it's more efficient to call sterf. If T is positive-definite, pteqr can compute small
eigenvalues more accurately than ?steqr.
To solve the problem by a single call, use one of the divide and conquer routines stevd, syevd, spevd, or
sbevd for real symmetric matrices or heevd, hpevd, or hbevd for complex Hermitian matrices.
Input Parameters
d, e Arrays:
d contains the diagonal elements of T.
The size of d must be at least max(1, n).
e contains the off-diagonal elements of T.
The size of e must be at least max(1, n-1).
If vect = 'V', z must contain the orthogonal matrix used in the reduction
to tridiagonal form.
884
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ldz The leading dimension of z. Constraints:
ldz≥ 1 if compz = 'N';
ldz≥ max(1, n) if compz = 'V' or 'I'.
Output Parameters
Return Values
This function returns a value info.
If info = i, the algorithm failed to find all the eigenvalues after 30n iterations: i off-diagonal elements have
not converged to zero. On exit, d and e contain, respectively, the diagonal and off-diagonal elements of a
tridiagonal matrix orthogonally similar to T.
If info = -i, the i-th parameter had an illegal value.
Application Notes
The computed eigenvalues and eigenvectors are exact for a matrix T+E such that ||E||2 = O(ε)*||T||2,
where ε is the machine precision.
If λi is an exact eigenvalue, and μi is the corresponding computed value, then
|μi - λi| ≤c(n)*ε*||T||2
If zi is the corresponding exact eigenvector, and wi is the corresponding computed vector, then the angle
θ(zi, wi) between them is bounded as follows:
θ(zi, wi) ≤c(n)*ε*||T||2 / mini≠j|λi - λj|.
The total number of floating-point operations depends on how rapidly the algorithm converges. Typically, it is
about
24n2 if compz = 'N';
7n3 (for complex flavors, 14n3) if compz = 'V' or 'I'.
?stemr
Computes selected eigenvalues and eigenvectors of a
real symmetric tridiagonal matrix.
Syntax
lapack_int LAPACKE_sstemr( int matrix_layout, char jobz, char range, lapack_int n,
const float* d, float* e, float vl, float vu, lapack_int il, lapack_int iu, lapack_int*
m, float* w, float* z, lapack_int ldz, lapack_int nzc, lapack_int* isuppz,
lapack_logical* tryrac );
885
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine computes selected eigenvalues and, optionally, eigenvectors of a real symmetric tridiagonal
matrix T. Any such unreduced matrix has a well defined set of pairwise different real eigenvalues, the
corresponding real eigenvectors are pairwise orthogonal.
The spectrum may be computed either completely or partially by specifying either an interval (vl,vu] or a
range of indices il:iu for the desired eigenvalues.
Depending on the number of desired eigenvalues, these are computed either by bisection or the dqds
algorithm. Numerically orthogonal eigenvectors are computed by the use of various suitable L*D*LT
factorizations near clusters of close eigenvalues (referred to as RRRs, Relatively Robust Representations). An
informal sketch of the algorithm follows.
For each unreduced block (submatrix) of T,
a. Compute T - sigma*I = L*D*LT, so that L and D define all the wanted eigenvalues to high relative
accuracy. This means that small relative changes in the entries of L and D cause only small relative
changes in the eigenvalues and eigenvectors. The standard (unfactored) representation of the
tridiagonal matrix T does not have this property in general.
b. Compute the eigenvalues to suitable accuracy. If the eigenvectors are desired, the algorithm attains full
accuracy of the computed eigenvalues only right before the corresponding vectors have to be
computed, see steps c and d.
c. For each cluster of close eigenvalues, select a new shift close to the cluster, find a new factorization,
and refine the shifted eigenvalues to suitable accuracy.
d. For each eigenvalue with a large enough relative separation compute the corresponding eigenvector by
forming a rank revealing twisted factorization. Go back to step c for any clusters that remain.
Normal execution of ?stemr may create NaNs and infinities and may abort due to a floating point exception
in environments that do not handle NaNs and infinities in the IEEE standard default manner.
For more details, see: [Dhillon04], [Dhillon04-02], [Dhillon97]
Input Parameters
886
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If jobz = 'N', then only eigenvalues are computed.
d Array, size n.
Contains n diagonal elements of the tridiagonal matrix T.
e Array, size n.
Contains (n-1) off-diagonal elements of the tridiagonal matrix T in
elements 0 to n-2 of e. e[n - 1] need not be set on input, but is used
internally as workspace.
vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues. Constraint: vl<vu.
il, iu If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint: 1≤il≤iu≤n, if n>0.
ldz ≥ 1 otherwise.
If nzc = -1, then a workspace query is assumed; the routine calculates the
number of columns of the array z that are needed to hold the eigenvectors.
This value is returned as the first entry of the array z, and no error
message related to nzc is issued by the routine xerbla.
tryrac If tryrac is true, it indicates that the code should check whether the
tridiagonal matrix defines its eigenvalues to high relative accuracy. If so,
the code uses relative-accuracy preserving algorithms that might be (a bit)
887
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
slower depending on the matrix. If the matrix does not define its
eigenvalues to high relative accuracy, the code can uses possibly faster
algorithms.
If tryrac is not true, the code is not required to guarantee relatively
accurate eigenvalues and can use the fastest possible techniques.
Output Parameters
w Array, size n.
The first m elements contain the selected eigenvalues in ascending order.
z Array z(size max(1, ldz*m) for column major layout and max(1, ldz*n) for
row major layout) .
If jobz = 'V', and info = 0, then the first m columns of z contain the
orthonormal eigenvectors of the matrix T corresponding to the selected
eigenvalues, with the i-th column of z holding the eigenvector associated
with w(i).
Note: the exact value of m is not known in advance and can be computed
with a workspace query by setting nzc=-1, see description of the
parameter nzc.
tryrac On exit, , set to true. tryrac is set to false if the matrix does not define its
eigenvalues to high relative accuracy.
Return Values
This function returns a value info.
888
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
?stedc
Computes all eigenvalues and eigenvectors of a
symmetric tridiagonal matrix using the divide and
conquer method.
Syntax
lapack_int LAPACKE_sstedc( int matrix_layout, char compz, lapack_int n, float* d,
float* e, float* z, lapack_int ldz );
lapack_int LAPACKE_dstedc( int matrix_layout, char compz, lapack_int n, double* d,
double* e, double* z, lapack_int ldz );
lapack_int LAPACKE_cstedc( int matrix_layout, char compz, lapack_int n, float* d,
float* e, lapack_complex_float* z, lapack_int ldz );
lapack_int LAPACKE_zstedc( int matrix_layout, char compz, lapack_int n, double* d,
double* e, lapack_complex_double* z, lapack_int ldz );
Include Files
• mkl.h
Description
The routine computes all the eigenvalues and (optionally) all the eigenvectors of a symmetric tridiagonal
matrix using the divide and conquer method. The eigenvectors of a full or band real symmetric or complex
Hermitian matrix can also be found if sytrd/hetrd or sptrd/hptrd or sbtrd/hbtrd has been used to reduce this
matrix to tridiagonal form.
Input Parameters
d, e Arrays:
d contains the diagonal elements of the tridiagonal matrix.
The dimension of d must be at least max(1, n).
e contains the subdiagonal elements of the tridiagonal matrix.
The dimension of e must be at least max(1, n-1).
889
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
Return Values
This function returns a value info.
If info = i, the algorithm failed to compute an eigenvalue while working on the submatrix lying in rows and
columns i/(n+1) through mod(i, n+1).
?stegr
Computes selected eigenvalues and eigenvectors of a
real symmetric tridiagonal matrix.
Syntax
lapack_int LAPACKE_sstegr( int matrix_layout, char jobz, char range, lapack_int n,
float* d, float* e, float vl, float vu, lapack_int il, lapack_int iu, float abstol,
lapack_int* m, float* w, float* z, lapack_int ldz, lapack_int* isuppz );
lapack_int LAPACKE_dstegr( int matrix_layout, char jobz, char range, lapack_int n,
double* d, double* e, double vl, double vu, lapack_int il, lapack_int iu, double abstol,
lapack_int* m, double* w, double* z, lapack_int ldz, lapack_int* isuppz );
lapack_int LAPACKE_cstegr( int matrix_layout, char jobz, char range, lapack_int n,
float* d, float* e, float vl, float vu, lapack_int il, lapack_int iu, float abstol,
lapack_int* m, float* w, lapack_complex_float* z, lapack_int ldz, lapack_int* isuppz );
lapack_int LAPACKE_zstegr( int matrix_layout, char jobz, char range, lapack_int n,
double* d, double* e, double vl, double vu, lapack_int il, lapack_int iu, double abstol,
lapack_int* m, double* w, lapack_complex_double* z, lapack_int ldz, lapack_int*
isuppz );
Include Files
• mkl.h
890
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
The routine computes selected eigenvalues and, optionally, eigenvectors of a real symmetric tridiagonal
matrix T.
The spectrum may be computed either completely or partially by specifying either an interval (vl,vu] or a
range of indices il:iu for the desired eigenvalues.
?stegr is a compatibility wrapper around the improved stemr routine. See its description for further details.
Note that the abstol parameter no longer provides any benefit and hence is no longer used.
Input Parameters
d, e Arrays:
d contains the diagonal elements of T.
The dimension of d must be at least max(1, n).
e contains the subdiagonal elements of T in elements 1 to n-1; e(n) need
not be set on input, but it is used as a workspace.
The dimension of e must be at least max(1, n).
vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
Constraint: vl< vu.
il, iu If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint: 1 ≤il≤iu≤n, if n > 0.
abstol Unused. Was the absolute error tolerance for the eigenvalues/eigenvectors
in previous versions.
891
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
Note: if range = 'V', the exact value of m is not known in advance and an
upper bound must be used. Using n = m is always safe.
Return Values
This function returns a value info.
?pteqr
Computes all eigenvalues and (optionally) all
eigenvectors of a real symmetric positive-definite
tridiagonal matrix.
Syntax
lapack_int LAPACKE_spteqr( int matrix_layout, char compz, lapack_int n, float* d,
float* e, float* z, lapack_int ldz );
lapack_int LAPACKE_dpteqr( int matrix_layout, char compz, lapack_int n, double* d,
double* e, double* z, lapack_int ldz );
892
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_cpteqr( int matrix_layout, char compz, lapack_int n, float* d,
float* e, lapack_complex_float* z, lapack_int ldz );
lapack_int LAPACKE_zpteqr( int matrix_layout, char compz, lapack_int n, double* d,
double* e, lapack_complex_double* z, lapack_int ldz );
Include Files
• mkl.h
Description
The routine computes all the eigenvalues and (optionally) all the eigenvectors of a real symmetric positive-
definite tridiagonal matrix T. In other words, the routine can compute the spectral factorization: T =
Z*Λ*ZT.
Here Λ is a diagonal matrix whose diagonal elements are the eigenvalues λi; Z is an orthogonal matrix whose
columns are eigenvectors. Thus,
T*zi = λi*zi for i = 1, 2, ..., n.
(The routine normalizes the eigenvectors so that ||zi||2 = 1.)
You can also use the routine for computing the eigenvalues and eigenvectors of real symmetric (or complex
Hermitian) positive-definite matrices A reduced to tridiagonal form T: A = Q*T*QH. In this case, the spectral
factorization is as follows: A = Q*T*QH = (QZ)*Λ*(QZ)H. Before calling ?pteqr, you must reduce A to
tridiagonal form and generate the explicit matrix Q by calling the following routines:
The routine first factorizes T as L*D*LH where L is a unit lower bidiagonal matrix, and D is a diagonal matrix.
Then it forms the bidiagonal matrix B = L*D1/2 and calls ?bdsqr to compute the singular values of B, which
are the square roots of the eigenvalues of T.
Input Parameters
d, e Arrays:
d contains the diagonal elements of T.
893
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
Return Values
This function returns a value info.
If info = i, the leading minor of order i (and hence T itself) is not positive-definite.
If info = n + i, the algorithm for computing singular values failed to converge; i off-diagonal elements
have not converged to zero.
If info = -i, the i-th parameter had an illegal value.
Application Notes
If λi is an exact eigenvalue, and μi is the corresponding computed value, then
|μi - λi| ≤c(n)*ε*K*λi
where c(n) is a modestly increasing function of n, ε is the machine precision, and K = ||DTD||2 *||
(DTD)-1||2, D is diagonal with dii = tii-1/2.
If zi is the corresponding exact eigenvector, and wi is the corresponding computed vector, then the angle θ(zi,
wi) between them is bounded as follows:
θ(ui, wi) ≤c(n)εK / mini≠j(|λi - λj|/|λi + λj|).
Here mini≠j(|λi - λj|/|λi + λj|) is the relative gap between λi and the other eigenvalues.
The total number of floating-point operations depends on how rapidly the algorithm converges.
Typically, it is about
30n2 if compz = 'N';
894
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
6n3 (for complex flavors, 12n3) if compz = 'V' or 'I'.
?stebz
Computes selected eigenvalues of a real symmetric
tridiagonal matrix by bisection.
Syntax
lapack_int LAPACKE_sstebz (char range, char order, lapack_int n, float vl, float vu,
lapack_int il, lapack_int iu, float abstol, const float* d, const float* e, lapack_int*
m, lapack_int* nsplit, float* w, lapack_int* iblock, lapack_int* isplit);
lapack_int LAPACKE_dstebz (char range, char order, lapack_int n, double vl, double vu,
lapack_int il, lapack_int iu, double abstol, const double* d, const double* e,
lapack_int* m, lapack_int* nsplit, double* w, lapack_int* iblock, lapack_int* isplit);
Include Files
• mkl.h
Description
The routine computes some (or all) of the eigenvalues of a real symmetric tridiagonal matrix T by bisection.
The routine searches for zero or negligible off-diagonal elements to see if T splits into block-diagonal form T
= diag(T1, T2, ...). Then it performs bisection on each of the blocks Ti and returns the block index of
each computed eigenvalue, so that a subsequent call to stein can also take advantage of the block structure.
Input Parameters
vl, vu If range = 'V', the routine computes eigenvalues w[i] in the half-open
interval:
vl < w[i]) ≤vu.
If range = 'A' or 'I', vl and vu are not referenced.
If range = 'I', the routine computes eigenvalues w[i] such that il≤i≤iu
(assuming that the eigenvalues w[i] are in ascending order).
895
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
d, e Arrays:
d contains the diagonal elements of T.
The size of d must be at least max(1, n).
e contains the off-diagonal elements of T.
The size of e must be at least max(1, n-1).
Output Parameters
w Array, size at least max(1, n). The computed eigenvalues, stored in w[0] to
w[m - 1].
Return Values
This function returns a value info.
If info = 1, for range = 'A' or 'V', the algorithm failed to compute some of the required eigenvalues to
the desired accuracy; iblock[i] < 0 indicates that the eigenvalue stored in w[i] failed to converge.
If info = 2, for range = 'I', the algorithm failed to compute some of the required eigenvalues. Try calling
the routine again with range = 'A'.
If info = 3:
If info = 4, no eigenvalues have been computed. The floating-point arithmetic on the computer is not
behaving as expected.
If info = -i, the i-th parameter had an illegal value.
896
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Application Notes
The eigenvalues of T are computed to high relative accuracy which means that if they vary widely in
magnitude, then any small eigenvalues will be computed more accurately than, for example, with the
standard QR method. However, the reduction to tridiagonal form (prior to calling the routine) may exclude
the possibility of obtaining high relative accuracy in the small eigenvalues of the original matrix if its
eigenvalues vary widely in magnitude.
?stein
Computes the eigenvectors corresponding to specified
eigenvalues of a real symmetric tridiagonal matrix.
Syntax
lapack_int LAPACKE_sstein( int matrix_layout, lapack_int n, const float* d, const
float* e, lapack_int m, const float* w, const lapack_int* iblock, const lapack_int*
isplit, float* z, lapack_int ldz, lapack_int* ifailv );
lapack_int LAPACKE_dstein( int matrix_layout, lapack_int n, const double* d, const
double* e, lapack_int m, const double* w, const lapack_int* iblock, const lapack_int*
isplit, double* z, lapack_int ldz, lapack_int* ifailv );
lapack_int LAPACKE_cstein( int matrix_layout, lapack_int n, const float* d, const
float* e, lapack_int m, const float* w, const lapack_int* iblock, const lapack_int*
isplit, lapack_complex_float* z, lapack_int ldz, lapack_int* ifailv );
lapack_int LAPACKE_zstein( int matrix_layout, lapack_int n, const double* d, const
double* e, lapack_int m, const double* w, const lapack_int* iblock, const lapack_int*
isplit, lapack_complex_double* z, lapack_int ldz, lapack_int* ifailv );
Include Files
• mkl.h
Description
The routine computes the eigenvectors of a real symmetric tridiagonal matrix T corresponding to specified
eigenvalues, by inverse iteration. It is designed to be used in particular after the specified eigenvalues have
been computed by ?stebz with order = 'B', but may also be used when the eigenvalues have been
computed by other routines.
If you use this routine after ?stebz, it can take advantage of the block structure by performing inverse
iteration on each block Ti separately, which is more efficient than using the whole matrix T.
If T has been formed by reduction of a full symmetric or Hermitian matrix A to tridiagonal form, you can
transform eigenvectors of T to eigenvectors of A by calling ?ormtr or ?opmtr (for real flavors) or by
calling ?unmtr or ?upmtr (for complex flavors).
Input Parameters
d, e, w Arrays:
897
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
iblock, isplit Arrays, size at least max(1, n). The arrays iblock and isplit, as returned
by ?stebz with order = 'B'.
If you did not call ?stebz with order = 'B', set all elements of iblock to
1, and isplit[0] to n.)
ldz The leading dimension of the output array z; ldz≥ max(1, n) for column
major layout and ldz>=max(1,m) for row major layout.
Output Parameters
Return Values
This function returns a value info.
If info = i, then i eigenvectors (as indicated by the parameter ifailv) each failed to converge in 5 iterations.
The current iterates are stored in the corresponding columns/rows of the array z.
If info = -i, the i-th parameter had an illegal value.
Application Notes
Each computed eigenvector zi is an exact eigenvector of a matrix T+Ei, where ||Ei||2 = O(ε)*||T||2.
However, a set of eigenvectors computed by this routine may not be orthogonal to so high a degree of
accuracy as those computed by ?steqr.
898
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
?disna
Computes the reciprocal condition numbers for the
eigenvectors of a symmetric/ Hermitian matrix or for
the left or right singular vectors of a general matrix.
Syntax
lapack_int LAPACKE_sdisna (char job, lapack_int m, lapack_int n, const float* d, float*
sep);
lapack_int LAPACKE_ddisna (char job, lapack_int m, lapack_int n, const double* d,
double* sep);
Include Files
• mkl.h
Description
The routine computes the reciprocal condition numbers for the eigenvectors of a real symmetric or complex
Hermitian matrix or for the left or right singular vectors of a general m-by-n matrix.
The reciprocal condition number is the 'gap' between the corresponding eigenvalue or singular value and the
nearest other one.
The bound on the error, measured by angle in radians, in the i-th computed vector is given by
?lamch('E')*(anorm/sep(i))
where anorm = ||A||2 = max( |d(j)| ). sep(i) is not allowed to be smaller than slamch('E')*anorm in
order to limit the size of the error bound.
?disna may also be used to compute error bounds for eigenvectors of the generalized symmetric definite
eigenproblem.
Input Parameters
job Must be 'E','L', or 'R'. Specifies for which problem the reciprocal
condition numbers should be computed:
job = 'E': for the eigenvectors of a symmetric/Hermitian matrix;
n If job = 'L', or 'R', the number of columns of the matrix (n≥ 0). Ignored
if job = 'E'.
This array must contain the eigenvalues (if job = 'E') or singular values
(if job = 'L' or 'R') of the matrix, in either increasing or decreasing
order.
If singular values, they must be non-negative.
899
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
sep Array, dimension at least max(1,m) if job = 'E', and at least max(1,
min(m,n)) if job = 'L' or 'R'. The reciprocal condition numbers of the
vectors.
Return Values
This function returns a value info.
?sygst
Reduces a real symmetric-definite generalized
eigenvalue problem to the standard form.
Syntax
lapack_int LAPACKE_ssygst (int matrix_layout, lapack_int itype, char uplo, lapack_int
n, float* a, lapack_int lda, const float* b, lapack_int ldb);
lapack_int LAPACKE_dsygst (int matrix_layout, lapack_int itype, char uplo, lapack_int
n, double* a, lapack_int lda, const double* b, lapack_int ldb);
900
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h
Description
Input Parameters
itype Must be 1 or 2 or 3.
If itype = 1, the generalized eigenproblem is A*z = lambda*B*z
If uplo = 'U', the array a stores the upper triangle of A; you must supply
B in the factored form B = UT*U.
If uplo = 'L', the array a stores the lower triangle of A; you must supply
B in the factored form B = L*LT.
a, b Arrays:
a (size max(1, lda*n)) contains the upper or lower triangle of A.
901
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
Return Values
This function returns a value info.
Application Notes
Forming the reduced matrix C is a stable procedure. However, it involves implicit multiplication by inv(B) (if
itype = 1) or B (if itype = 2 or 3). When the routine is used as a step in the computation of eigenvalues
and eigenvectors of the original problem, there may be a significant loss of accuracy if B is ill-conditioned
with respect to inversion.
The approximate number of floating-point operations is n3.
?hegst
Reduces a complex Hermitian positive-definite
generalized eigenvalue problem to the standard form.
Syntax
lapack_int LAPACKE_chegst (int matrix_layout, lapack_int itype, char uplo, lapack_int
n, lapack_complex_float* a, lapack_int lda, const lapack_complex_float* b, lapack_int
ldb);
lapack_int LAPACKE_zhegst (int matrix_layout, lapack_int itype, char uplo, lapack_int
n, lapack_complex_double* a, lapack_int lda, const lapack_complex_double* b, lapack_int
ldb);
Include Files
• mkl.h
Description
The routine reduces a complex Hermitian positive-definite generalized eigenvalue problem to standard form.
3 B*A*x = λ*x
Before calling this routine, you must call ?potrf to compute the Cholesky factorization: B = UH*U or B =
L*LH.
902
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
itype Must be 1 or 2 or 3.
If itype = 1, the generalized eigenproblem is A*z = lambda*B*z
If uplo = 'U', the array a stores the upper triangle of A; you must supply
B in the factored form B = UH*U.
If uplo = 'L', the array a stores the lower triangle of A; you must supply
B in the factored form B = L*LH.
a, b Arrays:
a (size max(1, lda*n)) contains the upper or lower triangle of A.
Output Parameters
Return Values
This function returns a value info.
Application Notes
Forming the reduced matrix C is a stable procedure. However, it involves implicit multiplication by B-1 (if
itype = 1) or B (if itype = 2 or 3). When the routine is used as a step in the computation of eigenvalues
and eigenvectors of the original problem, there may be a significant loss of accuracy if B is ill-conditioned
with respect to inversion.
The approximate number of floating-point operations is n3.
903
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?spgst
Reduces a real symmetric-definite generalized
eigenvalue problem to the standard form using packed
storage.
Syntax
lapack_int LAPACKE_sspgst (int matrix_layout, lapack_int itype, char uplo, lapack_int
n, float* ap, const float* bp);
lapack_int LAPACKE_dspgst (int matrix_layout, lapack_int itype, char uplo, lapack_int
n, double* ap, const double* bp);
Include Files
• mkl.h
Description
Input Parameters
itype Must be 1 or 2 or 3.
If itype = 1, the generalized eigenproblem is A*z = lambda*B*z
904
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ap, bp Arrays:
ap contains the packed upper or lower triangle of A.
The dimension of ap must be at least max(1, n*(n+1)/2).
bp contains the packed Cholesky factor of B (as returned by ?pptrf with
the same uplo value).
The dimension of bp must be at least max(1, n*(n+1)/2).
Output Parameters
Return Values
This function returns a value info.
Application Notes
Forming the reduced matrix C is a stable procedure. However, it involves implicit multiplication by inv(B) (if
itype = 1) or B (if itype = 2 or 3). When the routine is used as a step in the computation of eigenvalues
and eigenvectors of the original problem, there may be a significant loss of accuracy if B is ill-conditioned
with respect to inversion.
The approximate number of floating-point operations is n3.
?hpgst
Reduces a generalized eigenvalue problem with a
Hermitian matrix to a standard eigenvalue problem
using packed storage.
Syntax
lapack_int LAPACKE_chpgst (int matrix_layout, lapack_int itype, char uplo, lapack_int
n, lapack_complex_float* ap, const lapack_complex_float* bp);
lapack_int LAPACKE_zhpgst (int matrix_layout, lapack_int itype, char uplo, lapack_int
n, lapack_complex_double* ap, const lapack_complex_double* bp);
Include Files
• mkl.h
Description
905
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
itype Must be 1 or 2 or 3.
If itype = 1, the generalized eigenproblem is A*z = lambda*B*z
If uplo = 'U', ap stores the packed upper triangle of A; you must supply
B in the factored form B = UH*U.
If uplo = 'L', ap stores the packed lower triangle of A; you must supply B
in the factored form B = L*LH.
ap, bp Arrays:
ap contains the packed upper or lower triangle of A.
The dimension of a must be at least max(1, n*(n+1)/2).
bp contains the packed Cholesky factor of B (as returned by ?pptrf with
the same uplo value).
The dimension of b must be at least max(1, n*(n+1)/2).
Output Parameters
Return Values
This function returns a value info.
Application Notes
Forming the reduced matrix C is a stable procedure. However, it involves implicit multiplication by inv(B) (if
itype = 1) or B (if itype = 2 or 3). When the routine is used as a step in the computation of eigenvalues
and eigenvectors of the original problem, there may be a significant loss of accuracy if B is ill-conditioned
with respect to inversion.
906
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The approximate number of floating-point operations is n3.
?sbgst
Reduces a real symmetric-definite generalized
eigenproblem for banded matrices to the standard
form using the factorization performed by ?pbstf.
Syntax
lapack_int LAPACKE_ssbgst (int matrix_layout, char vect, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, float* ab, lapack_int ldab, const float* bb, lapack_int
ldbb, float* x, lapack_int ldx);
lapack_int LAPACKE_dsbgst (int matrix_layout, char vect, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, double* ab, lapack_int ldab, const double* bb, lapack_int
ldbb, double* x, lapack_int ldx);
Include Files
• mkl.h
Description
To reduce the real symmetric-definite generalized eigenproblem A*z = λ*B*z to the standard form C*y=λ*y,
where A, B and C are banded, this routine must be preceded by a call to pbstf, which computes the split
Cholesky factorization of the positive-definite matrix B: B=ST*S. The split Cholesky factorization, compared
with the ordinary Cholesky factorization, allows the work to be approximately halved.
This routine overwrites A with C = XT*A*X, where X = inv(S)*Q and Q is an orthogonal matrix chosen
(implicitly) to preserve the bandwidth of A. The routine also has an option to allow the accumulation of X,
and then, if z is an eigenvector of C, X*z is an eigenvector of the original system.
Input Parameters
907
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ab, bb ab(size at least max(1, ldab*n) for column major layout and at least
max(1, ldab*(ka + 1)) for row major layout) is an array containing either
upper or lower triangular part of the symmetric matrix A (as specified by
uplo) in band storage format.
bb(size at least max(1, ldbb*n) for column major layout and at least
max(1, ldbb*(kb + 1)) for row major layout) is an array containing the
banded split Cholesky factor of B as specified by uplo, n and kb and
returned by pbstf/pbstf.
ldab The leading dimension of the array ab; must be at least ka+1 for column
major layout and max(1, n) for row major layout.
ldbb The leading dimension of the array bb; must be at least kb+1 for column
major layout and max(1, n) for row major layout.
ldx The leading dimension of the output array x. Constraints: if vect = 'N',
then ldx≥ 1;
Output Parameters
x Array.
If vect = 'V', then x (size at least max(1, ldx*n)) contains the n-by-n
matrix X = inv(S)*Q.
Return Values
This function returns a value info.
Application Notes
Forming the reduced matrix C involves implicit multiplication by inv(B). When the routine is used as a step
in the computation of eigenvalues and eigenvectors of the original problem, there may be a significant loss of
accuracy if B is ill-conditioned with respect to inversion.
If ka and kb are much less than n then the total number of floating-point operations is approximately
6n2*kb, when vect = 'N'. Additional (3/2)n3*(kb/ka) operations are required when vect = 'V'.
?hbgst
Reduces a complex Hermitian positive-definite
generalized eigenproblem for banded matrices to the
standard form using the factorization performed
by ?pbstf.
908
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_chbgst (int matrix_layout, char vect, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, lapack_complex_float* ab, lapack_int ldab, const
lapack_complex_float* bb, lapack_int ldbb, lapack_complex_float* x, lapack_int ldx);
lapack_int LAPACKE_zhbgst (int matrix_layout, char vect, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, lapack_complex_double* ab, lapack_int ldab, const
lapack_complex_double* bb, lapack_int ldbb, lapack_complex_double* x, lapack_int ldx);
Include Files
• mkl.h
Description
To reduce the complex Hermitian positive-definite generalized eigenproblem A*z = λ*B*z to the standard
form C*x = λ*y, where A, B and C are banded, this routine must be preceded by a call to pbstf/pbstf, which
computes the split Cholesky factorization of the positive-definite matrix B: B = SH*S. The split Cholesky
factorization, compared with the ordinary Cholesky factorization, allows the work to be approximately halved.
This routine overwrites A with C = XH*A*X, where X = inv(S)*Q, and Q is a unitary matrix chosen
(implicitly) to preserve the bandwidth of A. The routine also has an option to allow the accumulation of X,
and then, if z is an eigenvector of C, X*z is an eigenvector of the original system.
Input Parameters
ab, bb ab(size at least max(1, ldab*n) for column major layout and at least
max(1, ldab*(ka + 1)) for row major layout) is an array containing either
upper or lower triangular part of the Hermitian matrix A (as specified by
uplo) in band storage format.
909
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
bb(size at least max(1, ldbb*n) for column major layout and at least
max(1, ldbb*(kb + 1)) for row major layout) is an array containing the
banded split Cholesky factor of B as specified by uplo, n and kb and
returned by pbstf/pbstf.
ldab The leading dimension of the array ab; must be at least ka+1 for column
major layout and max(1, n) for row major layout.
ldbb The leading dimension of the array bb; must be at least kb+1 for column
major layout and max(1, n) for row major layout.
Output Parameters
x Array.
If vect = 'V', then x (size at least max(1, ldx*n)) contains the n-by-n
matrix X = inv(S)*Q.
Return Values
This function returns a value info.
Application Notes
Forming the reduced matrix C involves implicit multiplication by inv(B). When the routine is used as a step
in the computation of eigenvalues and eigenvectors of the original problem, there may be a significant loss of
accuracy if B is ill-conditioned with respect to inversion. The total number of floating-point operations is
approximately 20n2*kb, when vect = 'N'. Additional 5n3*(kb/ka) operations are required when vect =
'V'. All these estimates assume that both ka and kb are much less than n.
?pbstf
Computes a split Cholesky factorization of a real
symmetric or complex Hermitian positive-definite
banded matrix used in ?sbgst/?hbgst .
Syntax
lapack_int LAPACKE_spbstf (int matrix_layout, char uplo, lapack_int n, lapack_int kb,
float* bb, lapack_int ldbb);
lapack_int LAPACKE_dpbstf (int matrix_layout, char uplo, lapack_int n, lapack_int kb,
double* bb, lapack_int ldbb);
lapack_int LAPACKE_cpbstf (int matrix_layout, char uplo, lapack_int n, lapack_int kb,
lapack_complex_float* bb, lapack_int ldbb);
910
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_zpbstf (int matrix_layout, char uplo, lapack_int n, lapack_int kb,
lapack_complex_double* bb, lapack_int ldbb);
Include Files
• mkl.h
Description
The routine computes a split Cholesky factorization of a real symmetric or complex Hermitian positive-
definite band matrix B. It is to be used in conjunction with sbgst/hbgst.
The factorization has the form B = ST*S (or B = SH*S for complex flavors), where S is a band matrix of the
same bandwidth as B and the following structure: S is upper triangular in the first (n+kb)/2 rows and lower
triangular in the remaining rows.
Input Parameters
bb bb(size at least max(1, ldbb*n) for column major layout and at least
max(1, ldbb*(kb + 1)) for row major layout) is an array containing either
upper or lower triangular part of the matrix B (as specified by uplo) in band
storage format.
ldbb The leading dimension of bb; must be at least kb+1for column major and at
least max(1, n) for row major.
Output Parameters
Return Values
This function returns a value info.
If info = i, then the factorization could not be completed, because the updated element bii would be the
square root of a negative number; hence the matrix B is not positive-definite.
If info = -i, the i-th parameter had an illegal value.
Application Notes
The computed factor S is the exact factor of a perturbed matrix B + E, where
911
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
912
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Operation performed Routines for real matrices Routines for complex matrices
?gehrd
Reduces a general matrix to upper Hessenberg form.
Syntax
lapack_int LAPACKE_sgehrd (int matrix_layout, lapack_int n, lapack_int ilo, lapack_int
ihi, float* a, lapack_int lda, float* tau);
lapack_int LAPACKE_dgehrd (int matrix_layout, lapack_int n, lapack_int ilo, lapack_int
ihi, double* a, lapack_int lda, double* tau);
lapack_int LAPACKE_cgehrd (int matrix_layout, lapack_int n, lapack_int ilo, lapack_int
ihi, lapack_complex_float* a, lapack_int lda, lapack_complex_float* tau);
lapack_int LAPACKE_zgehrd (int matrix_layout, lapack_int n, lapack_int ilo, lapack_int
ihi, lapack_complex_double* a, lapack_int lda, lapack_complex_double* tau);
Include Files
• mkl.h
Description
The routine reduces a general matrix A to upper Hessenberg form H by an orthogonal or unitary similarity
transformation A = Q*H*QH. Here H has real subdiagonal elements.
The routine does not form the matrix Q explicitly. Instead, Q is represented as a product of elementary
reflectors. Routines are provided to work with Q in this representation.
Input Parameters
ilo, ihi If A is an output by ?gebal, then ilo and ihi must contain the values
returned by that routine. Otherwise ilo = 1 and ihi = n. (If n > 0, then
1 ≤ilo≤ihi≤n; if n = 0, ilo = 1 and ihi = 0.)
a Arrays:
913
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
a The elements on and above the subdiagonal contain the upper Hessenberg
matrix H. The subdiagonal elements of H are real. The elements below the
subdiagonal, with the array tau, represent the orthogonal matrix Q as a
product of n elementary reflectors.
Return Values
This function returns a value info.
Application Notes
The computed Hessenberg matrix H is exactly similar to a nearby matrix A + E, where ||E||2 < c(n)ε||
A||2, c(n) is a modestly increasing function of n, and ε is the machine precision.
The approximate number of floating-point operations for real flavors is (2/3)*(ihi - ilo)2(2ihi + 2ilo
+ 3n); for complex flavors it is 4 times greater.
?orghr
Generates the real orthogonal matrix Q determined
by ?gehrd.
Syntax
lapack_int LAPACKE_sorghr (int matrix_layout, lapack_int n, lapack_int ilo, lapack_int
ihi, float* a, lapack_int lda, const float* tau);
lapack_int LAPACKE_dorghr (int matrix_layout, lapack_int n, lapack_int ilo, lapack_int
ihi, double* a, lapack_int lda, const double* tau);
Include Files
• mkl.h
Description
The routine explicitly generates the orthogonal matrix Q that has been determined by a preceding call to
sgehrd/dgehrd. (The routine ?gehrd reduces a real general matrix A to upper Hessenberg form H by an
orthogonal similarity transformation, A = Q*H*QT, and represents the matrix Q as a product of ihi-
iloelementary reflectors. Here ilo and ihi are values determined by sgebal/dgebal when balancing the
matrix; if the matrix has not been balanced, ilo = 1 and ihi = n.)
914
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
ilo, ihi These must be the same parameters ilo and ihi, respectively, as supplied
to ?gehrd. (If n > 0, then 1 ≤ilo≤ihi≤n; if n = 0, ilo = 1 and ihi =
0.)
a, tau Arrays: a (size max(1, lda*n)) contains details of the vectors which define
the elementary reflectors, as returned by ?gehrd.
Output Parameters
Return Values
This function returns a value info.
915
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Application Notes
The computed matrix Q differs from the exact result by a matrix E such that ||E||2 = O(ε), where ε is the
machine precision.
The approximate number of floating-point operations is (4/3)(ihi-ilo)3.
?ormhr
Multiplies an arbitrary real matrix C by the real
orthogonal matrix Q determined by ?gehrd.
Syntax
lapack_int LAPACKE_sormhr (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int ilo, lapack_int ihi, const float* a, lapack_int lda, const
float* tau, float* c, lapack_int ldc);
lapack_int LAPACKE_dormhr (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int ilo, lapack_int ihi, const double* a, lapack_int lda, const
double* tau, double* c, lapack_int ldc);
Include Files
• mkl.h
Description
The routine multiplies a matrix C by the orthogonal matrix Q that has been determined by a preceding call to
sgehrd/dgehrd. (The routine ?gehrd reduces a real general matrix A to upper Hessenberg form H by an
orthogonal similarity transformation, A = Q*H*QT, and represents the matrix Q as a product of ihi-
iloelementary reflectors. Here ilo and ihi are values determined by sgebal/dgebal when balancing the
matrix;if the matrix has not been balanced, ilo = 1 and ihi = n.)
With ?ormhr, you can form one of the matrix products Q*C, QT*C, C*Q, or C*QT, overwriting the result on C
(which may be any real rectangular matrix).
A common application of ?ormhr is to transform a matrix V of eigenvectors of H to the matrix QV of
eigenvectors of A.
Input Parameters
916
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
n The number of columns in C (n≥ 0).
ilo, ihi These must be the same parameters ilo and ihi, respectively, as supplied
to ?gehrd.
a, tau, c Arrays:
a(size max(1,lda*n) for side='R' and size max(1,lda*m) for side='L')
contains details of the vectors which define the elementary reflectors, as
returned by ?gehrd.
The dimension of tau must be at least max (1, m-1) if side = 'L' and at
least max (1, n-1) if side = 'R'.
c(size max(1, ldc*n) for column major layout and max(1, ldc*m for row
major layout) contains the m by n matrix C.
lda The leading dimension of a; at least max(1, m) if side = 'L' and at least
max (1, n) if side = 'R'.
ldc The leading dimension of c; at least max(1, m) for column major layout and
at least max(1, n) for row major layout .
Output Parameters
Return Values
This function returns a value info.
Application Notes
The computed matrix Q differs from the exact result by a matrix E such that ||E||2 = O(ε)|*|C||2, where
ε is the machine precision.
The approximate number of floating-point operations is
2n(ihi-ilo)2 if side = 'L';
2m(ihi-ilo)2 if side = 'R'.
The complex counterpart of this routine is unmhr.
917
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?unghr
Generates the complex unitary matrix Q determined
by ?gehrd.
Syntax
lapack_int LAPACKE_cunghr (int matrix_layout, lapack_int n, lapack_int ilo, lapack_int
ihi, lapack_complex_float* a, lapack_int lda, const lapack_complex_float* tau);
lapack_int LAPACKE_zunghr (int matrix_layout, lapack_int n, lapack_int ilo, lapack_int
ihi, lapack_complex_double* a, lapack_int lda, const lapack_complex_double* tau);
Include Files
• mkl.h
Description
The routine is intended to be used following a call to cgehrd/zgehrd, which reduces a complex matrix A to
upper Hessenberg form H by a unitary similarity transformation: A = Q*H*QH. ?gehrd represents the matrix
Q as a product of ihi-iloelementary reflectors. Here ilo and ihi are values determined by cgebal/zgebal
when balancing the matrix; if the matrix has not been balanced, ilo = 1 and ihi = n.
Use the routine unghr to generate Q explicitly as a square matrix. The matrix Q has the structure:
Input Parameters
918
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ilo, ihi These must be the same parameters ilo and ihi, respectively, as supplied
to ?gehrd . (If n > 0, then 1 ≤ilo≤ihi≤n. If n = 0, then ilo = 1 and
ihi = 0.)
a, tau Arrays:
a (size max(1, lda*n)) contains details of the vectors which define the
elementary reflectors, as returned by ?gehrd.
tau contains further details of the elementary reflectors, as returned
by ?gehrd .
Output Parameters
Return Values
This function returns a value info.
Application Notes
The computed matrix Q differs from the exact result by a matrix E such that ||E||2 = O(ε), where ε is the
machine precision.
The approximate number of real floating-point operations is (16/3)(ihi-ilo)3.
?unmhr
Multiplies an arbitrary complex matrix C by the
complex unitary matrix Q determined by ?gehrd.
Syntax
lapack_int LAPACKE_cunmhr (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int ilo, lapack_int ihi, const lapack_complex_float* a, lapack_int
lda, const lapack_complex_float* tau, lapack_complex_float* c, lapack_int ldc);
lapack_int LAPACKE_zunmhr (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int ilo, lapack_int ihi, const lapack_complex_double* a,
lapack_int lda, const lapack_complex_double* tau, lapack_complex_double* c, lapack_int
ldc);
Include Files
• mkl.h
Description
919
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The routine multiplies a matrix C by the unitary matrix Q that has been determined by a preceding call to
cgehrd/zgehrd. (The routine ?gehrd reduces a real general matrix A to upper Hessenberg form H by an
orthogonal similarity transformation, A = Q*H*QH, and represents the matrix Q as a product of ihi-ilo
elementary reflectors. Here ilo and ihi are values determined by cgebal/zgebal when balancing the
matrix; if the matrix has not been balanced, ilo = 1 and ihi = n.)
With ?unmhr, you can form one of the matrix products Q*C, QH*C, C*Q, or C*QH, overwriting the result on C
(which may be any complex rectangular matrix). A common application of this routine is to transform a
matrix V of eigenvectors of H to the matrix QV of eigenvectors of A.
Input Parameters
ilo, ihi These must be the same parameters ilo and ihi, respectively, as supplied
to ?gehrd .
a, tau, c Arrays:
a(size max(1,lda*n) for side='R' and size max(1,lda*m) for side='L')
contains details of the vectors which define the elementary reflectors, as
returned by ?gehrd.
c(size max(1, ldc*n) for column major layout and max(1, ldc*m for row
major layout) contains the m-by-n matrix C.
lda The leading dimension of a; at least max(1, m) if side = 'L' and at least
max (1, n) if side = 'R'.
ldc The leading dimension of c; at least max(1, m) for column major layout and
at least max(1, n) for row major layout.
920
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
Return Values
This function returns a value info.
Application Notes
The computed matrix Q differs from the exact result by a matrix E such that ||E||2 = O(ε)*||C||2, where
ε is the machine precision.
The approximate number of floating-point operations is
8n(ihi-ilo)2 if side = 'L';
8m(ihi-ilo)2 if side = 'R'.
The real counterpart of this routine is ormhr.
?gebal
Balances a general matrix to improve the accuracy of
computed eigenvalues and eigenvectors.
Syntax
lapack_int LAPACKE_sgebal( int matrix_layout, char job, lapack_int n, float* a,
lapack_int lda, lapack_int* ilo, lapack_int* ihi, float* scale );
lapack_int LAPACKE_dgebal( int matrix_layout, char job, lapack_int n, double* a,
lapack_int lda, lapack_int* ilo, lapack_int* ihi, double* scale );
lapack_int LAPACKE_cgebal( int matrix_layout, char job, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_int* ilo, lapack_int* ihi, float*
scale );
lapack_int LAPACKE_zgebal( int matrix_layout, char job, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_int* ilo, lapack_int* ihi, double*
scale );
Include Files
• mkl.h
Description
The routine balances a matrix A by performing either or both of the following two similarity transformations:
(1) The routine first attempts to permute A to block upper triangular form:
921
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
where P is a permutation matrix, and A'11 and A'33 are upper triangular. The diagonal elements of A'11 and
A'33 are eigenvalues of A. The rest of the eigenvalues of A are the eigenvalues of the central diagonal block
A'22, in rows and columns ilo to ihi. Subsequent operations to compute the eigenvalues of A (or its Schur
factorization) need only be applied to these rows and columns; this can save a significant amount of work if
ilo > 1 and ihi < n.
If no suitable permutation exists (as is often the case), the routine sets ilo = 1 and ihi = n, and A'22 is
the whole of A.
(2) The routine applies a diagonal similarity transformation to A', to make the rows and columns of A'22 as
close in norm as possible:
This scaling can reduce the norm of the matrix (that is, ||A''22|| < ||A'22||), and hence reduce the
effect of rounding errors on the accuracy of computed eigenvalues and eigenvectors.
Input Parameters
If job = 'N', then A is neither permuted nor scaled (but ilo, ihi, and scale
get their values).
If job = 'P', then A is permuted but not scaled.
Output Parameters
922
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ilo, ihi The values ilo and ihi such that on exit a(i,j) is zero if i > j and 1 ≤j <
ilo or ihi < j≤n.
If job = 'N' or 'S', then ilo = 1 and ihi = n.
Return Values
This function returns a value info.
Application Notes
The errors are negligible, compared with those in subsequent computations.
If the matrix A is balanced by this routine, then any eigenvectors computed subsequently are eigenvectors of
the matrix A'' and hence you must call gebak to transform them back to eigenvectors of A.
If the Schur vectors of A are required, do not call this routine with job = 'S' or 'B', because then the
balancing transformation is not orthogonal (not unitary for complex flavors).
If you call this routine with job = 'P', then any Schur vectors computed subsequently are Schur vectors of
the matrix A'', and you need to call gebak (with side = 'R') to transform them back to Schur vectors of A.
?gebak
Transforms eigenvectors of a balanced matrix to those
of the original nonsymmetric matrix.
Syntax
lapack_int LAPACKE_sgebak( int matrix_layout, char job, char side, lapack_int n,
lapack_int ilo, lapack_int ihi, const float* scale, lapack_int m, float* v, lapack_int
ldv );
lapack_int LAPACKE_dgebak( int matrix_layout, char job, char side, lapack_int n,
lapack_int ilo, lapack_int ihi, const double* scale, lapack_int m, double* v,
lapack_int ldv );
lapack_int LAPACKE_cgebak( int matrix_layout, char job, char side, lapack_int n,
lapack_int ilo, lapack_int ihi, const float* scale, lapack_int m, lapack_complex_float*
v, lapack_int ldv );
lapack_int LAPACKE_zgebak( int matrix_layout, char job, char side, lapack_int n,
lapack_int ilo, lapack_int ihi, const double* scale, lapack_int m,
lapack_complex_double* v, lapack_int ldv );
923
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine is intended to be used after a matrix A has been balanced by a call to ?gebal, and eigenvectors
of the balanced matrix A''22 have subsequently been computed. For a description of balancing, see gebal. The
balanced matrix A'' is obtained as A''= D*P*A*PT*inv(D), where P is a permutation matrix and D is a
diagonal scaling matrix. This routine transforms the eigenvectors as follows:
if x is a right eigenvector of A'', then PT*inv(D)*x is a right eigenvector of A; if y is a left eigenvector of A'',
then PT*D*y is a left eigenvector of A.
Input Parameters
job Must be 'N' or 'P' or 'S' or 'B'. The same parameter job as supplied
to ?gebal.
ilo, ihi The values ilo and ihi, as returned by ?gebal. (If n > 0, then 1
≤ilo≤ihi≤n;
if n = 0, then ilo = 1 and ihi = 0.)
v Arrays:
v(size max(1, ldv*n) for column major layout and max(1, ldv*m) for row
major layout) contains the matrix of left or right eigenvectors to be
transformed.
ldv The leading dimension of v; at least max(1, n) for column major layout and
at least max(1, m) for row major layout .
Output Parameters
Return Values
This function returns a value info.
924
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = -i, the i-th parameter had an illegal value.
Application Notes
The errors in this routine are negligible.
The approximate number of floating-point operations is approximately proportional to m*n.
?hseqr
Computes all eigenvalues and (optionally) the Schur
factorization of a matrix reduced to Hessenberg form.
Syntax
lapack_int LAPACKE_shseqr( int matrix_layout, char job, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, float* h, lapack_int ldh, float* wr, float* wi, float*
z, lapack_int ldz );
lapack_int LAPACKE_dhseqr( int matrix_layout, char job, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, double* h, lapack_int ldh, double* wr, double* wi,
double* z, lapack_int ldz );
lapack_int LAPACKE_chseqr( int matrix_layout, char job, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, lapack_complex_float* h, lapack_int ldh,
lapack_complex_float* w, lapack_complex_float* z, lapack_int ldz );
lapack_int LAPACKE_zhseqr( int matrix_layout, char job, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, lapack_complex_double* h, lapack_int ldh,
lapack_complex_double* w, lapack_complex_double* z, lapack_int ldz );
Include Files
• mkl.h
Description
The routine computes all the eigenvalues, and optionally the Schur factorization, of an upper Hessenberg
matrix H: H = Z*T*ZH, where T is an upper triangular (or, for real flavors, quasi-triangular) matrix (the
Schur form of H), and Z is the unitary or orthogonal matrix whose columns are the Schur vectors zi.
You can also use this routine to compute the Schur factorization of a general matrix A which has been
reduced to upper Hessenberg form H:
A = Q*H*QH, where Q is unitary (orthogonal for real flavors);
A = (QZ)*T*(QZ)H.
In this case, after reducing A to Hessenberg form by gehrd, call orghr to form Q explicitly and then pass Q
to ?hseqr with compz = 'V'.
You can also call gebal to balance the original matrix before reducing it to Hessenberg form by ?hseqr, so
that the Hessenberg matrix H will have the structure:
925
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
If compz = 'N', then no Schur vectors are computed (and the array z is not
referenced).
If compz = 'I', then the Schur vectors of H are computed (and the array z
is initialized by the routine).
If compz = 'V', then the Schur vectors of A are computed (and the array z
must contain the matrix Q on entry).
926
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ilo, ihi If A has been balanced by ?gebal, then ilo and ihi must contain the values
returned by ?gebal. Otherwise, ilo must be set to 1 and ihi to n.
h, z Arrays:
h (size max(1, ldh*n)) ) The n-by-n upper Hessenberg matrix H.
If compz = 'V', then z must contain the matrix Q from the reduction to
Hessenberg form.
If compz = 'I', then z need not be set.
Output Parameters
w Array, size at least max (1, n). Contains the computed eigenvalues, unless
info>0. The eigenvalues are stored in the same order as on the diagonal of
the Schur form T (if computed).
Return Values
This function returns a value info.
If info = i, ?hseqr failed to compute all of the eigenvalues. Elements 1,2, ..., ilo-1 and i+1, i+2, ..., n of
the eigenvalue arrays (wr and wi for real flavors and w for complex flavors) contain the real and imaginary
parts of those eigenvalues that have been successfully found.
927
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If info > 0, and job = 'E', then on exit, the remaining unconverged eigenvalues are the eigenvalues of
the upper Hessenberg matrix rows and columns ilo through info of the final output value of H.
If info > 0, and job = 'S', then on exit (initial value of H)*U = U*(final value of H), where U is a unitary
matrix. The final value of H is upper Hessenberg and triangular in rows and columns info+1 through ihi.
If info > 0, and compz = 'V', then on exit (final value of Z) = (initial value of Z)*U, where U is the
unitary matrix (regardless of the value of job).
If info > 0, and compz = 'I', then on exit (final value of Z) = U, where U is the unitary matrix (regardless
of the value of job).
If info > 0, and compz = 'N', then Z is not accessed.
Application Notes
The computed Schur factorization is the exact factorization of a nearby matrix H + E, where ||E||2 < O(ε)
||H||2/si, and ε is the machine precision.
If λi is an exact eigenvalue, and μi is the corresponding computed value, then |λi - μi|≤c(n)*ε*||H||2/si,
where c(n) is a modestly increasing function of n, and si is the reciprocal condition number of λi. The
condition numbers si may be computed by calling trsna.
The total number of floating-point operations depends on how rapidly the algorithm converges; typical
numbers are as follows.
?hsein
Computes selected eigenvectors of an upper
Hessenberg matrix that correspond to specified
eigenvalues.
Syntax
lapack_int LAPACKE_shsein( int matrix_layout, char side, char eigsrc, char initv,
lapack_logical* select, lapack_int n, const float* h, lapack_int ldh, float* wr, const
float* wi, float* vl, lapack_int ldvl, float* vr, lapack_int ldvr, lapack_int mm,
lapack_int* m, lapack_int* ifaill, lapack_int* ifailr );
lapack_int LAPACKE_dhsein( int matrix_layout, char side, char eigsrc, char initv,
lapack_logical* select, lapack_int n, const double* h, lapack_int ldh, double* wr,
const double* wi, double* vl, lapack_int ldvl, double* vr, lapack_int ldvr, lapack_int
mm, lapack_int* m, lapack_int* ifaill, lapack_int* ifailr );
lapack_int LAPACKE_chsein( int matrix_layout, char side, char eigsrc, char initv, const
lapack_logical* select, lapack_int n, const lapack_complex_float* h, lapack_int ldh,
lapack_complex_float* w, lapack_complex_float* vl, lapack_int ldvl,
lapack_complex_float* vr, lapack_int ldvr, lapack_int mm, lapack_int* m, lapack_int*
ifaill, lapack_int* ifailr );
928
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_zhsein( int matrix_layout, char side, char eigsrc, char initv, const
lapack_logical* select, lapack_int n, const lapack_complex_double* h, lapack_int ldh,
lapack_complex_double* w, lapack_complex_double* vl, lapack_int ldvl,
lapack_complex_double* vr, lapack_int ldvr, lapack_int mm, lapack_int* m, lapack_int*
ifaill, lapack_int* ifailr );
Include Files
• mkl.h
Description
The routine computes left and/or right eigenvectors of an upper Hessenberg matrix H, corresponding to
selected eigenvalues.
The right eigenvector x and the left eigenvector y, corresponding to an eigenvalue λ, are defined by: H*x =
λ*x and yH*H = λ*yH (or HH*y = λ**y). Here λ* denotes the conjugate of λ.
The eigenvectors are computed by inverse iteration. They are scaled so that, for a real eigenvector x, max|
xi| = 1, and for a complex eigenvector, max(|Rexi| + |Imxi|) = 1.
If H has been formed by reduction of a general matrix A to upper Hessenberg form, then eigenvectors of H
may be transformed to eigenvectors of A by ormhr or unmhr.
Input Parameters
If eigsrc = 'Q', then the eigenvalues of H were found using hseqr; thus if
H has any zero sub-diagonal elements (and so is block triangular), then the
j-th eigenvalue can be assumed to be an eigenvalue of the block containing
the j-th row/column. This property allows the routine to perform inverse
iteration on just one diagonal block. If eigsrc = 'N', then no such
assumption is made and the routine performs inverse iteration using the
whole matrix.
If initv = 'N', then no initial estimates for the selected eigenvectors are
supplied.
If initv = 'U', then initial estimates for the selected eigenvectors are
supplied in vl and/or vr.
select Array, size at least max (1, n). Specifies which eigenvectors are to be
computed.
For real flavors:
To obtain the real eigenvector corresponding to the real eigenvalue wr[j],
set select[j] to 1
929
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
h, vl, vr Arrays:
h (size max(1, ldh*n)) The n-by-n upper Hessenberg matrix H. If an NAN
value is detected in h, the routine returns with info = -6.
vl(size max(1, ldvl*mm) for column major layout and max(1, ldvl*n) for
row major layout)
If initv = 'V' and side = 'L' or 'B', then vl must contain starting
vectors for inverse iteration for the left eigenvectors. Each starting vector
must be stored in the same column or columns as will be used to store the
corresponding eigenvector.
If initv = 'N', then vl need not be set.
vr(size max(1, ldvr*mm) for column major layout and max(1, ldvr*n) for
row major layout)
If initv = 'V' and side = 'R' or 'B', then vr must contain starting
vectors for inverse iteration for the right eigenvectors. Each starting vector
must be stored in the same column or columns as will be used to store the
corresponding eigenvector.
If initv = 'N', then vr need not be set.
930
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If side = 'R' or 'B', ldvr≥ max(1,n) for column major layout and ldvr≥
max(1, mm) for row major layout .
Output Parameters
vl, vr If side = 'L' or 'B', vl contains the computed left eigenvectors (as
specified by select).
If side = 'R' or 'B', vr contains the computed right eigenvectors (as
specified by select).
The eigenvectors treated column-wise form a rectangular n-by-mm matrix.
m For real flavors: the number of columns of vl and/or vr required to store the
selected eigenvectors.
For complex flavors: the number of selected eigenvectors.
931
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Return Values
This function returns a value info.
If info > 0, then i eigenvectors (as indicated by the parameters ifaill and/or ifailr above) failed to converge.
The corresponding columns of vl and/or vr contain no useful information.
Application Notes
Each computed right eigenvector x i is the exact eigenvector of a nearby matrix A + Ei, such that ||Ei|| <
O(ε)||A||. Hence the residual is small:
||Axi - λixi|| = O(ε)||A||.
However, eigenvectors corresponding to close or coincident eigenvalues may not accurately span the relevant
subspaces.
Similar remarks apply to computed left eigenvectors.
?trevc
Computes selected eigenvectors of an upper (quasi-)
triangular matrix computed by ?hseqr.
Syntax
lapack_int LAPACKE_strevc( int matrix_layout, char side, char howmny, lapack_logical*
select, lapack_int n, const float* t, lapack_int ldt, float* vl, lapack_int ldvl, float*
vr, lapack_int ldvr, lapack_int mm, lapack_int* m );
lapack_int LAPACKE_dtrevc( int matrix_layout, char side, char howmny, lapack_logical*
select, lapack_int n, const double* t, lapack_int ldt, double* vl, lapack_int ldvl,
double* vr, lapack_int ldvr, lapack_int mm, lapack_int* m );
lapack_int LAPACKE_ctrevc( int matrix_layout, char side, char howmny, const
lapack_logical* select, lapack_int n, lapack_complex_float* t, lapack_int ldt,
lapack_complex_float* vl, lapack_int ldvl, lapack_complex_float* vr, lapack_int ldvr,
lapack_int mm, lapack_int* m );
lapack_int LAPACKE_ztrevc( int matrix_layout, char side, char howmny, const
lapack_logical* select, lapack_int n, lapack_complex_double* t, lapack_int ldt,
lapack_complex_double* vl, lapack_int ldvl, lapack_complex_double* vr, lapack_int ldvr,
lapack_int mm, lapack_int* m );
Include Files
• mkl.h
932
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
The routine computes some or all of the right and/or left eigenvectors of an upper triangular matrix T (or, for
real flavors, an upper quasi-triangular matrix T). Matrices of this type are produced by the Schur
factorization of a general matrix: A = Q*T*QH, as computed by hseqr.
The right eigenvector x and the left eigenvector y of T corresponding to an eigenvalue w, are defined by:
T*x = w*x, yH*T = w*yH, where yH denotes the conjugate transpose of y.
The eigenvalues are not input to this routine, but are read directly from the diagonal blocks of T.
This routine returns the matrices X and/or Y of right and left eigenvectors of T, or the products Q*X and/or
Q*Y, where Q is an input matrix.
If Q is the orthogonal/unitary factor that reduces a matrix A to Schur form T, then Q*X and Q*Y are the
matrices of right and left eigenvectors of A.
Input Parameters
If omega[j - 1] and omega[j] are the real and imaginary parts of a complex
eigenvalue, the corresponding complex eigenvector is computed if either
select[j - 1] or select[j] is 1, and on exit select[j - 1] is set to 1and select[j]
is set to 0.
933
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
t, vl, vr Arrays:
t (size max(1, ldt*n)) contains the n-by-n matrix T in Schur canonical
form. For complex flavors ctrevc and ztrevc, contains the upper
triangular matrix T.
vl(size max(1, ldvl*mm) for column major layout and max(1, ldvl*n) for
row major layout)
If howmny = 'B' and side = 'L' or 'B', then vl must contain an n-by-n
matrix Q (usually the matrix of Schur vectors returned by ?hseqr).
vr(size max(1, ldvr*mm) for column major layout and max(1, ldvr*n) for
row major layout)
If howmny = 'B' and side = 'R' or 'B', then vr must contain an n-by-n
matrix Q (usually the matrix of Schur vectors returned by ?hseqr). .
mm The number of columns in the arrays vl and/or vr. Must be at least m (the
precise number of columns required).
If howmny = 'A' or 'B', mm = n.
Output Parameters
vl, vr If side = 'L' or 'B', vl contains the computed left eigenvectors (as
specified by howmny and select).
934
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If side = 'R' or 'B', vr contains the computed right eigenvectors (as
specified by howmny and select).
The eigenvectors treated column-wise form a rectangular n-by-mm matrix.
Return Values
This function returns a value info.
Application Notes
If xi is an exact right eigenvector and yi is the corresponding computed eigenvector, then the angle θ(yi,
xi) between them is bounded as follows: θ(yi,xi)≤(c(n)ε||T||2)/sepi where sepi is the reciprocal
condition number of xi. The condition number sepi may be computed by calling ?trsna.
?trevc3
Computes selected eigenvectors of an upper (quasi-)
triangular matrix computed by ?hseqr using Level 3
BLAS
Syntax
call strevc3(side, howmny, select, n, t, ldt, vl, ldvl, vr, ldvr, mm, m, work, lwork,
info)
call dtrevc3(side, howmny, select, n, t, ldt, vl, ldvl, vr, ldvr, mm, m, work, lwork,
info)
call ctrevc3(side, howmny, select, n, t, ldt, vl, ldvl, vr, ldvr, mm, m, work, lwork,
rwork, lrwork, info)
call ztrevc3(side, howmny, select, n, t, ldt, vl, ldvl, vr, ldvr, mm, m, work, lwork,
rwork, lrwork, info)
Include Files
• mkl.fi
935
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Description
This routine computes some or all of the right and left eigenvectors of an upper triangular matrix T (or, for
real flavors, an upper quasi-triangular matrix T) using Level 3 BLAS. Matrices of this type are produced by
the Schur factorization of a general matrix: A =Q*T*QH, as computed by hseqr.
The right eigenvector x and the left eigenvector y of T corresponding to an eigenvalue w are defined by the
following:
If Q is the orthogonal/unitary factor that reduces a matrix A to Schur form T, then Q*X and Q*Y are the
matrices of the right and left eigenvectors of A.
Input Parameters
side CHARACTER*1
Must be 'R', 'L', or 'B'.
howmny CHARACTER*1
Must be 'A', 'B', or 'S'.
936
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• If select(j) is .TRUE., the eigenvector corresponding to the jth
eigenvalue is computed.
n INTEGER
The order of the matrix T (n≥ 0).
lwork INTEGER
The size of the work array. Must be at least max(1, 3*n) for real
flavors, and at least max(1, 2*n) for complex flavors.
ldt INTEGER
The leading dimension of t. It is at least max(1, n).
ldvl INTEGER
937
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ldvr INTEGER
The leading dimension of vr.
mm INTEGER
The number of columns in one or both of the arrays vl and vr. Must
be at least m (the precise number of columns required).
Constraint: 0 ≤mm≤n.
lrwork INTEGER
The size of the rwork array. It must be at least max(1, n).
Output Parameters
vl, vr If side = 'L' or 'B', vl contains the computed left eigenvectors (as
specified by howmny and select).
938
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Treated column-wise, the eigenvectors form a rectangular n-by-mm
matrix.
m INTEGER
rwork(1) On exit, if info = 0, then rwork(1) returns the required optimal size
of lrwork.
info INTEGER
If info = 0, the execution is successful.
Application Notes
If xi is an exact right eigenvector and yi is the corresponding computed eigenvector, the angle θ(yi, xi)
between them is bounded as follows:
θ(yi,xi)≤(c(n)ε||T||2)/sepi
where sepi is the reciprocal condition number of xi. You can compute the condition number sepi by
calling ?trsna.
See Also
Matrix Storage Schemes
?trsna
Estimates condition numbers for specified eigenvalues
and right eigenvectors of an upper (quasi-) triangular
matrix.
939
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
lapack_int LAPACKE_strsna( int matrix_layout, char job, char howmny, const
lapack_logical* select, lapack_int n, const float* t, lapack_int ldt, const float* vl,
lapack_int ldvl, const float* vr, lapack_int ldvr, float* s, float* sep, lapack_int mm,
lapack_int* m );
lapack_int LAPACKE_dtrsna( int matrix_layout, char job, char howmny, const
lapack_logical* select, lapack_int n, const double* t, lapack_int ldt, const double*
vl, lapack_int ldvl, const double* vr, lapack_int ldvr, double* s, double* sep,
lapack_int mm, lapack_int* m );
lapack_int LAPACKE_ctrsna( int matrix_layout, char job, char howmny, const
lapack_logical* select, lapack_int n, const lapack_complex_float* t, lapack_int ldt,
const lapack_complex_float* vl, lapack_int ldvl, const lapack_complex_float* vr,
lapack_int ldvr, float* s, float* sep, lapack_int mm, lapack_int* m );
lapack_int LAPACKE_ztrsna( int matrix_layout, char job, char howmny, const
lapack_logical* select, lapack_int n, const lapack_complex_double* t, lapack_int ldt,
const lapack_complex_double* vl, lapack_int ldvl, const lapack_complex_double* vr,
lapack_int ldvr, double* s, double* sep, lapack_int mm, lapack_int* m );
Include Files
• mkl.h
Description
The routine estimates condition numbers for specified eigenvalues and/or right eigenvectors of an upper
triangular matrix T (or, for real flavors, upper quasi-triangular matrix T in canonical Schur form). These are
the same as the condition numbers of the eigenvalues and right eigenvectors of an original matrix A =
Z*T*ZH (with unitary or, for real flavors, orthogonal Z), from which T may have been derived.
The routine computes the reciprocal of the condition number of an eigenvalue λi as si = |vT*u|/(||u||E||
v||E) for real flavors and si = |vH*u|/(||u||E||v||E) for complex flavors,
where:
• u and v are the right and left eigenvectors of T, respectively, corresponding to λi.
• vT/vH denote transpose/conjugate transpose of v, respectively.
This reciprocal condition number always lies between zero (ill-conditioned) and one (well-conditioned).
An approximate error estimate for a computed eigenvalue λi is then given by ε*||T||/si, where ε is the
machine precision.
To estimate the reciprocal of the condition number of the right eigenvector corresponding to λi, the routine
first calls trexc to reorder the diagonal elements of matrix T so that λi is in the leading position:
940
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The reciprocal condition number of the eigenvector is then estimated as sepi, the smallest singular value of
the matrix T22 - λi*I.
An approximate error estimate for a computed right eigenvector u corresponding to λi is then given by ε*||
T||/sepi.
Input Parameters
If job = 'E', then condition numbers for eigenvalues only are computed.
If job = 'V', then condition numbers for eigenvectors only are computed.
If howmny = 'A', then the condition numbers for all eigenpairs are
computed.
If howmny = 'S', then condition numbers for selected eigenpairs (as
specified by select) are computed.
select Array, size at least max (1, n) if howmny = 'S' and at least 1 otherwise.
941
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
t, vl, vr Arrays:
t (size max(1, ldt*n)) contains the n-by-n matrix T.
vl(size max(1, ldvl*mm) for column major layout and max(1, ldvl*n) for
row major layout)
If job = 'E' or 'B', then vl must contain the left eigenvectors of T (or of
any matrix Q*T*QH with Q unitary or orthogonal) corresponding to the
eigenpairs specified by howmny and select. The eigenvectors must be
stored in consecutive columns of vl, as returned by trevc or hsein.
The array vl is not referenced if job = 'V'.
vr(size max(1, ldvr*mm) for column major layout and max(1, ldvr*n) for
row major layout)
If job = 'E' or 'B', then vr must contain the right eigenvectors of T (or of
any matrix Q*T*QH with Q unitary or orthogonal) corresponding to the
eigenpairs specified by howmny and select. The eigenvectors must be
stored in consecutive columns of vr, as returned by trevc or hsein.
The array vr is not referenced if job = 'V'.
mm The number of elements in the arrays s and sep, and the number of
columns in vl and vr (if used). Must be at least m (the precise number
required).
If howmny = 'A', mm = n;
Output Parameters
s Array, size at least max(1, mm) if job = 'E' or 'B' and at least 1 if job =
'V'.
942
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Contains the reciprocal condition numbers of the selected eigenvalues if job
= 'E' or 'B', stored in consecutive elements of the array. Thus s[j - 1],
sep[j - 1] and the j-th columns of vl and vr all correspond to the same
eigenpair (but not in general the j th eigenpair unless all eigenpairs have
been selected).
For real flavors: for a complex conjugate pair of eigenvalues, two
consecutive elements of s are set to the same value. The array s is not
referenced if job = 'V'.
sep Array, size at least max(1, mm) if job = 'V' or 'B' and at least 1 if job =
'E'. Contains the estimated reciprocal condition numbers of the selected
right eigenvectors if job = 'V' or 'B', stored in consecutive elements of
the array.
For real flavors: for a complex eigenvector, two consecutive elements of sep
are set to the same value; if the eigenvalues cannot be reordered to
compute sep[j - 1], then sep[j - 1] is set to zero; this can only occur when
the true value would be very small anyway. The array sep is not referenced
if job = 'E'.
For real flavors: the number of elements of s and/or sep actually used to
store the estimated condition numbers.
If howmny = 'A', m is set to n.
Return Values
This function returns a value info.
Application Notes
The computed values sepi may overestimate the true value, but seldom by a factor of more than 3.
?trexc
Reorders the Schur factorization of a general matrix.
Syntax
lapack_int LAPACKE_strexc( int matrix_layout, char compq, lapack_int n, float* t,
lapack_int ldt, float* q, lapack_int ldq, lapack_int* ifst, lapack_int* ilst );
lapack_int LAPACKE_dtrexc( int matrix_layout, char compq, lapack_int n, double* t,
lapack_int ldt, double* q, lapack_int ldq, lapack_int* ifst, lapack_int* ilst );
lapack_int LAPACKE_ctrexc( int matrix_layout, char compq, lapack_int n,
lapack_complex_float* t, lapack_int ldt, lapack_complex_float* q, lapack_int ldq,
lapack_int ifst, lapack_int ilst );
lapack_int LAPACKE_ztrexc( int matrix_layout, char compq, lapack_int n,
lapack_complex_double* t, lapack_int ldt, lapack_complex_double* q, lapack_int ldq,
lapack_int ifst, lapack_int ilst );
943
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine reorders the Schur factorization of a general matrix A = Q*T*QH, so that the diagonal element or
block of T with row index ifst is moved to row ilst.
The reordered Schur form S is computed by an unitary (or, for real flavors, orthogonal) similarity
transformation: S = ZH*T*Z. Optionally the updated matrix P of Schur vectors is computed as P = Q*Z,
giving A = P*S*PH.
Input Parameters
t, q Arrays:
t (size max(1, ldt*n)) contains the n-by-n matrix T.
Output Parameters
944
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Return Values
This function returns a value info.
Application Notes
The computed matrix S is exactly similar to a matrix T+E, where ||E||2 = O(ε)*||T||2, and ε is the
machine precision.
Note that if a 2 by 2 diagonal block is involved in the re-ordering, its off-diagonal elements are in general
changed; the diagonal elements and the eigenvalues of the block are unchanged unless the block is
sufficiently ill-conditioned, in which case they may be noticeably altered. It is possible for a 2 by 2 block to
break into two 1 by 1 blocks, that is, for a pair of complex eigenvalues to become purely real.
The approximate number of floating-point operations is
?trsen
Reorders the Schur factorization of a matrix and
(optionally) computes the reciprocal condition
numbers for the selected cluster of eigenvalues and
respective invariant subspace.
Syntax
lapack_int LAPACKE_strsen( int matrix_layout, char job, char compq, const
lapack_logical* select, lapack_int n, float* t, lapack_int ldt, float* q, lapack_int
ldq, float* wr, float* wi, lapack_int* m, float* s, float* sep );
lapack_int LAPACKE_dtrsen( int matrix_layout, char job, char compq, const
lapack_logical* select, lapack_int n, double* t, lapack_int ldt, double* q, lapack_int
ldq, double* wr, double* wi, lapack_int* m, double* s, double* sep );
lapack_int LAPACKE_ctrsen( int matrix_layout, char job, char compq, const
lapack_logical* select, lapack_int n, lapack_complex_float* t, lapack_int ldt,
lapack_complex_float* q, lapack_int ldq, lapack_complex_float* w, lapack_int* m, float*
s, float* sep );
lapack_int LAPACKE_ztrsen( int matrix_layout, char job, char compq, const
lapack_logical* select, lapack_int n, lapack_complex_double* t, lapack_int ldt,
lapack_complex_double* q, lapack_int ldq, lapack_complex_double* w, lapack_int* m,
double* s, double* sep );
Include Files
• mkl.h
Description
945
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The routine reorders the Schur factorization of a general matrix A = Q*T*QT (for real flavors) or A = Q*T*QH
(for complex flavors) so that a selected cluster of eigenvalues appears in the leading diagonal elements (or,
for real flavors, diagonal blocks) of the Schur form. The reordered Schur form R is computed by a unitary
(orthogonal) similarity transformation: R = ZH*T*Z. Optionally the updated matrix P of Schur vectors is
computed as P = Q*Z, giving A = P*R*PH.
Let
where the selected eigenvalues are precisely the eigenvalues of the leading m-by-m submatrix T11. Let P be
correspondingly partitioned as (Q1Q2) where Q1 consists of the first m columns of Q. Then A*Q1 = Q1*T11,
and so the m columns of Q1 form an orthonormal basis for the invariant subspace corresponding to the
selected cluster of eigenvalues.
Optionally the routine also computes estimates of the reciprocal condition numbers of the average of the
cluster of eigenvalues and of the invariant subspace.
Input Parameters
If job = 'E', then only the condition number for the cluster of eigenvalues
is computed.
If job = 'V', then only the condition number for the invariant subspace is
computed.
If job = 'B', then condition numbers for both the cluster and the invariant
subspace are computed.
946
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
For real flavors: to select a complex conjugate pair of eigenvalues λj and λj
+1 (corresponding 2 by 2 diagonal block), select[j - 1] and/or select[j] must
be 1; the complex conjugate λjand λj + 1 must be either both included in the
cluster or both excluded.
t, q Arrays:
t (size max(1, ldt*n)) Theupper quasi-triangular n-by-n matrix T, in Schur
canonical form.
q (size max(1, ldq*n))
Output Parameters
q If compq = 'V', q contains the updated matrix of Schur vectors; the first
m columns of the Q form an orthogonal basis for the specified invariant
subspace.
wr, wi Arrays, size at least max(1, n). Contain the real and imaginary parts,
respectively, of the reordered eigenvalues of R. The eigenvalues are stored
in the same order as on the diagonal of R. Note that if a complex
eigenvalue is sufficiently ill-conditioned, then its value may differ
significantly from its value before reordering.
For real flavors: if info = 1, then s is set to zero.s is not referenced if job
= 'N' or 'V'.
947
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
sep If job = 'V' or 'B', sep is the estimated reciprocal condition number of
the specified invariant subspace.
If m = 0 or n, then sep = |T|.
Return Values
This function returns a value info.
If info = 1, the reordering of T failed because some eigenvalues are too close to separate (the problem is
very ill-conditioned); T may have been partially reordered, and wr and wi contain the eigenvalues in the
same order as in T; s and sep (if requested) are set to zero.
Application Notes
The computed matrix R is exactly similar to a matrix T+E, where ||E||2 = O(ε)*||T||2, and ε is the
machine precision. The computed s cannot underestimate the true reciprocal condition number by more than
a factor of (min(m, n-m))1/2; sep may differ from the true value by (m*n-m2)1/2. The angle between the
computed invariant subspace and the true subspace is O(ε)*||A||2/sep. Note that if a 2-by-2 diagonal
block is involved in the re-ordering, its off-diagonal elements are in general changed; the diagonal elements
and the eigenvalues of the block are unchanged unless the block is sufficiently ill-conditioned, in which case
they may be noticeably altered. It is possible for a 2-by-2 block to break into two 1-by-1 blocks, that is, for a
pair of complex eigenvalues to become purely real.
?trsyl
Solves Sylvester equation for real quasi-triangular or
complex triangular matrices.
Syntax
lapack_int LAPACKE_strsyl( int matrix_layout, char trana, char tranb, lapack_int isgn,
lapack_int m, lapack_int n, const float* a, lapack_int lda, const float* b, lapack_int
ldb, float* c, lapack_int ldc, float* scale );
lapack_int LAPACKE_dtrsyl( int matrix_layout, char trana, char tranb, lapack_int isgn,
lapack_int m, lapack_int n, const double* a, lapack_int lda, const double* b,
lapack_int ldb, double* c, lapack_int ldc, double* scale );
lapack_int LAPACKE_ctrsyl( int matrix_layout, char trana, char tranb, lapack_int isgn,
lapack_int m, lapack_int n, const lapack_complex_float* a, lapack_int lda, const
lapack_complex_float* b, lapack_int ldb, lapack_complex_float* c, lapack_int ldc,
float* scale );
lapack_int LAPACKE_ztrsyl( int matrix_layout, char trana, char tranb, lapack_int isgn,
lapack_int m, lapack_int n, const lapack_complex_double* a, lapack_int lda, const
lapack_complex_double* b, lapack_int ldb, lapack_complex_double* c, lapack_int ldc,
double* scale );
Include Files
• mkl.h
948
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
The routine solves the Sylvester matrix equation op(A)*X±X*op(B) = α*C, where op(A) = A or AH, and the
matrices A and B are upper triangular (or, for real flavors, upper quasi-triangular in canonical Schur form); α≤
1 is a scale factor determined by the routine to avoid overflow in X; A is m-by-m, B is n-by-n, and C and X
are both m-by-n. The matrix X is obtained by a straightforward process of back substitution.
The equation has a unique solution if and only if αi±βi≠ 0, where {αi} and {βi} are the eigenvalues of A and
B, respectively, and the sign (+ or -) is the same as that used in the equation to be solved.
Input Parameters
a, b, c Arrays:
a (size max(1, lda*m)) contains the matrix A.
c(size max(1, ldc*n) for column major layout and max(1, ldc*m for row
major layout) contains the matrix C.
lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.
ldc The leading dimension of c; at least max(1, m) for column major layout and
at least max(1, n) for row major layout .
Output Parameters
949
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Return Values
This function returns a value info.
If info = 1, A and B have common or close eigenvalues; perturbed values were used to solve the equation.
Application Notes
Let X be the exact, Y the corresponding computed solution, and R the residual matrix: R = C - (AY±YB).
Then the residual is always small:
||R||F = O(ε)*(||A||F +||B||F)*||Y||F.
However, Y is not necessarily the exact solution of a slightly perturbed equation; in other words, the solution
is not backwards stable.
For the forward error, the following bound holds:
||Y - X||F≤||R||F/sep(A,B)
but this may be a considerable overestimate. See [Golub96] for a definition of sep(A, B).
The approximate number of floating-point operations for real flavors is m*n*(m + n). For complex flavors it
is 4 times greater.
gghrd Reduces a pair of matrices to generalized upper Hessenberg form using orthogonal/
unitary transformations.
950
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Routine Operation performed
name
hgeqz Implements the QZ method for finding the generalized eigenvalues of the matrix pair
(H,T).
tgevc Computes some or all of the right and/or left generalized eigenvectors of a pair of upper
triangular matrices
tgexc Reorders the generalized Schur decomposition of a pair of matrices (A,B) so that one
diagonal block of (A,B) moves to another row index.
tgsen Reorders the generalized Schur decomposition of a pair of matrices (A,B) so that a
selected cluster of eigenvalues appears in the leading diagonal blocks of (A,B).
tgsyl Estimates reciprocal condition numbers for specified eigenvalues and/or eigenvectors of a
pair of matrices in generalized real Schur canonical form.
?gghrd
Reduces a pair of matrices to generalized upper
Hessenberg form using orthogonal/unitary
transformations.
Syntax
lapack_int LAPACKE_sgghrd (int matrix_layout, char compq, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, float* a, lapack_int lda, float* b, lapack_int ldb,
float* q, lapack_int ldq, float* z, lapack_int ldz);
lapack_int LAPACKE_dgghrd (int matrix_layout, char compq, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, double* a, lapack_int lda, double* b, lapack_int ldb,
double* q, lapack_int ldq, double* z, lapack_int ldz);
lapack_int LAPACKE_cgghrd (int matrix_layout, char compq, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, lapack_complex_float* a, lapack_int lda,
lapack_complex_float* b, lapack_int ldb, lapack_complex_float* q, lapack_int ldq,
lapack_complex_float* z, lapack_int ldz);
lapack_int LAPACKE_zgghrd (int matrix_layout, char compq, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, lapack_complex_double* a, lapack_int lda,
lapack_complex_double* b, lapack_int ldb, lapack_complex_double* q, lapack_int ldq,
lapack_complex_double* z, lapack_int ldz);
Include Files
• mkl.h
Description
The routine reduces a pair of real/complex matrices (A,B) to generalized upper Hessenberg form using
orthogonal/unitary transformations, where A is a general matrix and B is upper triangular. The form of the
generalized eigenvalue problem is A*x = λ*B*x, and B is typically made upper triangular by computing its
QR factorization and moving the orthogonal matrix Q to the left side of the equation.
This routine simultaneously reduces A to a Hessenberg matrix H:
QH*A*Z = H
951
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The orthogonal/unitary matrices Q and Z are determined as products of Givens rotations. They may either be
formed explicitly, or they may be postmultiplied into input matrices Q1 and Z1, so that
Q1*A*Z1H = (Q1*Q)*H*(Z1*Z)H
Q1*B*Z1H = (Q1*Q)*T*(Z1*Z)H
If Q1 is the orthogonal/unitary matrix from the QR factorization of B in the original equation A*x = λ*B*x,
then the routine ?gghrd reduces the original problem to generalized Hessenberg form.
Input Parameters
ilo, ihi ilo and ihi mark the rows and columns of A which are to be reduced. It is
assumed that A is already upper triangular in rows and columns 1:ilo-1 and
ihi+1:n. Values of ilo and ihi are normally set by a previous call to ggbal;
otherwise they should be set to 1 and n respectively.
Constraint:
If n > 0, then 1 ≤ilo≤ihi≤n;
a, b, q, z Arrays:
a (size max(1, lda*n)) contains the n-by-n general matrix A.
952
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
z (size max(1, ldz*n))
Output Parameters
a On exit, the upper triangle and the first subdiagonal of A are overwritten
with the upper Hessenberg matrix H, and the rest is set to zero.
Return Values
This function returns a value info.
?ggbal
Balances a pair of general real or complex matrices.
Syntax
lapack_int LAPACKE_sggbal( int matrix_layout, char job, lapack_int n, float* a,
lapack_int lda, float* b, lapack_int ldb, lapack_int* ilo, lapack_int* ihi, float*
lscale, float* rscale );
lapack_int LAPACKE_dggbal( int matrix_layout, char job, lapack_int n, double* a,
lapack_int lda, double* b, lapack_int ldb, lapack_int* ilo, lapack_int* ihi, double*
lscale, double* rscale );
lapack_int LAPACKE_cggbal( int matrix_layout, char job, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* b, lapack_int ldb,
lapack_int* ilo, lapack_int* ihi, float* lscale, float* rscale );
953
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine balances a pair of general real/complex matrices (A,B). This involves, first, permuting A and B by
similarity transformations to isolate eigenvalues in the first 1 to ilo-1 and last ihi+1 to n elements on the
diagonal;and second, applying a diagonal similarity transformation to rows and columns ilo to ihi to make the
rows and columns as close in norm as possible. Both steps are optional. Balancing may reduce the 1-norm of
the matrices, and improve the accuracy of the computed eigenvalues and/or eigenvectors in the generalized
eigenvalue problem A*x = λ*B*x.
Input Parameters
a, b Arrays:
a (size max(1, lda*n)) contains the matrix A.
Output Parameters
ilo, ihi ilo and ihi are set to integers such that on exit Ai, j = 0 and Bi, j = 0 if i>j
and j=1,...,ilo-1 or i=ihi+1,..., n.
954
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lscale contains details of the permutations and scaling factors applied to the
left side of A and B.
If Pj is the index of the row interchanged with row j, and Dj is the scaling
factor applied to row j, then
lscale[j - 1] = Pj, for j = 1,..., ilo-1
= Dj, for j = ilo,...,ihi
= Pj, for j = ihi+1,..., n.
rscale contains details of the permutations and scaling factors applied to the
right side of A and B.
If Pj is the index of the column interchanged with column j, and Dj is the
scaling factor applied to column j, then
rscale[j - 1] = Pj, for j = 1,..., ilo-1
= Dj, for j = ilo,...,ihi
= Pj, for j = ihi+1,..., n
The order in which the interchanges are made is n to ihi+1, then 1 to ilo-1.
Return Values
This function returns a value info.
?ggbak
Forms the right or left eigenvectors of a generalized
eigenvalue problem.
Syntax
lapack_int LAPACKE_sggbak( int matrix_layout, char job, char side, lapack_int n,
lapack_int ilo, lapack_int ihi, const float* lscale, const float* rscale, lapack_int m,
float* v, lapack_int ldv );
lapack_int LAPACKE_dggbak( int matrix_layout, char job, char side, lapack_int n,
lapack_int ilo, lapack_int ihi, const double* lscale, const double* rscale, lapack_int
m, double* v, lapack_int ldv );
lapack_int LAPACKE_cggbak( int matrix_layout, char job, char side, lapack_int n,
lapack_int ilo, lapack_int ihi, const float* lscale, const float* rscale, lapack_int m,
lapack_complex_float* v, lapack_int ldv );
lapack_int LAPACKE_zggbak( int matrix_layout, char job, char side, lapack_int n,
lapack_int ilo, lapack_int ihi, const double* lscale, const double* rscale, lapack_int
m, lapack_complex_double* v, lapack_int ldv );
Include Files
• mkl.h
Description
The routine forms the right or left eigenvectors of a real/complex generalized eigenvalue problem
955
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
A*x = λ*B*x
by backward transformation on the computed eigenvectors of the balanced pair of matrices output by ggbal.
Input Parameters
job Specifies the type of backward transformation required. Must be 'N', 'P',
'S', or 'B'.
If job = 'N', then no operations are done; return.
ilo, ihi The integers ilo and ihi determined by ?gebal. Constraint:
The array rscale contains details of the permutations and/or scaling factors
applied to the right side of A and B, as returned by ?ggbal.
v Array v(size max(1, ldv*m) for column major layout and max(1, ldv*n) for
row major layout) . Contains the matrix of right or left eigenvectors to be
transformed, as returned by tgevc.
ldv The leading dimension of v; at least max(1, n) for column major layout and
at least max(1, m) for row major layout .
Output Parameters
Return Values
This function returns a value info.
956
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = -i, the i-th parameter had an illegal value.
?gghd3
Reduces a pair of matrices to generalized upper
Hessenberg form.
Syntax
lapack_int LAPACKE_sgghd3 (int matrix_layout, char compq, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, float * a, lapack_int lda, float * b, lapack_int ldb,
float * q, lapack_int ldq, float * z, lapack_int ldz);
lapack_int LAPACKE_dgghd3 (int matrix_layout, char compq, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, double * a, lapack_int lda, double * b, lapack_int ldb,
double * q, lapack_int ldq, double * z, lapack_int ldz);
lapack_int LAPACKE_cgghd3 (int matrix_layout, char compq, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, lapack_complex_float * a, lapack_int lda,
lapack_complex_float * b, lapack_int ldb, lapack_complex_float * q, lapack_int ldq,
lapack_complex_float * z, lapack_int ldz);
lapack_int LAPACKE_zgghd3 (int matrix_layout, char compq, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, lapack_complex_double * a, lapack_int lda,
lapack_complex_double * b, lapack_int ldb, lapack_complex_double * q, lapack_int ldq,
lapack_complex_double * z, lapack_int ldz);
Include Files
• mkl.h
Description
?gghd3 reduces a pair of real or complex matrices (A, B) to generalized upper Hessenberg form using
orthogonal/unitary transformations, where A is a general matrix and B is upper triangular. The form of the
generalized eigenvalue problem is
A*x = λ*B*x,
and B is typically made upper triangular by computing its QR factorization and moving the orthogonal/unitary
matrix Q to the left side of the equation.
This subroutine simultaneously reduces A to a Hessenberg matrix H:
QT*A*Z = H for real flavors
or
QT*A*Z = H for complex flavors
and transforms B to another upper triangular matrix T:
QT*B*Z = T for real flavors
or
QT*B*Z = T for complex flavors
in order to reduce the problem to its standard form
H*y = λ*T*y
where y = ZT*x for real flavors
or
y = ZT*x for complex flavors.
957
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The orthogonal/unitary matrices Q and Z are determined as products of Givens rotations. They may either be
formed explicitly, or they may be postmultiplied into input matrices Q1 and Z1, so that
for real flavors:
Q1 * A * Z1T = (Q1*Q) * H * (Z1*Z)T
Q1 * B * Z1T = (Q1*Q) * T * (Z1*Z)T
for complex flavors:
Q1 * A * Z1H = (Q1*Q) * H * (Z1*Z)T
Q1 * B * Z1T = (Q1*Q) * T * (Z1*Z)T
If Q1 is the orthogonal/unitary matrix from the QR factorization of B in the original equation A*x = λ*B*x,
then ?gghd3 reduces the original problem to generalized Hessenberg form.
This is a blocked variant of ?gghrd, using matrix-matrix multiplications for parts of the computation to
enhance performance.
Input Parameters
ilo, ihi ilo and ihi mark the rows and columns of a which are to be reduced. It is
assumed that a is already upper triangular in rows and columns 1:ilo - 1
and ihi + 1:n. ilo and ihi are normally set by a previous call to ?ggbal;
otherwise they should be set to 1 and n, respectively.
lda≥ max(1,n).
b Array, (ldb*n).
958
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ldb The leading dimension of the array b.
ldb≥ max(1,n).
ldz The leading dimension of the array z. ldz≥n if compz='V' or 'I'; ldz≥ 1
otherwise.
Output Parameters
Return Values
This function returns a value info.
= 0: successful exit.
< 0: if info = -i, the i-th argument had an illegal value.
Application Notes
This routine reduces A to Hessenberg form and maintains B in using a blocked variant of Moler and Stewart's
original algorithm, as described by Kagstrom, Kressner, Quintana-Orti, and Quintana-Orti (BIT 2008).
?hgeqz
Implements the QZ method for finding the generalized
eigenvalues of the matrix pair (H,T).
959
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
lapack_int LAPACKE_shgeqz( int matrix_layout, char job, char compq, char compz,
lapack_int n, lapack_int ilo, lapack_int ihi, float* h, lapack_int ldh, float* t,
lapack_int ldt, float* alphar, float* alphai, float* beta, float* q, lapack_int ldq,
float* z, lapack_int ldz );
lapack_int LAPACKE_dhgeqz( int matrix_layout, char job, char compq, char compz,
lapack_int n, lapack_int ilo, lapack_int ihi, double* h, lapack_int ldh, double* t,
lapack_int ldt, double* alphar, double* alphai, double* beta, double* q, lapack_int ldq,
double* z, lapack_int ldz );
lapack_int LAPACKE_chgeqz( int matrix_layout, char job, char compq, char compz,
lapack_int n, lapack_int ilo, lapack_int ihi, lapack_complex_float* h, lapack_int ldh,
lapack_complex_float* t, lapack_int ldt, lapack_complex_float* alpha,
lapack_complex_float* beta, lapack_complex_float* q, lapack_int ldq,
lapack_complex_float* z, lapack_int ldz );
lapack_int LAPACKE_zhgeqz( int matrix_layout, char job, char compq, char compz,
lapack_int n, lapack_int ilo, lapack_int ihi, lapack_complex_double* h, lapack_int ldh,
lapack_complex_double* t, lapack_int ldt, lapack_complex_double* alpha,
lapack_complex_double* beta, lapack_complex_double* q, lapack_int ldq,
lapack_complex_double* z, lapack_int ldz );
Include Files
• mkl.h
Description
The routine computes the eigenvalues of a real/complex matrix pair (H,T), where H is an upper Hessenberg
matrix and T is upper triangular, using the double-shift version (for real flavors) or single-shift version (for
complex flavors) of the QZ method. Matrix pairs of this type are produced by the reduction to generalized
upper Hessenberg form of a real/complex matrix pair (A,B):
A = Q1*H*Z1H, B = Q1*T*Z1H,
as computed by ?gghrd.
H = Q*S*ZT, T = Q*P*ZT,
where Q and Z are orthogonal matrices, P is an upper triangular matrix, and S is a quasi-triangular matrix
with 1-by-1 and 2-by-2 diagonal blocks. The 1-by-1 blocks correspond to real eigenvalues of the matrix pair
(H,T) and the 2-by-2 blocks correspond to complex conjugate pairs of eigenvalues.
Additionally, the 2-by-2 upper triangular diagonal blocks of P corresponding to 2-by-2 blocks of S are reduced
to positive diagonal form, that is, if Sj + 1, j is non-zero, then Pj + 1, j = Pj, j + 1 = 0, Pj, j > 0, and Pj +
1, j + 1 > 0.
H = Q* S*ZH, T = Q*P*ZH,
where Q and Z are unitary matrices, and S and P are upper triangular.
For all function flavors:
960
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Optionally, the orthogonal/unitary matrix Q from the generalized Schur factorization may be post-multiplied
by an input matrix Q1, and the orthogonal/unitary matrix Z may be post-multiplied by an input matrix Z1.
If Q1 and Z1 are the orthogonal/unitary matrices from ?gghrd that reduced the matrix pair (A,B) to
generalized upper Hessenberg form, then the output matrices Q1Q and Z1Z are the orthogonal/unitary
factors from the generalized Schur factorization of (A,B):
A = (Q1Q)*S *(Z1Z)H, B = (Q1Q)*P*(Z1Z)H.
To avoid overflow, eigenvalues of the matrix pair (H,T) (equivalently, of (A,B)) are computed as a pair of
values (alpha,beta). For chgeqz/zhgeqz, alpha and beta are complex, and for shgeqz/dhgeqz, alpha is
complex and beta real. If beta is nonzero, λ = alpha/beta is an eigenvalue of the generalized
nonsymmetric eigenvalue problem (GNEP)
A*x = λ*B*x
and if alpha is nonzero, μ = beta/alpha is an eigenvalue of the alternate form of the GNEP
μ*A*y = B*y .
Real eigenvalues (for real flavors) or the values of alpha and beta for the i-th eigenvalue (for complex
flavors) can be read directly from the generalized Schur form:
alpha = Si, i, beta = Pi, i.
Input Parameters
If compq = 'I', q is initialized to the unit matrix and the matrix of left
Schur vectors of (H,T) is returned;
If compq = 'V', q must contain an orthogonal/unitary matrix Q1 on entry
and the product Q1*Q is returned.
If compz = 'I', z is initialized to the unit matrix and the matrix of right
Schur vectors of (H,T) is returned;
If compz = 'V', z must contain an orthogonal/unitary matrix Z1 on entry
and the product Z1*Z is returned.
ilo, ihi ilo and ihi mark the rows and columns of H which are in Hessenberg form.
It is assumed that H is already upper triangular in rows and columns 1:ilo-1
and ihi+1:n.
Constraint:
961
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
h, t, q, z Arrays:
On entry, h (size max(1, ldh*n)) contains the n-by-n upper Hessenberg
matrix H.
On entry, t (size max(1, ldt*n)) contains the n-by-n upper triangular
matrix T.
q (size max(1, ldq*n)) :
Output Parameters
t If job = 'S', then, on exit, t contains the upper triangular matrix P from
the generalized Schur factorization.
For real flavors:
962
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
2-by-2 diagonal blocks of P corresponding to 2-by-2 blocks of S are reduced
to positive diagonal form, that is, if h(j+1,j) is non-zero, then t(j
+1,j)=t(j,j+1)=0 and t(j,j) and t(j+1,j+1) will be positive.
If job = 'E', then on exit the diagonal blocks of t match those of P, but
the rest of t is unspecified.
For complex flavors:
if job = 'E', then on exit the diagonal of t matches that of P, but the rest
of t is unspecified.
alphar, alphai Arrays, size at least max(1, n). The real and imaginary parts, respectively,
of each scalar alpha defining an eigenvalue of GNEP.
If alphai[j - 1] is zero, then the j-th eigenvalue is real; if positive, then the
j-th and (j+1)-th eigenvalues are a complex conjugate pair, with
alphai[j] = -alphai[j - 1].
Return Values
This function returns a value info.
963
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
(H,T) is not in Schur form, but alphar[i - 1], alphai[i - 1] (for real flavors), alpha[i - 1] (for complex flavors),
and beta[i - 1], i=info+1,..., n should be correct.
(H,T) is not in Schur form, but alphar[i - 1], alphai[i - 1] (for real flavors), alpha[i - 1] (for complex flavors),
and beta[i - 1], i =info-n+1,..., n should be correct.
?tgevc
Computes some or all of the right and/or left
generalized eigenvectors of a pair of upper triangular
matrices.
Syntax
lapack_int LAPACKE_stgevc (int matrix_layout, char side, char howmny, const
lapack_logical* select, lapack_int n, const float* s, lapack_int lds, const float* p,
lapack_int ldp, float* vl, lapack_int ldvl, float* vr, lapack_int ldvr, lapack_int mm,
lapack_int* m);
lapack_int LAPACKE_dtgevc (int matrix_layout, char side, char howmny, const
lapack_logical* select, lapack_int n, const double* s, lapack_int lds, const double* p,
lapack_int ldp, double* vl, lapack_int ldvl, double* vr, lapack_int ldvr, lapack_int mm,
lapack_int* m);
lapack_int LAPACKE_ctgevc (int matrix_layout, char side, char howmny, const
lapack_logical* select, lapack_int n, const lapack_complex_float* s, lapack_int lds,
const lapack_complex_float* p, lapack_int ldp, lapack_complex_float* vl, lapack_int
ldvl, lapack_complex_float* vr, lapack_int ldvr, lapack_int mm, lapack_int* m);
lapack_int LAPACKE_ztgevc (int matrix_layout, char side, char howmny, const
lapack_logical* select, lapack_int n, const lapack_complex_double* s, lapack_int lds,
const lapack_complex_double* p, lapack_int ldp, lapack_complex_double* vl, lapack_int
ldvl, lapack_complex_double* vr, lapack_int ldvr, lapack_int mm, lapack_int* m);
Include Files
• mkl.h
Description
The routine computes some or all of the right and/or left eigenvectors of a pair of real/complex matrices
(S,P), where S is quasi-triangular (for real flavors) or upper triangular (for complex flavors) and P is upper
triangular.
Matrix pairs of this type are produced by the generalized Schur factorization of a real/complex matrix pair
(A,B):
A = Q*S*ZH, B = Q*P*ZH
as computed by ?gghrd plus ?hgeqz.
The right eigenvector x and the left eigenvector y of (S,P) corresponding to an eigenvalue w are defined by:
S*x = w*P*x, yH*S = w*yH*P
The eigenvalues are not input to this routine, but are computed directly from the diagonal blocks or diagonal
elements of S and P.
964
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
This routine returns the matrices X and/or Y of right and left eigenvectors of (S,P), or the products Z*X
and/or Q*Y, where Z and Q are input matrices.
If Q and Z are the orthogonal/unitary factors from the generalized Schur factorization of a matrix pair (A,B),
then Z*X and Q*Y are the matrices of right and left eigenvectors of (A,B).
Input Parameters
If w[j] and omega[j + 1] are the real and imaginary parts of a complex
eigenvalue, the corresponding complex eigenvector is computed if either
select[j] or select[j + 1] is 1, and on exit select[j] is set to 1and select[j +
1] is set to 0.
s, p, vl, vr Arrays:
s (size max(1, lds*n)) contains the matrix S from a generalized Schur
factorization as computed by ?hgeqz. This matrix is upper quasi-triangular
for real flavors, and upper triangular for complex flavors.
p (size max(1, ldp*n)) contains the upper triangular matrix P from a
generalized Schur factorization as computed by ?hgeqz.
965
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If side = 'L' or 'B' and howmny = 'B', vl(size max(1, ldvl*mm) for
column major layout and max(1, ldvl*n) for row major layout) must
contain an n-by-n matrix Q (usually the orthogonal/unitary matrix Q of left
Schur vectors returned by ?hgeqz).
If side = 'R' or 'B' and howmny = 'B', vr(size max(1, ldvr*mm) for
column major layout and max(1, ldvr*n) for row major layout) must
contain an n-by-n matrix Z (usually the orthogonal/unitary matrix Z of right
Schur vectors returned by ?hgeqz).
Output Parameters
966
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
A complex eigenvector corresponding to a complex eigenvalue is stored in
two consecutive columns, the first holding the real part, and the second the
imaginary part.
m The number of columns in the arrays vl and/or vr actually used to store the
eigenvectors.
If howmny = 'A' or 'B', m is set to n.
Return Values
This function returns a value info.
?tgexc
Reorders the generalized Schur decomposition of a
pair of matrices (A,B) so that one diagonal block of
(A,B) moves to another row index.
Syntax
lapack_int LAPACKE_stgexc (int matrix_layout, lapack_logical wantq, lapack_logical
wantz, lapack_int n, float* a, lapack_int lda, float* b, lapack_int ldb, float* q,
lapack_int ldq, float* z, lapack_int ldz, lapack_int* ifst, lapack_int* ilst);
lapack_int LAPACKE_dtgexc (int matrix_layout, lapack_logical wantq, lapack_logical
wantz, lapack_int n, double* a, lapack_int lda, double* b, lapack_int ldb, double* q,
lapack_int ldq, double* z, lapack_int ldz, lapack_int* ifst, lapack_int* ilst);
lapack_int LAPACKE_ctgexc (int matrix_layout, lapack_logical wantq, lapack_logical
wantz, lapack_int n, lapack_complex_float* a, lapack_int lda, lapack_complex_float* b,
lapack_int ldb, lapack_complex_float* q, lapack_int ldq, lapack_complex_float* z,
lapack_int ldz, lapack_int ifst, lapack_int ilst);
lapack_int LAPACKE_ztgexc (int matrix_layout, lapack_logical wantq, lapack_logical
wantz, lapack_int n, lapack_complex_double* a, lapack_int lda, lapack_complex_double*
b, lapack_int ldb, lapack_complex_double* q, lapack_int ldq, lapack_complex_double* z,
lapack_int ldz, lapack_int ifst, lapack_int ilst);
Include Files
• mkl.h
Description
967
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The routine reorders the generalized real-Schur/Schur decomposition of a real/complex matrix pair (A,B)
using an orthogonal/unitary equivalence transformation
(A,B) = Q*(A,B)*ZH,
so that the diagonal block of (A, B) with row index ifst is moved to row ilst. Matrix pair (A, B) must be in a
generalized real-Schur/Schur canonical form (as returned by gges), that is, A is block upper triangular with
1-by-1 and 2-by-2 diagonal blocks and B is upper triangular. Optionally, the matrices Q and Z of generalized
Schur vectors are updated.
Qin*Ain*ZinT = Qout*Aout*ZoutT
Qin*Bin*ZinT = Qout*Bout*ZoutT.
Input Parameters
a, b, q, z Arrays:
a (size max(1, lda*n)) contains the matrix A.
ifst, ilst Specify the reordering of the diagonal blocks of (A, B). The block with row
index ifst is moved to row ilst, by a sequence of swapping between adjacent
blocks. Constraint: 1 ≤ifst, ilst≤n.
968
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
Return Values
This function returns a value info.
If info = 1, the transformed matrix pair (A, B) would be too far from generalized Schur form; the problem
is ill-conditioned. (A, B) may have been partially reordered, and ilst points to the first row of the current
position of the block being moved.
?tgsen
Reorders the generalized Schur decomposition of a
pair of matrices (A,B) so that a selected cluster of
eigenvalues appears in the leading diagonal blocks of
(A,B).
Syntax
lapack_int LAPACKE_stgsen( int matrix_layout, lapack_int ijob, lapack_logical wantq,
lapack_logical wantz, const lapack_logical* select, lapack_int n, float* a, lapack_int
lda, float* b, lapack_int ldb, float* alphar, float* alphai, float* beta, float* q,
lapack_int ldq, float* z, lapack_int ldz, lapack_int* m, float* pl, float* pr, float*
dif );
lapack_int LAPACKE_dtgsen( int matrix_layout, lapack_int ijob, lapack_logical wantq,
lapack_logical wantz, const lapack_logical* select, lapack_int n, double* a, lapack_int
lda, double* b, lapack_int ldb, double* alphar, double* alphai, double* beta, double* q,
lapack_int ldq, double* z, lapack_int ldz, lapack_int* m, double* pl, double* pr,
double* dif );
lapack_int LAPACKE_ctgsen( int matrix_layout, lapack_int ijob, lapack_logical wantq,
lapack_logical wantz, const lapack_logical* select, lapack_int n, lapack_complex_float*
a, lapack_int lda, lapack_complex_float* b, lapack_int ldb, lapack_complex_float*
alpha, lapack_complex_float* beta, lapack_complex_float* q, lapack_int ldq,
lapack_complex_float* z, lapack_int ldz, lapack_int* m, float* pl, float* pr, float*
dif );
lapack_int LAPACKE_ztgsen( int matrix_layout, lapack_int ijob, lapack_logical wantq,
lapack_logical wantz, const lapack_logical* select, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* b, lapack_int ldb,
lapack_complex_double* alpha, lapack_complex_double* beta, lapack_complex_double* q,
lapack_int ldq, lapack_complex_double* z, lapack_int ldz, lapack_int* m, double* pl,
double* pr, double* dif );
Include Files
• mkl.h
969
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Description
The routine reorders the generalized real-Schur/Schur decomposition of a real/complex matrix pair (A, B) (in
terms of an orthogonal/unitary equivalence transformation QT*(A,B)*Z for real flavors or QH*(A,B)*Z for
complex flavors), so that a selected cluster of eigenvalues appears in the leading diagonal blocks of the pair
(A, B). The leading columns of Q and Z form orthonormal/unitary bases of the corresponding left and right
eigenspaces (deflating subspaces).
(A, B) must be in generalized real-Schur/Schur canonical form (as returned by gges), that is, A and B are
both upper triangular.
?tgsen also computes the generalized eigenvalues
ωj = (alphar(j) + alphai(j)*i)/beta(j) (for real flavors)
ωj = alpha(j)/beta(j) (for complex flavors)
of the reordered matrix pair (A, B).
Optionally, the routine computes the estimates of reciprocal condition numbers for eigenvalues and
eigenspaces. These are Difu[(A11, B11), (A22, B22)] and Difl[(A11, B11), (A22, B22)], that is, the
separation(s) between the matrix pairs (A11, B11) and (A22, B22) that correspond to the selected cluster and
the eigenvalues outside the cluster, respectively, and norms of "projections" onto left and right eigenspaces
with respect to the selected cluster in the (1,1)-block.
Input Parameters
ijob Specifies whether condition numbers are required for the cluster of
eigenvalues (pl and pr) or the deflating subspaces Difu and Difl.
If ijob =4,>compute pl, pr and dif (i.e., options 0, 1 and 2 above). This is
an economic version to get it all;
If ijob =5, compute pl, pr and dif (i.e., options 0, 1 and 3 above).
select Array, size at least max (1, n). Specifies the eigenvalues in the selected
cluster.
970
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
To select an eigenvalue ωj, select[j - 1] must be 1For real flavors: to select
a complex conjugate pair of eigenvalues ωj and ωj + 1 (corresponding 2 by 2
diagonal block), select[j - 1] and/or select[j] must be set to 1; the complex
conjugate ωj and ωj + 1 must be either both included in the cluster or both
excluded.
a, b, q, z Arrays:
a (size max(1, lda*n)) contains the matrix A.
For real flavors: B is upper triangular, with (A, B) in generalized real Schur
canonical form.
For complex flavors: B is upper triangular, in generalized Schur canonical
form.
q (size at least 1 if wantq = 0 and at least max(1, ldq*n) if wantq = 1)
Output Parameters
alphar, alphai Arrays, size at least max(1, n). Contain values that form generalized
eigenvalues in real flavors.
See beta.
alpha Array, size at least max(1, n). Contain values that form generalized
eigenvalues in complex flavors.
See beta.
971
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
m The dimension of the specified pair of left and right eigen-spaces (deflating
subspaces); 0 ≤m≤n.
Return Values
This function returns a value info.
972
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = 1, Reordering of (A, B) failed because the transformed matrix pair (A, B) would be too far from
generalized Schur form; the problem is very ill-conditioned. (A, B) may have been partially reordered.
If ijob > 0, 0 is returned in dif, pl and pr.
?tgsyl
Solves the generalized Sylvester equation.
Syntax
lapack_int LAPACKE_stgsyl( int matrix_layout, char trans, lapack_int ijob, lapack_int
m, lapack_int n, const float* a, lapack_int lda, const float* b, lapack_int ldb, float*
c, lapack_int ldc, const float* d, lapack_int ldd, const float* e, lapack_int lde,
float* f, lapack_int ldf, float* scale, float* dif );
lapack_int LAPACKE_dtgsyl( int matrix_layout, char trans, lapack_int ijob, lapack_int
m, lapack_int n, const double* a, lapack_int lda, const double* b, lapack_int ldb,
double* c, lapack_int ldc, const double* d, lapack_int ldd, const double* e, lapack_int
lde, double* f, lapack_int ldf, double* scale, double* dif );
lapack_int LAPACKE_ctgsyl( int matrix_layout, char trans, lapack_int ijob, lapack_int
m, lapack_int n, const lapack_complex_float* a, lapack_int lda, const
lapack_complex_float* b, lapack_int ldb, lapack_complex_float* c, lapack_int ldc, const
lapack_complex_float* d, lapack_int ldd, const lapack_complex_float* e, lapack_int lde,
lapack_complex_float* f, lapack_int ldf, float* scale, float* dif );
lapack_int LAPACKE_ztgsyl( int matrix_layout, char trans, lapack_int ijob, lapack_int
m, lapack_int n, const lapack_complex_double* a, lapack_int lda, const
lapack_complex_double* b, lapack_int ldb, lapack_complex_double* c, lapack_int ldc,
const lapack_complex_double* d, lapack_int ldd, const lapack_complex_double* e,
lapack_int lde, lapack_complex_double* f, lapack_int ldf, double* scale, double* dif );
Include Files
• mkl.h
Description
973
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Here Ik is the identity matrix of size k and XT is the transpose/conjugate-transpose of X. kron(X, Y) is the
Kronecker product between the matrices X and Y.
If trans = 'T' (for real flavors), or trans = 'C' (for complex flavors), the routine ?tgsyl solves the
transposed/conjugate-transposed system ZT*y = scale*b, which is equivalent to solve for R and L in
AT*R+DT*L = scale*C
R*BT+L*ET = scale*(-F)
This case (trans = 'T' for stgsyl/dtgsyl or trans = 'C' for ctgsyl/ztgsyl) is used to compute an
one-norm-based estimate of Dif[(A, D), (B, E)], the separation between the matrix pairs (A,D) and
(B,E).
If ijob ≥ 1, ?tgsyl computes a Frobenius norm-based estimate of Dif[(A, D), (B,E)]. That is, the
reciprocal of a lower bound on the reciprocal of the smallest singular value of Z. This is a level 3 BLAS
algorithm.
Input Parameters
If trans = 'T', solve the 'transposed' system (for real flavors only).
If trans = 'C', solve the ' conjugate transposed' system (for complex
flavors only).
If ijob =3, only an estimate of Dif[(A, D), (B, E)] is computed (look ahead
strategy is used);
If ijob =4, only an estimate of Dif[(A, D), (B,E)] is computed (?gecon on
sub-systems is used). If trans = 'T' or 'C', ijob is not referenced.
m The order of the matrices A and D, and the row dimension of the matrices
C, F, R and L.
n The order of the matrices B and E, and the column dimension of the
matrices C, F, R and L.
a, b, c, d, e, f Arrays:
974
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
a (size max(1, lda*m)) contains the upper quasi-triangular (for real flavors)
or upper triangular (for complex flavors) matrix A.
b (size max(1, ldb*n)) contains the upper quasi-triangular (for real flavors)
or upper triangular (for complex flavors) matrix B.
c(size max(1, ldc*n) for column major layout and max(1, ldc*m) for row
major layout) contains the right-hand-side of the first matrix equation in
the generalized Sylvester equation (as defined by trans)
d (size max(1, ldd*m)) contains the upper triangular matrix D.
f(size max(1, ldf*n) for column major layout and max(1, ldf*m) for row
major layout) contains the right-hand-side of the second matrix equation in
the generalized Sylvester equation (as defined by trans)
ldc The leading dimension of c; at least max(1, m) for column major layout and
at least max(1, n) for row major layout .
ldf The leading dimension of f; at least max(1, m) for column major layout and
at least max(1, n) for row major layout .
Output Parameters
If ijob=3 or 4 and trans = 'N', c holds R, the solution achieved during the
computation of the Dif-estimate.
dif On exit, dif is the reciprocal of a lower bound of the reciprocal of the Dif-
function, that is, dif is an upper bound of Dif[(A, D), (B, E)] =
sigma_min(Z), where Z as defined in the description.
If ijob = 0, or trans = 'T' (for real flavors), or trans = 'C' (for
complex flavors), dif is not touched.
scale On exit, scale is the scaling factor in the generalized Sylvester equation.
If 0 < scale < 1, c and f hold the solutions R and L, respectively, to a
slightly perturbed system but the input matrices A, B, D and E have not
been changed.
If scale = 0, c and f hold the solutions R and L, respectively, to the
homogeneous system with C = F = 0. Normally, scale = 1.
975
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Return Values
This function returns a value info.
?tgsna
Estimates reciprocal condition numbers for specified
eigenvalues and/or eigenvectors of a pair of matrices
in generalized real Schur canonical form.
Syntax
lapack_int LAPACKE_stgsna( int matrix_layout, char job, char howmny, const
lapack_logical* select, lapack_int n, const float* a, lapack_int lda, const float* b,
lapack_int ldb, const float* vl, lapack_int ldvl, const float* vr, lapack_int ldvr,
float* s, float* dif, lapack_int mm, lapack_int* m );
lapack_int LAPACKE_dtgsna( int matrix_layout, char job, char howmny, const
lapack_logical* select, lapack_int n, const double* a, lapack_int lda, const double* b,
lapack_int ldb, const double* vl, lapack_int ldvl, const double* vr, lapack_int ldvr,
double* s, double* dif, lapack_int mm, lapack_int* m );
lapack_int LAPACKE_ctgsna( int matrix_layout, char job, char howmny, const
lapack_logical* select, lapack_int n, const lapack_complex_float* a, lapack_int lda,
const lapack_complex_float* b, lapack_int ldb, const lapack_complex_float* vl,
lapack_int ldvl, const lapack_complex_float* vr, lapack_int ldvr, float* s, float* dif,
lapack_int mm, lapack_int* m );
lapack_int LAPACKE_ztgsna( int matrix_layout, char job, char howmny, const
lapack_logical* select, lapack_int n, const lapack_complex_double* a, lapack_int lda,
const lapack_complex_double* b, lapack_int ldb, const lapack_complex_double* vl,
lapack_int ldvl, const lapack_complex_double* vr, lapack_int ldvr, double* s, double*
dif, lapack_int mm, lapack_int* m );
Include Files
• mkl.h
Description
The real flavors stgsna/dtgsna of this routine estimate reciprocal condition numbers for specified
eigenvalues and/or eigenvectors of a matrix pair (A, B) in generalized real Schur canonical form (or of any
matrix pair (Q*A*ZT, Q*B*ZT) with orthogonal matrices Q and Z.
(A, B) must be in generalized real Schur form (as returned by gges/gges), that is, A is block upper triangular
with 1-by-1 and 2-by-2 diagonal blocks. B is upper triangular.
The complex flavors ctgsna/ztgsna estimate reciprocal condition numbers for specified eigenvalues and/or
eigenvectors of a matrix pair (A, B). (A, B) must be in generalized Schur canonical form, that is, A and B are
both upper triangular.
976
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
If job = 'B', for both eigenvalues and eigenvectors (compute both s and
dif).
a, b, vl, vr Arrays:
a (size max(1, lda*n)) contains the upper quasi-triangular (for real flavors)
or upper triangular (for complex flavors) matrix A in the pair (A, B).
b (size max(1, ldb*n)) contains the upper triangular matrix B in the pair
(A, B).
If job = 'E' or 'B', vl(size max(1, ldvl*m) for column major layout and
max(1, ldvl*n) for row major layout) must contain left eigenvectors of (A,
B), corresponding to the eigenpairs specified by howmny and select. The
eigenvectors must be stored in consecutive columns of vl, as returned
by ?tgevc.
977
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If job = 'E' or 'B', vr(size max(1, ldvr*m) for column major layout and
max(1, ldvr*n) for row major layout) must contain right eigenvectors of
(A, B), corresponding to the eigenpairs specified by howmny and select.
The eigenvectors must be stored in consecutive columns of vr, as returned
by ?tgevc.
If job = 'E' or 'B', then ldvl≥ max(1, n) for column major layout and
ldvl≥ max(1, m) for row major layout .
If job = 'E' or 'B', then ldvr≥ max(1, n) for column major layout and
ldvr≥ max(1, m) for row major layout.
Output Parameters
m The number of elements in the arrays s and dif used to store the specified
condition numbers; for each selected eigenvalue one element is used.
978
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If howmny = 'A', m is set to n.
Return Values
This function returns a value info.
You can use routines listed in the above table as well as the driver routine ggsvd to find the GSVD of a pair of
general rectangular matrices.
?ggsvp
Computes the preprocessing decomposition for the
generalized SVD (deprecated).
Syntax
lapack_int LAPACKE_sggsvp( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, float* a, lapack_int lda, float* b, lapack_int
ldb, float tola, float tolb, lapack_int* k, lapack_int* l, float* u, lapack_int ldu,
float* v, lapack_int ldv, float* q, lapack_int ldq );
lapack_int LAPACKE_dggsvp( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, double* a, lapack_int lda, double* b,
lapack_int ldb, double tola, double tolb, lapack_int* k, lapack_int* l, double* u,
lapack_int ldu, double* v, lapack_int ldv, double* q, lapack_int ldq );
lapack_int LAPACKE_cggsvp( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, lapack_complex_float* a, lapack_int lda,
lapack_complex_float* b, lapack_int ldb, float tola, float tolb, lapack_int* k,
lapack_int* l, lapack_complex_float* u, lapack_int ldu, lapack_complex_float* v,
lapack_int ldv, lapack_complex_float* q, lapack_int ldq );
979
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
lapack_int LAPACKE_zggsvp( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, lapack_complex_double* a, lapack_int lda,
lapack_complex_double* b, lapack_int ldb, double tola, double tolb, lapack_int* k,
lapack_int* l, lapack_complex_double* u, lapack_int ldu, lapack_complex_double* v,
lapack_int ldv, lapack_complex_double* q, lapack_int ldq );
Include Files
• mkl.h
Description
This routine is deprecated; use ggsvp3.
where the k-by-k matrix A12 and l-by-l matrix B13 are nonsingular upper triangular; A23 is l-by-l upper
triangular if m-k-l≥0, otherwise A23 is (m-k)-by-l upper trapezoidal. The sum k+l is equal to the effective
numerical rank of the (m+p)-by-n matrix (AH,BH)H.
This decomposition is the preprocessing step for computing the Generalized Singular Value Decomposition
(GSVD), see subroutine ?tgsja.
Input Parameters
980
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If jobu = 'N', U is not computed.
a, b Arrays:
a(size at least max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout) contains the m-by-n matrix A.
b(size at least max(1, ldb*n) for column major layout and max(1, ldb*p)
for row major layout) contains the p-by-n matrix B.
lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.
ldb The leading dimension of b; at least max(1, p)for column major layout and
max(1, n) for row major layout.
tola, tolb tola and tolb are the thresholds to determine the effective numerical rank of
matrix B and a subblock of A. Generally, they are set to
tola = max(m, n)*||A||*MACHEPS,
tolb = max(p, n)*||B||*MACHEPS.
The size of tola and tolb may affect the size of backward errors of the
decomposition.
ldu The leading dimension of the output array u . ldu≥ max(1, m) if jobu =
'U'; ldu≥ 1 otherwise.
ldv The leading dimension of the output array v . ldv≥ max(1, p) if jobv =
'V'; ldv≥ 1 otherwise.
ldq The leading dimension of the output array q . ldq≥ max(1, n) if jobq =
'Q'; ldq≥ 1 otherwise.
Output Parameters
981
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
u, v, q Arrays:
If jobu = 'U', u (size max(1, ldu*m)) contains the orthogonal/unitary
matrix U.
If jobu = 'N', u is not referenced.
Return Values
This function returns a value info.
?ggsvp3
Performs preprocessing for a generalized SVD.
Syntax
lapack_int LAPACKE_sggsvp3 (int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, float * a, lapack_int lda, float * b,
lapack_int ldb, float tola, float tolb, lapack_int * k, lapack_int * l, float * u,
lapack_int ldu, float * v, lapack_int ldv, float * q, lapack_int ldq);
lapack_int LAPACKE_dggsvp3 (int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, double * a, lapack_int lda, double * b,
lapack_int ldb, double tola, double tolb, lapack_int * k, lapack_int * l, double * u,
lapack_int ldu, double * v, lapack_int ldv, double * q, lapack_int ldq);
lapack_int LAPACKE_cggsvp3 (int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, lapack_complex_float * a, lapack_int lda,
lapack_complex_float * b, lapack_int ldb, float tola, float tolb, lapack_int * k,
lapack_int * l, lapack_complex_float * u, lapack_int ldu, lapack_complex_float * v,
lapack_int ldv, lapack_complex_float * q, lapack_int ldq);
lapack_int LAPACKE_zggsvp3 (int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, lapack_complex_double * a, lapack_int lda,
lapack_complex_double * b, lapack_int ldb, double tola, double tolb, lapack_int * k,
lapack_int * l, lapack_complex_double * u, lapack_int ldu, lapack_complex_double * v,
lapack_int ldv, lapack_complex_double * q, lapack_int ldq);
Include Files
• mkl_lapack.h
Include Files
• mkl.h
982
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
?ggsvp3 computes orthogonal or unitary matrices U, V, and Q such that
for real flavors:
n−k−l k l
T k 0 A12 A13
U AQ = if m - k - l≥ 0;
0l 0 A23
m−k−l 0 0 0
n−k−l k l
T
U AQ = k 0 A12 A13 if m - k - l< 0;
m−k 0 0 A23
n−k−l k l
T
V BQ = l 0 0 B13
p−l 00 0
for complex flavors:
n−k−l k l
H k 0 A12 A13
U AQ = if m - k - l≥ 0;
0l 0 A23
m−k−l 0 0 0
n−k−l k l
H
U AQ = k 0 A12 A13 if m - k-l< 0;
m−k 0 0 A23
n−k−l k l
H
V BQ = l 0 0 B13
p−l 00 0
where the k-by-k matrix A12 and l-by-l matrix B13 are nonsingular upper triangular; A23 is l-by-l upper
triangular if m-k-l≥ 0, otherwise A23 is (m-k-by-l upper trapezoidal. k + l = the effective numerical rank of
the (m + p)-by-n matrix (AT,BT)T for real flavors or (AH,BH)H for complex flavors.
This decomposition is the preprocessing step for computing the Generalized Singular Value Decomposition
(GSVD), see ?ggsvd3.
Input Parameters
983
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
lda≥ max(1,m).
ldb≥ max(1,p).
tola, tolb tola and tolb are the thresholds to determine the effective numerical rank
of matrix B and a subblock of A. Generally, they are set to
tola = max(m,n)*norm(a)*MACHEPS,
tolb = max(p,n)*norm(b)*MACHEPS.
The size of tola and tolb may affect the size of backward errors of the
decomposition.
Output Parameters
984
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If jobu = 'N', u is not referenced.
Return Values
This function returns a value info.
= 0: successful exit.
< 0: if info = -i, the i-th argument had an illegal value.
Application Notes
The subroutine uses LAPACK subroutine ?geqp3 for the QR factorization with column pivoting to detect the
effective numerical rank of the A matrix. It may be replaced by a better rank determination strategy.
?ggsvp3 replaces the deprecated subroutine ?ggsvp.
?ggsvd3
Computes generalized SVD.
Syntax
lapack_int LAPACKE_sggsvd3 (int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int n, lapack_int p, lapack_int * k, lapack_int * l, float * a,
lapack_int lda, float * b, lapack_int ldb, float * alpha, float * beta, float * u,
lapack_int ldu, float * v, lapack_int ldv, float * q, lapack_int ldq, lapack_int *
iwork);
lapack_int LAPACKE_dggsvd3 (int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int n, lapack_int p, lapack_int * k, lapack_int * l, double * a,
lapack_int lda, double * b, lapack_int ldb, double * alpha, double * beta, double * u,
lapack_int ldu, double * v, lapack_int ldv, double * q, lapack_int ldq, lapack_int *
iwork);
lapack_int LAPACKE_cggsvd3 (int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int n, lapack_int p, lapack_int * k, lapack_int * l,
lapack_complex_float * a, lapack_int lda, lapack_complex_float * b, lapack_int ldb,
float * alpha, float * beta, lapack_complex_float * u, lapack_int ldu,
lapack_complex_float * v, lapack_int ldv, lapack_complex_float * q, lapack_int ldq,
lapack_int * iwork);
lapack_int LAPACKE_zggsvd3 (int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int n, lapack_int p, lapack_int * k, lapack_int * l,
lapack_complex_double * a, lapack_int lda, lapack_complex_double * b, lapack_int ldb,
double * alpha, double * beta, lapack_complex_double * u, lapack_int ldu,
lapack_complex_double * v, lapack_int ldv, lapack_complex_double * q, lapack_int ldq,
lapack_int * iwork);
985
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
?ggsvd3 computes the generalized singular value decomposition (GSVD) of an m-by-n real or complex matrix
A and p-by-n real or complex matrix B:
k l
k I 0
D1 =
l
0 C
m−k−l 0 0
k l
D2 = l 0S
p−l 0 0
n−k−l k l
0 R =k 0 R11 R12
l 0 0 R22
where
C = diag( alpha(k+1), ... , alpha(k+l) ),
C2 + S2 = I.
If m - k - l < 0,
k m−k k+l −m
D1 = k I 0 0
m−k 0 C 0
k m−k k+l −m
m−k 0 S 0
D2 =
k+l −m 0 0 I
p−l 0 0 0
986
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
S = diag(beta(k + 1), ... , beta(m)),
C2 + S2 = I.
The routine computes C, S, R, and optionally the orthogonal/unitary transformation matrices U, V and Q.
In particular, if B is an n-by-n nonsingular matrix, then the GSVD of A and B implicitly gives the SVD of
A*inv(B):
A*inv(B) = U*(D1*inv(D2))*VT for real flavors
or
A*inv(B) = U*(D1*inv(D2))*VH for complex flavors.
If (AT,BT)T for real flavors or (AH,BH)H for complex flavors has orthonormal columns, then the GSVD of A and
B is also equal to the CS decomposition of A and B. Furthermore, the GSVD can be used to derive the
solution of the eigenvalue problem:
AT*AX = λ* BT*BX for real flavors
or
AH*AX = λ* BH*BX for complex flavors
In some literature, the GSVD of A and B is presented in the form
UT*A*X = ( 0 D1 ), VT*B*X = ( 0 D2 ) for real (A, B)
or
UH*A*X = ( 0 D1 ), VH*B*X = ( 0 D2 ) for complex (A, B)
where U and V are orthogonal and X is nonsingular, D1 and D2 are "diagonal''. The former GSVD form can be
converted to the latter form by taking the nonsingular matrix X as
I 0
X = Q*
0 inv R
Input Parameters
987
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
lda≥ max(1,m).
ldb≥ max(1,p).
Output Parameters
On exit, alpha and beta contain the generalized singular value pairs
of a and b;
alpha[0: k - 1] = 1,
beta[0: k - 1] = 0,
and if m - k - l≥ 0,
988
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
alpha[k:k + l - 1] = C,
beta[k:k + l - 1] = S,
or if m - k - l < 0,
alpha[k:m - 1] = C, alpha[m: k + l - 1] = 0
beta[k: m - 1] =S, beta[m: k + l - 1] = 1
and
alpha[k + l: n - 1] = 0
beta[k + l : n - 1] = 0
iwork On exit, iwork stores the sorting information. More precisely, the
following loop uses iwork to sort alpha:
Return Values
This function returns a value info.
= 0: successful exit.
< 0: if info = -i, the i-th argument had an illegal value.
Application Notes
?ggsvd3 replaces the deprecated subroutine ?ggsvd.
?tgsja
Computes the generalized SVD of two upper triangular
or trapezoidal matrices.
989
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
lapack_int LAPACKE_stgsja( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, lapack_int k, lapack_int l, float* a,
lapack_int lda, float* b, lapack_int ldb, float tola, float tolb, float* alpha, float*
beta, float* u, lapack_int ldu, float* v, lapack_int ldv, float* q, lapack_int ldq,
lapack_int* ncycle );
lapack_int LAPACKE_dtgsja( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, lapack_int k, lapack_int l, double* a,
lapack_int lda, double* b, lapack_int ldb, double tola, double tolb, double* alpha,
double* beta, double* u, lapack_int ldu, double* v, lapack_int ldv, double* q,
lapack_int ldq, lapack_int* ncycle );
lapack_int LAPACKE_ctgsja( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, lapack_int k, lapack_int l,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* b, lapack_int ldb, float
tola, float tolb, float* alpha, float* beta, lapack_complex_float* u, lapack_int ldu,
lapack_complex_float* v, lapack_int ldv, lapack_complex_float* q, lapack_int ldq,
lapack_int* ncycle );
lapack_int LAPACKE_ztgsja( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, lapack_int k, lapack_int l,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* b, lapack_int ldb,
double tola, double tolb, double* alpha, double* beta, lapack_complex_double* u,
lapack_int ldu, lapack_complex_double* v, lapack_int ldv, lapack_complex_double* q,
lapack_int ldq, lapack_int* ncycle );
Include Files
• mkl.h
Description
The routine computes the generalized singular value decomposition (GSVD) of two real/complex upper
triangular (or trapezoidal) matrices A and B. On entry, it is assumed that matrices A and B have the following
forms, which may be obtained by the preprocessing subroutine ggsvp from a general m-by-n matrix A and p-
by-n matrix B:
990
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
where the k-by-k matrix A12 and l-by-l matrix B13 are nonsingular upper triangular; A23 is l-by-l upper
triangular if m-k-l≥0, otherwise A23 is (m-k)-by-l upper trapezoidal.
On exit,
UH*A*Q = D1*(0 R), VH*B*Q = D2*(0 R),
where U, V and Q are orthogonal/unitary matrices, R is a nonsingular upper triangular matrix, and D1 and D2
are "diagonal" matrices, which are of the following structures:
If m-k-l≥0,
991
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
where
C = diag(alpha[k],...,alpha[k+l-1])
S = diag(beta[k],...,beta[k+l-1])
C2 + S2 = I
R is stored in a(1:k+l, n-k-l+1:n ) on exit.
If m-k-l < 0,
992
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
where
C = diag(alpha[k],...,alpha[m-1]),
S = diag(beta[k],...,beta[m-1]),
C2 + S2 = I
Input Parameters
993
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
a, b, u, v, q Arrays:
a(size at least max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout) contains the m-by-n matrix A.
b(size at least max(1, ldb*n) for column major layout and max(1, ldb*p)
for row major layout) contains the p-by-n matrix B.
If jobu = 'U', u (size max(1, ldu*m)) must contain a matrix U1 (usually
the orthogonal/unitary matrix returned by ?ggsvp).
lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.
ldb The leading dimension of b; at least max(1, p) for column major layout and
max(1, n) for row major layout.
tola, tolb tola and tolb are the convergence criteria for the Jacobi-Kogbetliantz
iteration procedure. Generally, they are the same as used in ?ggsvp:
Output Parameters
994
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
b On exit, if necessary, b(m-k+1: l, n+m-k-l+1: n)) contains a part of R.
alpha, beta Arrays, size at least max(1, n). Contain the generalized singular value pairs
of A and B:
alpha(1:k) = 1,
beta(1:k) = 0,
and if m-k-l≥ 0,
alpha(k+1:k+l) = diag(C),
beta(k+1:k+l) = diag(S),
or if m-k-l < 0,
alpha(k+l+1:n)= 0 and
beta(k+l+1:n) = 0.
Return Values
This function returns a value info.
995
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
See Also
CS Driver Routine
?bbcsd
Computes the CS decomposition of an orthogonal/
unitary matrix in bidiagonal-block form.
Syntax
lapack_int LAPACKE_sbbcsd( int matrix_layout, char jobu1, char jobu2, char jobv1t, char
jobv2t, char trans, lapack_int m, lapack_int p, lapack_int q, float* theta, float* phi,
float* u1, lapack_int ldu1, float* u2, lapack_int ldu2, float* v1t, lapack_int ldv1t,
float* v2t, lapack_int ldv2t, float* b11d, float* b11e, float* b12d, float* b12e, float*
b21d, float* b21e, float* b22d, float* b22e );
lapack_int LAPACKE_dbbcsd( int matrix_layout, char jobu1, char jobu2, char jobv1t, char
jobv2t, char trans, lapack_int m, lapack_int p, lapack_int q, double* theta, double*
phi, double* u1, lapack_int ldu1, double* u2, lapack_int ldu2, double* v1t, lapack_int
ldv1t, double* v2t, lapack_int ldv2t, double* b11d, double* b11e, double* b12d, double*
b12e, double* b21d, double* b21e, double* b22d, double* b22e );
lapack_int LAPACKE_cbbcsd( int matrix_layout, char jobu1, char jobu2, char jobv1t, char
jobv2t, char trans, lapack_int m, lapack_int p, lapack_int q, float* theta, float* phi,
lapack_complex_float* u1, lapack_int ldu1, lapack_complex_float* u2, lapack_int ldu2,
lapack_complex_float* v1t, lapack_int ldv1t, lapack_complex_float* v2t, lapack_int
ldv2t, float* b11d, float* b11e, float* b12d, float* b12e, float* b21d, float* b21e,
float* b22d, float* b22e );
lapack_int LAPACKE_zbbcsd( int matrix_layout, char jobu1, char jobu2, char jobv1t, char
jobv2t, char trans, lapack_int m, lapack_int p, lapack_int q, double* theta, double*
phi, lapack_complex_double* u1, lapack_int ldu1, lapack_complex_double* u2, lapack_int
ldu2, lapack_complex_double* v1t, lapack_int ldv1t, lapack_complex_double* v2t,
lapack_int ldv2t, double* b11d, double* b11e, double* b12d, double* b12e, double* b21d,
double* b21e, double* b22d, double* b22e );
Include Files
• mkl.h
996
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
mkl_lapack.fiThe routine ?bbcsd computes the CS decomposition of an orthogonal or unitary matrix in
bidiagonal-block form:
or
respectively.
x is m-by-m with the top-left block p-by-q. Note that q must not be larger than p, m-p, or m-q. If q is not
the smallest index, x must be transposed and/or permuted in constant time using the trans option.
See ?orcsd/?uncsd for details.
The bidiagonal matrices b11, b12, b21, and b22 are represented implicitly by angles theta(1:q) and
phi(1:q-1).
The orthogonal/unitary matrices u1, u2, v1t, and v2t are input/output. The input matrices are pre- or post-
multiplied by the appropriate singular vector matrices.
Input Parameters
trans = 'T': x, u1, u2, v1t, v2t are stored in row-major order.
≤
q The number of columns in the top-left block of x. 0 q≤ min(p,m-p,m-q).
997
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ldu1 The leading dimension of the array u1, ldu1≤ max(1, p).
ldu2 The leading dimension of the array u2, ldu2≤ max(1, m-p).
ldv1t The leading dimension of the array v1t, ldv1t≤ max(1, q).
ldv2t The leading dimension of the array v2t, ldv2t≤ max(1, m-q).
Output Parameters
theta On exit, the angles whose cosines and sines define the diagonal blocks in
the CS decomposition.
v2t On exit, v2t is premultiplied by the transpose of the right singular vector
matrix common to [ b12 0 0 ; 0 -I 0 ] and [ b22 0 0 ; 0 0 I ].
998
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
When ?bbcsd converges, b11e contains zeros. If ?bbcsd fails to converge,
b11e contains the superdiagonal of the partially reduced top left block.
Return Values
This function returns a value info.
If info > 0 and if ?bbcsd did not converge, info specifies the number of nonzero entries in phi, and b11d,
b11e, etc. contain the partially reduced matrix.
See Also
?orcsd/?uncsd
xerbla
?orbdb/?unbdb
Simultaneously bidiagonalizes the blocks of a
partitioned orthogonal/unitary matrix.
Syntax
lapack_int LAPACKE_sorbdb( int matrix_layout, char trans, char signs, lapack_int m,
lapack_int p, lapack_int q, float* x11, lapack_int ldx11, float* x12, lapack_int ldx12,
float* x21, lapack_int ldx21, float* x22, lapack_int ldx22, float* theta, float* phi,
float* taup1, float* taup2, float* tauq1, float* tauq2 );
lapack_int LAPACKE_dorbdb( int matrix_layout, char trans, char signs, lapack_int m,
lapack_int p, lapack_int q, double* x11, lapack_int ldx11, double* x12, lapack_int
ldx12, double* x21, lapack_int ldx21, double* x22, lapack_int ldx22, double* theta,
double* phi, double* taup1, double* taup2, double* tauq1, double* tauq );
lapack_int LAPACKE_cunbdb( int matrix_layout, char trans, char signs, lapack_int m,
lapack_int p, lapack_int q, lapack_complex_float* x11, lapack_int ldx11,
lapack_complex_float* x12, lapack_int ldx12, lapack_complex_float* x21, lapack_int
ldx21, lapack_complex_float* x22, lapack_int ldx22, float* theta, float* phi,
lapack_complex_float* taup1, lapack_complex_float* taup2, lapack_complex_float* tauq1,
lapack_complex_float* tauq2 );
lapack_int LAPACKE_zunbdb( int matrix_layout, char trans, char signs, lapack_int m,
lapack_int p, lapack_int q, lapack_complex_double* x11, lapack_int ldx11,
lapack_complex_double* x12, lapack_int ldx12, lapack_complex_double* x21, lapack_int
ldx21, lapack_complex_double* x22, lapack_int ldx22, double* theta, double* phi,
lapack_complex_double* taup1, lapack_complex_double* taup2, lapack_complex_double*
tauq1, lapack_complex_double* tauq2 );
999
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routines ?orbdb/?unbdb simultaneously bidiagonalizes the blocks of an m-by-m partitioned orthogonal
matrix X:
or unitary matrix:
x11 is p-by-q. q must not be larger than p, m-p, or m-q. Otherwise, x must be transposed and/or permuted
in constant time using the trans and signs options.
The orthogonal/unitary matrices p1, p2, q1, and q2 are p-by-p, (m-p)-by-(m-p), q-by-q, (m-q)-by-(m-q),
respectively. They are represented implicitly by Housholder vectors.
The bidiagonal matrices b11, b12, b21, and b22 are q-by-q bidiagonal matrices represented implicitly by angles
theta[0], ..., theta[q - 1] and phi[0], ..., phi[q - 2]. b11 and b12 are upper bidiagonal, while b21 and
b22 are lower bidiagonal. Every entry in each bidiagonal band is a product of a sine or cosine of theta with a
sine or cosine of phi. See [Sutton09] for details.
Input Parameters
trans = 'T': x, u1, u2, v1t, v2t are stored in row-major order.
1000
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
q The number of columns in x11 and x21. 0 ≤q≤ min(p,m-p,m-q).
x11 Array, size (size max(1, ldx11*q) for column major layout and max(1,
ldx11*p) for row major layout) .
On entry, the top-left block of the orthogonal/unitary matrix to be reduced.
ldx11 The leading dimension of the array X11. If trans = 'T', ldx11≥p for column
major layout and ldx11≥q for row major layout. Otherwise, ldx11≥q.
x12 Array, size (size max(1, ldx12*(m-q)) for column major layout and max(1,
ldx12*p) for row major layout).
On entry, the top-right block of the orthogonal/unitary matrix to be
reduced.
ldx12 The leading dimension of the array X12. If trans = 'N', ldx12≥p for column
major layout and ldx12≥m - q for row major layout. . Otherwise,
ldx12≥m-q.
x21 Array, size (size max(1, ldx21*q) for column major layout and max(1,
ldx21*(m-p)) for row major layout).
On entry, the bottom-left block of the orthogonal/unitary matrix to be
reduced.
ldx21 The leading dimension of the array X21. If trans = 'N', ldx21≥m-p for
column major layout and ldx12≥q for row major layout. . Otherwise,
ldx21≥q.
x22 Array, size ((size max(1, ldx22*(m-q)) for column major layout and max(1,
ldx22*(m - p)) for row major layout).
On entry, the bottom-right block of the orthogonal/unitary matrix to be
reduced.
ldx22 The leading dimension of the array X21. If trans = 'N', ldx22≥m-p for
column major layout and ldx22≥m - q for row major layout. . Otherwise,
ldx22≥m-q.
Output Parameters
If trans='N', the columns of the upper triangle of x12 specify the first
p reflectors for q2
otherwise the columns of the lower triangle of x12 specify the first
trans='T', p reflectors for q2
1001
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
theta Array, size q. The entries of bidiagonal blocks b11, b12, b21, and b22 can be
computed from the angles theta and phi. See the Description section for
details.
phi Array, size q-1. The entries of bidiagonal blocks b11, b12, b21, and b22 can
be computed from the angles theta and phi. See the Description section
for details.
Return Values
This function returns a value info.
See Also
?orcsd/?uncsd
?orgqr
?ungqr
?orglq
?unglq
xerbla
1002
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Generalized LLS Problems
Symmetric Eigenproblems
Nonsymmetric Eigenproblems
Singular Value Decomposition
Cosine-Sine Decomposition
Generalized Symmetric Definite Eigenproblems
Generalized Nonsymmetric Eigenproblems
gelsy Computes the minimum-norm solution to a linear least squares problem using a
complete orthogonal factorization of A.
gelss Computes the minimum-norm solution to a linear least squares problem using the
singular value decomposition of A.
gelsd Computes the minimum-norm solution to a linear least squares problem using the
singular value decomposition of A and a divide and conquer method.
?gels
Uses QR or LQ factorization to solve a overdetermined
or underdetermined linear system with full rank
matrix.
Syntax
lapack_int LAPACKE_sgels (int matrix_layout, char trans, lapack_int m, lapack_int n,
lapack_int nrhs, float* a, lapack_int lda, float* b, lapack_int ldb);
lapack_int LAPACKE_dgels (int matrix_layout, char trans, lapack_int m, lapack_int n,
lapack_int nrhs, double* a, lapack_int lda, double* b, lapack_int ldb);
lapack_int LAPACKE_cgels (int matrix_layout, char trans, lapack_int m, lapack_int n,
lapack_int nrhs, lapack_complex_float* a, lapack_int lda, lapack_complex_float* b,
lapack_int ldb);
lapack_int LAPACKE_zgels (int matrix_layout, char trans, lapack_int m, lapack_int n,
lapack_int nrhs, lapack_complex_double* a, lapack_int lda, lapack_complex_double* b,
lapack_int ldb);
Include Files
• mkl.h
Description
1003
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The routine solves overdetermined or underdetermined real/ complex linear systems involving an m-by-n
matrix A, or its transpose/ conjugate-transpose, using a QR or LQ factorization of A. It is assumed that A has
full rank.
The following options are provided:
1. If trans = 'N' and m≥n: find the least squares solution of an overdetermined system, that is, solve the
least squares problem
minimize ||b - A*x||2
2. If trans = 'N' and m < n: find the minimum norm solution of an underdetermined system A*X = B.
3. If trans = 'T' or 'C' and m≥n: find the minimum norm solution of an undetermined system AH*X = B.
4. If trans = 'T' or 'C' and m < n: find the least squares solution of an overdetermined system, that is,
solve the least squares problem
minimize ||b - AH*x||2
Several right hand side vectors b and solution vectors x can be handled in a single call; they are formed by
the columns of the right hand side matrix B and the solution matrix X (when coefficient matrix is A, B is m-
by-nrhs and X is n-by-nrhs; if the coefficient matrix is AT or AH, B isn-by-nrhs and X is m-by-nrhs.
Input Parameters
If trans = 'T', the linear system involves the transposed matrix AT (for
real flavors only);
If trans = 'C', the linear system involves the conjugate-transposed
matrix AH (for complex flavors only).
nrhs The number of right-hand sides; the number of columns in B (nrhs≥ 0).
a, b Arrays:
a(size max(1, lda*n) for column major layout and max(1, lda*m) for row
major layout) contains the m-by-n matrix A.
b(size max(1, ldb*nrhs) for column major layout and max(1, ldb*max(m,
n)) for row major layout) contains the matrix B of right hand side vectors.
lda The leading dimension of a; at least max(1, m) for column major layout and
at least max(1, n) for row major layout.
1004
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
if trans = 'N' and m≥n, rows 1 to n of b contain the least squares solution
vectors; the residual sum of squares for the solution in each column is
given by the sum of squares of modulus of elements n+1 to m in that
column;
if trans = 'N' and m < n, rows 1 to n of b contain the minimum norm
solution vectors;
if trans = 'T' or 'C' and m≥n, rows 1 to m of b contain the minimum
norm solution vectors;
if trans = 'T' or 'C' and m < n, rows 1 to m of b contain the least
squares solution vectors; the residual sum of squares for the solution in
each column is given by the sum of squares of modulus of elements m+1 to
n in that column.
Return Values
This function returns a value info.
If info = i, the i-th diagonal element of the triangular factor of A is zero, so that A does not have full rank;
the least squares solution could not be computed.
?gelsy
Computes the minimum-norm solution to a linear least
squares problem using a complete orthogonal
factorization of A.
Syntax
lapack_int LAPACKE_sgelsy( int matrix_layout, lapack_int m, lapack_int n, lapack_int
nrhs, float* a, lapack_int lda, float* b, lapack_int ldb, lapack_int* jpvt, float rcond,
lapack_int* rank );
lapack_int LAPACKE_dgelsy( int matrix_layout, lapack_int m, lapack_int n, lapack_int
nrhs, double* a, lapack_int lda, double* b, lapack_int ldb, lapack_int* jpvt, double
rcond, lapack_int* rank );
lapack_int LAPACKE_cgelsy( int matrix_layout, lapack_int m, lapack_int n, lapack_int
nrhs, lapack_complex_float* a, lapack_int lda, lapack_complex_float* b, lapack_int ldb,
lapack_int* jpvt, float rcond, lapack_int* rank );
lapack_int LAPACKE_zgelsy( int matrix_layout, lapack_int m, lapack_int n, lapack_int
nrhs, lapack_complex_double* a, lapack_int lda, lapack_complex_double* b, lapack_int
ldb, lapack_int* jpvt, double rcond, lapack_int* rank );
1005
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The ?gelsy routine computes the minimum-norm solution to a real/complex linear least squares problem:
with R11 defined as the largest leading submatrix whose estimated condition number is less than 1/rcond.
The order of R11, rank, is the effective rank of A. Then, R22 is considered to be negligible, and R12 is
annihilated by orthogonal/unitary transformations from the right, arriving at the complete orthogonal
factorization:
1006
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
where Q1 consists of the first rank columns of Q.
The ?gelsy routine is identical to the original deprecated ?gelsx routine except for the following
differences:
• The call to the subroutine ?geqpf has been substituted by the call to the subroutine ?geqp3, which is a
BLAS-3 version of the QR factorization with column pivoting.
• The matrix B (the right hand side) is updated with BLAS-3.
• The permutation of the matrix B (the right hand side) is faster and more simple.
Input Parameters
nrhs The number of right-hand sides; the number of columns in B (nrhs≥ 0).
a, b Arrays:
a(size max(1, lda*n) for column major layout and max(1, lda*m) for row
major layout) contains the m-by-n matrix A.
b(size max(1, ldb*nrhs) for column major layout and max(1, ldb*max(m,
n)) for row major layout) contains the m-by-nrhs right hand side matrix B.
lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.
rcond rcond is used to determine the effective rank of A, which is defined as the
order of the largest leading triangular submatrix R11 in the QR factorization
with pivoting of A, whose estimated condition number < 1/rcond.
Output Parameters
jpvt On exit, if jpvt[i - 1]= k, then the i-th column of AP was the k-th column of
A.
rank The effective rank of A, that is, the order of the submatrix R11. This is the
same as the order of the submatrix T11 in the complete orthogonal
factorization of A.
1007
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Return Values
This function returns a value info.
?gelss
Computes the minimum-norm solution to a linear least
squares problem using the singular value
decomposition of A.
Syntax
lapack_int LAPACKE_sgelss( int matrix_layout, lapack_int m, lapack_int n, lapack_int
nrhs, float* a, lapack_int lda, float* b, lapack_int ldb, float* s, float rcond,
lapack_int* rank );
lapack_int LAPACKE_dgelss( int matrix_layout, lapack_int m, lapack_int n, lapack_int
nrhs, double* a, lapack_int lda, double* b, lapack_int ldb, double* s, double rcond,
lapack_int* rank );
lapack_int LAPACKE_cgelss( int matrix_layout, lapack_int m, lapack_int n, lapack_int
nrhs, lapack_complex_float* a, lapack_int lda, lapack_complex_float* b, lapack_int ldb,
float* s, float rcond, lapack_int* rank );
lapack_int LAPACKE_zgelss( int matrix_layout, lapack_int m, lapack_int n, lapack_int
nrhs, lapack_complex_double* a, lapack_int lda, lapack_complex_double* b, lapack_int
ldb, double* s, double rcond, lapack_int* rank );
Include Files
• mkl.h
Description
The routine computes the minimum norm solution to a real linear least squares problem:
minimize ||b - A*x||2
using the singular value decomposition (SVD) of A. A is an m-by-n matrix which may be rank-deficient.
Several right hand side vectors b and solution vectors x can be handled in a single call; they are stored as
the columns of the m-by-nrhs right hand side matrix B and the n-by-nrhs solution matrix X. The effective
rank of A is determined by treating as zero those singular values which are less than rcond times the largest
singular value.
Input Parameters
1008
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
(nrhs≥ 0).
a, b Arrays:
a(size max(1, lda*n) for column major layout and max(1, lda*m) for row
major layout) contains the m-by-n matrix A.
b(size max(1, ldb*nrhs) for column major layout and max(1, ldb*max(m,
n)) for row major layout) contains the m-by-nrhs right hand side matrix B.
lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.
rcond rcond is used to determine the effective rank of A. Singular values s(i)
≤rcond *s(1) are treated as zero.
If rcond <0, machine precision is used instead.
Output Parameters
a On exit, the first min(m, n) rows of a are overwritten with the matrix of
right singular vectors of A, stored row-wise.
rank The effective rank of A, that is, the number of singular values which are
greater than rcond *s(1).
Return Values
This function returns a value info.
If info = i, then the algorithm for computing the SVD failed to converge; i indicates the number of off-
diagonal elements of an intermediate bidiagonal form which did not converge to zero.
?gelsd
Computes the minimum-norm solution to a linear least
squares problem using the singular value
decomposition of A and a divide and conquer method.
Syntax
lapack_int LAPACKE_sgelsd( int matrix_layout, lapack_int m, lapack_int n, lapack_int
nrhs, float* a, lapack_int lda, float* b, lapack_int ldb, float* s, float rcond,
lapack_int* rank );
1009
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine computes the minimum-norm solution to a real linear least squares problem:
minimize ||b - A*x||2
using the singular value decomposition (SVD) of A. A is an m-by-n matrix which may be rank-deficient.
Several right hand side vectors b and solution vectors x can be handled in a single call; they are stored as
the columns of the m-by-nrhs right hand side matrix B and the n-by-nrhs solution matrix X.
The problem is solved in three steps:
1. Reduce the coefficient matrix A to bidiagonal form with Householder transformations, reducing the
original problem into a "bidiagonal least squares problem" (BLS).
2. Solve the BLS using a divide and conquer approach.
3. Apply back all the Householder transformations to solve the original least squares problem.
The effective rank of A is determined by treating as zero those singular values which are less than rcond
times the largest singular value.
Input Parameters
nrhs The number of right-hand sides; the number of columns in B (nrhs≥ 0).
a, b Arrays:
a(size max(1, lda*n) for column major layout and max(1, lda*m) for row
major layout) contains the m-by-n matrix A.
b(size max(1, ldb*nrhs) for column major layout and max(1, ldb*max(m,
n)) for row major layout) contains the m-by-nrhs right hand side matrix B.
lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.
1010
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ldb The leading dimension of b; must be at least max(1, m, n) for column
major layout and at least max(1, nrhs) for row major layout.
rcond rcond is used to determine the effective rank of A. Singular values s(i)
≤rcond *s(1) are treated as zero. If rcond≤ 0, machine precision is used
instead.
Output Parameters
rank The effective rank of A, that is, the number of singular values which are
greater than rcond *s(1).
Return Values
This function returns a value info.
If info = i, then the algorithm for computing the SVD failed to converge; i indicates the number of off-
diagonal elements of an intermediate bidiagonal form that did not converge to zero.
gglse Solves the linear equality-constrained least squares problem using a generalized RQ
factorization.
?gglse
Solves the linear equality-constrained least squares
problem using a generalized RQ factorization.
Syntax
lapack_int LAPACKE_sgglse (int matrix_layout, lapack_int m, lapack_int n, lapack_int p,
float* a, lapack_int lda, float* b, lapack_int ldb, float* c, float* d, float* x);
lapack_int LAPACKE_dgglse (int matrix_layout, lapack_int m, lapack_int n, lapack_int p,
double* a, lapack_int lda, double* b, lapack_int ldb, double* c, double* d, double* x);
1011
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine solves the linear equality-constrained least squares (LSE) problem:
minimize ||c - A*x||2 subject to B*x = d
where A is an m-by-n matrix, B is a p-by-n matrix, c is a given m-vector, andd is a given p-vector. It is
assumed that p≤n≤m+p, and
These conditions ensure that the LSE problem has a unique solution, which is obtained using a generalized
RQ factorization of the matrices (B, A) given by
Input Parameters
a, b, c, d Arrays:
a(size max(1, lda*n) for column major layout and max(1, lda*m) for row
major layout) contains the m-by-n matrix A.
b(size max(1, ldb*n) for column major layout and max(1, ldb*p) for row
major layout) contains the p-by-nmatrix B.
c size at least max(1, m), contains the right hand side vector for the least
squares part of the LSE problem.
d, size at least max(1, p), contains the right hand side vector for the
constrained equation.
1012
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.
ldb The leading dimension of b; at least max(1, p)for column major layout and
max(1, n) for row major layout.
Output Parameters
a The elements on and above the diagonal contain the min(m, n)-by-n upper
trapezoidal matrix T as returned by ?ggrqf.
b On exit, the upper right triangle contains the p-by-p upper triangular matrix
R as returned by ?ggrqf.
d On exit, d is destroyed.
c On exit, the residual sum-of-squares for the solution is given by the sum of
squares of elements n-p+1 to m of vector c.
Return Values
This function returns a value info.
If info = 1, the upper triangular factor R associated with B in the generalized RQ factorization of the pair
(B, A) is singular, so that rank(B) < p; the least squares solution could not be computed.
If info = 2, the (n-p)-by-(n-p) part of the upper trapezoidal factor T associated with A in the generalized
RQ factorization of the pair (B, A) is singular, so that
?ggglm
Solves a general Gauss-Markov linear model problem
using a generalized QR factorization.
1013
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
lapack_int LAPACKE_sggglm (int matrix_layout, lapack_int n, lapack_int m, lapack_int p,
float* a, lapack_int lda, float* b, lapack_int ldb, float* d, float* x, float* y);
lapack_int LAPACKE_dggglm (int matrix_layout, lapack_int n, lapack_int m, lapack_int p,
double* a, lapack_int lda, double* b, lapack_int ldb, double* d, double* x, double* y);
lapack_int LAPACKE_cggglm (int matrix_layout, lapack_int n, lapack_int m, lapack_int p,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* b, lapack_int ldb,
lapack_complex_float* d, lapack_complex_float* x, lapack_complex_float* y);
lapack_int LAPACKE_zggglm (int matrix_layout, lapack_int n, lapack_int m, lapack_int p,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* b, lapack_int ldb,
lapack_complex_double* d, lapack_complex_double* x, lapack_complex_double* y);
Include Files
• mkl.h
Description
Under these assumptions, the constrained equation is always consistent, and there is a unique solution x and
a minimal 2-norm solution y, which is obtained using a generalized QR factorization of the matrices (A, B )
given by
In particular, if matrix B is square nonsingular, then the problem GLM is equivalent to the following weighted
linear least squares problem
minimizex ||B-1(d-A*x)||2.
Input Parameters
a, b, d Arrays:
a(size max(1, lda*m) for column major layout and max(1, lda*n) for row
major layout) contains the n-by-m matrix A.
1014
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
b(size max(1, ldb*p) for column major layout and max(1, ldb*n) for row
major layout) contains the n-by-p matrix B.
d, size at least max(1, n), contains the left hand side of the GLM equation.
lda The leading dimension of a; at least max(1, n)for column major layout and
max(1, m) for row major layout.
ldb The leading dimension of b; at least max(1, n)for column major layout and
max(1, p) for row major layout.
Output Parameters
a On exit, the upper triangular part of the array a contains the m-by-m upper
triangular matrix R.
d On exit, d is destroyed
Return Values
This function returns a value info.
If info = 1, the upper triangular factor R associated with A in the generalized QR factorization of the pair
(A, B) is singular, so that rank(A) < m; the least squares solution could not be computed.
If info = 2, the bottom (n-m)-by-(n-m) part of the upper trapezoidal factor T associated with B in the
generalized QR factorization of the pair (A, B) is singular, so that rank(AB) < n; the least squares solution
could not be computed.
syevd/heevd Computes all eigenvalues and (optionally) all eigenvectors of a real symmetric /
Hermitian matrix using divide and conquer algorithm.
1015
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
spevd/hpevd Uses divide and conquer algorithm to compute all eigenvalues and (optionally) all
eigenvectors of a real symmetric / Hermitian matrix held in packed storage.
sbev /hbev Computes all eigenvalues and, optionally, eigenvectors of a real symmetric /
Hermitian band matrix.
sbevd/hbevd Computes all eigenvalues and (optionally) all eigenvectors of a real symmetric /
Hermitian band matrix using divide and conquer algorithm.
stevd Computes all eigenvalues and (optionally) all eigenvectors of a real symmetric
tridiagonal matrix using divide and conquer algorithm.
?syev
Computes all eigenvalues and, optionally,
eigenvectors of a real symmetric matrix.
Syntax
lapack_int LAPACKE_ssyev (int matrix_layout, char jobz, char uplo, lapack_int n, float*
a, lapack_int lda, float* w);
lapack_int LAPACKE_dsyev (int matrix_layout, char jobz, char uplo, lapack_int n,
double* a, lapack_int lda, double* w);
Include Files
• mkl.h
Description
The routine computes all eigenvalues and, optionally, eigenvectors of a real symmetric matrix A.
Note that for most cases of real symmetric eigenvalue problems the default choice should be syevr function
as its underlying algorithm is faster and uses less workspace.
1016
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
Output Parameters
(if uplo = 'L') or the upper triangle (if uplo = 'U') of A, including the
diagonal, is overwritten.
Return Values
This function returns a value info.
If info = i, then the algorithm failed to converge; i indicates the number of elements of an intermediate
tridiagonal form which did not converge to zero.
?heev
Computes all eigenvalues and, optionally,
eigenvectors of a Hermitian matrix.
Syntax
lapack_int LAPACKE_cheev( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_complex_float* a, lapack_int lda, float* w );
lapack_int LAPACKE_zheev( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_complex_double* a, lapack_int lda, double* w );
1017
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine computes all eigenvalues and, optionally, eigenvectors of a complex Hermitian matrix A.
Note that for most cases of complex Hermitian eigenvalue problems the default choice should be heevr
function as its underlying algorithm is faster and uses less workspace.
Input Parameters
lda The leading dimension of the array a. Must be at least max(1, n).
Output Parameters
(if uplo = 'L') or the upper triangle (if uplo = 'U') of A, including the
diagonal, is overwritten.
Return Values
This function returns a value info.
If info = i, then the algorithm failed to converge; i indicates the number of elements of an intermediate
tridiagonal form which did not converge to zero.
1018
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
?syevd
Computes all eigenvalues and, optionally, all
eigenvectors of a real symmetric matrix using divide
and conquer algorithm.
Syntax
lapack_int LAPACKE_ssyevd (int matrix_layout, char jobz, char uplo, lapack_int n,
float* a, lapack_int lda, float* w);
lapack_int LAPACKE_dsyevd (int matrix_layout, char jobz, char uplo, lapack_int n,
double* a, lapack_int lda, double* w);
Include Files
• mkl.h
Description
The routine computes all the eigenvalues, and optionally all the eigenvectors, of a real symmetric matrix A.
In other words, it can compute the spectral factorization of A as: A = Z*λ*ZT.
Here Λ is a diagonal matrix whose diagonal elements are the eigenvalues λi, and Z is the orthogonal matrix
whose columns are the eigenvectors zi. Thus,
A*zi = λi*zi for i = 1, 2, ..., n.
If the eigenvectors are requested, then this routine uses a divide and conquer algorithm to compute
eigenvalues and eigenvectors. However, if only eigenvalues are required, then it uses the Pal-Walker-Kahan
variant of the QL or QR algorithm.
Note that for most cases of real symmetric eigenvalue problems the default choice should be syevr function
as its underlying algorithm is faster and uses less workspace. ?syevd requires more workspace but is faster
in some cases, especially for large matrices.
Input Parameters
1019
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
Return Values
This function returns a value info.
If info = i, and jobz = 'N', then the algorithm failed to converge; i indicates the number of off-diagonal
elements of an intermediate tridiagonal form which did not converge to zero.
If info = i, and jobz = 'V', then the algorithm failed to compute an eigenvalue while working on the
submatrix lying in rows and columns info/(n+1) through mod(info,n+1).
Application Notes
The computed eigenvalues and eigenvectors are exact for a matrix A+E such that ||E||2 = O(ε)*||A||2,
where ε is the machine precision.
The complex analogue of this routine is heevd
?heevd
Computes all eigenvalues and, optionally, all
eigenvectors of a complex Hermitian matrix using
divide and conquer algorithm.
Syntax
lapack_int LAPACKE_cheevd( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_complex_float* a, lapack_int lda, float* w );
lapack_int LAPACKE_zheevd( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_complex_double* a, lapack_int lda, double* w );
Include Files
• mkl.h
Description
The routine computes all the eigenvalues, and optionally all the eigenvectors, of a complex Hermitian matrix
A. In other words, it can compute the spectral factorization of A as: A = Z*Λ*ZH.
Here Λ is a real diagonal matrix whose diagonal elements are the eigenvalues λi, and Z is the (complex)
unitary matrix whose columns are the eigenvectors zi. Thus,
A*zi = λi*zi for i = 1, 2, ..., n.
If the eigenvectors are requested, then this routine uses a divide and conquer algorithm to compute
eigenvalues and eigenvectors. However, if only eigenvalues are required, then it uses the Pal-Walker-Kahan
variant of the QL or QR algorithm.
1020
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Note that for most cases of complex Hermetian eigenvalue problems the default choice should be heevr
function as its underlying algorithm is faster and uses less workspace. ?heevd requires more workspace but
is faster in some cases, especially for large matrices.
Input Parameters
lda The leading dimension of the array a. Must be at least max(1, n).
Output Parameters
a If jobz = 'V', then on exit this array is overwritten by the unitary matrix
Z which contains the eigenvectors of A.
Return Values
This function returns a value info.
If info = i, and jobz = 'N', then the algorithm failed to converge; i off-diagonal elements of an
intermediate tridiagonal form did not converge to zero;
if info = i, and jobz = 'V', then the algorithm failed to compute an eigenvalue while working on the
submatrix lying in rows and columns info/(n+1) through mod(info, n+1).
Application Notes
The computed eigenvalues and eigenvectors are exact for a matrix A + E such that ||E||2 = O(ε)*||A||2,
where ε is the machine precision.
The real analogue of this routine is syevd. See also hpevd for matrices held in packed storage, and hbevd for
banded matrices.
1021
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?syevx
Computes selected eigenvalues and, optionally,
eigenvectors of a symmetric matrix.
Syntax
lapack_int LAPACKE_ssyevx (int matrix_layout, char jobz, char range, char uplo,
lapack_int n, float* a, lapack_int lda, float vl, float vu, lapack_int il, lapack_int
iu, float abstol, lapack_int* m, float* w, float* z, lapack_int ldz, lapack_int* ifail);
lapack_int LAPACKE_dsyevx (int matrix_layout, char jobz, char range, char uplo,
lapack_int n, double* a, lapack_int lda, double vl, double vu, lapack_int il, lapack_int
iu, double abstol, lapack_int* m, double* w, double* z, lapack_int ldz, lapack_int*
ifail);
Include Files
• mkl.h
Description
The routine computes selected eigenvalues and, optionally, eigenvectors of a real symmetric matrix A.
Eigenvalues and eigenvectors can be selected by specifying either a range of values or a range of indices for
the desired eigenvalues.
Note that for most cases of real symmetric eigenvalue problems the default choice should be syevr function
as its underlying algorithm is faster and uses less workspace. ?syevx is faster for a few selected
eigenvalues.
Input Parameters
If range = 'V', all eigenvalues in the half-open interval (vl, vu] will be
found.
If range = 'I', the eigenvalues with indices il through iu will be found.
1022
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues; vl≤vu. Not referenced if range = 'A'or 'I'.
il, iu If range = 'I', the indices of the smallest and largest eigenvalues to be
returned.
Constraints: 1 ≤il≤iu≤n, if n > 0;
il = 1 and iu = 0, if n = 0.
Not referenced if range = 'A'or 'V'.
abstol The absolute error tolerance for the eigenvalues. See Application Notes for
more information.
If jobz = 'V', then ldz≥ max(1, n) for column major layout and lda≥
max(1, m) for row major layout .
Output Parameters
a On exit, the lower triangle (if uplo = 'L') or the upper triangle (if uplo =
'U') of A, including the diagonal, is overwritten.
w Array, size at least max(1, n). The first m elements contain the selected
eigenvalues of the matrix A in ascending order.
z Array z(size max(1, ldz*m) for column major layout and max(1, ldz*n) for
row major layout) contains eigenvectors.
If jobz = 'V', then if info = 0, the first m columns of z contain the
orthonormal eigenvectors of the matrix A corresponding to the selected
eigenvalues, with the i-th column of z holding the eigenvector associated
with w(i).
If an eigenvector fails to converge, then that column of z contains the latest
approximation to the eigenvector, and the index of the eigenvector is
returned in ifail.
If jobz = 'N', then z is not referenced.
Note: you must ensure that at least max(1,m) columns are supplied in the
array z; if range = 'V', the exact value of m is not known in advance and
an upper bound must be used.
1023
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Return Values
This function returns a value info.
If info = i, then i eigenvectors failed to converge; their indices are stored in the array ifail.
Application Notes
An approximate eigenvalue is accepted as converged when it is determined to lie in an interval [a,b] of width
less than or equal to abstol+ε*max(|a|,|b|), where ε is the machine precision.
If abstol is less than or equal to zero, then ε*||T|} is used as tolerance, where ||T|| is the 1-norm of the
tridiagonal matrix obtained by reducing A to tridiagonal form. Eigenvalues are computed most accurately
when abstol is set to twice the underflow threshold 2*?lamch('S'), not zero.
If this routine returns with info > 0, indicating that some eigenvectors did not converge, try setting abstol
to 2*?lamch('S').
?heevx
Computes selected eigenvalues and, optionally,
eigenvectors of a Hermitian matrix.
Syntax
lapack_int LAPACKE_cheevx( int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_complex_float* a, lapack_int lda, float vl, float vu, lapack_int
il, lapack_int iu, float abstol, lapack_int* m, float* w, lapack_complex_float* z,
lapack_int ldz, lapack_int* ifail );
lapack_int LAPACKE_zheevx( int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_complex_double* a, lapack_int lda, double vl, double vu,
lapack_int il, lapack_int iu, double abstol, lapack_int* m, double* w,
lapack_complex_double* z, lapack_int ldz, lapack_int* ifail );
Include Files
• mkl.h
Description
The routine computes selected eigenvalues and, optionally, eigenvectors of a complex Hermitian matrix A.
Eigenvalues and eigenvectors can be selected by specifying either a range of values or a range of indices for
the desired eigenvalues.
Note that for most cases of complex Hermetian eigenvalue problems the default choice should be heevr
function as its underlying algorithm is faster and uses less workspace. ?heevx is faster for a few selected
eigenvalues.
Input Parameters
1024
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If jobz = 'N', then only eigenvalues are computed.
If range = 'V', all eigenvalues in the half-open interval (vl, vu] will be
found.
If range = 'I', the eigenvalues with indices il through iu will be found.
lda The leading dimension of the array a. Must be at least max(1, n).
vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues; vl≤vu. Not referenced if range = 'A'or 'I'.
il, iu If range = 'I', the indices of the smallest and largest eigenvalues to be
returned. Constraints:
1 ≤il≤iu≤n, if n > 0;il = 1 and iu = 0, if n = 0. Not referenced if range =
'A'or 'V'.
abstol
ldz The leading dimension of the output array z; ldz≥ 1.
If jobz = 'V', then ldz≥max(1, n) for column major layout and lda≥
max(1, m) for row major layout.
Output Parameters
a On exit, the lower triangle (if uplo = 'L') or the upper triangle (if uplo =
'U') of A, including the diagonal, is overwritten.
w Array, size max(1, n). The first m elements contain the selected eigenvalues
of the matrix A in ascending order.
z Array z(size max(1, ldz*m) for column major layout and max(1, ldz*n) for
row major layout) contains eigenvectors.
If jobz = 'V', then if info = 0, the first m columns of z contain the
orthonormal eigenvectors of the matrix A corresponding to the selected
eigenvalues, with the i-th column of z holding the eigenvector associated
with w(i).
1025
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Return Values
This function returns a value info.
If info = i, then i eigenvectors failed to converge; their indices are stored in the array ifail.
Application Notes
An approximate eigenvalue is accepted as converged when it is determined to lie in an interval [a,b] of width
less than or equal to abstol+ε*max(|a|,|b|), where ε is the machine precision.
If abstol is less than or equal to zero, then ε*||T|| will be used in its place, where ||T|| is the 1-norm of
the tridiagonal matrix obtained by reducing A to tridiagonal form. Eigenvalues will be computed most
accurately when abstol is set to twice the underflow threshold 2*?lamch('S'), not zero.
If this routine returns with info > 0, indicating that some eigenvectors did not converge, try setting abstol
to 2*?lamch('S').
?syevr
Computes selected eigenvalues and, optionally,
eigenvectors of a real symmetric matrix using the
Relatively Robust Representations.
Syntax
lapack_int LAPACKE_ssyevr (int matrix_layout, char jobz, char range, char uplo,
lapack_int n, float* a, lapack_int lda, float vl, float vu, lapack_int il, lapack_int
iu, float abstol, lapack_int* m, float* w, float* z, lapack_int ldz, lapack_int*
isuppz);
lapack_int LAPACKE_dsyevr (int matrix_layout, char jobz, char range, char uplo,
lapack_int n, double* a, lapack_int lda, double vl, double vu, lapack_int il, lapack_int
iu, double abstol, lapack_int* m, double* w, double* z, lapack_int ldz, lapack_int*
isuppz);
Include Files
• mkl.h
Description
1026
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The routine computes selected eigenvalues and, optionally, eigenvectors of a real symmetric matrix A.
Eigenvalues and eigenvectors can be selected by specifying either a range of values or a range of indices for
the desired eigenvalues.
The routine first reduces the matrix A to tridiagonal form T. Then, whenever possible, ?syevr calls stemr to
compute the eigenspectrum using Relatively Robust Representations. stemr computes eigenvalues by the
dqds algorithm, while orthogonal eigenvectors are computed from various "good" L*D*LT representations
(also known as Relatively Robust Representations). Gram-Schmidt orthogonalization is avoided as far as
possible. More specifically, the various steps of the algorithm are as follows. For the each unreduced block of
T:
a. Compute T - σ*I = L*D*LT, so that L and D define all the wanted eigenvalues to high relative
accuracy. This means that small relative changes in the entries of D and L cause only small relative
changes in the eigenvalues and eigenvectors. The standard (unfactored) representation of the
tridiagonal matrix T does not have this property in general.
b. Compute the eigenvalues to suitable accuracy. If the eigenvectors are desired, the algorithm attains full
accuracy of the computed eigenvalues only right before the corresponding vectors have to be
computed, see Steps c) and d).
c. For each cluster of close eigenvalues, select a new shift close to the cluster, find a new factorization,
and refine the shifted eigenvalues to suitable accuracy.
d. For each eigenvalue with a large enough relative separation, compute the corresponding eigenvector by
forming a rank revealing twisted factorization. Go back to Step c) for any clusters that remain.
The desired accuracy of the output can be specified by the input parameter abstol.
The routine ?syevr calls stemr when the full spectrum is requested on machines that conform to the
IEEE-754 floating point standard. ?syevr calls stebz and stein on non-IEEE machines and when partial
spectrum requests are made.
Normal execution of ?dsyevr may create NaNs and infinities and may abort due to a floating point exception
in environments that do not handle NaNs and infinities in the IEEE standard default manner.
Note that ?syevr is preferable for most cases of real symmetric eigenvalue problems as its underlying
algorithm is fast and uses less workspace.
NOTE
This routine supports the Progress Routine feature. See Progress Function for details.
Input Parameters
1027
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
For range = 'V'or 'I' and iu-il < n-1, sstebz/dstebz and sstein/
dstein are called.
lda The leading dimension of the array a. Must be at least max(1, n).
vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
Constraint: vl< vu.
il, iu If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint:
1 ≤il≤iu≤n, if n > 0;
il=1 and iu=0, if n = 0.
If range = 'A' or 'V', il and iu are not referenced.
abstol If jobz = 'V', the eigenvalues and eigenvectors output have residual
norms bounded by abstol, and the dot products between different
eigenvectors are bounded by abstol.
If abstol < n *eps*||T||, then n *eps*||T|| is used instead, where
eps is the machine precision, and ||T|| is the 1-norm of the matrix T. The
eigenvalues are computed to an accuracy of eps*||T|| irrespective of
abstol.
If high relative accuracy is important, set abstol to ?lamch('S').
Output Parameters
a On exit, the lower triangle (if uplo = 'L') or the upper triangle (if uplo =
'U') of A, including the diagonal, is overwritten.
1028
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If range = 'A', m = n, if range = 'I', m = iu-il+1, and if range =
'V' the exact value of m is not known in advance.
w, z Arrays:
w, size at least max(1, n), contains the selected eigenvalues in ascending
order, stored in w[0] to w[m - 1];
z(size max(1, ldz*m) for column major layout and max(1, ldz*n) for row
major layout) .
If jobz = 'V', then if info = 0, the first m columns of z contain the
orthonormal eigenvectors of the matrix A corresponding to the selected
eigenvalues, with the i-th column of z holding the eigenvector associated
with w[i - 1].
Return Values
This function returns a value info.
Application Notes
?heevr
Computes selected eigenvalues and, optionally,
eigenvectors of a Hermitian matrix using the
Relatively Robust Representations.
Syntax
lapack_int LAPACKE_cheevr( int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_complex_float* a, lapack_int lda, float vl, float vu, lapack_int
il, lapack_int iu, float abstol, lapack_int* m, float* w, lapack_complex_float* z,
lapack_int ldz, lapack_int* isuppz );
lapack_int LAPACKE_zheevr( int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_complex_double* a, lapack_int lda, double vl, double vu,
lapack_int il, lapack_int iu, double abstol, lapack_int* m, double* w,
lapack_complex_double* z, lapack_int ldz, lapack_int* isuppz );
Include Files
• mkl.h
1029
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Description
The routine computes selected eigenvalues and, optionally, eigenvectors of a complex Hermitian matrix A.
Eigenvalues and eigenvectors can be selected by specifying either a range of values or a range of indices for
the desired eigenvalues.
The routine first reduces the matrix A to tridiagonal form T with a call to hetrd. Then, whenever
possible, ?heevr calls stegr to compute the eigenspectrum using Relatively Robust Representations. ?stegr
computes eigenvalues by the dqds algorithm, while orthogonal eigenvectors are computed from various
"good" L*D*LT representations (also known as Relatively Robust Representations). Gram-Schmidt
orthogonalization is avoided as far as possible. More specifically, the various steps of the algorithm are as
follows. For each unreduced block (submatrix) of T:
a. Compute T - σ*I = L*D*LT, so that L and D define all the wanted eigenvalues to high relative
accuracy. This means that small relative changes in the entries of D and L cause only small relative
changes in the eigenvalues and eigenvectors. The standard (unfactored) representation of the
tridiagonal matrix T does not have this property in general.
b. Compute the eigenvalues to suitable accuracy. If the eigenvectors are desired, the algorithm attains full
accuracy of the computed eigenvalues only right before the corresponding vectors have to be
computed, see Steps c) and d).
c. For each cluster of close eigenvalues, select a new shift close to the cluster, find a new factorization,
and refine the shifted eigenvalues to suitable accuracy.
d. For each eigenvalue with a large enough relative separation, compute the corresponding eigenvector by
forming a rank revealing twisted factorization. Go back to Step c) for any clusters that remain.
The desired accuracy of the output can be specified by the input parameter abstol.
The routine ?heevr calls stemr when the full spectrum is requested on machines which conform to the
IEEE-754 floating point standard, or stebz and stein on non-IEEE machines and when partial spectrum
requests are made.
Note that the routine ?heevr is preferable for most cases of complex Hermitian eigenvalue problems as its
underlying algorithm is fast and uses less workspace.
Input Parameters
1030
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If uplo = 'L', a stores the lower triangular part of A.
vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
Constraint: vl< vu.
il, iu If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint: 1 ≤il≤iu≤n, if n > 0; il=1 and iu=0 if n = 0.
Output Parameters
a On exit, the lower triangle (if uplo = 'L') or the upper triangle (if uplo =
'U') of A, including the diagonal, is overwritten.
1031
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
z Array z(size max(1, ldz*m) for column major layout and max(1, ldz*n) for
row major layout) .
If jobz = 'V', then if info = 0, the first m columns of z contain the
orthonormal eigenvectors of the matrix A corresponding to the selected
eigenvalues, with the i-th column of z holding the eigenvector associated
with w[i - 1].
Return Values
This function returns a value info.
Application Notes
Normal execution of ?stemr may create NaNs and infinities and hence may abort due to a floating point
exception in environments which do not handle NaNs and infinities in the IEEE standard default manner.
• Inderjit S. Dhillon and Beresford N. Parlett: "Multiple representations to compute orthogonal eigenvectors
of symmetric tridiagonal matrices," Linear Algebra and its Applications, 387(1), pp. 1-28, August 2004.
• Inderjit Dhillon and Beresford Parlett: "Orthogonal Eigenvectors and Relative Gaps," SIAM Journal on
Matrix Analysis and Applications, Vol. 25, 2004. Also LAPACK Working Note 154.
• Inderjit Dhillon: "A new O(n^2) algorithm for the symmetric tridiagonal eigenvalue/eigenvector problem",
Computer Science Division Technical Report No. UCB/CSD-97-971, UC Berkeley, May 1997.
?spev
Computes all eigenvalues and, optionally,
eigenvectors of a real symmetric matrix in packed
storage.
Syntax
lapack_int LAPACKE_sspev (int matrix_layout, char jobz, char uplo, lapack_int n, float*
ap, float* w, float* z, lapack_int ldz);
lapack_int LAPACKE_dspev (int matrix_layout, char jobz, char uplo, lapack_int n,
double* ap, double* w, double* z, lapack_int ldz);
Include Files
• mkl.h
1032
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
The routine computes all the eigenvalues and, optionally, eigenvectors of a real symmetric matrix A in
packed storage.
Input Parameters
Output Parameters
w, z Arrays:
w, size at least max(1, n).
If info = 0, w contains the eigenvalues of the matrix A in ascending order.
Return Values
This function returns a value info.
1033
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If info = i, then the algorithm failed to converge; i indicates the number of elements of an intermediate
tridiagonal form which did not converge to zero.
?hpev
Computes all eigenvalues and, optionally,
eigenvectors of a Hermitian matrix in packed storage.
Syntax
lapack_int LAPACKE_chpev( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_complex_float* ap, float* w, lapack_complex_float* z, lapack_int ldz );
lapack_int LAPACKE_zhpev( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_complex_double* ap, double* w, lapack_complex_double* z, lapack_int ldz );
Include Files
• mkl.h
Description
The routine computes all the eigenvalues and, optionally, eigenvectors of a complex Hermitian matrix A in
packed storage.
Input Parameters
Output Parameters
1034
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = 0, w contains the eigenvalues of the matrix A in ascending order.
Return Values
This function returns a value info.
If info = i, then the algorithm failed to converge; i indicates the number of elements of an intermediate
tridiagonal form which did not converge to zero.
?spevd
Uses divide and conquer algorithm to compute all
eigenvalues and (optionally) all eigenvectors of a real
symmetric matrix held in packed storage.
Syntax
lapack_int LAPACKE_sspevd (int matrix_layout, char jobz, char uplo, lapack_int n,
float* ap, float* w, float* z, lapack_int ldz);
lapack_int LAPACKE_dspevd (int matrix_layout, char jobz, char uplo, lapack_int n,
double* ap, double* w, double* z, lapack_int ldz);
Include Files
• mkl.h
Description
The routine computes all the eigenvalues, and optionally all the eigenvectors, of a real symmetric matrix A
(held in packed storage). In other words, it can compute the spectral factorization of A as:
A = Z*Λ*ZT.
Here Λ is a diagonal matrix whose diagonal elements are the eigenvalues λi, and Z is the orthogonal matrix
whose columns are the eigenvectors zi. Thus,
A*zi = λi*zi for i = 1, 2, ..., n.
If the eigenvectors are requested, then this routine uses a divide and conquer algorithm to compute
eigenvalues and eigenvectors. However, if only eigenvalues are required, then it uses the Pal-Walker-Kahan
variant of the QL or QR algorithm.
1035
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
Output Parameters
w, z Arrays:
w, size at least max(1, n).
If info = 0, contains the eigenvalues of the matrix A in ascending order.
See also info.
z (size max(1, ldz*n)).
If jobz = 'V', then this array is overwritten by the orthogonal matrix Z
which contains the eigenvectors of A. If jobz = 'N', then z is not
referenced.
Return Values
This function returns a value info.
If info = i, then the algorithm failed to converge; i indicates the number of elements of an intermediate
tridiagonal form which did not converge to zero.
If info = -i, the i-th parameter had an illegal value.
1036
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Application Notes
The computed eigenvalues and eigenvectors are exact for a matrix A+E such that ||E||2 = O(ε)*||A||2,
where ε is the machine precision.
The complex analogue of this routine is hpevd.
See also syevd for matrices held in full storage, and sbevd for banded matrices.
?hpevd
Uses divide and conquer algorithm to compute all
eigenvalues and, optionally, all eigenvectors of a
complex Hermitian matrix held in packed storage.
Syntax
lapack_int LAPACKE_chpevd( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_complex_float* ap, float* w, lapack_complex_float* z, lapack_int ldz );
lapack_int LAPACKE_zhpevd( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_complex_double* ap, double* w, lapack_complex_double* z, lapack_int ldz );
Include Files
• mkl.h
Description
The routine computes all the eigenvalues, and optionally all the eigenvectors, of a complex Hermitian matrix
A (held in packed storage). In other words, it can compute the spectral factorization of A as: A = Z*Λ*ZH.
Here Λ is a real diagonal matrix whose diagonal elements are the eigenvalues λi, and Z is the (complex)
unitary matrix whose columns are the eigenvectors zi. Thus,
A*zi = λi*zi for i = 1, 2, ..., n.
If the eigenvectors are requested, then this routine uses a divide and conquer algorithm to compute
eigenvalues and eigenvectors. However, if only eigenvalues are required, then it uses the Pal-Walker-Kahan
variant of the QL or QR algorithm.
Input Parameters
1037
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
If jobz = 'V', then this array is overwritten by the unitary matrix Z which
contains the eigenvectors of A.
If jobz = 'N', then z is not referenced.
Return Values
This function returns a value info.
If info = i, then the algorithm failed to converge; i indicates the number of elements of an intermediate
tridiagonal form which did not converge to zero.
Application Notes
The computed eigenvalues and eigenvectors are exact for a matrix A + E such that ||E||2 = O(ε)*||A||2,
where ε is the machine precision.
The real analogue of this routine is spevd.
See also heevd for matrices held in full storage, and hbevd for banded matrices.
?spevx
Computes selected eigenvalues and, optionally,
eigenvectors of a real symmetric matrix in packed
storage.
Syntax
lapack_int LAPACKE_sspevx (int matrix_layout, char jobz, char range, char uplo,
lapack_int n, float* ap, float vl, float vu, lapack_int il, lapack_int iu, float abstol,
lapack_int* m, float* w, float* z, lapack_int ldz, lapack_int* ifail);
lapack_int LAPACKE_dspevx (int matrix_layout, char jobz, char range, char uplo,
lapack_int n, double* ap, double vl, double vu, lapack_int il, lapack_int iu, double
abstol, lapack_int* m, double* w, double* z, lapack_int ldz, lapack_int* ifail);
1038
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h
Description
The routine computes selected eigenvalues and, optionally, eigenvectors of a real symmetric matrix A in
packed storage. Eigenvalues and eigenvectors can be selected by specifying either a range of values or a
range of indices for the desired eigenvalues.
Input Parameters
vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
Constraint: vl< vu.
il, iu If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint: 1 ≤il≤iu≤n, if n > 0; il=1 and iu=0
if n = 0.
abstol The absolute error tolerance to which each eigenvalue is required. See
Application notes for details on error tolerance.
1039
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
if jobz = 'V', then ldz≥ max(1, n) for column major layout and ldz≥
max(1, m) for row major layout.
Output Parameters
w, z Arrays:
w, size at least max(1, n).
If info = 0, contains the selected eigenvalues of the matrix A in ascending
order.
z(size max(1, ldz*m) for column major layout and max(1, ldz*n) for row
major layout).
If jobz = 'V', then if info = 0, the first m columns of z contain the
orthonormal eigenvectors of the matrix A corresponding to the selected
eigenvalues, with the i-th column of z holding the eigenvector associated
with w[i - 1].
Return Values
This function returns a value info.
If info = i, then i eigenvectors failed to converge; their indices are stored in the array ifail.
1040
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Application Notes
An approximate eigenvalue is accepted as converged when it is determined to lie in an interval [a,b] of width
less than or equal to abstol+ε*max(|a|,|b|), where ε is the machine precision.
If abstol is less than or equal to zero, then ε*||T||1 will be used in its place, where T is the tridiagonal
matrix obtained by reducing A to tridiagonal form. Eigenvalues will be computed most accurately when abstol
is set to twice the underflow threshold 2*?lamch('S'), not zero.
If this routine returns with info > 0, indicating that some eigenvectors did not converge, try setting abstol
to 2*?lamch('S').
?hpevx
Computes selected eigenvalues and, optionally,
eigenvectors of a Hermitian matrix in packed storage.
Syntax
lapack_int LAPACKE_chpevx( int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_complex_float* ap, float vl, float vu, lapack_int il, lapack_int
iu, float abstol, lapack_int* m, float* w, lapack_complex_float* z, lapack_int ldz,
lapack_int* ifail );
lapack_int LAPACKE_zhpevx( int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_complex_double* ap, double vl, double vu, lapack_int il,
lapack_int iu, double abstol, lapack_int* m, double* w, lapack_complex_double* z,
lapack_int ldz, lapack_int* ifail );
Include Files
• mkl.h
Description
The routine computes selected eigenvalues and, optionally, eigenvectors of a complex Hermitian matrix A in
packed storage. Eigenvalues and eigenvectors can be selected by specifying either a range of values or a
range of indices for the desired eigenvalues.
Input Parameters
1041
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
Constraint: vl< vu.
il, iu If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint: 1 ≤il≤iu≤n, if n > 0; il=1 and iu=0 if n = 0.
abstol The absolute error tolerance to which each eigenvalue is required. See
Application notes for details on error tolerance.
if jobz = 'V', then ldz≥ max(1, n) for column major layout and ldz≥
max(1, m) for row major layout.
Output Parameters
z Array z(size max(1, ldz*m) for column major layout and max(1, ldz*n) for
row major layout).
If jobz = 'V', then if info = 0, the first m columns of z contain the
orthonormal eigenvectors of the matrix A corresponding to the selected
eigenvalues, with the i-th column of z holding the eigenvector associated
with w(i).
1042
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If an eigenvector fails to converge, then that column of z contains the latest
approximation to the eigenvector, and the index of the eigenvector is
returned in ifail.
If jobz = 'N', then z is not referenced.
Return Values
This function returns a value info.
If info = i, then i eigenvectors failed to converge; their indices are stored in the array ifail.
Application Notes
An approximate eigenvalue is accepted as converged when it is determined to lie in an interval [a,b] of width
less than or equal to abstol+ε*max(|a|,|b|), where ε is the machine precision.
If abstol is less than or equal to zero, then ε*||T||1 will be used in its place, where T is the tridiagonal
matrix obtained by reducing A to tridiagonal form. Eigenvalues will be computed most accurately when abstol
is set to twice the underflow threshold 2*?lamch('S'), not zero.
If this routine returns with info > 0, indicating that some eigenvectors did not converge, try setting abstol
to 2*?lamch('S').
?sbev
Computes all eigenvalues and, optionally,
eigenvectors of a real symmetric band matrix.
Syntax
lapack_int LAPACKE_ssbev (int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int kd, float* ab, lapack_int ldab, float* w, float* z, lapack_int ldz);
lapack_int LAPACKE_dsbev (int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int kd, double* ab, lapack_int ldab, double* w, double* z, lapack_int ldz);
Include Files
• mkl.h
Description
The routine computes all eigenvalues and, optionally, eigenvectors of a real symmetric band matrix A.
1043
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
ab ab (size at least max(1, ldab*n) for column major layout and at least
max(1, ldab*(kd + 1)) for row major layout) is an array containing either
upper or lower triangular part of the symmetric matrix A (as specified by
uplo) in band storage format.
ldab The leading dimension of ab; must be at least kd +1 for column major
layout and n for row major layout.
Output Parameters
w, z Arrays:
w, size at least max(1, n).
If info = 0, contains the eigenvalues of the matrix A in ascending order.
Return Values
This function returns a value info.
1044
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = -i, the i-th parameter had an illegal value.
?hbev
Computes all eigenvalues and, optionally,
eigenvectors of a Hermitian band matrix.
Syntax
lapack_int LAPACKE_chbev( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int kd, lapack_complex_float* ab, lapack_int ldab, float* w,
lapack_complex_float* z, lapack_int ldz );
lapack_int LAPACKE_zhbev( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int kd, lapack_complex_double* ab, lapack_int ldab, double* w,
lapack_complex_double* z, lapack_int ldz );
Include Files
• mkl.h
Description
The routine computes all eigenvalues and, optionally, eigenvectors of a complex Hermitian band matrix A.
Input Parameters
ab ab (size at least max(1, ldab*n) for column major layout and at least
max(1, ldab*(kd + 1)) for row major layout) is an array containing either
upper or lower triangular part of the Hermitian matrix A (as specified by
uplo) in band storage format.
ldab The leading dimension of ab; must be at least kd +1 for column major
layout and n for row major layout.
1045
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
Return Values
This function returns a value info.
i indicates the number of elements of an intermediate tridiagonal form which did not converge to zero.
?sbevd
Computes all eigenvalues and, optionally, all
eigenvectors of a real symmetric band matrix using
divide and conquer algorithm.
Syntax
lapack_int LAPACKE_ssbevd (int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int kd, float* ab, lapack_int ldab, float* w, float* z, lapack_int ldz);
lapack_int LAPACKE_dsbevd (int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int kd, double* ab, lapack_int ldab, double* w, double* z, lapack_int ldz);
Include Files
• mkl.h
Description
The routine computes all the eigenvalues, and optionally all the eigenvectors, of a real symmetric band
matrix A. In other words, it can compute the spectral factorization of A as:
A = Z*Λ*ZT
Here Λ is a diagonal matrix whose diagonal elements are the eigenvalues λi, and Z is the orthogonal matrix
whose columns are the eigenvectors zi. Thus,
A*zi = λi*zi for i = 1, 2, ..., n.
1046
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If the eigenvectors are requested, then this routine uses a divide and conquer algorithm to compute
eigenvalues and eigenvectors. However, if only eigenvalues are required, then it uses the Pal-Walker-Kahan
variant of the QL or QR algorithm.
Input Parameters
ab ab (size at least max(1, ldab*n) for column major layout and at least
max(1, ldab*(kd + 1)) for row major layout) is an array containing either
upper or lower triangular part of the symmetric matrix A (as specified by
uplo) in band storage format.
ldab The leading dimension of ab; must be at least kd+1 for column major
layout and n for row major layout.
Output Parameters
w, z Arrays:
w, size at least max(1, n).
If info = 0, contains the eigenvalues of the matrix A in ascending order.
See also info.
z(size max(1, ldz*n if job = 'V' and at least 1 if job = 'N').
If job = 'V', then this array is overwritten by the orthogonal matrix Z
which contains the eigenvectors of A. The i-th column of Z contains the
eigenvector which corresponds to the eigenvalue w[i - 1].
1047
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Return Values
This function returns a value info.
If info = i, then the algorithm failed to converge; i indicates the number of elements of an intermediate
tridiagonal form which did not converge to zero.
If info = -i, the i-th parameter had an illegal value.
Application Notes
The computed eigenvalues and eigenvectors are exact for a matrix A+E such that ||E||2=O(ε)*||A||2,
where ε is the machine precision.
The complex analogue of this routine is hbevd.
See also syevd for matrices held in full storage, and spevd for matrices held in packed storage.
?hbevd
Computes all eigenvalues and, optionally, all
eigenvectors of a complex Hermitian band matrix
using divide and conquer algorithm.
Syntax
lapack_int LAPACKE_chbevd( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int kd, lapack_complex_float* ab, lapack_int ldab, float* w,
lapack_complex_float* z, lapack_int ldz );
lapack_int LAPACKE_zhbevd( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int kd, lapack_complex_double* ab, lapack_int ldab, double* w,
lapack_complex_double* z, lapack_int ldz );
Include Files
• mkl.h
Description
The routine computes all the eigenvalues, and optionally all the eigenvectors, of a complex Hermitian band
matrix A. In other words, it can compute the spectral factorization of A as: A = Z*Λ*ZH.
Here Λ is a real diagonal matrix whose diagonal elements are the eigenvalues λi, and Z is the (complex)
unitary matrix whose columns are the eigenvectors zi. Thus,
A*zi = λi*zi for i = 1, 2, ..., n.
If the eigenvectors are requested, then this routine uses a divide and conquer algorithm to compute
eigenvalues and eigenvectors. However, if only eigenvalues are required, then it uses the Pal-Walker-Kahan
variant of the QL or QR algorithm.
Input Parameters
1048
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If jobz = 'V', then eigenvalues and eigenvectors are computed.
ab ab (size at least max(1, ldab*n) for column major layout and at least
max(1, ldab*(kd + 1)) for row major layout) is an array containing either
upper or lower triangular part of the Hermitian matrix A (as specified by
uplo) in band storage format.
ldab The leading dimension of ab; must be at least kd+1 for column major
layout and n for row major layout.
Output Parameters
z Array, size max(1, ldz*n if job = 'V' and at least 1 if job = 'N'.
If jobz = 'V', then this array is overwritten by the unitary matrix Z which
contains the eigenvectors of A. The i-th column of Z contains the
eigenvector which corresponds to the eigenvalue w[i - 1].
Return Values
This function returns a value info.
Application Notes
The computed eigenvalues and eigenvectors are exact for a matrix A + E such that ||E||2 = O(ε)||A||2,
where ε is the machine precision.
The real analogue of this routine is sbevd.
See also heevd for matrices held in full storage, and hpevd for matrices held in packed storage.
1049
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?sbevx
Computes selected eigenvalues and, optionally,
eigenvectors of a real symmetric band matrix.
Syntax
lapack_int LAPACKE_ssbevx (int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_int kd, float* ab, lapack_int ldab, float* q, lapack_int ldq, float
vl, float vu, lapack_int il, lapack_int iu, float abstol, lapack_int* m, float* w,
float* z, lapack_int ldz, lapack_int* ifail);
lapack_int LAPACKE_dsbevx (int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_int kd, double* ab, lapack_int ldab, double* q, lapack_int ldq,
double vl, double vu, lapack_int il, lapack_int iu, double abstol, lapack_int* m,
double* w, double* z, lapack_int ldz, lapack_int* ifail);
Include Files
• mkl.h
Description
The routine computes selected eigenvalues and, optionally, eigenvectors of a real symmetric band matrix A.
Eigenvalues and eigenvectors can be selected by specifying either a range of values or a range of indices for
the desired eigenvalues.
Input Parameters
ab Arrays:
1050
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Array ab (size at least max(1, ldab*n) for column major layout and at least
max(1, ldab*(kd + 1)) for row major layout) contains either upper or
lower triangular part of the symmetric matrix A (as specified by uplo) in
band storage format.
ldab The leading dimension of ab; must be at least kd +1 for column major
layout and n for row major layout.
vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
Constraint: vl< vu.
il, iu If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint: 1 ≤il≤iu≤n, if n > 0; il=1 and iu=0
if n = 0.
abstol The absolute error tolerance to which each eigenvalue is required. See
Application notes for details on error tolerance.
ldq, ldz The leading dimensions of the output arrays q and z, respectively.
Constraints:
ldq≥ 1, ldz≥ 1;
If jobz = 'V', then ldq≥ max(1, n) and ldz≥ max(1, n) for column
major layout and ldz≥ max(1, m) for row major layout .
Output Parameters
w, z Arrays:
w, size at least max(1, n). The first m elements of w contain the selected
eigenvalues of the matrix A in ascending order.
z(size at least max(1, ldz*m) for column major layout and max(1, ldz*n)
for row major layout).
If jobz = 'V', then if info = 0, the first m columns of z contain the
orthonormal eigenvectors of the matrix A corresponding to the selected
eigenvalues, with the i-th column of z holding the eigenvector associated
with w[i - 1].
1051
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Return Values
This function returns a value info.
If info = i, then i eigenvectors failed to converge; their indices are stored in the array ifail.
Application Notes
An approximate eigenvalue is accepted as converged when it is determined to lie in an interval [a,b] of width
less than or equal to abstol+ε*max(|a|,|b|), where ε is the machine precision.
If abstol is less than or equal to zero, then ε*||T||1 is used as tolerance, where T is the tridiagonal matrix
obtained by reducing A to tridiagonal form. Eigenvalues will be computed most accurately when abstol is set
to twice the underflow threshold 2*?lamch('S'), not zero.
If this routine returns with info > 0, indicating that some eigenvectors did not converge, try setting abstol
to 2*?lamch('S').
?hbevx
Computes selected eigenvalues and, optionally,
eigenvectors of a Hermitian band matrix.
Syntax
lapack_int LAPACKE_chbevx( int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_int kd, lapack_complex_float* ab, lapack_int ldab,
lapack_complex_float* q, lapack_int ldq, float vl, float vu, lapack_int il, lapack_int
iu, float abstol, lapack_int* m, float* w, lapack_complex_float* z, lapack_int ldz,
lapack_int* ifail );
lapack_int LAPACKE_zhbevx( int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_int kd, lapack_complex_double* ab, lapack_int ldab,
lapack_complex_double* q, lapack_int ldq, double vl, double vu, lapack_int il,
lapack_int iu, double abstol, lapack_int* m, double* w, lapack_complex_double* z,
lapack_int ldz, lapack_int* ifail );
Include Files
• mkl.h
1052
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
The routine computes selected eigenvalues and, optionally, eigenvectors of a complex Hermitian band matrix
A. Eigenvalues and eigenvectors can be selected by specifying either a range of values or a range of indices
for the desired eigenvalues.
Input Parameters
ab ab (size at least max(1, ldab*n) for column major layout and at least
max(1, ldab*(kd + 1)) for row major layout) is an array containing either
upper or lower triangular part of the Hermitian matrix A (as specified by
uplo) in band storage format.
ldab The leading dimension of ab; must be at least kd +1 for column major
layout and n for row major layout.
vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
Constraint: vl< vu.
il, iu If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint: 1 ≤il≤iu≤n, if n > 0; il=1 and iu=0 if n = 0.
1053
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
abstol The absolute error tolerance to which each eigenvalue is required. See
Application notes for details on error tolerance.
ldq, ldz The leading dimensions of the output arrays q and z, respectively.
Constraints:
ldq≥ 1, ldz≥ 1;
If jobz = 'V', then ldq≥ max(1, n) and ldz≥ max(1, n) for column major
layout and ldz≥ max(1, m) for row major layout.
Output Parameters
w Array, size at least max(1, n). The first m elements contain the selected
eigenvalues of the matrix A in ascending order.
z Array z(size at least max(1, ldz*m) for column major layout and max(1,
ldz*n) for row major layout).
If jobz = 'V', then if info = 0, the first m columns of z contain the
orthonormal eigenvectors of the matrix A corresponding to the selected
eigenvalues, with the i-th column of z holding the eigenvector associated
with w[i - 1].
1054
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Return Values
This function returns a value info.
If info = i, then i eigenvectors failed to converge; their indices are stored in the array ifail.
Application Notes
An approximate eigenvalue is accepted as converged when it is determined to lie in an interval [a,b] of width
less than or equal to abstol + ε * max( |a|,|b| ), where ε is the machine precision.
If abstol is less than or equal to zero, then ε*||T||1 will be used in its place, where T is the tridiagonal
matrix obtained by reducing A to tridiagonal form. Eigenvalues will be computed most accurately when abstol
is set to twice the underflow threshold 2*?lamch('S'), not zero.
If this routine returns with info > 0, indicating that some eigenvectors did not converge, try setting abstol
to 2*?lamch('S').
?stev
Computes all eigenvalues and, optionally,
eigenvectors of a real symmetric tridiagonal matrix.
Syntax
lapack_int LAPACKE_sstev (int matrix_layout, char jobz, lapack_int n, float* d, float*
e, float* z, lapack_int ldz);
lapack_int LAPACKE_dstev (int matrix_layout, char jobz, lapack_int n, double* d,
double* e, double* z, lapack_int ldz);
Include Files
• mkl.h
Description
The routine computes all eigenvalues and, optionally, eigenvectors of a real symmetric tridiagonal matrix A.
Input Parameters
d, e Arrays:
Array d contains the n diagonal elements of the tridiagonal matrix A.
1055
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The size of e must be at least max(1, n). The n-th element of this array is
used as workspace.
ldz The leading dimension of the output array z; ldz≥ 1. If jobz = 'V' then
ldz≥ max(1, n).
Output Parameters
Return Values
This function returns a value info.
?stevd
Computes all eigenvalues and, optionally, all
eigenvectors of a real symmetric tridiagonal matrix
using divide and conquer algorithm.
Syntax
lapack_int LAPACKE_sstevd (int matrix_layout, char jobz, lapack_int n, float* d, float*
e, float* z, lapack_int ldz);
lapack_int LAPACKE_dstevd (int matrix_layout, char jobz, lapack_int n, double* d,
double* e, double* z, lapack_int ldz);
Include Files
• mkl.h
Description
The routine computes all the eigenvalues, and optionally all the eigenvectors, of a real symmetric tridiagonal
matrix T. In other words, the routine can compute the spectral factorization of T as: T = Z*Λ*ZT.
Here Λ is a diagonal matrix whose diagonal elements are the eigenvalues λi, and Z is the orthogonal matrix
whose columns are the eigenvectors zi. Thus,
T*zi = λi*zi for i = 1, 2, ..., n.
1056
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If the eigenvectors are requested, then this routine uses a divide and conquer algorithm to compute
eigenvalues and eigenvectors. However, if only eigenvalues are required, then it uses the Pal-Walker-Kahan
variant of the QL or QR algorithm.
There is no complex analogue of this routine.
Input Parameters
d, e Arrays:
d contains the n diagonal elements of the tridiagonal matrix T.
The dimension of d must be at least max(1, n).
e contains the n-1 off-diagonal elements of T.
The dimension of e must be at least max(1, n). The n-th element of this
array is used as workspace.
Output Parameters
Return Values
This function returns a value info.
If info = i, then the algorithm failed to converge; i indicates the number of elements of an intermediate
tridiagonal form which did not converge to zero.
If info = -i, the i-th parameter had an illegal value.
1057
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Application Notes
The computed eigenvalues and eigenvectors are exact for a matrix T+E such that ||E||2 = O(ε)*||T||2,
where ε is the machine precision.
If λi is an exact eigenvalue, and μi is the corresponding computed value, then
|μi - λi| ≤ c(n)*ε*||T||2
where c(n) is a modestly increasing function of n.
If zi is the corresponding exact eigenvector, and wi is the corresponding computed vector, then the angle
θ(zi, wi) between them is bounded as follows:
θ(zi, wi) ≤ c(n)*ε*||T||2 / min i≠j|λi - λj|.
Thus the accuracy of a computed eigenvector depends on the gap between its eigenvalue and all the other
eigenvalues.
?stevx
Computes selected eigenvalues and eigenvectors of a
real symmetric tridiagonal matrix.
Syntax
lapack_int LAPACKE_sstevx (int matrix_layout, char jobz, char range, lapack_int n,
float* d, float* e, float vl, float vu, lapack_int il, lapack_int iu, float abstol,
lapack_int* m, float* w, float* z, lapack_int ldz, lapack_int* ifail);
lapack_int LAPACKE_dstevx (int matrix_layout, char jobz, char range, lapack_int n,
double* d, double* e, double vl, double vu, lapack_int il, lapack_int iu, double abstol,
lapack_int* m, double* w, double* z, lapack_int ldz, lapack_int* ifail);
Include Files
• mkl.h
Description
The routine computes selected eigenvalues and, optionally, eigenvectors of a real symmetric tridiagonal
matrix A. Eigenvalues and eigenvectors can be selected by specifying either a range of values or a range of
indices for the desired eigenvalues.
Input Parameters
1058
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If range = 'I', the routine computes eigenvalues with indices il to iu.
d, e Arrays:
d contains the n diagonal elements of the tridiagonal matrix A.
The dimension of d must be at least max(1, n).
e contains the n-1 subdiagonal elements of A.
The dimension of e must be at least max(1, n-1). The n-th element of this
array is used as workspace.
vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
Constraint: vl< vu.
il, iu If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint: 1 ≤il≤iu≤n, if n > 0; il=1 and iu=0 if n = 0.
abstol
ldz The leading dimensions of the output array z; ldz≥ 1. If jobz = 'V', then
ldz≥ max(1, n) for column major layout and ldz≥ max(1, m) for row major
layout.
Output Parameters
w, z Arrays:
w, size at least max(1, n).
The first m elements of w contain the selected eigenvalues of the matrix A
in ascending order.
z(size at least max(1, ldz*m) for column major layout and max(1, ldz*n)
for row major layout) .
If jobz = 'V', then if info = 0, the first m columns of z contain the
orthonormal eigenvectors of the matrix A corresponding to the selected
eigenvalues, with the i-th column of z holding the eigenvector associated
with w[i - 1].
1059
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Return Values
This function returns a value info.
If info = i, then i eigenvectors failed to converge; their indices are stored in the array ifail.
Application Notes
An approximate eigenvalue is accepted as converged when it is determined to lie in an interval [a,b] of width
less than or equal to abstol+ε*max(|a|,|b|), where ε is the machine precision.
If abstol is less than or equal to zero, then ε*|A|1 is used instead. Eigenvalues are computed most accurately
when abstol is set to twice the underflow threshold 2*?lamch('S'), not zero.
If this routine returns with info > 0, indicating that some eigenvectors did not converge, set abstol to
2*?lamch('S').
?stevr
Computes selected eigenvalues and, optionally,
eigenvectors of a real symmetric tridiagonal matrix
using the Relatively Robust Representations.
Syntax
lapack_int LAPACKE_sstevr (int matrix_layout, char jobz, char range, lapack_int n,
float* d, float* e, float vl, float vu, lapack_int il, lapack_int iu, float abstol,
lapack_int* m, float* w, float* z, lapack_int ldz, lapack_int* isuppz);
lapack_int LAPACKE_dstevr (int matrix_layout, char jobz, char range, lapack_int n,
double* d, double* e, double vl, double vu, lapack_int il, lapack_int iu, double abstol,
lapack_int* m, double* w, double* z, lapack_int ldz, lapack_int* isuppz);
Include Files
• mkl.h
Description
The routine computes selected eigenvalues and, optionally, eigenvectors of a real symmetric tridiagonal
matrix T. Eigenvalues and eigenvectors can be selected by specifying either a range of values or a range of
indices for the desired eigenvalues.
1060
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Whenever possible, the routine calls stemr to compute the eigenspectrum using Relatively Robust
Representations. stegr computes eigenvalues by the dqds algorithm, while orthogonal eigenvectors are
computed from various "good" L*D*LT representations (also known as Relatively Robust Representations).
Gram-Schmidt orthogonalization is avoided as far as possible. More specifically, the various steps of the
algorithm are as follows. For the i-th unreduced block of T:
The desired accuracy of the output can be specified by the input parameter abstol.
The routine ?stevr calls stemr when the full spectrum is requested on machines which conform to the
IEEE-754 floating point standard. ?stevr calls stebz and stein on non-IEEE machines and when partial
spectrum requests are made.
Input Parameters
For range = 'V'or 'I' and iu-il < n-1, sstebz/dstebz and sstein/
dstein are called.
d, e Arrays:
d contains the n diagonal elements of the tridiagonal matrix T.
The dimension of d must be at least max(1, n).
econtains the n-1 subdiagonal elements of A.
The dimension of e must be at least max(1, n-1). The n-th element of this
array is used as workspace.
vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
Constraint: vl< vu.
il, iu
1061
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint: 1 ≤il≤iu≤n, if n > 0; il=1 and iu=0 if n = 0.
Output Parameters
w, z Arrays:
w, size at least max(1, n).
The first m elements of w contain the selected eigenvalues of the matrix T
in ascending order.
z(size at least max(1, ldz*m) for column major layout and max(1, ldz*n)
for row major layout).
If jobz = 'V', then if info = 0, the first m columns of z contain the
orthonormal eigenvectors of the matrix T corresponding to the selected
eigenvalues, with the i-th column of z holding the eigenvector associated
with w[i - 1].
1062
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Return Values
This function returns a value info.
Application Notes
Normal execution of the routine ?stegr may create NaNs and infinities and hence may abort due to a floating
point exception in environments which do not handle NaNs and infinities in the IEEE standard default manner.
gees Computes the eigenvalues and Schur factorization of a general matrix, and orders
the factorization so that selected eigenvalues are at the top left of the Schur form.
geesx Computes the eigenvalues and Schur factorization of a general matrix, orders the
factorization and computes reciprocal condition numbers.
geev Computes the eigenvalues and left and right eigenvectors of a general matrix.
geevx Computes the eigenvalues and left and right eigenvectors of a general matrix, with
preliminary matrix balancing, and computes reciprocal condition numbers for the
eigenvalues and right eigenvectors.
?gees
Computes the eigenvalues and Schur factorization of a
general matrix, and orders the factorization so that
selected eigenvalues are at the top left of the Schur
form.
Syntax
lapack_int LAPACKE_sgees( int matrix_layout, char jobvs, char sort, LAPACK_S_SELECT2
select, lapack_int n, float* a, lapack_int lda, lapack_int* sdim, float* wr, float* wi,
float* vs, lapack_int ldvs );
lapack_int LAPACKE_dgees( int matrix_layout, char jobvs, char sort, LAPACK_D_SELECT2
select, lapack_int n, double* a, lapack_int lda, lapack_int* sdim, double* wr, double*
wi, double* vs, lapack_int ldvs );
lapack_int LAPACKE_cgees( int matrix_layout, char jobvs, char sort, LAPACK_C_SELECT1
select, lapack_int n, lapack_complex_float* a, lapack_int lda, lapack_int* sdim,
lapack_complex_float* w, lapack_complex_float* vs, lapack_int ldvs );
lapack_int LAPACKE_zgees( int matrix_layout, char jobvs, char sort, LAPACK_Z_SELECT1
select, lapack_int n, lapack_complex_double* a, lapack_int lda, lapack_int* sdim,
lapack_complex_double* w, lapack_complex_double* vs, lapack_int ldvs );
1063
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine computes for an n-by-n real/complex nonsymmetric matrix A, the eigenvalues, the real Schur
form T, and, optionally, the matrix of Schur vectors Z. This gives the Schur factorization A = Z*T*ZH.
Optionally, it also orders the eigenvalues on the diagonal of the real-Schur/Schur form so that selected
eigenvalues are at the top left. The leading columns of Z then form an orthonormal basis for the invariant
subspace corresponding to the selected eigenvalues.
A real matrix is in real-Schur form if it is upper quasi-triangular with 1-by-1 and 2-by-2 blocks. 2-by-2 blocks
will be standardized in the form
1064
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
sort Must be 'N' or 'S'. Specifies whether or not to order the eigenvalues on
the diagonal of the Schur form.
If sort = 'N', then eigenvalues are not ordered.
select If sort = 'S', select is used to select eigenvalues to sort to the top left of
the Schur form.
If sort = 'N', select is not referenced.
a Arrays:
a (size at least max(1, lda*n)) is an array containing the n-by-n matrix A.
lda The leading dimension of the array a. Must be at least max(1, n).
Output Parameters
1065
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
wr, wi Arrays, size at least max (1, n) each. Contain the real and imaginary parts,
respectively, of the computed eigenvalues, in the same order that they
appear on the diagonal of the output real-Schur form T. Complex conjugate
pairs of eigenvalues appear consecutively with the eigenvalue having
positive imaginary part first.
w Array, size at least max(1, n). Contains the computed eigenvalues. The
eigenvalues are stored in the same order as they appear on the diagonal of
the output Schur form T.
Return Values
This function returns a value info.
If info = i, and
i≤n:
the QR algorithm failed to compute all the eigenvalues; elements 1:ilo-1 and i+1:n of wr and wi (for real
flavors) or w (for complex flavors) contain those eigenvalues which have converged; if jobvs = 'V', vs
contains the matrix which reduces A to its partially converged Schur form;
i = n+1:
the eigenvalues could not be reordered because some eigenvalues were too close to separate (the problem is
very ill-conditioned);
i = n+2:
after reordering, round-off changed values of some complex eigenvalues so that leading eigenvalues in the
Schur form no longer satisfy select = 1. This could also be caused by underflow due to scaling.
?geesx
Computes the eigenvalues and Schur factorization of a
general matrix, orders the factorization and computes
reciprocal condition numbers.
Syntax
lapack_int LAPACKE_sgeesx( int matrix_layout, char jobvs, char sort, LAPACK_S_SELECT2
select, char sense, lapack_int n, float* a, lapack_int lda, lapack_int* sdim, float* wr,
float* wi, float* vs, lapack_int ldvs, float* rconde, float* rcondv );
lapack_int LAPACKE_dgeesx( int matrix_layout, char jobvs, char sort, LAPACK_D_SELECT2
select, char sense, lapack_int n, double* a, lapack_int lda, lapack_int* sdim, double*
wr, double* wi, double* vs, lapack_int ldvs, double* rconde, double* rcondv );
lapack_int LAPACKE_cgeesx( int matrix_layout, char jobvs, char sort, LAPACK_C_SELECT1
select, char sense, lapack_int n, lapack_complex_float* a, lapack_int lda, lapack_int*
sdim, lapack_complex_float* w, lapack_complex_float* vs, lapack_int ldvs, float*
rconde, float* rcondv );
1066
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_zgeesx( int matrix_layout, char jobvs, char sort, LAPACK_Z_SELECT1
select, char sense, lapack_int n, lapack_complex_double* a, lapack_int lda, lapack_int*
sdim, lapack_complex_double* w, lapack_complex_double* vs, lapack_int ldvs, double*
rconde, double* rcondv );
Include Files
• mkl.h
Description
The routine computes for an n-by-n real/complex nonsymmetric matrix A, the eigenvalues, the real-Schur/
Schur form T, and, optionally, the matrix of Schur vectors Z. This gives the Schur factorization A = Z*T*ZH.
Optionally, it also orders the eigenvalues on the diagonal of the real-Schur/Schur form so that selected
eigenvalues are at the top left; computes a reciprocal condition number for the average of the selected
eigenvalues (rconde); and computes a reciprocal condition number for the right invariant subspace
corresponding to the selected eigenvalues (rcondv). The leading columns of Z form an orthonormal basis for
this invariant subspace.
For further explanation of the reciprocal condition numbers rconde and rcondv, see [LUG], Section 4.10
(where these quantities are called s and sep respectively).
A real matrix is in real-Schur form if it is upper quasi-triangular with 1-by-1 and 2-by-2 blocks. 2-by-2 blocks
will be standardized in the form
1067
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
sort Must be 'N' or 'S'. Specifies whether or not to order the eigenvalues on
the diagonal of the Schur form.
If sort = 'N', then eigenvalues are not ordered.
select If sort = 'S', select is used to select eigenvalues to sort to the top left of
the Schur form.
If sort = 'N', select is not referenced.
1068
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
For real flavors:
An eigenvalue wr[j]+sqrt(-1)*wi[j] is selected if select(wr[j], wi[j]) is
true; that is, if either one of a complex conjugate pair of eigenvalues is
selected, then both complex eigenvalues are selected.
For complex flavors:
An eigenvalue w[j] is selected if select(w[j]) is true.
Note that a selected complex eigenvalue may no longer satisfy select(wr[j],
wi[j])= 1 after ordering, since ordering may change the value of complex
eigenvalues (especially if the eigenvalue is ill-conditioned); in this case info
may be set to n+2 (see info below).
sense Must be 'N', 'E', 'V', or 'B'. Determines which reciprocal condition
number are computed.
If sense = 'N', none are computed;
a Arrays:
a (size at least max(1, lda*n)) is an array containing the n-by-n matrix A.
lda The leading dimension of the array a. Must be at least max(1, n).
Output Parameters
wr, wi Arrays, size at least max (1, n) each. Contain the real and imaginary parts,
respectively, of the computed eigenvalues, in the same order that they
appear on the diagonal of the output real-Schur form T. Complex conjugate
pairs of eigenvalues appear consecutively with the eigenvalue having
positive imaginary part first.
w Array, size at least max(1, n). Contains the computed eigenvalues. The
eigenvalues are stored in the same order as they appear on the diagonal of
the output Schur form T.
1069
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
rconde, rcondv If sense = 'E' or 'B', rconde contains the reciprocal condition number for
the average of the selected eigenvalues.
If sense = 'N' or 'V', rconde is not referenced.
If sense = 'V' or 'B', rcondv contains the reciprocal condition number for
the selected right invariant subspace.
If sense = 'N' or 'E', rcondv is not referenced.
Return Values
This function returns a value info.
If info = i, and
i≤n:
the QR algorithm failed to compute all the eigenvalues; elements 1:ilo-1 and i+1:n of wr and wi (for real
flavors) or w (for complex flavors) contain those eigenvalues which have converged; if jobvs = 'V', vs
contains the transformation which reduces A to its partially converged Schur form;
i = n+1:
the eigenvalues could not be reordered because some eigenvalues were too close to separate (the problem is
very ill-conditioned);
i = n+2:
after reordering, roundoff changed values of some complex eigenvalues so that leading eigenvalues in the
Schur form no longer satisfy select = 1. This could also be caused by underflow due to scaling.
?geev
Computes the eigenvalues and left and right
eigenvectors of a general matrix.
Syntax
lapack_int LAPACKE_sgeev( int matrix_layout, char jobvl, char jobvr, lapack_int n,
float* a, lapack_int lda, float* wr, float* wi, float* vl, lapack_int ldvl, float* vr,
lapack_int ldvr );
lapack_int LAPACKE_dgeev( int matrix_layout, char jobvl, char jobvr, lapack_int n,
double* a, lapack_int lda, double* wr, double* wi, double* vl, lapack_int ldvl, double*
vr, lapack_int ldvr );
lapack_int LAPACKE_cgeev( int matrix_layout, char jobvl, char jobvr, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* w, lapack_complex_float*
vl, lapack_int ldvl, lapack_complex_float* vr, lapack_int ldvr );
1070
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_zgeev( int matrix_layout, char jobvl, char jobvr, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* w,
lapack_complex_double* vl, lapack_int ldvl, lapack_complex_double* vr, lapack_int
ldvr );
Include Files
• mkl.h
Description
The routine computes for an n-by-n real/complex nonsymmetric matrix A, the eigenvalues and, optionally,
the left and/or right eigenvectors. The right eigenvector v of A satisfies
A*v = λ*v
where λ is its eigenvalue.
Input Parameters
lda The leading dimension of the array a. Must be at least max(1, n).
ldvl, ldvr The leading dimensions of the output arrays vl and vr, respectively.
Constraints:
ldvl≥ 1; ldvr≥ 1.
If jobvl = 'V', ldvl≥ max(1, n);
Output Parameters
1071
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
vl, vr Arrays:
vl (size at least max(1, ldvl*n)) .
If jobvl = 'N', vl is not referenced.
1072
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Return Values
This function returns a value info.
If info = i, the QR algorithm failed to compute all the eigenvalues, and no eigenvectors have been
computed; elements i+1:n of wr and wi (for real flavors) or w (for complex flavors) contain those
eigenvalues which have converged.
?geevx
Computes the eigenvalues and left and right
eigenvectors of a general matrix, with preliminary
matrix balancing, and computes reciprocal condition
numbers for the eigenvalues and right eigenvectors.
Syntax
lapack_int LAPACKE_sgeevx( int matrix_layout, char balanc, char jobvl, char jobvr, char
sense, lapack_int n, float* a, lapack_int lda, float* wr, float* wi, float* vl,
lapack_int ldvl, float* vr, lapack_int ldvr, lapack_int* ilo, lapack_int* ihi, float*
scale, float* abnrm, float* rconde, float* rcondv );
lapack_int LAPACKE_dgeevx( int matrix_layout, char balanc, char jobvl, char jobvr, char
sense, lapack_int n, double* a, lapack_int lda, double* wr, double* wi, double* vl,
lapack_int ldvl, double* vr, lapack_int ldvr, lapack_int* ilo, lapack_int* ihi, double*
scale, double* abnrm, double* rconde, double* rcondv );
lapack_int LAPACKE_cgeevx( int matrix_layout, char balanc, char jobvl, char jobvr, char
sense, lapack_int n, lapack_complex_float* a, lapack_int lda, lapack_complex_float* w,
lapack_complex_float* vl, lapack_int ldvl, lapack_complex_float* vr, lapack_int ldvr,
lapack_int* ilo, lapack_int* ihi, float* scale, float* abnrm, float* rconde, float*
rcondv );
lapack_int LAPACKE_zgeevx( int matrix_layout, char balanc, char jobvl, char jobvr, char
sense, lapack_int n, lapack_complex_double* a, lapack_int lda, lapack_complex_double*
w, lapack_complex_double* vl, lapack_int ldvl, lapack_complex_double* vr, lapack_int
ldvr, lapack_int* ilo, lapack_int* ihi, double* scale, double* abnrm, double* rconde,
double* rcondv );
Include Files
• mkl.h
Description
The routine computes for an n-by-n real/complex nonsymmetric matrix A, the eigenvalues and, optionally,
the left and/or right eigenvectors.
Optionally also, it computes a balancing transformation to improve the conditioning of the eigenvalues and
eigenvectors (ilo, ihi, scale, and abnrm), reciprocal condition numbers for the eigenvalues (rconde), and
reciprocal condition numbers for the right eigenvectors (rcondv).
The right eigenvector v of A satisfies
A·v = λ·v
where λ is its eigenvalue.
1073
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
uHA = λuH
where uH denotes the conjugate transpose of u. The computed eigenvectors are normalized to have Euclidean
norm equal to 1 and largest component real.
Balancing a matrix means permuting the rows and columns to make it more nearly upper triangular, and
applying a diagonal similarity transformation D*A*inv(D), where D is a diagonal matrix, to make its rows and
columns closer in norm and the condition numbers of its eigenvalues and eigenvectors smaller. The computed
reciprocal condition numbers correspond to the balanced matrix. Permuting rows and columns will not
change the condition numbers in exact arithmetic) but diagonal scaling will. For further explanation of
balancing, see [LUG], Section 4.10.
Input Parameters
balanc Must be 'N', 'P', 'S', or 'B'. Indicates how the input matrix should be
diagonally scaled and/or permuted to improve the conditioning of its
eigenvalues.
If balanc = 'N', do not diagonally scale or permute;
sense Must be 'N', 'E', 'V', or 'B'. Determines which reciprocal condition
number are computed.
If sense = 'N', none are computed;
1074
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If sense is 'E' or 'B', both left and right eigenvectors must also be
computed (jobvl = 'V' and jobvr = 'V').
a Arrays:
a (size at least max(1, lda*n)) is an array containing the n-by-n matrix A.
lda The leading dimension of the array a. Must be at least max(1, n).
ldvl, ldvr The leading dimensions of the output arrays vl and vr, respectively.
Constraints:
ldvl≥ 1; ldvr≥ 1.
If jobvl = 'V', ldvl≥ max(1, n);
Output Parameters
wr, wi Arrays, size at least max (1, n) each. Contain the real and imaginary parts,
respectively, of the computed eigenvalues. Complex conjugate pairs of
eigenvalues appear consecutively with the eigenvalue having positive
imaginary part first.
vl, vr Arrays:
vl (size at least max(1, ldvl*n)) .
If jobvl = 'N', vl is not referenced.
1075
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ilo, ihi ilo and ihi are integer values determined when A was balanced.
The balanced A(i,j) = 0 if i > j and j = 1,..., ilo-1 or i = ihi
+1,..., n.
If balanc = 'N' or 'S', ilo = 1 and ihi = n.
scale Array, size at least max(1, n). Details of the permutations and scaling
factors applied when balancing A.
If P[j - 1] is the index of the row and column interchanged with row and
column j, and D[j - 1] is the scaling factor applied to row and column j,
then
scale[j - 1] = P[j - 1], for j = 1,...,ilo-1
= D[j - 1], for j = ilo,...,ihi
= P[j - 1] for j = ihi+1,..., n.
The order in which the interchanges are made is n to ihi+1, then 1 to ilo-1.
abnrm The one-norm of the balanced matrix (the maximum of the sum of absolute
values of elements of any column).
Return Values
This function returns a value info.
1076
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = i, the QR algorithm failed to compute all the eigenvalues, and no eigenvectors or condition
numbers have been computed; elements 1:ilo-1 and i+1:n of wr and wi (for real flavors) or w (for complex
flavors) contain eigenvalues which have converged.
?gesdd Computes the singular value decomposition of a general rectangular matrix using a
divide and conquer method.
?gejsv Computes the singular value decomposition of a real matrix using a preconditioned
Jacobi SVD method.
?gesvj Computes the singular value decomposition of a real matrix using Jacobi plane
rotations.
?gesvdx Computes the SVD and left and right singular vectors for a matrix.
? Computes the truncated SVD of a group of general m-by-n matrices that are stored
gesvda_batch_stri at a constant stride from each other in a contiguous block of memory.
ded
?gesvd
Computes the singular value decomposition of a
general rectangular matrix.
Syntax
lapack_int LAPACKE_sgesvd( int matrix_layout, char jobu, char jobvt, lapack_int m,
lapack_int n, float* a, lapack_int lda, float* s, float* u, lapack_int ldu, float* vt,
lapack_int ldvt, float* superb );
lapack_int LAPACKE_dgesvd( int matrix_layout, char jobu, char jobvt, lapack_int m,
lapack_int n, double* a, lapack_int lda, double* s, double* u, lapack_int ldu, double*
vt, lapack_int ldvt, double* superb );
lapack_int LAPACKE_cgesvd( int matrix_layout, char jobu, char jobvt, lapack_int m,
lapack_int n, lapack_complex_float* a, lapack_int lda, float* s, lapack_complex_float*
u, lapack_int ldu, lapack_complex_float* vt, lapack_int ldvt, float* superb );
lapack_int LAPACKE_zgesvd( int matrix_layout, char jobu, char jobvt, lapack_int m,
lapack_int n, lapack_complex_double* a, lapack_int lda, double* s,
lapack_complex_double* u, lapack_int ldu, lapack_complex_double* vt, lapack_int ldvt,
double* superb );
1077
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine computes the singular value decomposition (SVD) of a real/complex m-by-n matrix A, optionally
computing the left and/or right singular vectors. The SVD is written as
A = U*Σ*VT for real routines
A = U*Σ*VH for complex routines
where Σ is an m-by-n matrix which is zero except for its min(m,n) diagonal elements, U is an m-by-m
orthogonal/unitary matrix, and V is an n-by-n orthogonal/unitary matrix. The diagonal elements of Σ are the
singular values of A; they are real and non-negative, and are returned in descending order. The first min(m,
n) columns of U and V are the left and right singular vectors of A.
The routine returns VT (for real flavors) or VH (for complex flavors), not V.
Input Parameters
jobu Must be 'A', 'S', 'O', or 'N'. Specifies options for computing all or part
of the matrix U.
If jobu = 'A', all m columns of U are returned in the array u;
if jobu = 'S', the first min(m, n) columns of U (the left singular vectors)
are returned in the array u;
if jobu = 'O', the first min(m, n) columns of U (the left singular vectors)
are overwritten on the array a;
if jobu = 'N', no columns of U (no left singular vectors) are computed.
jobvt Must be 'A', 'S', 'O', or 'N'. Specifies options for computing all or part
of the matrix VT/VH.
If jobvt = 'A', all n rows of VT/VH are returned in the array vt;
if jobvt = 'S', the first min(m,n) rows of VT/VH (the right singular
vectors) are returned in the array vt;
if jobvt = 'O', the first min(m,n) rows of VT/VH) (the right singular
vectors) are overwritten on the array a;
if jobvt = 'N', no rows of VT/VH (no right singular vectors) are computed.
a Arrays:
a(size at least max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout) is an array containing the m-by-n matrix A.
1078
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lda The leading dimension of the array a.
Must be at least max(1, m) for column major layout and at least max(1, n)
for row major layout .
ldu, ldvt The leading dimensions of the output arrays u and vt, respectively.
Constraints:
ldu≥ 1; ldvt≥ 1.
If jobu = 'A', ldu≥m;
If jobu = 'S', ldu≥m for column major layout and ldu≥ min(m, n) for row
major layout;
If jobvt = 'A', ldvt≥n;
If jobvt = 'S', ldvt≥ min(m, n) for column major layout and ldvt≥n for
row major layout .
Output Parameters
a On exit,
If jobu = 'O', a is overwritten with the first min(m,n) columns of U (the
left singular vectors stored columnwise);
If jobvt = 'O', a is overwritten with the first min(m, n) rows of VT/VH (the
right singular vectors stored rowwise);
If jobu≠'O' and jobvt≠'O', the contents of a are destroyed.
u, vt Arrays:
Array u minimum size:
1079
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If jobvt = 'S', vt contains the first min(m, n) rows of VT/VH (the right
singular vectors stored row-wise).
If jobvt = 'N'or 'O', vt is not referenced.
superb If ?bdsqr does not converge (indicated by the return value info > 0), on
exit superb(0:min(m,n)-2) contains the unconverged superdiagonal
elements of an upper bidiagonal matrix B whose diagonal is in s (not
necessarily sorted). B satisfies A = u*B*VT (real flavors) or A = u*B*VH
(complex flavors), so it has the same singular values as A, and singular
vectors related by u and vt.
Return Values
This function returns a value info.
If info = i, then if ?bdsqr did not converge, i specifies how many superdiagonals of the intermediate
bidiagonal form B did not converge to zero (see the description of the superb parameter for details).
?gesdd
Computes the singular value decomposition of a
general rectangular matrix using a divide and conquer
method.
Syntax
lapack_int LAPACKE_sgesdd( int matrix_layout, char jobz, lapack_int m, lapack_int n,
float* a, lapack_int lda, float* s, float* u, lapack_int ldu, float* vt, lapack_int
ldvt );
lapack_int LAPACKE_dgesdd( int matrix_layout, char jobz, lapack_int m, lapack_int n,
double* a, lapack_int lda, double* s, double* u, lapack_int ldu, double* vt, lapack_int
ldvt );
lapack_int LAPACKE_cgesdd( int matrix_layout, char jobz, lapack_int m, lapack_int n,
lapack_complex_float* a, lapack_int lda, float* s, lapack_complex_float* u, lapack_int
ldu, lapack_complex_float* vt, lapack_int ldvt );
lapack_int LAPACKE_zgesdd( int matrix_layout, char jobz, lapack_int m, lapack_int n,
lapack_complex_double* a, lapack_int lda, double* s, lapack_complex_double* u,
lapack_int ldu, lapack_complex_double* vt, lapack_int ldvt );
Include Files
• mkl.h
1080
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
The routine computes the singular value decomposition (SVD) of a real/complex m-by-n matrix A, optionally
computing the left and/or right singular vectors.
If singular vectors are desired, it uses a divide-and-conquer algorithm. The SVD is written
A = U*Σ*VT for real routines,
A = U*Σ*VH for complex routines,
where Σ is an m-by-n matrix which is zero except for its min(m,n) diagonal elements, U is an m-by-m
orthogonal/unitary matrix, and V is an n-by-n orthogonal/unitary matrix. The diagonal elements of Σ are the
singular values of A; they are real and non-negative, and are returned in descending order. The first min(m,
n) columns of U and V are the left and right singular vectors of A.
Note that the routine returns vt = VT (for real flavors) or vt =VH (for complex flavors), not V.
Input Parameters
if m≥ n, the first n columns of U are overwritten in the array a and all rows
of VT or VH are returned in the array vt;
if m<n, all columns of U are returned in the array u and the first m rows of
VT or VH are overwritten in the array a;
if jobz = 'N', no columns of U or rows of VT or VH are computed.
a a(size max(1, lda*n) for column major layout and max(1, lda*m) for row
major layout) is an array containing the m-by-n matrix A.
lda The leading dimension of the array a. Must be at least max(1, m) for
column major layout and at least max(1, n) for row major layout.
ldu, ldvt The leading dimensions of the output arrays u and vt, respectively.
The minimum size of ldu is
'N' 1 1
'A' m m
1081
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
'O' 1 m
'N' 1 1
'A' n n
'O' n 1
Output Parameters
a On exit:
If jobz = 'O', then if m≥ n, a is overwritten with the first n columns of U
(the left singular vectors, stored columnwise). If m < n, a is overwritten
with the first m rows of VT (the right singular vectors, stored rowwise);
If jobz≠'O', the contents of a are destroyed.
u, vt Arrays:
Array u is of size:
'N' 1 1
1082
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Array vt is of size:
'N' 1 1
If jobz = 'A'or jobz = 'O' and m≥n, vt contains the n-by-n orthogonal/
unitary matrix VT.
If jobz = 'S', vt contains the first min(m, n) rows of VT (the right singular
vectors, stored rowwise).
If jobz = 'O' and m < n, or jobz = 'N', vt is not referenced.
Return Values
This function returns a value info.
?gejsv
Computes the singular value decomposition using a
preconditioned Jacobi SVD method.
Syntax
lapack_int LAPACKE_sgejsv (int matrix_layout, char joba, char jobu, char jobv, char
jobr, char jobt, char jobp, lapack_int m, lapack_int n, float * a, lapack_int lda, float
* sva, float * u, lapack_int ldu, float * v, lapack_int ldv, float * stat, lapack_int *
istat);
lapack_int LAPACKE_dgejsv (int matrix_layout, char joba, char jobu, char jobv, char
jobr, char jobt, char jobp, lapack_int m, lapack_int n, double * a, lapack_int lda,
double * sva, double * u, lapack_int ldu, double * v, lapack_int ldv, double * stat,
lapack_int * istat);
lapack_int LAPACKE_cgejsv (int matrix_layout, char joba, char jobu, char jobv, char
jobr, char jobt, char jobp, lapack_int m, lapack_int n, lapack_complex_float * a,
lapack_int lda, float * sva, lapack_complex_float * u, lapack_int ldu,
lapack_complex_float * v, lapack_int ldv, float * stat, lapack_int * istat);
lapack_int LAPACKE_zgejsv (int matrix_layout, char joba, char jobu, char jobv, char
jobr, char jobt, char jobp, lapack_int m, lapack_int n, lapack_complex_double * a,
lapack_int lda, double * sva, lapack_complex_double * u, lapack_int ldu,
lapack_complex_double * v, lapack_int ldv, double * stat, lapack_int * istat);
1083
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine computes the singular value decomposition (SVD) of a real/complex m-by-n matrix A, where m≥n.
The ?gejsv routine can sometimes compute tiny singular values and their singular vectors much more
accurately than other SVD routines.
The routine implements a preconditioned Jacobi SVD algorithm. It uses ?geqp3, ?geqrf, and ?gelqf as
preprocessors and preconditioners. Optionally, an additional row pivoting can be used as a preprocessor,
which in some cases results in much higher accuracy. An example is matrix A with the structure A = D1 * C
* D2, where D1, D2 are arbitrarily ill-conditioned diagonal matrices and C is a well-conditioned matrix. In that
case, complete pivoting in the first QR factorizations provides accuracy dependent on the condition number
of C, and independent of D1, D2. Such higher accuracy is not completely understood theoretically, but it
works well in practice.
If A can be written as A = B*D, with well-conditioned B and some diagonal D, then the high accuracy is
guaranteed, both theoretically and in software, independent of D. For more details see [Drmac08-1],
[Drmac08-2].
The computational range for the singular values can be the full range ( UNDERFLOW,OVERFLOW ), provided
that the machine arithmetic and the BLAS and LAPACK routines called by ?gejsv are implemented to work in
that range. If that is not the case, the restriction for safe computation with the singular values in the range
of normalized IEEE numbers is that the spectral condition number kappa(A)=sigma_max(A)/sigma_min(A)
does not overflow. This code (?gejsv) is best used in this restricted range, meaning that singular values of
magnitude below ||A||_2 / slamch('O') (for single precision) or ||A||_2 / dlamch('O') (for double
precision) are returned as zeros. See jobr for details on this.
This implementation is slower than the one described in [Drmac08-1], [Drmac08-2] due to replacement of
some non-LAPACK components, and because the choice of some tuning parameters in the iterative part
(?gesvj) is left to the implementer on a particular machine.
The rank revealing QR factorization (in this code: ?geqp3) should be implemented as in [Drmac08-3].
If m is much larger than n, it is obvious that the inital QRF with column pivoting can be preprocessed by the
QRF without pivoting. That well known trick is not used in ?gejsv because in some cases heavy row
weighting can be treated with complete pivoting. The overhead in cases m much larger than n is then only
due to pivoting, but the benefits in accuracy have prevailed. You can incorporate this extra QRF step easily
and also improve data movement (matrix transpose, matrix copy, matrix transposed copy) - this
implementation of ?gejsv uses only the simplest, naive data movement.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
1084
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Product and Performance Information
Input Parameters
If joba = 'R', the procedure is similar to the 'A' option. Rank revealing
property of the initial QR factorization is used to reveal (using triangular
factor) a gap sigma_{r+1} < epsilon * sigma_r, in which case the
numerical rank is declared to be r. The SVD is computed with absolute error
bounds, but more accurately than with 'A'.
If jobu = 'F', a full set of m left singular vectors is returned in the array u.
1085
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Specifies the range for the singular values. If small positive singular values
are outside the specified range, they may be set to zero. If A is scaled so
that the largest singular value of the scaled matrix is around sqrt(big),
big = ?lamch('O'), the function can remove columns of A whose norm in
the scaled matrix is less than sqrt(?lamch('S')) (for jobr = 'R'), or
less than small = ?lamch('S')/?lamch('E').
If jobr = 'N', the function does not remove small columns of the scaled
matrix. This option assumes that BLAS and QR factorizations and triangular
solvers are implemented to work in that range. If the condition of A if
greater that big, use ?gesvj.
If jobr = 'R', restricted range for singular values of the scaled matrix A is
[sqrt(?lamch('S'), sqrt(big)], roughly as described above. This
option is recommended.
For computing the singular values in the full range [?lamch('S'),big],
use ?gesvj.
The decision is based on two values of entropy over the adjoint orbit of AT *
A (for real flavors) or AH * A (for complex flavors). See the descriptions of
stat[5] and stat[6].
If jobt = 'T', the function performs transposition if the entropy test
indicates possibly faster convergence of the Jacobi process, if A is taken as
input. If A is replaced with AT or AH, the row pivoting is included
automatically.
If jobt = 'N', the functions attempts no speculations. This option can be
used to compute only the singular values, or the full SVD (u, sigma, and v).
For only one set of singular vectors (u or v), the caller should provide both
u and v, as one of the arrays is used as workspace if the matrix A is
transposed. The implementer can easily remove this constraint and make
the code more complicated. See the descriptions of u and v.
1086
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Caution
The jobt = 'T' option is experimental and its effect might not
be the same in subsequent releases. Consider using the jobt =
'N' instead.
a, u, v Array a(size lda*n for column major layout and lda*m for row major
layout) is an array containing the m-by-n matrix A.
u is a workspace array, its size for column major layout is ldu*n for
jobu='U' or 'W' and ldu*m for jobu='F'; for row major layout its size is at
least ldu*m. When jobt = 'T' and m = n, u must be provided even though
jobu = 'N'.
v is a workspace array, its size is ldv*n. When jobt = 'T' and m = n, v
must be provided even though jobv = 'N'.
lda The leading dimension of the array a. Must be at least max(1, m) for
column major layout and at least max(1, n) for row major layout .
jobu = 'U' or 'F' or 'W', ldu≥m for column major layout; for row major
layout if jobu = 'U' or jobu = 'W'ldu≥n and if jobu = 'F'ldu≥m.
rwork rwork is an array of size at least max(7, lrwork) for real flavors and at
least max(7, lwork) for complex flavors.
Output Parameters
sva On exit:
1087
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
u On exit:
If jobu = 'U', contains the m-by-n matrix of the left singular vectors.
If jobu = 'F', contains the m-by-m matrix of the left singular vectors,
including an orthonormal basis of the orthogonal complement of the range
of A.
If jobu = 'W' and jobv = 'V', jobt = 'T', and m = n, then u is used
as workspace if the procedure replaces A with AT (for real flavors) or AH (for
complex flavors). In that case, v is computed in u as left singular vectors of
AT or AH and copied back to the v array. This 'W' option is just a reminder
to the caller that in this case u is reserved as workspace of length n*n.
v On exit:
If jobv = 'V' or 'J', contains the n-by-n matrix of the right singular
vectors.
If jobv = 'W' and jobu = 'U', jobt = 'T', and m = n, then v is used
as workspace if the procedure replaces A with AT (for real flavors) or AH (for
complex flavors). In that case, u is computed in v as right singular vectors
of AT or AH and copied back to the u array. This 'W' option is just a
reminder to the caller that in this case v is reserved as workspace of length
n*n.
If jobv = 'N', v is not referenced.
stat On exit,
stat[0] = scale = stat[1]/stat[0] is the scaling factor such that
scale*sva(1:n) are the computed singular values of A. See the
description of sva.
1088
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If full SVD is needed, the following two condition numbers are useful for the
analysis of the algorithm. They are provied for a user who is familiar with
the details of the method.
stat[3] = an estimate of the scaled condition number of the triangular
factor in the first QR factorization.
stat[4] = an estimate of the scaled condition number of the triangular
factor in the second QR factorization.
The following two parameters are computed if jobt = 'T'. They are
provided for a user who is familiar with the details of the method.
stat[5] = the entropy of AT*A :: this is the Shannon entropy of
diag(AT*A) / Trace(AT*A) taken as point in the probability simplex.
stat[6] = the entropy of A*A**t.
istat On exit,
istat[0] = the numerical rank determined after the initial QR factorization
with pivoting. See the descriptions of joba and jobr.
Return Values
This function returns a value info.
If info > 0, the function did not converge in the maximal number of sweeps. The computed values may be
inaccurate.
See Also
?geqp3
?geqrf
?gelqf
?gesvj
?lamch
?pocon
?ormlq
?gesvj
Computes the singular value decomposition of a real
matrix using Jacobi plane rotations.
1089
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
lapack_int LAPACKE_sgesvj (int matrix_layout, char joba, char jobu, char jobv,
lapack_int m, lapack_int n, float * a, lapack_int lda, float * sva, lapack_int mv, float
* v, lapack_int ldv, float * stat);
lapack_int LAPACKE_dgesvj (int matrix_layout, char joba, char jobu, char jobv,
lapack_int m, lapack_int n, double * a, lapack_int lda, double * sva, lapack_int mv,
double * v, lapack_int ldv, double * stat);
lapack_int LAPACKE_cgesvj (int matrix_layout, char joba, char jobu, char jobv,
lapack_int m, lapack_int n, lapack_complex_float * a, lapack_int lda, float * sva,
lapack_int mv, lapack_complex_float * v, lapack_int ldv, float * stat);
lapack_int LAPACKE_zgesvj (int matrix_layout, char joba, char jobu, char jobv,
lapack_int m, lapack_int n, lapack_complex_double * a, lapack_int lda, double * sva,
lapack_int mv, lapack_complex_double * v, lapack_int ldv, double * stat);
Include Files
• mkl.h
Description
The routine computes the singular value decomposition (SVD) of a real or complex m-by-n matrix A, where
m≥n.
The SVD of A is written as
A = U*Σ*VT for real flavors, or
A = U*Σ*VH for complex flavors,
where Σ is an m-by-n diagonal matrix, U is an m-by-n orthonormal matrix, and V is an n-by-n orthogonal/
unitary matrix. The diagonal elements of Σ are the singular values of A; the columns of U and V are the left
and right singular vectors of A, respectively. The matrices U and V are computed and stored in the arrays u
and v, respectively. The diagonal of Σ is computed and stored in the array sva.
The ?gesvj routine can sometimes compute tiny singular values and their singular vectors much more
accurately than other SVD routines.
The n-by-n orthogonal matrix V is obtained as a product of Jacobi plane rotations. The rotations are
implemented as fast scaled rotations of Anda and Park [AndaPark94]. In the case of underflow of the Jacobi
angle, a modified Jacobi transformation of Drmac ([Drmac08-4]) is used. Pivot strategy uses column
interchanges of de Rijk ([deRijk98]). The relative accuracy of the computed singular values and the accuracy
of the computed singular vectors (in angle metric) is as guaranteed by the theory of Demmel and Veselic
[Demmel92]. The condition number that determines the accuracy in the full rank case is essentially
where κ(.) is the spectral condition number. The best performance of this Jacobi SVD procedure is achieved if
used in an accelerated version of Drmac and Veselic [Drmac08-1], [Drmac08-2].
1090
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The computational range for the nonzero singular values is the machine number interval
( UNDERFLOW,OVERFLOW ). In extreme cases, even denormalized singular values can be computed with the
corresponding gradual loss of accurate digit.
Input Parameters
Specifies whether to compute the right singular vectors, that is, the matrix
V:
If jobv = 'V', the matrix V is computed and returned in the array v.
If jobv = 'A', the Jacobi rotations are applied to the mv-byn array v. In
other words, the right singular vector matrix V is not computed explicitly,
instead it is applied to an mv-byn matrix initially stored in the first mv rows
of V.
If jobv = 'N', the matrix V is not computed and the array v is not
referenced.
1091
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
a, v Array a(size at least lda*n for column major layout andlda*m for row
major layout) is an array containing the m-by-n matrix A.
Array v(size at least max(1, ldv*n)) contains, if jobv = 'A' the mv-by-n
matrix to be post-multiplied by Jacobi rotations.
lda The leading dimension of the array a. Must be at least max(1, m) for
column major layout and at least max(1, n) for row major layout .
stat Array size 6. If jobu = 'C', stat[0] = CTOL, where CTOL defines the
threshold for convergence. The process stops if all columns of A are
mutually orthogonal up to CTOL*EPS, where EPS = ?lamch('E'). It is
required that CTOL≥ 1 - that is, it is not allowed to force the routine to
obtain orthogonality below ε.
Output Parameters
a On exit:
If jobu = 'U' or jobu = 'C':
• if info = 0, note that the left singular vectors are 'for free' in the one-
sided Jacobi SVD algorithm. However, if only the singular values are
needed, the level of numerical orthogonality of u is not an issue and
iterations are stopped when the columns of the iterated matrix are
numerically orthogonal up to approximately m*EPS. Thus, on exit, a
contains the columns of u scaled with the corresponding singular values.
1092
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• if info > 0, the procedure ?gesvj did not converge in the given
number of iterations (sweeps).
If info > 0, the procedure ?gesvj did not converge in the given number
of iterations (sweeps) and scale*sva(1:n) may not be accurate.
v On exit:
If jobv = 'V', contains the n-by-n matrix of the right singular vectors.
If jobv = 'A', then v contains the product of the computed right singular
vector matrix and the initial matrix in the array v.
stat On exit,
stat[0] = scale is the scaling factor such that scale*sva(1:n) are the
computed singular values of A. See the description of sva.
Return Values
This function returns a value info.
If info > 0, the function did not converge in the maximal number (30) of sweeps. The output may still be
useful. See the description of stat.
1093
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?ggsvd
Computes the generalized singular value
decomposition of a pair of general rectangular
matrices (deprecated).
Syntax
lapack_int LAPACKE_sggsvd( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int n, lapack_int p, lapack_int* k, lapack_int* l, float* a,
lapack_int lda, float* b, lapack_int ldb, float* alpha, float* beta, float* u,
lapack_int ldu, float* v, lapack_int ldv, float* q, lapack_int ldq, lapack_int* iwork );
lapack_int LAPACKE_dggsvd( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int n, lapack_int p, lapack_int* k, lapack_int* l, double* a,
lapack_int lda, double* b, lapack_int ldb, double* alpha, double* beta, double* u,
lapack_int ldu, double* v, lapack_int ldv, double* q, lapack_int ldq, lapack_int*
iwork );
lapack_int LAPACKE_cggsvd( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int n, lapack_int p, lapack_int* k, lapack_int* l,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* b, lapack_int ldb,
float* alpha, float* beta, lapack_complex_float* u, lapack_int ldu,
lapack_complex_float* v, lapack_int ldv, lapack_complex_float* q, lapack_int ldq,
lapack_int* iwork );
lapack_int LAPACKE_zggsvd( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int n, lapack_int p, lapack_int* k, lapack_int* l,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* b, lapack_int ldb,
double* alpha, double* beta, lapack_complex_double* u, lapack_int ldu,
lapack_complex_double* v, lapack_int ldv, lapack_complex_double* q, lapack_int ldq,
lapack_int* iwork );
Include Files
• mkl.h
Description
This routine is deprecated; use ggsvd3.
The routine computes the generalized singular value decomposition (GSVD) of an m-by-n real/complex
matrix A and p-by-n real/complex matrix B:
U'*A*Q = D1*(0 R), V'*B*Q = D2*(0 R),
where U, V and Q are orthogonal/unitary matrices and U', V' mean transpose/conjugate transpose of U and V
respectively.
Let k+l = the effective numerical rank of the matrix (A', B')', then R is a (k+l)-by-(k+l) nonsingular upper
triangular matrix, D1 and D2 are m-by-(k+l) and p-by-(k+l) "diagonal" matrices and of the following
structures, respectively:
If m-k-l≥0,
1094
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
where
C = diag(alpha[k],..., alpha[k + l - 1])
S = diag(beta[k],...,beta[k + l - 1])
C2 + S2 = I
Nonzero element ri j (1 ≤i≤j≤k + l) of R is stored in a[(i - 1) + (n - k - l + j - 1)*lda] for column
major layout and in a[(i - 1)*lda + (n - k - l + j - 1)] for row major layout.
If m-k-l < 0,
1095
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
where
C = diag(alpha[k],..., alpha(m)),
S = diag(beta[k],...,beta[m - 1]),
C2 + S2 = I
On exit, the location of nonzero element ri j (1 ≤i≤j≤k + l) of R depends on the value of i. For i≤m this element
is stored in a[(i - 1) + (n - k - l + j - 1)*lda] for column major layout and in a[(i - 1)*lda +
(n - k - l + j - 1)] for row major layout. For m < i≤k + l it is stored in b[(i - k - 1) + (n - k -
l + j - 1)*ldb] for column major layout and in b[(i - k - 1)*ldb + (n - k - l + j - 1)] for row
major layout.
The routine computes C, S, R, and optionally the orthogonal/unitary transformation matrices U, V and Q.
In particular, if B is an n-by-n nonsingular matrix, then the GSVD of A and B implicitly gives the SVD of
A*B-1:
A*B-1 = U*(D1*D2-1)*V'.
If (A', B')' has orthonormal columns, then the GSVD of A and B is also equal to the CS decomposition of A
and B. Furthermore, the GSVD can be used to derive the solution of the eigenvalue problem:
A'**A*x = λ*B'*B*x.
1096
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
a, b Arrays:
a(size at least max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout) contains the m-by-n matrix A.
b(size at least max(1, ldb*n) for column major layout and max(1, ldb*p)
for row major layout) contains the p-by-n matrix B.
lda The leading dimension of a; at least max(1, m)for column major layout and
max(1, n) for row major layout.
ldb The leading dimension of b; at least max(1, p)for column major layout and
max(1, n) for row major layout.
Output Parameters
k, l On exit, k and l specify the dimension of the subblocks. The sum k+l is
equal to the effective numerical rank of (A', B')'.
1097
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
alpha(k+1:k+l) = C,
beta(k+1:k+l) = S,
or if m-k-l < 0,
alpha(k+1:m)= C, alpha(m+1:k+l)=0
beta(k+1:m) = S, beta(m+1:k+l) = 1
and
alpha(k+l+1:n) = 0
beta(k+l+1:n) = 0.
u, v, q Arrays:
u, size at least max(1, ldu*m).
Return Values
This function returns a value info.
If info = 1, the Jacobi-type procedure failed to converge. For further details, see subroutine tgsja.
?gesvdx
Computes the SVD and left and right singular vectors
for a matrix.
Syntax
lapack_int LAPACKE_sgesvdx (int matrix_layout, char jobu, char jobvt, char range,
lapack_int m, lapack_int n, float * a, lapack_int lda, float vl, float vu, lapack_int
il, lapack_int iu, lapack_int * ns, float * s, float * u, lapack_int ldu, float * vt,
lapack_int ldvt, lapack_int * superb);
1098
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_dgesvdx (int matrix_layout, char jobu, char jobvt, char range,
lapack_int m, lapack_int n, double * a, lapack_int lda, double vl, double vu, lapack_int
il, lapack_int iu, lapack_int *ns, double * s, double * u, lapack_int ldu, double * vt,
lapack_int ldvt, lapack_int * superb);
lapack_int LAPACKE_cgesvdx (int matrix_layout, char jobu, char jobvt, char range,
lapack_int m, lapack_int n, lapack_complex_float * a, lapack_int lda, float vl, float
vu, lapack_int il, lapack_int iu, lapack_int * ns, float * s, lapack_complex_float * u,
lapack_int ldu, lapack_complex_float * vt, lapack_int ldvt, lapack_int * superb);
lapack_int LAPACKE_zgesvdx (int matrix_layout, char jobu, char jobvt, char range,
lapack_int m, lapack_int n, lapack_complex_double * a, lapack_int lda, double vl,
double vu, lapack_int il, lapack_int iu, lapack_int * ns, double * s,
lapack_complex_double * u, lapack_int ldu, lapack_complex_double * vt, lapack_int ldvt,
lapack_int * superb);
Include Files
• mkl.h
Description
?gesvdx computes the singular value decomposition (SVD) of a real or complex m-by-n matrix A, optionally
computing the left and right singular vectors. The SVD is written
A = U * Σ * transpose(V)
where Σ is an m-by-n matrix which is zero except for its min(m,n) diagonal elements, U is an m-by-m matrix,
and V is an n-by-n matrix. The matrices U and V are orthogonal for real A, and unitary for complex A. The
diagonal elements of Σ are the singular values of A; they are real and non-negative, and are returned in
descending order. The first min(m,n) columns of U and V are the left and right singular vectors of A.
?gesvdx uses an eigenvalue problem for obtaining the SVD, which allows for the computation of a subset of
singular values and vectors. See ?bdsvdx for details.
Input Parameters
jobvt Specifies options for computing all or part of the matrix VT:
= 'V': the first min(m,n) rows of VT (the right singular vectors) or as
specified by range are returned in the array vt;
1099
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
lda≥ max(1,m).
vl vl≥0.
vu If range='V', the lower and upper bounds of the interval to be searched for
singular values. vu > vl. Not referenced if range = 'A' or 'I'.
il
iu If range='I', the indices (in ascending order) of the smallest and largest
singular values to be returned. 1 ≤il≤iu≤ min(m,n), if min(m,n) > 0. Not
referenced if range = 'A' or 'V'.
ldu The leading dimension of the array u. ldu≥ 1; if jobu = 'V', ldu≥m.
ldvt The leading dimension of the array vt. ldvt≥ 1; if jobvt = 'V', ldvt≥ns
(see above).
Output Parameters
NOTE
Make sure that ucol≥ns; if range = 'V', the exact value of ns
is not known in advance and an upper bound must be used.
1100
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
NOTE
Make sure that ldvt≥ns; if range = 'V', the exact value of
ns is not known in advance and an upper bound must be
used.
Return Values
This function returns a value info.
= 0: successful exit.
< 0: if info = -i, the i-th argument had an illegal value.
> 0: if info = i, then i eigenvectors failed to converge in ?bdsvdx/?stevx. if info = n*2 + 1, an internal error
occurred in ?bdsvdx.
?bdsvdx
Computes the SVD of a bidiagonal matrix.
Syntax
lapack_int LAPACKE_sbdsvdx (int matrix_layout, char uplo, char jobz, char range,
lapack_int n, float * d, float * e, float vl, float vu, lapack_int il, lapack_int iu,
lapack_int * ns, float * s, float * z, lapack_int ldz, lapack_int * superb);
lapack_int LAPACKE_dbdsvdx (int matrix_layout, char uplo, char jobz, char range,
lapack_int n, double * d, double * e, double vl, double vu, lapack_int il, lapack_int
iu, lapack_int * ns, double * s, double * z, lapack_int ldz, lapack_int * superb);
Include Files
• mkl.h
Description
?bdsvdx computes the singular value decomposition (SVD) of a real n-by-n (upper or lower) bidiagonal
matrix B, B = U * S * VT, where S is a diagonal matrix with non-negative diagonal elements (the singular
values of B), and U and VT are orthogonal matrices of left and right singular vectors, respectively.
Given an upper bidiagonal B with diagonal d = [d1d2 ... dn] and superdiagonal e = [e1e2 ... en - 1], ?bdsvdx
computes the singular value decompositon of B through the eigenvalues and eigenvectors of the n*2-by-n*2
tridiagonal matrix
0 d1
d1 0 e1
TGK = e1 0 d2
d2 ⋱ ⋱
⋱ ⋱
1101
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If (s,u,v) is a singular triplet of B with ||u|| = ||v|| = 1, then (±s,q), ||q|| = 1, are eigenpairs of TGK, with
u′ ± v′ v1 u1 v2 u2 ⋯ vn un
q =P* = , and P = en + 1 e1 en + 2 e2 ⋯ .
2 2
1. compute -s, -v and change signs so that the singular values (and corresponding vectors) are already in
descending order (as in ?gesvd/?gesdd) or
2. compute s, v and reorder the values (and corresponding vectors).
?bdsvdx implements (1) by calling ?stevx (bisection plus inverse iteration, to be replaced with a version of
the Multiple Relative Robust Representation algorithm. (See P. Willems and B. Lang, A framework for the
MR^3 algorithm: theory and implementation, SIAM J. Sci. Comput., 35:740-766, 2013.)
Input Parameters
d Array, size n.
vl vl≥ 0.
vu If range='V', the lower and upper bounds of the interval to be searched for
singular values. vu > vl.
il, iu If range='I', the indices (in ascending order) of the smallest and largest
singular values to be returned.
1 ≤il≤iu≤ min(m,n), if min(m,n) > 0.
1102
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
U
z=
V
If jobz = 'N', then z is not referenced.
NOTE
Make sure that at least k = ns+1 columns are supplied in
the array z; if range = 'V', the exact value of ns is not
known in advance and an upper bound must be used.
Return Values
This function returns a value info.
= 0: successful exit.
< 0: if info = -i, the i-th argument had an illegal value.
> 0:
if info = i, then i eigenvectors failed to converge in ?stevx. The indices of the eigenvectors (as returned
by ?stevx) are stored in the array iwork.
?gesvda_batch_strided
Computes the truncated SVD of a group of general m-
by-n matrices that are stored at a constant stride from
each other in a contiguous block of memory.
Syntax
void sgesvda_batch_strided(
const MKL_INT* iparm, MKL_INT* irank,
const MKL_INT* m, const MKL_INT* n,
float* a, const MKL_INT* lda, const MKL_INT* stride_a,
float* s, const MKL_INT* stride_s,
1103
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
mkl.h
Description
The ?gesvda_batch_strided routines compute the truncated SVD for a group of general m-by-n matrices.
All matrices have the same parameters (matrix size, leading dimension) and are stored at constant
stride_a from each other in a contiguous block of memory. The operation is defined as
for i = 0 … batch_size-1
Ai is a matrix at offset i * stride_a from A
Ai := Ui * Si*ViT
Ai := U i * Si *
end for
1104
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
where Ui and Vi are orthogonal matrices, and Si is a diagonal matrix with singular values on the diagonal.
Singular values are nonnegative and listed in decreasing order. A truncated SVD of a given mxn matrix
produces matrices with the specified number of columns, where the number of columns is defined by the
user or determined at runtime with the help of the user-defined tolerance threshold.
An approximation of each matrix can be also obtained as a product of two low-rank matrices (low-rank
product):
Ai=Pi×Qi
where Pi=Ui×Si , Qi=ViT if m≥n, and Pi=Ui , Qi=Si × ViT otherwise.
• Compute truncated SVD with the help of the input array rank where rank(i) specifies the number of
singular values and vectors to be computed in parameters Ui ,Vi and Si for each matrix Ai.
• Compute truncated SVD using a tolerance threshold. While computing SVD, singular values that are less
than the user-defined tolerance are treated as zero, and they are not computed but set to zero.
• Compute truncated SVD using the effective rank. The effective rank of A is determined by treating as zero
those singular values that are less than the user-defined tolerance threshold times the largest singular
value.
The routines can be also used for computing singular values only.
Input Parameters
1105
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
0*
Computes the truncated SVD as a product
of three matrices:
Ai=Ui×Si×ViT
1
Computes the truncated SVD as a low-
rank product:
Ai=Pi×Qi
NOTE
iparm[4]–iparm[15] are reserved for future use.
lda Specifies the leading dimension of the Ai matrices: lda ≥ max(1, m).
ldu Specifies the leading dimension of the Ui matrices: ldu ≥ max(1, m).
stride_u The stride between two consecutive Ui matrices: stride_u ≥ max(1, ldu
* m).
ldvt Specifies the leading dimension of the ViT matrices: ldvt ≥ max(1, n).
stride_vt The stride between two consecutive ViT matrices: stride_vt ≥ max(1,
ldvt * n).
tolerance Specifies the tolerance threshold for computing truncated SVD in the cases
of iparm[0]=1 and iparm[0]=2. Not used otherwise.
1106
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lwork The dimension of the array work.
If lwork = -1, a workspace query is assumed: the routine only calculates
the optimal size of the work array and returns this value as the first entry of
the work array, and no error message related to lwork is issued by xerbla. If
lwork is less than the required minimum size but is positive, the routine
internally allocates the needed memory.
Output Parameters
Ai:=Ai- Ui×Si×ViT
if iparm[2]=0, and
A_i:=Ai-Pi×Qi
otherwise.
info Array of size at least batch_size, which reports the status for each
matrix.
If info[i] = 0, the execution is successful for Ai.
1107
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
See Also
CS Computational Routines
?orcsd/?uncsd
Computes the CS decomposition of a block-partitioned
orthogonal/unitary matrix.
Syntax
lapack_int LAPACKE_sorcsd( int matrix_layout, char jobu1, char jobu2, char jobv1t, char
jobv2t, char trans, char signs, lapack_int m, lapack_int p, lapack_int q, float* x11,
lapack_int ldx11, float* x12, lapack_int ldx12, float* x21, lapack_int ldx21, float*
x22, lapack_int ldx22, float* theta, float* u1, lapack_int ldu1, float* u2, lapack_int
ldu2, float* v1t, lapack_int ldv1t, float* v2t, lapack_int ldv2t );
lapack_int LAPACKE_dorcsd( int matrix_layout, char jobu1, char jobu2, char jobv1t, char
jobv2t, char trans, char signs, lapack_int m, lapack_int p, lapack_int q, double* x11,
lapack_int ldx11, double* x12, lapack_int ldx12, double* x21, lapack_int ldx21, double*
x22, lapack_int ldx22, double* theta, double* u1, lapack_int ldu1, double* u2,
lapack_int ldu2, double* v1t, lapack_int ldv1t, double* v2t, lapack_int ldv2t );
lapack_int LAPACKE_cuncsd( int matrix_layout, char jobu1, char jobu2, char jobv1t, char
jobv2t, char trans, char signs, lapack_int m, lapack_int p, lapack_int q,
lapack_complex_float* x11, lapack_int ldx11, lapack_complex_float* x12, lapack_int
ldx12, lapack_complex_float* x21, lapack_int ldx21, lapack_complex_float* x22,
lapack_int ldx22, float* theta, lapack_complex_float* u1, lapack_int ldu1,
lapack_complex_float* u2, lapack_int ldu2, lapack_complex_float* v1t, lapack_int ldv1t,
lapack_complex_float* v2t, lapack_int ldv2t );
lapack_int LAPACKE_zuncsd( int matrix_layout, char jobu1, char jobu2, char jobv1t, char
jobv2t, char trans, char signs, lapack_int m, lapack_int p, lapack_int q,
lapack_complex_double* x11, lapack_int ldx11, lapack_complex_double* x12, lapack_int
ldx12, lapack_complex_double* x21, lapack_int ldx21, lapack_complex_double* x22,
lapack_int ldx22, double* theta, lapack_complex_double* u1, lapack_int ldu1,
lapack_complex_double* u2, lapack_int ldu2, lapack_complex_double* v1t, lapack_int
ldv1t, lapack_complex_double* v2t, lapack_int ldv2t );
Include Files
• mkl.h
Description
The routines ?orcsd/?uncsd compute the CS decomposition of an m-by-m partitioned orthogonal matrix X:
1108
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
or unitary matrix:
x11 is p-by-q. The orthogonal/unitary matrices u1, u2, v1, and v2 are p-by-p, (m-p)-by-(m-p), q-by-q, (m-q)-
by-(m-q), respectively. C and S are r-by-r nonnegative diagonal matrices satisfying C2 + S2 = I, in which r
= min(p,m-p,q,m-q).
Input Parameters
trans = 'T': x, u1, u2, v1t, v2t are stored in row-major order.
x11, x12, x21, x22 Arrays of size x11 (ldx11,q), x12 (ldx12,m - q), x21 (ldx21,q), and x22
(ldx22,m - q).
1109
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ldx11, ldx12, ldx21, ldx22 The leading dimensions of the parts of array X. ldx11≥ max(1, p), ldx12≥
max(1, p), ldx21≥ max(1, m - p), ldx22≥ max(1, m - p).
ldu1 The leading dimension of the array u1. If jobu1 = 'Y', ldu1≥ max(1,p).
ldu2 The leading dimension of the array u2. If jobu2 = 'Y', ldu2≥ max(1,m-p).
ldv1t The leading dimension of the array v1t. If jobv1t = 'Y', ldv1t≥
max(1,q).
ldv2t The leading dimension of the array v2t. If jobv2t = 'Y', ldv2t≥ max(1,m-
q).
Output Parameters
If jobv1t = 'Y', v1t contains the q-by-q orthogonal matrix v1T or unitary
matrix v1H.
Return Values
This function returns a value info.
See Also
?bbcsd
xerbla
?orcsd2by1/?uncsd2by1
Computes the CS decomposition of a block-partitioned
orthogonal/unitary matrix.
1110
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_sorcsd2by1 (int matrix_layout, char jobu1, char jobu2, char jobv1t,
lapack_int m, lapack_int p, lapack_int q, float * x11, lapack_int ldx11, float * x21,
lapack_int ldx21, float * theta, float * u1, lapack_int ldu1, float * u2, lapack_int
ldu2, float * v1t, lapack_int ldv1t);
lapack_int LAPACKE_dorcsd2by1 (int matrix_layout, char jobu1, char jobu2, char jobv1t,
lapack_int m, lapack_int p, lapack_int q, double * x11, lapack_int ldx11, double * x21,
lapack_int ldx21, double * theta, double * u1, lapack_int ldu1, double * u2, lapack_int
ldu2, double * v1t, lapack_int ldv1t);
lapack_int LAPACKE_cuncsd2by1 (int matrix_layout, char jobu1, char jobu2, char jobv1t,
lapack_int m, lapack_int p, lapack_int q, lapack_complex_float * x11, lapack_int ldx11,
lapack_complex_float * x21, lapack_int ldx21, float * theta, lapack_complex_float * u1,
lapack_int ldu1, lapack_complex_float * u2, lapack_int ldu2, lapack_complex_float *
v1t, lapack_int ldv1t);
lapack_int LAPACKE_zuncsd2by1 (int matrix_layout, char jobu1, char jobu2, char jobv1t,
lapack_int m, lapack_int p, lapack_int q, lapack_complex_double * x11, lapack_int
ldx11, lapack_complex_double * x21, lapack_int ldx21, double * theta,
lapack_complex_double * u1, lapack_int ldu1, lapack_complex_double * u2, lapack_int
ldu2, lapack_complex_double * v1t, lapack_int ldv1t);
Include Files
• mkl.h
Description
The routines ?orcsd2by1/?uncsd2by1 compute the CS decomposition of an m-by-q matrix X with
orthonormal columns that has been partitioned into a 2-by-1 block structure:
x11 is p-by-q. The orthogonal/unitary matrices u1, u2, v1, and v2 are p-by-p, (m-p)-by-(m-p), q-by-q, (m-q)-
by-(m-q), respectively. C and S are r-by-r nonnegative diagonal matrices satisfying C2 + S2 = I, in which r
= min(p,m-p,q,m-q).
1111
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
jobv1t If equal to 'Y', then v1t is computed. Otherwise, v1t is not computed.
ldu1 The leading dimension of the array u1. If jobu1 = 'Y', ldu1≥ max(1,p).
ldu2 The leading dimension of the array u2. If jobu2 = 'Y', ldu2≥ max(1,m-p).
ldv1t The leading dimension of the array v1t. If jobv1t = 'Y', ldv1t≥
max(1,q).
Output Parameters
Return Values
This function returns a value info.
= 0: successful exit
1112
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
< 0: if info = -i, the i-th argument has an illegal value
See Also
?bbcsd
xerbla
?sygv
Computes all eigenvalues and, optionally,
eigenvectors of a real generalized symmetric definite
eigenproblem.
Syntax
lapack_int LAPACKE_ssygv (int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, float* a, lapack_int lda, float* b, lapack_int ldb, float* w);
1113
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
lapack_int LAPACKE_dsygv (int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, double* a, lapack_int lda, double* b, lapack_int ldb, double* w);
Include Files
• mkl.h
Description
The routine computes all the eigenvalues, and optionally, the eigenvectors of a real generalized symmetric-
definite eigenproblem, of the form
A*x = λ*B*x, A*B*x = λ*x, or B*A*x = λ*x.
Here A and B are assumed to be symmetric and B is also positive definite.
Input Parameters
itype Must be 1 or 2 or 3.
Specifies the problem type to be solved:
if itype = 1, the problem type is A*x = lambda*B*x;
a, b Arrays:
a (size at least max(1, lda*n)) contains the upper or lower triangle of the
symmetric matrix A, as specified by uplo.
b (size at least max(1, ldb*n)) contains the upper or lower triangle of the
symmetric positive definite matrix B, as specified by uplo.
Output Parameters
1114
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
if itype = 3, ZT*inv(B)*Z = I;
If jobz = 'N', then on exit the upper triangle (if uplo = 'U') or the
lower triangle (if uplo = 'L') of A, including the diagonal, is destroyed.
Return Values
This function returns a value info.
If info = i≤n, ssyev/dsyev failed to converge, and i off-diagonal elements of an intermediate tridiagonal
did not converge to zero;
If info = n + i, for 1 ≤i≤n, then the leading minor of order i of B is not positive-definite. The factorization
of B could not be completed and no eigenvalues or eigenvectors were computed.
?hegv
Computes all eigenvalues and, optionally,
eigenvectors of a complex generalized Hermitian
positive-definite eigenproblem.
Syntax
lapack_int LAPACKE_chegv( int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, lapack_complex_float* a, lapack_int lda, lapack_complex_float* b,
lapack_int ldb, float* w );
lapack_int LAPACKE_zhegv( int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, lapack_complex_double* a, lapack_int lda, lapack_complex_double* b,
lapack_int ldb, double* w );
Include Files
• mkl.h
Description
The routine computes all the eigenvalues, and optionally, the eigenvectors of a complex generalized
Hermitian positive-definite eigenproblem, of the form
A*x = λ*B*x, A*B*x = λ*x, or B*A*x = λ*x.
Here A and B are assumed to be Hermitian and B is also positive definite.
1115
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
a, b Arrays:
a (size at least max(1, lda*n)) contains the upper or lower triangle of the
Hermitian matrix A, as specified by uplo.
b (size at least max(1, ldb*n)) contains the upper or lower triangle of the
Hermitian positive definite matrix B, as specified by uplo.
Output Parameters
if itype = 3, ZH*inv(B)*Z = I;
If jobz = 'N', then on exit the upper triangle (if uplo = 'U') or the
lower triangle (if uplo = 'L') of A, including the diagonal, is destroyed.
Return Values
This function returns a value info.
1116
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = -i, the i-th parameter had an illegal value.
If info = i≤n, cheev/zheev fails to converge, and i off-diagonal elements of an intermediate tridiagonal do
not converge to zero;
If info = n + i, for 1 ≤i≤n, then the leading minor of order i of B is not positive-definite. The factorization
of B can not be completed and no eigenvalues or eigenvectors are computed.
?sygvd
Computes all eigenvalues and, optionally,
eigenvectors of a real generalized symmetric definite
eigenproblem using a divide and conquer method.
Syntax
lapack_int LAPACKE_ssygvd (int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, float* a, lapack_int lda, float* b, lapack_int ldb, float* w);
lapack_int LAPACKE_dsygvd (int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, double* a, lapack_int lda, double* b, lapack_int ldb, double* w);
Include Files
• mkl.h
Description
The routine computes all the eigenvalues, and optionally, the eigenvectors of a real generalized symmetric-
definite eigenproblem, of the form
A*x = λ*B*x, A*B*x = λ*x, or B*A*x = λ*x .
Here A and B are assumed to be symmetric and B is also positive definite.
It uses a divide and conquer algorithm.
Input Parameters
1117
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
a, b Arrays:
a (size at least lda*n) contains the upper or lower triangle of the
symmetric matrix A, as specified by uplo.
b (size at least ldb*n) contains the upper or lower triangle of the
symmetric positive definite matrix B, as specified by uplo.
Output Parameters
if itype = 3, ZT*inv(B)*Z = I;
If jobz = 'N', then on exit the upper triangle (if uplo = 'U') or the
lower triangle (if uplo = 'L') of A, including the diagonal, is destroyed.
Return Values
This function returns a value info.
• For info≤n:
• If info = i and jobz = 'N', then the algorithm failed to converge; i off-diagonal elements of an
intermediate tridiagonal form did not converge to zero.
• If jobz = 'V', then the algorithm failed to compute an eigenvalue while working on the submatrix
lying in rows and columns info/(n+1) through mod(info,n+1).
• For info > n:
• If info = n + i, for 1 ≤i≤n, then the leading minor of order i of B is not positive-definite. The
factorization of B could not be completed and no eigenvalues or eigenvectors were computed.
?hegvd
Computes all the eigenvalues, and optionally, the
eigenvectors of a complex generalized Hermitian
positive-definite eigenproblem using a divide and
conquer method.
1118
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_chegvd( int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, lapack_complex_float* a, lapack_int lda, lapack_complex_float* b,
lapack_int ldb, float* w );
lapack_int LAPACKE_zhegvd( int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, lapack_complex_double* a, lapack_int lda, lapack_complex_double* b,
lapack_int ldb, double* w );
Include Files
• mkl.h
Description
The routine computes all the eigenvalues, and optionally, the eigenvectors of a complex generalized
Hermitian positive-definite eigenproblem, of the form
A*x = λ*B*x, A*B*x = λ*x, or B*A*x = λ*x.
Here A and B are assumed to be Hermitian and B is also positive definite.
It uses a divide and conquer algorithm.
Input Parameters
a, b Arrays:
a (size at least max(1, lda*n)) contains the upper or lower triangle of the
Hermitian matrix A, as specified by uplo.
b (size at least max(1, ldb*n)) contains the upper or lower triangle of the
Hermitian positive definite matrix B, as specified by uplo.
1119
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
if itype = 3, ZH*inv(B)*Z = I;
If jobz = 'N', then on exit the upper triangle (if uplo = 'U') or the
lower triangle (if uplo = 'L') of A, including the diagonal, is destroyed.
Return Values
This function returns a value info.
If info = i, and jobz = 'N', then the algorithm failed to converge; i off-diagonal elements of an
intermediate tridiagonal form did not converge to zero;
if info = i, and jobz = 'V', then the algorithm failed to compute an eigenvalue while working on the
submatrix lying in rows and columns info/(n+1) through mod(info, n+1).
If info = n + i, for 1 ≤i≤n, then the leading minor of order i of B is not positive-definite. The factorization
of B could not be completed and no eigenvalues or eigenvectors were computed.
?sygvx
Computes selected eigenvalues and, optionally,
eigenvectors of a real generalized symmetric definite
eigenproblem.
Syntax
lapack_int LAPACKE_ssygvx (int matrix_layout, lapack_int itype, char jobz, char range,
char uplo, lapack_int n, float* a, lapack_int lda, float* b, lapack_int ldb, float vl,
float vu, lapack_int il, lapack_int iu, float abstol, lapack_int* m, float* w, float* z,
lapack_int ldz, lapack_int* ifail);
lapack_int LAPACKE_dsygvx (int matrix_layout, lapack_int itype, char jobz, char range,
char uplo, lapack_int n, double* a, lapack_int lda, double* b, lapack_int ldb, double
vl, double vu, lapack_int il, lapack_int iu, double abstol, lapack_int* m, double* w,
double* z, lapack_int ldz, lapack_int* ifail);
Include Files
• mkl.h
Description
1120
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The routine computes selected eigenvalues, and optionally, the eigenvectors of a real generalized symmetric-
definite eigenproblem, of the form
A*x = λ*B*x, A*B*x = λ*x, or B*A*x = λ*x.
Here A and B are assumed to be symmetric and B is also positive definite. Eigenvalues and eigenvectors can
be selected by specifying either a range of values or a range of indices for the desired eigenvalues.
Input Parameters
a, b Arrays:
a (size at least max(1, lda*n)) contains the upper or lower triangle of the
symmetric matrix A, as specified by uplo.
b (size at least max(1, ldb*n)) contains the upper or lower triangle of the
symmetric positive definite matrix B, as specified by uplo.
vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
Constraint: vl< vu.
il, iu
1121
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint: 1 ≤il≤iu≤n, if n > 0; il=1 and iu=0
if n = 0.
abstol
ldz The leading dimension of the output array z. Constraints:
ldz≥ 1; if jobz = 'V', ldz≥ max(1, n) for column major layout and ldz≥
max(1, m) for row major layout .
Output Parameters
a On exit, the upper triangle (if uplo = 'U') or the lower triangle (if uplo =
'L') of A, including the diagonal, is overwritten.
w, z Arrays:
w, size at least max(1, n).
The first m elements of w contain the selected eigenvalues in ascending
order.
z(size at least max(1, ldz*m) for column major layout and max(1, ldz*n)
for row major layout) .
If jobz = 'V', then if info = 0, the first m columns of z contain the
orthonormal eigenvectors of the matrix A corresponding to the selected
eigenvalues, with the i-th column of z holding the eigenvector associated
with w[i - 1]. The eigenvectors are normalized as follows:
if itype = 1 or 2, ZT*B*Z = I;
if itype = 3, ZT*inv(B)*Z = I;
1122
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If jobz = 'V', then if info = 0, the first m elements of ifail are zero; if
info > 0, the ifail contains the indices of the eigenvectors that failed to
converge.
If jobz = 'N', then ifail is not referenced.
Return Values
This function returns a value info.
If info = i≤n, ssyevx/dsyevx failed to converge, and i eigenvectors failed to converge. Their indices are
stored in the array ifail;
If info = n + i, for 1 ≤i≤n, then the leading minor of order i of B is not positive-definite. The factorization
of B could not be completed and no eigenvalues or eigenvectors were computed.
Application Notes
An approximate eigenvalue is accepted as converged when it is determined to lie in an interval [a,b] of
width less than or equal to abstol+ε*max(|a|,|b|), where ε is the machine precision.
If abstol is less than or equal to zero, then ε*||T||1 is used as tolerance, where T is the tridiagonal matrix
obtained by reducing C to tridiagonal form, where C is the symmetric matrix of the standard symmetric
problem to which the generalized problem is transformed. Eigenvalues will be computed most accurately
when abstol is set to twice the underflow threshold 2*?lamch('S'), not zero.
If this routine returns with info > 0, indicating that some eigenvectors did not converge, set abstol to
2*?lamch('S').
?hegvx
Computes selected eigenvalues and, optionally,
eigenvectors of a complex generalized Hermitian
positive-definite eigenproblem.
Syntax
lapack_int LAPACKE_chegvx( int matrix_layout, lapack_int itype, char jobz, char range,
char uplo, lapack_int n, lapack_complex_float* a, lapack_int lda, lapack_complex_float*
b, lapack_int ldb, float vl, float vu, lapack_int il, lapack_int iu, float abstol,
lapack_int* m, float* w, lapack_complex_float* z, lapack_int ldz, lapack_int* ifail );
lapack_int LAPACKE_zhegvx( int matrix_layout, lapack_int itype, char jobz, char range,
char uplo, lapack_int n, lapack_complex_double* a, lapack_int lda,
lapack_complex_double* b, lapack_int ldb, double vl, double vu, lapack_int il,
lapack_int iu, double abstol, lapack_int* m, double* w, lapack_complex_double* z,
lapack_int ldz, lapack_int* ifail );
Include Files
• mkl.h
Description
1123
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The routine computes selected eigenvalues, and optionally, the eigenvectors of a complex generalized
Hermitian positive-definite eigenproblem, of the form
A*x = λ*B*x, A*B*x = λ*x, or B*A*x = λ*x.
Here A and B are assumed to be Hermitian and B is also positive definite. Eigenvalues and eigenvectors can
be selected by specifying either a range of values or a range of indices for the desired eigenvalues.
Input Parameters
a, b Arrays:
a (size at least max(1, lda*n)) contains the upper or lower triangle of the
Hermitian matrix A, as specified by uplo.
b (size at least max(1, ldb*n)) contains the upper or lower triangle of the
Hermitian positive definite matrix B, as specified by uplo.
vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
Constraint: vl< vu.
il, iu
1124
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint: 1 ≤il≤iu≤n, if n > 0; il=1 and iu=0
if n = 0.
abstol The absolute error tolerance for the eigenvalues. See Application Notes for
more information.
Output Parameters
a On exit, the upper triangle (if uplo = 'U') or the lower triangle (if uplo =
'L') of A, including the diagonal, is overwritten.
z Array z(size at least max(1, ldz*m) for column major layout and max(1,
ldz*n) for row major layout).
If jobz = 'V', then if info = 0, the first m columns of z contain the
orthonormal eigenvectors of the matrix A corresponding to the selected
eigenvalues, with the i-th column of z holding the eigenvector associated
with w[i - 1]. The eigenvectors are normalized as follows:
if itype = 1 or 2, ZH*B*Z = I;
if itype = 3, ZH*inv(B)*Z = I;
1125
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If jobz = 'V', then if info = 0, the first m elements of ifail are zero; if
info > 0, the ifail contains the indices of the eigenvectors that failed to
converge.
If jobz = 'N', then ifail is not referenced.
Return Values
This function returns a value info.
If info = i≤n, cheevx/zheevx failed to converge, and i eigenvectors failed to converge. Their indices are
stored in the array ifail;
If info = n + i, for 1 ≤i≤n, then the leading minor of order i of B is not positive-definite. The factorization
of B could not be completed and no eigenvalues or eigenvectors were computed.
Application Notes
An approximate eigenvalue is accepted as converged when it is determined to lie in an interval [a,b] of width
less than or equal to abstol+ε*max(|a|,|b|), where ε is the machine precision.
If abstol is less than or equal to zero, then ε*||T||1 will be used in its place, where T is the tridiagonal
matrix obtained by reducing C to tridiagonal form, where C is the symmetric matrix of the standard
symmetric problem to which the generalized problem is transformed. Eigenvalues will be computed most
accurately when abstol is set to twice the underflow threshold 2*?lamch('S'), not zero.
If this routine returns with info > 0, indicating that some eigenvectors did not converge, try setting abstol
to 2*?lamch('S').
?spgv
Computes all eigenvalues and, optionally,
eigenvectors of a real generalized symmetric definite
eigenproblem with matrices in packed storage.
Syntax
lapack_int LAPACKE_sspgv (int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, float* ap, float* bp, float* w, float* z, lapack_int ldz);
lapack_int LAPACKE_dspgv (int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, double* ap, double* bp, double* w, double* z, lapack_int ldz);
Include Files
• mkl.h
Description
The routine computes all the eigenvalues, and optionally, the eigenvectors of a real generalized symmetric-
definite eigenproblem, of the form
A*x = λ*B*x, A*B*x = λ*x, or B*A*x = λ*x.
Here A and B are assumed to be symmetric, stored in packed format, and B is also positive definite.
1126
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
ap, bp Arrays:
ap contains the packed upper or lower triangle of the symmetric matrix A,
as specified by uplo.
The dimension of ap must be at least max(1, n*(n+1)/2).
bp contains the packed upper or lower triangle of the symmetric matrix B,
as specified by uplo.
The dimension of bp must be at least max(1, n*(n+1)/2).
ldz The leading dimension of the output array z; ldz≥ 1. If jobz = 'V', ldz≥
max(1, n).
Output Parameters
w, z Arrays:
w, size at least max(1, n).
If info = 0, contains the eigenvalues in ascending order.
if itype = 3, ZT*inv(B)*Z = I;
1127
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Return Values
This function returns a value info.
If info = i≤n, sspev/dspev failed to converge, and i off-diagonal elements of an intermediate tridiagonal
did not converge to zero;
If info = n + i, for 1 ≤i≤n, then the leading minor of order i of B is not positive-definite. The factorization
of B could not be completed and no eigenvalues or eigenvectors were computed.
?hpgv
Computes all eigenvalues and, optionally,
eigenvectors of a complex generalized Hermitian
positive-definite eigenproblem with matrices in packed
storage.
Syntax
lapack_int LAPACKE_chpgv( int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, lapack_complex_float* ap, lapack_complex_float* bp, float* w,
lapack_complex_float* z, lapack_int ldz );
lapack_int LAPACKE_zhpgv( int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, lapack_complex_double* ap, lapack_complex_double* bp, double* w,
lapack_complex_double* z, lapack_int ldz );
Include Files
• mkl.h
Description
The routine computes all the eigenvalues, and optionally, the eigenvectors of a complex generalized
Hermitian positive-definite eigenproblem, of the form
A*x = λ*B*x, A*B*x = λ*x, or B*A*x = λ*x.
Here A and B are assumed to be Hermitian, stored in packed format, and B is also positive definite.
Input Parameters
1128
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If jobz = 'V', then compute eigenvalues and eigenvectors.
ap, bp Arrays:
ap contains the packed upper or lower triangle of the Hermitian matrix A, as
specified by uplo.
The dimension of ap must be at least max(1, n*(n+1)/2).
bp contains the packed upper or lower triangle of the Hermitian matrix B,
as specified by uplo.
The dimension of bp must be at least max(1, n*(n+1)/2).
ldz The leading dimension of the output array z; ldz≥ 1. If jobz = 'V', ldz≥
max(1, n).
Output Parameters
if itype = 3, ZH*inv(B)*Z = I;
Return Values
This function returns a value info.
If info = i≤n, chpev/zhpev failed to converge, and i off-diagonal elements of an intermediate tridiagonal
did not converge to zero;
If info = n + i, for 1 ≤i≤n, then the leading minor of order i of B is not positive-definite. The factorization
of B could not be completed and no eigenvalues or eigenvectors were computed.
1129
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?spgvd
Computes all eigenvalues and, optionally,
eigenvectors of a real generalized symmetric definite
eigenproblem with matrices in packed storage using a
divide and conquer method.
Syntax
lapack_int LAPACKE_sspgvd (int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, float* ap, float* bp, float* w, float* z, lapack_int ldz);
lapack_int LAPACKE_dspgvd (int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, double* ap, double* bp, double* w, double* z, lapack_int ldz);
Include Files
• mkl.h
Description
The routine computes all the eigenvalues, and optionally, the eigenvectors of a real generalized symmetric-
definite eigenproblem, of the form
A*x = λ*B*x, A*B*x = λ*x, or B*A*x = λ*x.
Here A and B are assumed to be symmetric, stored in packed format, and B is also positive definite.
If eigenvectors are desired, it uses a divide and conquer algorithm.
Input Parameters
ap, bp Arrays:
ap contains the packed upper or lower triangle of the symmetric matrix A,
as specified by uplo.
The dimension of ap must be at least max(1, n*(n+1)/2).
1130
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
bp contains the packed upper or lower triangle of the symmetric matrix B,
as specified by uplo.
The dimension of bp must be at least max(1, n*(n+1)/2).
ldz The leading dimension of the output array z; ldz≥ 1. If jobz = 'V', ldz≥
max(1, n).
Output Parameters
w, z Arrays:
w, size at least max(1, n).
If info = 0, contains the eigenvalues in ascending order.
if itype = 3, ZT*inv(B)*Z = I;
Return Values
This function returns a value info.
If info = i≤n, sspevd/dspevd failed to converge, and i off-diagonal elements of an intermediate tridiagonal
did not converge to zero;
If info = n + i, for 1 ≤i≤n, then the leading minor of order i of B is not positive-definite. The factorization
of B could not be completed and no eigenvalues or eigenvectors were computed.
?hpgvd
Computes all eigenvalues and, optionally,
eigenvectors of a complex generalized Hermitian
positive-definite eigenproblem with matrices in packed
storage using a divide and conquer method.
Syntax
lapack_int LAPACKE_chpgvd( int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, lapack_complex_float* ap, lapack_complex_float* bp, float* w,
lapack_complex_float* z, lapack_int ldz );
lapack_int LAPACKE_zhpgvd( int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, lapack_complex_double* ap, lapack_complex_double* bp, double* w,
lapack_complex_double* z, lapack_int ldz );
1131
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine computes all the eigenvalues, and optionally, the eigenvectors of a complex generalized
Hermitian positive-definite eigenproblem, of the form
A*x = λ*B*x, A*B*x = λ*x, or B*A*x = λ*x.
Here A and B are assumed to be Hermitian, stored in packed format, and B is also positive definite.
If eigenvectors are desired, it uses a divide and conquer algorithm.
Input Parameters
ap, bp Arrays:
ap contains the packed upper or lower triangle of the Hermitian matrix A, as
specified by uplo.
The dimension of ap must be at least max(1, n*(n+1)/2).
bp contains the packed upper or lower triangle of the Hermitian matrix B,
as specified by uplo.
The dimension of bp must be at least max(1, n*(n+1)/2).
ldz The leading dimension of the output array z; ldz≥ 1. If jobz = 'V', ldz≥
max(1, n).
Output Parameters
1132
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
w Array, size at least max(1, n).
If info = 0, contains the eigenvalues in ascending order.
if itype = 3, ZH*inv(B)*Z = I;
Return Values
This function returns a value info.
If info = i≤n, chpevd/zhpevd failed to converge, and i off-diagonal elements of an intermediate tridiagonal
did not converge to zero;
If info = n + i, for 1 ≤i≤n, then the leading minor of order i of B is not positive-definite. The factorization
of B could not be completed and no eigenvalues or eigenvectors were computed.
?spgvx
Computes selected eigenvalues and, optionally,
eigenvectors of a real generalized symmetric definite
eigenproblem with matrices in packed storage.
Syntax
lapack_int LAPACKE_sspgvx (int matrix_layout, lapack_int itype, char jobz, char range,
char uplo, lapack_int n, float* ap, float* bp, float vl, float vu, lapack_int il,
lapack_int iu, float abstol, lapack_int* m, float* w, float* z, lapack_int ldz,
lapack_int* ifail);
lapack_int LAPACKE_dspgvx (int matrix_layout, lapack_int itype, char jobz, char range,
char uplo, lapack_int n, double* ap, double* bp, double vl, double vu, lapack_int il,
lapack_int iu, double abstol, lapack_int* m, double* w, double* z, lapack_int ldz,
lapack_int* ifail);
Include Files
• mkl.h
Description
The routine computes selected eigenvalues, and optionally, the eigenvectors of a real generalized symmetric-
definite eigenproblem, of the form
A*x = λ*B*x, A*B*x = λ*x, or B*A*x = λ*x.
Here A and B are assumed to be symmetric, stored in packed format, and B is also positive definite.
Eigenvalues and eigenvectors can be selected by specifying either a range of values or a range of indices for
the desired eigenvalues.
1133
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
ap, bp Arrays:
ap contains the packed upper or lower triangle of the symmetric matrix A,
as specified by uplo.
The size of ap must be at least max(1, n*(n+1)/2).
bp contains the packed upper or lower triangle of the symmetric matrix B,
as specified by uplo.
The size of bp must be at least max(1, n*(n+1)/2).
vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
Constraint: vl< vu.
il, iu If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint: 1 ≤il≤iu≤n, if n > 0; il=1 and iu=0
if n = 0.
1134
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
abstol The absolute error tolerance for the eigenvalues. See Application Notes for
more information.
Output Parameters
w, z Arrays:
w, size at least max(1, n).
If info = 0, contains the eigenvalues in ascending order.
z(size at least max(1, ldz*m) for column major layout and max(1, ldz*n)
for row major layout) .
If jobz = 'V', then if info = 0, the first m columns of z contain the
orthonormal eigenvectors of the matrix A corresponding to the selected
eigenvalues, with the i-th column of z holding the eigenvector associated
with w(i). The eigenvectors are normalized as follows:
if itype = 1 or 2, ZT*B*Z = I;
if itype = 3, ZT*inv(B)*Z = I;
Return Values
This function returns a value info.
1135
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If info = i≤n, sspevx/dspevx failed to converge, and i eigenvectors failed to converge. Their indices are
stored in the array ifail;
If info = n + i, for 1 ≤i≤n, then the leading minor of order i of B is not positive-definite. The factorization
of B could not be completed and no eigenvalues or eigenvectors were computed.
Application Notes
An approximate eigenvalue is accepted as converged when it is determined to lie in an interval [a,b] of width
less than or equal to abstol+ε*max(|a|,|b|), where ε is the machine precision.
If abstol is less than or equal to zero, then ε*||T||1 is used instead, where T is the tridiagonal matrix
obtained by reducing A to tridiagonal form. Eigenvalues are computed most accurately when abstol is set to
twice the underflow threshold 2*?lamch('S'), not zero.
If this routine returns with info > 0, indicating that some eigenvectors did not converge, set abstol to
2*?lamch('S').
?hpgvx
Computes selected eigenvalues and, optionally,
eigenvectors of a generalized Hermitian positive-
definite eigenproblem with matrices in packed
storage.
Syntax
lapack_int LAPACKE_chpgvx( int matrix_layout, lapack_int itype, char jobz, char range,
char uplo, lapack_int n, lapack_complex_float* ap, lapack_complex_float* bp, float vl,
float vu, lapack_int il, lapack_int iu, float abstol, lapack_int* m, float* w,
lapack_complex_float* z, lapack_int ldz, lapack_int* ifail );
lapack_int LAPACKE_zhpgvx( int matrix_layout, lapack_int itype, char jobz, char range,
char uplo, lapack_int n, lapack_complex_double* ap, lapack_complex_double* bp, double
vl, double vu, lapack_int il, lapack_int iu, double abstol, lapack_int* m, double* w,
lapack_complex_double* z, lapack_int ldz, lapack_int* ifail );
Include Files
• mkl.h
Description
The routine computes selected eigenvalues, and optionally, the eigenvectors of a complex generalized
Hermitian positive-definite eigenproblem, of the form
A*x = λ*B*x, A*B*x = λ*x, or B*A*x = λ*x.
Here A and B are assumed to be Hermitian, stored in packed format, and B is also positive definite.
Eigenvalues and eigenvectors can be selected by specifying either a range of values or a range of indices for
the desired eigenvalues.
Input Parameters
1136
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
itype Must be 1 or 2 or 3. Specifies the problem type to be solved:
if itype = 1, the problem type is A*x = lambda*B*x;
ap, bp Arrays:
ap contains the packed upper or lower triangle of the Hermitian matrix A, as
specified by uplo.
The dimension of ap must be at least max(1, n*(n+1)/2).
bp contains the packed upper or lower triangle of the Hermitian matrix B,
as specified by uplo.
The dimension of bp must be at least max(1, n*(n+1)/2).
vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
Constraint: vl< vu.
il, iu If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint: 1 ≤il≤iu≤n, if n > 0; il=1 and iu=0
if n = 0.
1137
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ldz The leading dimension of the output array z; ldz≥ 1. If jobz = 'V', ldz≥
max(1, n) for column major layout and ldz≥ max(1, m) for row major
layout.
Output Parameters
z Array z(size at least max(1, ldz*m) for column major layout and max(1,
ldz*n) for row major layout).
If jobz = 'V', then if info = 0, the first m columns of z contain the
orthonormal eigenvectors of the matrix A corresponding to the selected
eigenvalues, with the i-th column of z holding the eigenvector associated
with w(i). The eigenvectors are normalized as follows:
if itype = 1 or 2, ZH*B*Z = I;
if itype = 3, ZH*inv(B)*Z = I;
Return Values
This function returns a value info.
If info = i≤n, chpevx/zhpevx failed to converge, and i eigenvectors failed to converge. Their indices are
stored in the array ifail;
1138
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = n + i, for 1 ≤i≤n, then the leading minor of order i of B is not positive-definite. The factorization
of B could not be completed and no eigenvalues or eigenvectors were computed.
Application Notes
An approximate eigenvalue is accepted as converged when it is determined to lie in an interval [a,b] of width
less than or equal to abstol+ε*max(|a|,|b|), where ε is the machine precision.
If abstol is less than or equal to zero, then ε*||T||1 is used as tolerance, where T is the tridiagonal matrix
obtained by reducing A to tridiagonal form. Eigenvalues will be computed most accurately when abstol is set
to twice the underflow threshold 2*?lamch('S'), not zero.
If this routine returns with info > 0, indicating that some eigenvectors did not converge, try setting abstol
to 2*?lamch('S').
?sbgv
Computes all eigenvalues and, optionally,
eigenvectors of a real generalized symmetric definite
eigenproblem with banded matrices.
Syntax
lapack_int LAPACKE_ssbgv (int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, float* ab, lapack_int ldab, float* bb, lapack_int ldbb,
float* w, float* z, lapack_int ldz);
lapack_int LAPACKE_dsbgv (int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, double* ab, lapack_int ldab, double* bb, lapack_int ldbb,
double* w, double* z, lapack_int ldz);
Include Files
• mkl.h
Description
The routine computes all the eigenvalues, and optionally, the eigenvectors of a real generalized symmetric-
definite banded eigenproblem, of the form A*x = λ*B*x. Here A and B are assumed to be symmetric and
banded, and B is also positive definite.
Input Parameters
1139
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ab, bb Arrays:
ab(size at least max(1, ldab*n) for column major layout and max(1,
ldab*(ka + 1)) for row major layout) is an array containing either upper or
lower triangular part of the symmetric matrix A (as specified by uplo) in
band storage format.
bb(size at least max(1, ldbb*n) for column major layout and max(1,
ldbb*(kb + 1)) for row major layout) is an array containing either upper or
lower triangular part of the symmetric matrix B (as specified by uplo) in
band storage format.
ldab The leading dimension of the array ab; must be at least ka+1 for column
major layout and at least max(1, n) for row major layout .
ldbb The leading dimension of the array bb; must be at least kb+1 for column
major layout and at least max(1, n) for row major layout.
ldz The leading dimension of the output array z; ldz≥ 1. If jobz = 'V', ldz≥
max(1, n).
Output Parameters
w, z Arrays:
w, size at least max(1, n).
If info = 0, contains the eigenvalues in ascending order.
Return Values
This function returns a value info.
if i≤n, the algorithm failed to converge, and i off-diagonal elements of an intermediate tridiagonal did not
converge to zero;
1140
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
if info = n + i, for 1 ≤i≤n, then pbstf/pbstf returned info = i and B is not positive-definite. The
factorization of B could not be completed and no eigenvalues or eigenvectors were computed.
?hbgv
Computes all eigenvalues and, optionally,
eigenvectors of a complex generalized Hermitian
positive-definite eigenproblem with banded matrices.
Syntax
lapack_int LAPACKE_chbgv( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, lapack_complex_float* ab, lapack_int ldab,
lapack_complex_float* bb, lapack_int ldbb, float* w, lapack_complex_float* z,
lapack_int ldz );
lapack_int LAPACKE_zhbgv( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, lapack_complex_double* ab, lapack_int ldab,
lapack_complex_double* bb, lapack_int ldbb, double* w, lapack_complex_double* z,
lapack_int ldz );
Include Files
• mkl.h
Description
The routine computes all the eigenvalues, and optionally, the eigenvectors of a complex generalized
Hermitian positive-definite banded eigenproblem, of the form A*x = λ*B*x. Here A and B are Hermitian and
banded matrices, and matrix B is also positive definite.
Input Parameters
ab, bb Arrays:
1141
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ab(size at least max(1, ldab*n) for column major layout and max(1,
ldab*(ka + 1)) for row major layout) is an array containing either upper or
lower triangular part of the Hermitian matrix A (as specified by uplo) in
band storage format.
bb(size at least max(1, ldbb*n) for column major layout and max(1,
ldbb*(kb + 1)) for row major layout) is an array containing either upper or
lower triangular part of the Hermitian matrix B (as specified by uplo) in
band storage format.
ldab The leading dimension of the array ab; must be at least ka+1 for column
major layout and at least max(1, n for row major layout.
ldbb The leading dimension of the array bb; must be at least kb+1 for column
major layout and at least max(1, n for row major layout.
ldz The leading dimension of the output array z; ldz≥ 1. If jobz = 'V', ldz≥
max(1, n).
Output Parameters
Return Values
This function returns a value info.
if i≤n, the algorithm failed to converge, and i off-diagonal elements of an intermediate tridiagonal did not
converge to zero;
if info = n + i, for 1 ≤i≤n, then pbstf/pbstf returned info = i and B is not positive-definite. The
factorization of B could not be completed and no eigenvalues or eigenvectors were computed.
?sbgvd
Computes all eigenvalues and, optionally,
eigenvectors of a real generalized symmetric definite
eigenproblem with banded matrices. If eigenvectors
are desired, it uses a divide and conquer method.
1142
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_ssbgvd (int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, float* ab, lapack_int ldab, float* bb, lapack_int ldbb,
float* w, float* z, lapack_int ldz);
lapack_int LAPACKE_dsbgvd (int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, double* ab, lapack_int ldab, double* bb, lapack_int ldbb,
double* w, double* z, lapack_int ldz);
Include Files
• mkl.h
Description
The routine computes all the eigenvalues, and optionally, the eigenvectors of a real generalized symmetric-
definite banded eigenproblem, of the form A*x = λ*B*x. Here A and B are assumed to be symmetric and
banded, and B is also positive definite.
If eigenvectors are desired, it uses a divide and conquer algorithm.
Input Parameters
ab, bb Arrays:
ab(size at least max(1, ldab*n) for column major layout and max(1,
ldab*(ka + 1)) for row major layout) is an array containing either upper or
lower triangular part of the symmetric matrix A (as specified by uplo) in
band storage format.
bb(size at least max(1, ldbb*n) for column major layout and max(1,
ldbb*(kb + 1)) for row major layout) is an array containing either upper or
lower triangular part of the symmetric matrix B (as specified by uplo) in
band storage format.
ldab The leading dimension of the array ab; must be at least ka+1 for column
major layout and at least max(1, n) for row major layout.
1143
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ldbb The leading dimension of the array bb; must be at least kb+1 for column
major layout and at least max(1, n) for row major layout.
ldz The leading dimension of the output array z; ldz≥ 1. If jobz = 'V', ldz≥
max(1, n).
Output Parameters
w, z Arrays:
w, size at least max(1, n).
If info = 0, contains the eigenvalues in ascending order.
Return Values
This function returns a value info.
if i≤n, the algorithm failed to converge, and i off-diagonal elements of an intermediate tridiagonal did not
converge to zero;
if info = n + i, for 1 ≤i≤n, then pbstf/pbstf returned info = i and B is not positive-definite. The
factorization of B could not be completed and no eigenvalues or eigenvectors were computed.
?hbgvd
Computes all eigenvalues and, optionally,
eigenvectors of a complex generalized Hermitian
positive-definite eigenproblem with banded matrices.
If eigenvectors are desired, it uses a divide and
conquer method.
Syntax
lapack_int LAPACKE_chbgvd( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, lapack_complex_float* ab, lapack_int ldab,
lapack_complex_float* bb, lapack_int ldbb, float* w, lapack_complex_float* z,
lapack_int ldz );
lapack_int LAPACKE_zhbgvd( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, lapack_complex_double* ab, lapack_int ldab,
lapack_complex_double* bb, lapack_int ldbb, double* w, lapack_complex_double* z,
lapack_int ldz );
1144
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h
Description
The routine computes all the eigenvalues, and optionally, the eigenvectors of a complex generalized
Hermitian positive-definite banded eigenproblem, of the form A*x = λ*B*x. Here A and B are assumed to be
Hermitian and banded, and B is also positive definite.
If eigenvectors are desired, it uses a divide and conquer algorithm.
Input Parameters
ab, bb Arrays:
ab(size at least max(1, ldab*n) for column major layout and max(1,
ldab*(ka + 1)) for row major layout) is an array containing either upper or
lower triangular part of the Hermitian matrix A (as specified by uplo) in
band storage format.
bb(size at least max(1, ldbb*n) for column major layout and max(1,
ldbb*(kb + 1)) for row major layout) is an array containing either upper or
lower triangular part of the Hermitian matrix B (as specified by uplo) in
band storage format.
ldab The leading dimension of the array ab; must be at least ka+1.
ldbb The leading dimension of the array bb; must be at least kb+1.
ldz The leading dimension of the output array z; ldz≥ 1. If jobz = 'V', ldz≥
max(1, n).
Output Parameters
1145
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Return Values
This function returns a value info.
if i≤n, the algorithm failed to converge, and i off-diagonal elements of an intermediate tridiagonal did not
converge to zero;
if info = n + i, for 1 ≤i≤n, then pbstf/pbstf returned info = i and B is not positive-definite. The
factorization of B could not be completed and no eigenvalues or eigenvectors were computed.
?sbgvx
Computes selected eigenvalues and, optionally,
eigenvectors of a real generalized symmetric definite
eigenproblem with banded matrices.
Syntax
lapack_int LAPACKE_ssbgvx (int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_int ka, lapack_int kb, float* ab, lapack_int ldab, float* bb,
lapack_int ldbb, float* q, lapack_int ldq, float vl, float vu, lapack_int il, lapack_int
iu, float abstol, lapack_int* m, float* w, float* z, lapack_int ldz, lapack_int* ifail);
lapack_int LAPACKE_dsbgvx (int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_int ka, lapack_int kb, double* ab, lapack_int ldab, double* bb,
lapack_int ldbb, double* q, lapack_int ldq, double vl, double vu, lapack_int il,
lapack_int iu, double abstol, lapack_int* m, double* w, double* z, lapack_int ldz,
lapack_int* ifail);
Include Files
• mkl.h
Description
The routine computes selected eigenvalues, and optionally, the eigenvectors of a real generalized symmetric-
definite banded eigenproblem, of the form A*x = λ*B*x. Here A and B are assumed to be symmetric and
banded, and B is also positive definite. Eigenvalues and eigenvectors can be selected by specifying either all
eigenvalues, a range of values or a range of indices for the desired eigenvalues.
1146
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
ab, bb Arrays:
ab(size at least max(1, ldab*n) for column major layout and max(1,
ldab*(ka + 1)) for row major layout) is an array containing either upper or
lower triangular part of the symmetric matrix A (as specified by uplo) in
band storage format.
bb(size at least max(1, ldbb*n) for column major layout and max(1,
ldbb*(kb + 1)) for row major layout) is an array containing either upper or
lower triangular part of the symmetric matrix B (as specified by uplo) in
band storage format.
ldab The leading dimension of the array ab; must be at least ka+1 for column
major layout and at least max(1, n) for row major layout.
ldbb The leading dimension of the array bb; must be at least kb+1 for column
major layout and at least max(1, n) for row major layout.
vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
Constraint: vl< vu.
il, iu If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
1147
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
if n = 0.
abstol The absolute error tolerance for the eigenvalues. See Application Notes for
more information.
ldz The leading dimension of the output array z; ldz≥ 1. If jobz = 'V', ldz≥
max(1, n).
Output Parameters
w, z, q Arrays:
w, size at least max(1, n) .
If info = 0, contains the eigenvalues in ascending order.
z(size max(1, ldz*m) for column major layout and max(1, ldz*n) for row
major layout) .
If jobz = 'V', then if info = 0, z contains the matrix Z of eigenvectors,
with the i-th column of z holding the eigenvector associated with w(i). The
eigenvectors are normalized so that ZT*B*Z = I.
Return Values
This function returns a value info.
1148
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info=0, the execution is successful.
if i≤n, the algorithm failed to converge, and i off-diagonal elements of an intermediate tridiagonal did not
converge to zero;
if info = n + i, for 1 ≤i≤n, then pbstf/pbstf returned info = i and B is not positive-definite. The
factorization of B could not be completed and no eigenvalues or eigenvectors were computed.
Application Notes
An approximate eigenvalue is accepted as converged when it is determined to lie in an interval [a,b] of width
less than or equal to abstol+ε*max(|a|,|b|), where ε is the machine precision.
If abstol is less than or equal to zero, then ε*||T||1 is used as tolerance, where T is the tridiagonal matrix
obtained by reducing A to tridiagonal form. Eigenvalues will be computed most accurately when abstol is set
to twice the underflow threshold 2*?lamch('S'), not zero.
If this routine returns with info > 0, indicating that some eigenvectors did not converge, try setting abstol
to 2*?lamch('S').
?hbgvx
Computes selected eigenvalues and, optionally,
eigenvectors of a complex generalized Hermitian
positive-definite eigenproblem with banded matrices.
Syntax
lapack_int LAPACKE_chbgvx( int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_int ka, lapack_int kb, lapack_complex_float* ab, lapack_int ldab,
lapack_complex_float* bb, lapack_int ldbb, lapack_complex_float* q, lapack_int ldq,
float vl, float vu, lapack_int il, lapack_int iu, float abstol, lapack_int* m, float* w,
lapack_complex_float* z, lapack_int ldz, lapack_int* ifail );
lapack_int LAPACKE_zhbgvx( int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_int ka, lapack_int kb, lapack_complex_double* ab, lapack_int ldab,
lapack_complex_double* bb, lapack_int ldbb, lapack_complex_double* q, lapack_int ldq,
double vl, double vu, lapack_int il, lapack_int iu, double abstol, lapack_int* m,
double* w, lapack_complex_double* z, lapack_int ldz, lapack_int* ifail );
Include Files
• mkl.h
Description
The routine computes selected eigenvalues, and optionally, the eigenvectors of a complex generalized
Hermitian positive-definite banded eigenproblem, of the form A*x = λ*B*x. Here A and B are assumed to be
Hermitian and banded, and B is also positive definite. Eigenvalues and eigenvectors can be selected by
specifying either all eigenvalues, a range of values or a range of indices for the desired eigenvalues.
Input Parameters
1149
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ab, bb Arrays:
ab(size at least max(1, ldab*n) for column major layout and max(1,
ldab*(ka + 1)) for row major layout) is an array containing either upper or
lower triangular part of the Hermitian matrix A (as specified by uplo) in
band storage format.
bb(size at least max(1, ldbb*n) for column major layout and max(1,
ldbb*(kb + 1)) for row major layout) is an array containing either upper or
lower triangular part of the Hermitian matrix B (as specified by uplo) in
band storage format.
ldab The leading dimension of the array ab; must be at least ka+1 for column
major layout and at least max(1, n) for row major layout.
ldbb The leading dimension of the array bb; must be at least kb+1 for column
major layout and at least max(1, n) for row major layout.
vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
Constraint: vl< vu.
il, iu If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint: 1 ≤il≤iu≤n, if n > 0; il=1 and iu=0
if n = 0.
1150
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
abstol The absolute error tolerance for the eigenvalues. See Application Notes for
more information.
ldz The leading dimension of the output array z; ldz≥ 1. If jobz = 'V', ldz≥
max(1, n) for column major layout and at least max(1, m) for row major
layout.
ldq The leading dimension of the output array q; ldq≥ 1. If jobz = 'V', ldq≥
max(1, n).
Output Parameters
z, q Arrays:
z(size max(1, ldz*m) for column major layout and max(1, ldz*n) for row
major layout).
If jobz = 'V', then if info = 0, z contains the matrix Z of eigenvectors,
with the i-th column of z holding the eigenvector associated with w[i -
1]. The eigenvectors are normalized so that ZH*B*Z = I.
If jobz = 'N', then z is not referenced.
Return Values
This function returns a value info.
1151
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
if i≤n, the algorithm failed to converge, and i off-diagonal elements of an intermediate tridiagonal did not
converge to zero;
if info = n + i, for 1 ≤i≤n, then pbstf/pbstf returned info = i and B is not positive-definite. The
factorization of B could not be completed and no eigenvalues or eigenvectors were computed.
Application Notes
An approximate eigenvalue is accepted as converged when it is determined to lie in an interval [a,b] of width
less than or equal to abstol+ε*max(|a|,|b|), where ε is the machine precision.
If abstol is less than or equal to zero, then ε*||T||1 will be used in its place, where T is the tridiagonal
matrix obtained by reducing A to tridiagonal form. Eigenvalues will be computed most accurately when abstol
is set to twice the underflow threshold 2*?lamch('S'), not zero.
If this routine returns with info > 0, indicating that some eigenvectors did not converge, try setting abstol
to 2*?lamch('S').
gges Computes the generalized eigenvalues, Schur form, and the left and/or right Schur
vectors for a pair of nonsymmetric matrices.
ggesx Computes the generalized eigenvalues, Schur form, and, optionally, the left and/or
right matrices of Schur vectors.
ggev Computes the generalized eigenvalues, and the left and/or right generalized
eigenvectors for a pair of nonsymmetric matrices.
ggevx Computes the generalized eigenvalues, and, optionally, the left and/or right
generalized eigenvectors.
?gges
Computes the generalized eigenvalues, Schur form,
and the left and/or right Schur vectors for a pair of
nonsymmetric matrices.
Syntax
lapack_int LAPACKE_sgges( int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_S_SELECT3 select, lapack_int n, float* a, lapack_int lda, float* b, lapack_int
ldb, lapack_int* sdim, float* alphar, float* alphai, float* beta, float* vsl, lapack_int
ldvsl, float* vsr, lapack_int ldvsr );
lapack_int LAPACKE_dgges( int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_D_SELECT3 select, lapack_int n, double* a, lapack_int lda, double* b, lapack_int
ldb, lapack_int* sdim, double* alphar, double* alphai, double* beta, double* vsl,
lapack_int ldvsl, double* vsr, lapack_int ldvsr );
1152
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_cgges( int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_C_SELECT2 select, lapack_int n, lapack_complex_float* a, lapack_int lda,
lapack_complex_float* b, lapack_int ldb, lapack_int* sdim, lapack_complex_float* alpha,
lapack_complex_float* beta, lapack_complex_float* vsl, lapack_int ldvsl,
lapack_complex_float* vsr, lapack_int ldvsr );
lapack_int LAPACKE_zgges( int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_Z_SELECT2 select, lapack_int n, lapack_complex_double* a, lapack_int lda,
lapack_complex_double* b, lapack_int ldb, lapack_int* sdim, lapack_complex_double*
alpha, lapack_complex_double* beta, lapack_complex_double* vsl, lapack_int ldvsl,
lapack_complex_double* vsr, lapack_int ldvsr );
Include Files
• mkl.h
Description
The ?gges routine computes the generalized eigenvalues, the generalized real/complex Schur form (S,T),
optionally, the left and/or right matrices of Schur vectors (vsl and vsr) for a pair of n-by-n real/complex
nonsymmetric matrices (A,B). This gives the generalized Schur factorization
(A,B) = ( vsl*S *vsrH, vsl*T*vsrH )
Optionally, it also orders the eigenvalues so that a selected cluster of eigenvalues appears in the leading
diagonal blocks of the upper quasi-triangular matrix S and the upper triangular matrix T. The leading
columns of vsl and vsr then form an orthonormal/unitary basis for the corresponding left and right
eigenspaces (deflating subspaces).
If only the generalized eigenvalues are needed, use the driver ggev instead, which is faster.
A generalized eigenvalue for a pair of matrices (A,B) is a scalar w or a ratio alpha / beta = w, such that A -
w*B is singular. It is usually represented as the pair (alpha, beta), as there is a reasonable interpretation
for beta=0 or for both being zero. A pair of matrices (S,T) is in the generalized real Schur form if T is upper
triangular with non-negative diagonal and S is block upper triangular with 1-by-1 and 2-by-2 blocks. 1-by-1
blocks correspond to real generalized eigenvalues, while 2-by-2 blocks of S are "standardized" by making the
corresponding elements of T have the form:
1153
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
and the pair of corresponding 2-by-2 blocks in S and T will have a complex conjugate pair of generalized
eigenvalues. A pair of matrices (S,T) is in generalized complex Schur form if S and T are upper triangular
and, in addition, the diagonal of T are non-negative real numbers.
The ?gges routine replaces the deprecated ?gegs routine.
Input Parameters
If jobvsl = 'N', then the left Schur vectors are not computed.
If jobvsr = 'N', then the right Schur vectors are not computed.
sort Must be 'N' or 'S'. Specifies whether or not to order the eigenvalues on
the diagonal of the generalized Schur form.
1154
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If sort = 'N', then eigenvalues are not ordered.
a, b Arrays:
a (size at least max(1, lda*n)) is an array containing the n-by-n matrix A
(first of the pair of matrices).
b (size at least max(1, ldb*n)) is an array containing the n-by-n matrix B
(second of the pair of matrices).
lda The leading dimension of the array a. Must be at least max(1, n).
ldb The leading dimension of the array b. Must be at least max(1, n).
ldvsl, ldvsr The leading dimensions of the output matrices vsl and vsr, respectively.
Constraints:
ldvsl≥ 1. If jobvsl = 'V', ldvsl≥ max(1, n).
1155
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
a On exit, this array has been overwritten by its generalized Schur form S.
b On exit, this array has been overwritten by its generalized Schur form T.
Note that for real flavors complex conjugate pairs for which select is true
for either eigenvalue count as 2.
alphar, alphai Arrays, size at least max(1, n) each. Contain values that form generalized
eigenvalues in real flavors.
See beta.
alpha Array, size at least max(1, n). Contain values that form generalized
eigenvalues in complex flavors. See beta.
If jobvsl = 'V', this array will contain the left Schur vectors.
If jobvsr = 'V', this array will contain the right Schur vectors.
Return Values
This function returns a value info.
1156
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info=0, the execution is successful.
If info = i, and
i≤n:
the QZ iteration failed. (A, B) is not in Schur form, but alphar[j], alphai[j] (for real flavors), or alpha[j] (for
complex flavors), and beta[j], j = info,..., n - 1 should be correct.
Application Notes
The quotients alphar[j]/beta[j] and alphai[j]/beta[j] may easily over- or underflow, and beta[j] may even be
zero. Thus, you should avoid simply computing the ratio. However, alphar and alphai will be always less than
and usually comparable with norm(A) in magnitude, and beta always less than and usually comparable with
norm(B).
?ggesx
Computes the generalized eigenvalues, Schur form,
and, optionally, the left and/or right matrices of Schur
vectors.
Syntax
lapack_int LAPACKE_sggesx( int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_S_SELECT3 select, char sense, lapack_int n, float* a, lapack_int lda, float* b,
lapack_int ldb, lapack_int* sdim, float* alphar, float* alphai, float* beta, float* vsl,
lapack_int ldvsl, float* vsr, lapack_int ldvsr, float* rconde, float* rcondv );
lapack_int LAPACKE_dggesx( int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_D_SELECT3 select, char sense, lapack_int n, double* a, lapack_int lda, double* b,
lapack_int ldb, lapack_int* sdim, double* alphar, double* alphai, double* beta, double*
vsl, lapack_int ldvsl, double* vsr, lapack_int ldvsr, double* rconde, double* rcondv );
lapack_int LAPACKE_cggesx( int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_C_SELECT2 select, char sense, lapack_int n, lapack_complex_float* a, lapack_int
lda, lapack_complex_float* b, lapack_int ldb, lapack_int* sdim, lapack_complex_float*
alpha, lapack_complex_float* beta, lapack_complex_float* vsl, lapack_int ldvsl,
lapack_complex_float* vsr, lapack_int ldvsr, float* rconde, float* rcondv );
lapack_int LAPACKE_zggesx( int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_Z_SELECT2 select, char sense, lapack_int n, lapack_complex_double* a, lapack_int
lda, lapack_complex_double* b, lapack_int ldb, lapack_int* sdim, lapack_complex_double*
alpha, lapack_complex_double* beta, lapack_complex_double* vsl, lapack_int ldvsl,
lapack_complex_double* vsr, lapack_int ldvsr, double* rconde, double* rcondv );
Include Files
• mkl.h
1157
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Description
The routine computes for a pair of n-by-n real/complex nonsymmetric matrices (A,B), the generalized
eigenvalues, the generalized real/complex Schur form (S,T), optionally, the left and/or right matrices of
Schur vectors (vsl and vsr). This gives the generalized Schur factorization
(A,B) = ( vsl*S *vsrH, vsl*T*vsrH )
Optionally, it also orders the eigenvalues so that a selected cluster of eigenvalues appears in the leading
diagonal blocks of the upper quasi-triangular matrix S and the upper triangular matrix T; computes a
reciprocal condition number for the average of the selected eigenvalues (rconde); and computes a reciprocal
condition number for the right and left deflating subspaces corresponding to the selected eigenvalues
(rcondv). The leading columns of vsl and vsr then form an orthonormal/unitary basis for the corresponding
left and right eigenspaces (deflating subspaces).
A generalized eigenvalue for a pair of matrices (A,B) is a scalar w or a ratio alpha / beta = w, such that A
- w*B is singular. It is usually represented as the pair (alpha, beta), as there is a reasonable interpretation
for beta=0 or for both being zero. A pair of matrices (S,T) is in generalized real Schur form if T is upper
triangular with non-negative diagonal and S is block upper triangular with 1-by-1 and 2-by-2 blocks. 1-by-1
blocks correspond to real generalized eigenvalues, while 2-by-2 blocks of S will be "standardized" by making
the corresponding elements of T have the form:
1158
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
and the pair of corresponding 2-by-2 blocks in S and T will have a complex conjugate pair of generalized
eigenvalues. A pair of matrices (S,T) is in generalized complex Schur form if S and T are upper triangular
and, in addition, the diagonal of T are non-negative real numbers.
Input Parameters
If jobvsl = 'N', then the left Schur vectors are not computed.
If jobvsr = 'N', then the right Schur vectors are not computed.
sort Must be 'N' or 'S'. Specifies whether or not to order the eigenvalues on
the diagonal of the generalized Schur form.
If sort = 'N', then eigenvalues are not ordered.
1159
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
sense Must be 'N', 'E', 'V', or 'B'. Determines which reciprocal condition
number are computed.
If sense = 'N', none are computed;
a, b Arrays:
a (size at least max(1, lda*n)) is an array containing the n-by-n matrix A
(first of the pair of matrices).
b (size at least max(1, ldb*n)) is an array containing the n-by-n matrix B
(second of the pair of matrices).
ldvsl, ldvsr The leading dimensions of the output matrices vsl and vsr, respectively.
Constraints:
ldvsl≥ 1. If jobvsl = 'V', ldvsl≥ max(1, n).
ldvsr≥ 1. If jobvsr = 'V', ldvsr≥ max(1, n).
Output Parameters
a On exit, this array has been overwritten by its generalized Schur form S.
b On exit, this array has been overwritten by its generalized Schur form T.
Note that for real flavors complex conjugate pairs for which select is true
for either eigenvalue count as 2.
alphar, alphai Arrays, size at least max(1, n) each. Contain values that form generalized
eigenvalues in real flavors.
See beta.
1160
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
alpha Array, size at least max(1, n). Contain values that form generalized
eigenvalues in complex flavors. See beta.
If jobvsl = 'V', this array will contain the left Schur vectors.
If jobvsr = 'V', this array will contain the right Schur vectors.
Return Values
This function returns a value info.
If info = i, and
i≤n:
the QZ iteration failed. (A, B) is not in Schur form, but alphar[j], alphai[j] (for real flavors), or alpha[j] (for
complex flavors), and beta[j], j = info,..., n - 1 should be correct.
1161
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Application Notes
The quotients alphar[j]/beta[j] and alphai[j]/beta[j] may easily over- or underflow, and beta[j] may even be
zero. Thus, you should avoid simply computing the ratio. However, alphar and alphai will be always less than
and usually comparable with norm(A) in magnitude, and beta always less than and usually comparable with
norm(B).
?gges3
Computes generalized Schur factorization for a pair of
matrices.
Syntax
lapack_int LAPACKE_sgges3 (int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_S_SELECT3 selctg, lapack_int n, float * a, lapack_int lda, float * b, lapack_int
ldb, lapack_int * sdim, float * alphar, float * alphai, float * beta, float * vsl,
lapack_int ldvsl, float * vsr, lapack_int ldvsr);
lapack_int LAPACKE_dgges3 (int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_D_SELECT3 selctg, lapack_int n, double * a, lapack_int lda, double * b,
lapack_int ldb, lapack_int * sdim, double * alphar, double * alphai, double * beta,
double * vsl, lapack_int ldvsl, double * vsr, lapack_int ldvsr);
lapack_int LAPACKE_cgges3 (int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_C_SELECT2 selctg, lapack_int n, lapack_complex_float * a, lapack_int lda,
lapack_complex_float * b, lapack_int ldb, lapack_int * sdim, lapack_complex_float *
alpha, lapack_complex_float * beta, lapack_complex_float * vsl, lapack_int ldvsl,
lapack_complex_float * vsr, lapack_int ldvsr);
lapack_int LAPACKE_zgges3 (int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_Z_SELECT2 selctg, lapack_int n, lapack_complex_double * a, lapack_int lda,
lapack_complex_double * b, lapack_int ldb, lapack_int * sdim, lapack_complex_double *
alpha, lapack_complex_double * beta, lapack_complex_double * vsl, lapack_int ldvsl,
lapack_complex_double * vsr, lapack_int ldvsr);
Include Files
• mkl.h
Description
For a pair of n-by-n real or complex nonsymmetric matrices (A,B), ?gges3 computes the generalized
eigenvalues, the generalized real or complex Schur form (S,T), and optionally the left or right matrices of
Schur vectors (VSL and VSR). This gives the generalized Schur factorization
(A,B) = ( (VSL)*S*(VSR)T, (VSL)*T*(VSR)T ) for real (A,B)
or
(A,B) = ( (VSL)*S*(VSR)H, (VSL)*T*(VSR)H ) for complex (A,B)
where (VSR)H is the conjugate-transpose of VSR.
1162
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Optionally, it also orders the eigenvalues so that a selected cluster of eigenvalues appears in the leading
diagonal blocks of the upper quasi-triangular matrix S and the upper triangular matrix T. The leading
columns of VSL and VSR then form an orthonormal basis for the corresponding left and right eigenspaces
(deflating subspaces).
NOTE
If only the generalized eigenvalues are needed, use the driver ?ggev instead, which is faster.
A generalized eigenvalue for a pair of matrices (A,B) is a scalar w or a ratio alpha/beta = w, such that A -
w*B is singular. It is usually represented as the pair (alpha,beta), as there is a reasonable interpretation for
beta=0 or both being zero.
For real flavors:
A pair of matrices (S,T) is in generalized real Schur form if T is upper triangular with non-negative diagonal
and S is block upper triangular with 1-by-1 and 2-by-2 blocks. 1-by-1 blocks correspond to real generalized
eigenvalues, while 2-by-2 blocks of S will be "standardized" by making the corresponding elements of T have
the form:
a 0
0 b
and the pair of corresponding 2-by-2 blocks in S and T have a complex conjugate pair of generalized
eigenvalues.
For complex flavors:
A pair of matrices (S,T) is in generalized complex Schur form if S and T are upper triangular and, in addition,
the diagonal elements of T are non-negative real numbers.
Input Parameters
sort Specifies whether or not to order the eigenvalues on the diagonal of the
generalized Schur form.
= 'N': Eigenvalues are not ordered;
= 'S': Eigenvalues are ordered (see selctg).
selctg selctg is a function of three arguments for real flavors or two arguments
for complex flavors. selctg must be declared EXTERNAL in the calling
subroutine. If sort = 'N', selctg is not referenced. If sort = 'S', selctg
is used to select eigenvalues to sort to the top left of the Schur form.
For real flavors:
An eigenvalue (alphar[j - 1] + alphai[j - 1])/beta[j - 1] is
selected if selctg(alphar[j - 1],alphai[j - 1],beta[j - 1]) is true.
In other words, if either one of a complex conjugate pair of eigenvalues is
selected, then both complex eigenvalues are selected.
1163
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ldvsl The leading dimension of the matrix VSL. ldvsl≥ 1, and if jobvsl = 'V',
ldvsl≥ n.
ldvsr The leading dimension of the matrix VSR. ldvsr≥ 1, and if jobvsr = 'V',
ldvsr≥ n.
Output Parameters
1164
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Note: the quotients alphar[j - 1]/beta[j - 1] and alphai[j -
1]/beta[j - 1] can easily over- or underflow, and beta[j - 1]
might even be zero. Thus, you should avoid computing the ratio
alpha/beta by simply dividing alpha by beta. However, alphar and
alphai is always less than and usually comparable with norm(a) in
magnitude, and beta is always less than and usually comparable with
norm(b).
If jobvsl = 'V', vsl contains the left Schur vectors. Not referenced if
jobvsl = 'N'.
If jobvsr = 'V', vsr contains the right Schur vectors. Not referenced
if jobvsr = 'N'.
Return Values
This function returns a value info.
= 0: successful exit < 0: if info = -i, the i-th argument had an illegal value.
=1,...,n:
The QZ iteration failed. (a,b) are not in Schur form, but alpha[j] and beta[j] should be correct for
j=info,...,n - 1.
=n+2: after reordering, roundoff changed values of some complex eigenvalues so that leading eigenvalues in
the Generalized Schur form no longer satisfy selctg≠ 0 This could also be caused due to scaling.
1165
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?ggev
Computes the generalized eigenvalues, and the left
and/or right generalized eigenvectors for a pair of
nonsymmetric matrices.
Syntax
lapack_int LAPACKE_sggev( int matrix_layout, char jobvl, char jobvr, lapack_int n,
float* a, lapack_int lda, float* b, lapack_int ldb, float* alphar, float* alphai, float*
beta, float* vl, lapack_int ldvl, float* vr, lapack_int ldvr );
lapack_int LAPACKE_dggev( int matrix_layout, char jobvl, char jobvr, lapack_int n,
double* a, lapack_int lda, double* b, lapack_int ldb, double* alphar, double* alphai,
double* beta, double* vl, lapack_int ldvl, double* vr, lapack_int ldvr );
lapack_int LAPACKE_cggev( int matrix_layout, char jobvl, char jobvr, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* b, lapack_int ldb,
lapack_complex_float* alpha, lapack_complex_float* beta, lapack_complex_float* vl,
lapack_int ldvl, lapack_complex_float* vr, lapack_int ldvr );
lapack_int LAPACKE_zggev( int matrix_layout, char jobvl, char jobvr, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* b, lapack_int ldb,
lapack_complex_double* alpha, lapack_complex_double* beta, lapack_complex_double* vl,
lapack_int ldvl, lapack_complex_double* vr, lapack_int ldvr );
Include Files
• mkl.h
Description
The ?ggev routine computes the generalized eigenvalues, and optionally, the left and/or right generalized
eigenvectors for a pair of n-by-n real/complex nonsymmetric matrices (A,B).
A generalized eigenvalue for a pair of matrices (A,B) is a scalar λ or a ratio alpha / beta = λ, such that A -
λ*B is singular. It is usually represented as the pair (alpha, beta), as there is a reasonable interpretation for
beta =0 and even for both being zero.
The right generalized eigenvector v(j) corresponding to the generalized eigenvalue λ(j) of (A,B) satisfies
A*v(j) = λ(j)*B*v(j).
The left generalized eigenvector u(j) corresponding to the generalized eigenvalue λ(j) of (A,B) satisfies
u(j)H*A = λ(j)*u(j)H*B
where u(j)H denotes the conjugate transpose of u(j).
Input Parameters
1166
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
jobvr Must be 'N' or 'V'.
a, b Arrays:
a (size at least max(1, lda*n)) is an array containing the n-by-n matrix A
(first of the pair of matrices).
b (size at least max(1, ldb*n)) is an array containing the n-by-n matrix B
(second of the pair of matrices).
lda The leading dimension of the array a. Must be at least max(1, n).
ldb The leading dimension of the array b. Must be at least max(1, n).
ldvl, ldvr The leading dimensions of the output matrices vl and vr, respectively.
Constraints:
ldvl≥ 1. If jobvl = 'V', ldvl≥ max(1, n).
ldvr≥ 1. If jobvr = 'V', ldvr≥ max(1, n).
Output Parameters
alphar, alphai Arrays, size at least max(1, n) each. Contain values that form generalized
eigenvalues in real flavors.
See beta.
alpha Array, size at least max(1, n). Contain values that form generalized
eigenvalues in complex flavors. See beta.
vl, vr Arrays:
vl (size at least max(1, ldvl*n)). Contains the matrix of left generalized
eigenvectors VL.
1167
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If jobvl = 'V', the left generalized eigenvectors uj are stored one after
another in the columns of VL, in the same order as their eigenvalues. Each
eigenvector is scaled so the largest component has abs(Re) + abs(Im) =
1.
If jobvl = 'N', vl is not referenced.
If the j-th and (j+1)-st eigenvalues form a complex conjugate pair, then for
i = sqrt(-1), the k-th components of the j-th left eigenvector ujare
vl[(k - 1) + (j - 1)*ldvl] + i*vl[(k - 1) + j*ldvl] for column
major layout and vl[(k - 1)*ldvl + (j - 1)] + i*vl[(k - 1)*ldvl
+ j] for row major layout. Similarly, the k-th components of left
eigenvector j+1 uj+1 are vl[(k - 1) + (j - 1)*ldvl] - i*vl[(k - 1)
+ j*ldvl] for column major layout and vl[(k - 1)*ldvl + (j - 1)] -
i*vl[(k - 1)*ldvl + j] for row major layout..
If the j-th and (j+1)-st eigenvalues form a complex conjugate pair, then the
k-th components of thej-th right eigenvector vj can be computed as vr[(k
- 1) + (j - 1)*ldvr] + i*vr[(k - 1) + j*ldvr] for column major
layout and vr[(k - 1)*ldvr + (j - 1)] + i*vr[(k - 1)*ldvr + j]
for row major layout. Similarly, the k-th components of the right
eigenvector j+1 v{j+1} can be computed as vr[(k - 1) + (j - 1)*ldvr]
- i*vr[(k - 1) + j*ldvr] for column major layout and vr[(k -
1)*ldvr + (j - 1)] - i*vr[(k - 1)*ldvr + j] for row major layout..
For complex flavors:
The k-th component of the j-th right eigenvector vj is stored in vr[(k - 1)
+ (j - 1)*ldvr] for column major layout and in vr[(k - 1)*ldvr + (j
- 1)] for row major layout.
1168
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Return Values
This function returns a value info.
If info = i, and
i≤n: the QZ iteration failed. No eigenvectors have been calculated, but alphar[j], alphai[j] (for real flavors),
or alpha[j] (for complex flavors), and beta[j], j=info,..., n - 1 should be correct.
Application Notes
The quotients alphar[j]/beta[j] and alphai[j]/beta[j] may easily over- or underflow, and beta[j] may even be
zero. Thus, you should avoid simply computing the ratio. However, alphar and alphai (for real flavors) or
alpha (for complex flavors) will be always less than and usually comparable with norm(A) in magnitude, and
beta always less than and usually comparable with norm(B).
?ggevx
Computes the generalized eigenvalues, and,
optionally, the left and/or right generalized
eigenvectors.
Syntax
lapack_int LAPACKE_sggevx( int matrix_layout, char balanc, char jobvl, char jobvr, char
sense, lapack_int n, float* a, lapack_int lda, float* b, lapack_int ldb, float* alphar,
float* alphai, float* beta, float* vl, lapack_int ldvl, float* vr, lapack_int ldvr,
lapack_int* ilo, lapack_int* ihi, float* lscale, float* rscale, float* abnrm, float*
bbnrm, float* rconde, float* rcondv );
lapack_int LAPACKE_dggevx( int matrix_layout, char balanc, char jobvl, char jobvr, char
sense, lapack_int n, double* a, lapack_int lda, double* b, lapack_int ldb, double*
alphar, double* alphai, double* beta, double* vl, lapack_int ldvl, double* vr,
lapack_int ldvr, lapack_int* ilo, lapack_int* ihi, double* lscale, double* rscale,
double* abnrm, double* bbnrm, double* rconde, double* rcondv );
lapack_int LAPACKE_cggevx( int matrix_layout, char balanc, char jobvl, char jobvr, char
sense, lapack_int n, lapack_complex_float* a, lapack_int lda, lapack_complex_float* b,
lapack_int ldb, lapack_complex_float* alpha, lapack_complex_float* beta,
lapack_complex_float* vl, lapack_int ldvl, lapack_complex_float* vr, lapack_int ldvr,
lapack_int* ilo, lapack_int* ihi, float* lscale, float* rscale, float* abnrm, float*
bbnrm, float* rconde, float* rcondv );
lapack_int LAPACKE_zggevx( int matrix_layout, char balanc, char jobvl, char jobvr, char
sense, lapack_int n, lapack_complex_double* a, lapack_int lda, lapack_complex_double*
b, lapack_int ldb, lapack_complex_double* alpha, lapack_complex_double* beta,
lapack_complex_double* vl, lapack_int ldvl, lapack_complex_double* vr, lapack_int ldvr,
lapack_int* ilo, lapack_int* ihi, double* lscale, double* rscale, double* abnrm,
double* bbnrm, double* rconde, double* rcondv );
1169
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine computes for a pair of n-by-n real/complex nonsymmetric matrices (A,B), the generalized
eigenvalues, and optionally, the left and/or right generalized eigenvectors.
Optionally also, it computes a balancing transformation to improve the conditioning of the eigenvalues and
eigenvectors (ilo, ihi, lscale, rscale, abnrm, and bbnrm), reciprocal condition numbers for the eigenvalues
(rconde), and reciprocal condition numbers for the right eigenvectors (rcondv).
A generalized eigenvalue for a pair of matrices (A,B) is a scalar λ or a ratio alpha / beta = λ, such that A -
λ*B is singular. It is usually represented as the pair (alpha, beta), as there is a reasonable interpretation for
beta=0 and even for both being zero. The right generalized eigenvector v(j) corresponding to the
generalized eigenvalue λ(j) of (A,B) satisfies
A*v(j) = λ(j)*B*v(j).
The left generalized eigenvector u(j) corresponding to the generalized eigenvalue λ(j) of (A,B) satisfies
u(j)H*A = λ(j)*u(j)H*B
where u(j)H denotes the conjugate transpose of u(j).
Input Parameters
balanc Must be 'N', 'P', 'S', or 'B'. Specifies the balance option to be
performed.
If balanc = 'N', do not diagonally scale or permute;
sense Must be 'N', 'E', 'V', or 'B'. Determines which reciprocal condition
number are computed.
If sense = 'N', none are computed;
1170
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If sense = 'E', computed for eigenvalues only;
a, b Arrays:
a (size at least max(1, lda*n)) is an array containing the n-by-n matrix A
(first of the pair of matrices).
b (size at least max(1, ldb*n)) is an array containing the n-by-n matrix B
(second of the pair of matrices).
ldvl, ldvr The leading dimensions of the output matrices vl and vr, respectively.
Constraints:
ldvl≥ 1. If jobvl = 'V', ldvl≥ max(1, n).
ldvr≥ 1. If jobvr = 'V', ldvr≥ max(1, n).
Output Parameters
alphar, alphai Arrays, size at least max(1, n) each. Contain values that form generalized
eigenvalues in real flavors.
See beta.
alpha Array, size at least max(1, n). Contain values that form generalized
eigenvalues in complex flavors. See beta.
1171
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If jobvl = 'V', the left generalized eigenvectors u(j) are stored one after
another in the columns of vl, in the same order as their eigenvalues. Each
eigenvector will be scaled so the largest component have abs(Re) +
abs(Im) = 1.
If jobvl = 'N', vl is not referenced.
If jobvr = 'V', the right generalized eigenvectors v(j) are stored one after
another in the columns of vr, in the same order as their eigenvalues. Each
eigenvector will be scaled so the largest component have abs(Re) +
abs(Im) = 1.
If jobvr = 'N', vr is not referenced.
If the j-th and (j+1)-st eigenvalues form a complex conjugate pair, then
The k-th components of the j-th right eigenvector vj can be computed as
vr[(k - 1) + (j - 1)*ldvr] + i*vr[(k - 1) + j*ldvr] for column
major layout and vr[(k - 1)*ldvr + (j - 1)] + i*vr[(k - 1)*ldvr
+ j] for row major layout. Respectively, the k-th components of right
eigenvector j+1 vj + 1 can be computed as vr[(k - 1) + (j - 1)*ldvr]
- i*vr[(k - 1) + j*ldvr] for column major layout and vr[(k -
1)*ldvr + (j - 1)] - i*vr[(k - 1)*ldvr + j] for row major layout..
For complex flavors:
The k-th component of the j-th right eigenvector vj is stored in vr[(k - 1)
+ (j - 1)*ldvr] for column major layout and in vr[(k - 1)*ldvr + (j
- 1)] for row major layout.
1172
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ilo, ihi ilo and ihi are integer values such that on exit Ai j = 0 and Bi j = 0 if i >
j and j = 1,..., ilo-1 or i = ihi+1,..., n.
If balanc = 'N' or 'S', ilo = 1 and ihi = n.
Return Values
This function returns a value info.
1173
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If info = i, and
i≤n: the QZ iteration failed. No eigenvectors have been calculated, but alphar[j], alphai[j] (for real flavors),
or alpha[j] (for complex flavors), and beta[j], j=info,..., n - 1 should be correct.
Application Notes
The quotients alphar[j]/beta[j] and alphai[j]/beta[j] may easily over- or underflow, and beta[j] may even be
zero. Thus, you should avoid simply computing the ratio. However, alphar and alphai (for real flavors) or
alpha (for complex flavors) will be always less than and usually comparable with norm(A) in magnitude, and
beta always less than and usually comparable with norm(B).
?ggev3
Computes the generalized eigenvalues and the left
and right generalized eigenvectors for a pair of
matrices.
Syntax
lapack_int LAPACKE_sggev3 (int matrix_layout, char jobvl, char jobvr, lapack_int n,
float * a, lapack_int lda, float * b, lapack_int ldb, float * alphar, float * alphai,
float * beta, float * vl, lapack_int ldvl, float * vr, lapack_int ldvr);
lapack_int LAPACKE_dggev3 (int matrix_layout, char jobvl, char jobvr, lapack_int n,
double * a, lapack_int lda, double * b, lapack_int ldb, double * alphar, double *
alphai, double * beta, double * vl, lapack_int ldvl, double * vr, lapack_int ldvr);
lapack_int LAPACKE_cggev3 (int matrix_layout, char jobvl, char jobvr, lapack_int n,
lapack_complex_float * a, lapack_int lda, lapack_complex_float * b, lapack_int ldb,
lapack_complex_float * alpha, lapack_complex_float * beta, lapack_complex_float * vl,
lapack_int ldvl, lapack_complex_float * vr, lapack_int ldvr);
lapack_int LAPACKE_zggev3 (int matrix_layout, char jobvl, char jobvr, lapack_int n,
lapack_complex_double * a, lapack_int lda, lapack_complex_double * b, lapack_int ldb,
lapack_complex_double * alpha, lapack_complex_double * beta, lapack_complex_double *
vl, lapack_int ldvl, lapack_complex_double * vr, lapack_int ldvr);
Include Files
• mkl.h
Description
For a pair of n-by-n real or complex nonsymmetric matrices (A, B), ?ggev3 computes the generalized
eigenvalues, and optionally, the left and right generalized eigenvectors.
A generalized eigenvalue for a pair of matrices (A, B) is a scalar λ or a ratio alpha/beta = λ, such that A -
λ*B is singular. It is usually represented as the pair (alpha,beta), as there is a reasonable interpretation for
beta=0, and even for both being zero.
For real flavors:
The right eigenvector vj corresponding to the eigenvalue λj of (A, B) satisfies
A * vj = λj * B * vj.
The left eigenvector uj corresponding to the eigenvalue λj of (A, B) satisfies
1174
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ujH * A = λj * ujH * B
where ujH is the conjugate-transpose of uj.
For complex flavors:
The right generalized eigenvector vj corresponding to the generalized eigenvalue λj of (A, B) satisfies
A * vj = λj * B * vj.
The left generalized eigenvector uj corresponding to the generalized eigenvalues λj of (A, B) satisfies
ujH * A = λj * ujH * B
where ujH is the conjugate-transpose of uj.
Input Parameters
lda≥ max(1,n).
ldb≥ max(1,n).
Output Parameters
a On exit, a is overwritten.
b On exit, b is overwritten.
1175
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
1176
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If jobvr = 'V', the right eigenvectors vj are stored one after another
in the columns of vr, in the same order as their eigenvalues. If the j-
th eigenvalue is real, then vj = the j-th column of vr. If the j-th and (j
+ 1)-st eigenvalues form a complex conjugate pair, then the real part
of vj = the j-th column of vr and the imaginary part of vj = the (j +
1)-st column of vr.
Return Values
This function returns a value info.
= 0: successful exit
< 0: if info = -i, the i-th argument had an illegal value.
=1,...,n:
> n:
?lacgv
Conjugates a complex vector.
Syntax
lapack_int LAPACKE_clacgv (lapack_int n, lapack_complex_float* x, lapack_int incx);
lapack_int LAPACKE_zlacgv (lapack_int n, lapack_complex_double* x, lapack_int incx);
1177
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine conjugates a complex vector x of length n and increment incx (see "Vector Arguments in BLAS"
in Appendix B).
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
Output Parameters
?lacrm
Multiplies a complex matrix by a square real matrix.
Syntax
call clacrm( m, n, a, lda, b, ldb, c, ldc, rwork )
call zlacrm( m, n, a, lda, b, ldb, c, ldc, rwork )
Include Files
• mkl.h
Description
Input Parameters
m INTEGER. The number of rows of the matrix A and of the matrix C (m≥ 0).
n INTEGER. The number of columns and rows of the matrix B and the number
of columns of the matrix C
(n≥ 0).
1178
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Array, DIMENSION(lda, n). Contains the m-by-n matrix A.
ldc INTEGER. The leading dimension of the output array c, ldc≥max(1, n).
Output Parameters
?syconv
Converts a symmetric matrix given by a triangular
matrix factorization into two matrices and vice versa.
Syntax
lapack_int LAPACKE_ssyconv (int matrix_layout, char uplo, char way, lapack_int n, float
* a, lapack_int lda, const lapack_int * ipiv, float * e);
lapack_int LAPACKE_dsyconv (int matrix_layout, char uplo, char way, lapack_int n,
double* a, lapack_int lda, const lapack_int * ipiv, double * e);
lapack_int LAPACKE_csyconv (int matrix_layout, char uplo, char way, lapack_int n,
lapack_complex_float * a, lapack_int lda, const lapack_int * ipiv, lapack_complex_float
* e);
lapack_int LAPACKE_zsyconv (int matrix_layout, char uplo, char way, lapack_int n,
lapack_complex_double* a, lapack_int lda, const lapack_int * ipiv,
lapack_complex_double * e);
Include Files
• mkl.h
Description
The routine converts matrix A, which results from a triangular matrix factorization, into matrices L and D and
vice versa. The routine returns non-diagonalized elements of D and applies or reverses permutation done
with the triangular matrix factorization.
1179
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
Output Parameters
Return Values
See Also
?sytrf
?syr
Performs the symmetric rank-1 update of a complex
symmetric matrix.
Syntax
lapack_int LAPACKE_csyr (int matrix_layout, char uplo, lapack_int n,
lapack_complex_float alpha, const lapack_complex_float * x, lapack_int incx,
lapack_complex_float * a, lapack_int lda);
lapack_int LAPACKE_zsyr (int matrix_layout, char uplo, lapack_int n,
lapack_complex_double alpha, const lapack_complex_double * x, lapack_int incx,
lapack_complex_double * a, lapack_int lda);
1180
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h
Description
Input Parameters
uplo Specifies whether the upper or lower triangular part of the array a is used:
If uplo = 'U' or 'u', then the upper triangular part of the array a is used.
If uplo = 'L' or 'l', then the lower triangular part of the array a is used.
n Specifies the order of the matrix a. The value of n must be at least zero.
incx Specifies the increment for the elements of x. The value of incx must not be
zero.
a Array, size max(1, lda*n). Before entry with uplo = 'U' or 'u', the
leading n-by-n upper triangular part of the array a must contain the upper
triangular part of the symmetric matrix and the strictly lower triangular part
of a is not referenced.
Before entry with uplo = 'L' or 'l', the leading n-by-n lower triangular
part of the array a must contain the lower triangular part of the symmetric
matrix and the strictly upper triangular part of a is not referenced.
Output Parameters
a With uplo = 'U' or 'u', the upper triangular part of the array a is
overwritten by the upper triangular part of the updated matrix.
With uplo = 'L' or 'l', the lower triangular part of the array a is
overwritten by the lower triangular part of the updated matrix.
1181
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Return Values
This function returns a value info.
If info = 0, the execution is successful.
i?max1
Finds the index of the vector element whose real part
has maximum absolute value.
Syntax
MKL_INT icmax1(const MKL_INT*n, const MKL_Complex8*cx, const MKL_INT*incx)
MKL_INT izmax1(const MKL_INT*n, const MKL_Complex16*cx, const MKL_INT*incx)
Include Files
• mkl.h
Description
Given a complex vector cx, the i?max1 functions return the index of the first vector element of maximum
absolute value. These functions are based on the BLAS functions icamax/izamax, but using the absolute
value of components. They are designed for use with clacon/zlacon.
Input Parameters
Return Values
Index of the vector element of maximum absolute value.
?sum1
Forms the 1-norm of the complex vector using the
true absolute value.
Syntax
float scsum1(const MKL_INT*n, const MKL_Complex8*cx, const MKL_INT*incx)
double dzsum1(const MKL_INT*n, const MKL_Complex16*cx, const MKL_INT*incx)
Include Files
• mkl.h
1182
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
Given a complex vector cx, scsum1/dzsum1 functions take the sum of the absolute values of vector elements
and return a single/double precision result, respectively. These functions are based on scasum/dzasum from
Level 1 BLAS, but use the true absolute value and were designed for use with clacon/zlacon.
Input Parameters
incx Specifies the spacing between successive elements of cx (incx > 0).
Return Values
Sum of absolute values.
?gelq2
Computes the LQ factorization of a general
rectangular matrix using an unblocked algorithm.
Syntax
lapack_int LAPACKE_sgelq2 (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int lda, float* tau);
lapack_int LAPACKE_dgelq2 (int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int lda, double * tau);
lapack_int LAPACKE_cgelq2 (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* tau);
lapack_int LAPACKE_zgelq2 (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* tau);
Include Files
• mkl.h
Description
The routine does not form the matrix Q explicitly. Instead, Q is represented as a product of min(m, n)
elementary reflectors :
Q = H(k) ... H(2) H(1) (or Q = H(k)H ... H(2)HH(1)H for complex flavors), where k = min(m, n)
Each H(i) has the form
H(i) = I - tau*v*vT for real flavors, or
H(i) = I - tau*v*vH for complex flavors,
where tau is a real/complex scalar stored in tau(i), and v is a real/complex vector with v1:i-1 = 0 and vi =
1.
1183
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
On exit, the j-th (i+1 ≤j≤n) component of vector v (for real functions) or its conjugate (for complex functions)
is stored in a[i - 1 + lda*(j - 1)] for column major layout or in a[j - 1 + lda*(i - 1)] for row
major layout.
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
a Array, size at least max(1, lda*n) for column major and max(1, lda*m)
for row major layout. Array a contains the m-by-n matrix A.
lda The leading dimension of a; at least max(1, m) for column major layout and
max(1,n) for row major layout.
Output Parameters
Return Values
This function returns a value info.
If info = 0, the execution is successful.
?geqr2
Computes the QR factorization of a general
rectangular matrix using an unblocked algorithm.
Syntax
lapack_int LAPACKE_sgeqr2 (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int lda, float* tau);
lapack_int LAPACKE_dgeqr2 (int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int lda, double* tau);
lapack_int LAPACKE_cgeqr2 (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* tau);
lapack_int LAPACKE_zgeqr2 (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* tau);
1184
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h
Description
The routine does not form the matrix Q explicitly. Instead, Q is represented as a product of min(m, n)
elementary reflectors :
Q = H(1)*H(2)* ... *H(k), where k = min(m, n)
Each H(i) has the form
H(i) = I - tau*v*vT for real flavors, or
H(i) = I - tau*v*vH for complex flavors
where tau is a real/complex scalar stored in tau[i], and v is a real/complex vector with v1:i-1 = 0 and vi =
1.
On exit, vi+1:m is stored in a(i+1:m, i).
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
a Array, size at least max(1, lda*n) for column major and max(1, lda*m)
for row major layout. Array a contains the m-by-n matrix A.
lda The leading dimension of a; at least max(1, m) for column major layout and
max(1,n) for row major layout.
Output Parameters
Return Values
This function returns a value info.
If info = 0, the execution is successful.
1185
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?geqrt2
Computes a QR factorization of a general real or
complex matrix using the compact WY representation
of Q.
Syntax
lapack_int LAPACKE_sgeqrt2 (int matrix_layout, lapack_int m, lapack_int n, float * a,
lapack_int lda, float * t, lapack_int ldt );
lapack_int LAPACKE_dgeqrt2 (int matrix_layout, lapack_int m, lapack_int n, double * a,
lapack_int lda, double * t, lapack_int ldt );
lapack_int LAPACKE_cgeqrt2 (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float * a, lapack_int lda, lapack_complex_float * t, lapack_int ldt );
lapack_int LAPACKE_zgeqrt2 (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double * a, lapack_int lda, lapack_complex_double * t, lapack_int ldt );
Include Files
• mkl.h
Description
The strictly lower triangular matrix V contains the elementary reflectors H(i) in the ith column below the
diagonal. For example, if m=5 and n=3, the matrix V is
1186
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
where vi represents the vector that defines H(i). The vectors are returned in the lower triangular part of array
a.
NOTE
The 1s along the diagonal of V are not stored in a.
1187
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
a Array, size at least max(1, lda*n) for column major and max(1, lda*m)
for row major layout. Array a contains the m-by-n matrix A.
lda The leading dimension of a; at least max(1, m) for column major layout and
max(1,n) for row major layout.
Output Parameters
The n-by-n upper triangular factor of the block reflector. The elements on
and above the diagonal contain the block reflector T. The elements below
the diagonal are not used.
Return Values
This function returns a value info.
If info = 0, the execution is successful.
If info < 0 and info = -i, the ith argument had an illegal value.
?geqrt3
Recursively computes a QR factorization of a general
real or complex matrix using the compact WY
representation of Q.
Syntax
lapack_int LAPACKE_sgeqrt3 (int matrix_layout , lapack_int m , lapack_int n , float *
a , lapack_int lda , float * t , lapack_int ldt );
lapack_int LAPACKE_dgeqrt3 (int matrix_layout , lapack_int m , lapack_int n , double *
a , lapack_int lda , double * t , lapack_int ldt );
lapack_int LAPACKE_cgeqrt3 (int matrix_layout , lapack_int m , lapack_int n ,
lapack_complex_float * a , lapack_int lda , lapack_complex_float * t , lapack_int
ldt );
lapack_int LAPACKE_zgeqrt3 (int matrix_layout , lapack_int m , lapack_int n ,
lapack_complex_double * a , lapack_int lda , lapack_complex_double * t , lapack_int
ldt );
1188
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h
Description
The strictly lower triangular matrix V contains the elementary reflectors H(i) in the ith column below the
diagonal. For example, if m=5 and n=3, the matrix V is
where vi represents one of the vectors that define H(i). The vectors are returned in the lower part of
triangular array a.
1189
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
The 1s along the diagonal of V are not stored in a.
Input Parameters
a Array, size at least max(1, lda*n) for column major and max(1, lda*m)
for row major layout. Array a contains the m-by-n matrix A.
lda The leading dimension of a; at least max(1, m) for column major layout and
max(1,n) for row major layout.
Output Parameters
a The elements on and above the diagonal of the array contain the n-by-n
upper triangular matrix R. The elements below the diagonal are the
columns of V.
The n-by-n upper triangular factor of the block reflector. The elements on
and above the diagonal contain the block reflector T. The elements below
the diagonal are not used.
Return Values
This function returns a value info.
If info = 0, the execution is successful.
If info < 0 and info = -i, the ith argument had an illegal value.
?getf2
Computes the LU factorization of a general m-by-n
matrix using partial pivoting with row interchanges
(unblocked algorithm).
Syntax
lapack_int LAPACKE_sgetf2 (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int lda, lapack_int * ipiv);
lapack_int LAPACKE_dgetf2 (int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int lda, lapack_int * ipiv);
1190
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_cgetf2 (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_int * ipiv);
lapack_int LAPACKE_zgetf2 (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_int * ipiv);
Include Files
• mkl.h
Description
The routine computes the LU factorization of a general m-by-n matrix A using partial pivoting with row
interchanges. The factorization has the form
A = P*L*U
where p is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m >
n) and U is upper triangular (upper trapezoidal if m < n).
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
a Array, size at least max(1, lda*n) for column major and max(1, lda*m)
for row major layout. Array a contains the m-by-n matrix A.
lda The leading dimension of a; at least max(1, m) for column major layout and
max(1,n) for row major layout.
Output Parameters
The pivot indices: for 1 ≤ i ≤ n, row i was interchanged with row ipiv(i).
Return Values
This function returns a value info.
If info = -i, the i-th parameter had an illegal value.
If info = i >0, uii is 0. The factorization has been completed, but U is exactly singular. Division by 0 will
occur if you use the factor U for solving a system of linear equations.
If info = -1011, memory allocation error occurred.
?lacn2
Estimates the 1-norm of a square matrix, using
reverse communication for evaluating matrix-vector
products.
1191
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
C:
lapack_int LAPACKE_slacn2 (lapack_int n, float * v, float * x, lapack_int * isgn, float
* est, lapack_int * kase, lapack_int * isave);
lapack_int LAPACKE_clacn2 (lapack_int n, lapack_complex_float * v, lapack_complex_float
* x, float * est, lapack_int * kase, lapack_int * isave);
lapack_int LAPACKE_dlacn2 (lapack_int n, double * v, double * x, lapack_int * isgn,
double * est, lapack_int * kase, lapack_int * isave);
lapack_int LAPACKE_zlacn2 (lapack_int n, lapack_complex_double * v,
lapack_complex_double * x, double * est, lapack_int * kase, lapack_int * isave);
Include Files
• mkl.h
Description
The routine estimates the 1-norm of a square, real or complex matrix A. Reverse communication is used for
evaluating matrix-vector products.
Input Parameters
isgn Workspace array, size (n), used with real flavors only.
Output Parameters
1192
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
AT*x, if kase = 2 (for real flavors),
AH*x, if kase = 2 (for complex flavors),
and the routine must be re-called with all the other parameters unchanged.
isave This parameter is used to save variables between calls to the routine.
Return Values
This function returns a value info.
If info = 0, the execution is successful.
?lacpy
Copies all or part of one two-dimensional array to
another.
Syntax
lapack_int LAPACKE_slacpy (int matrix_layout, char uplo, lapack_int m, lapack_int n,
const float* a, lapack_int lda, float* b, lapack_int ldb);
lapack_int LAPACKE_dlacpy (int matrix_layout, char uplo, lapack_int m, lapack_int n,
const double* a, lapack_int lda, double* b, lapack_int ldb);
lapack_int LAPACKE_clacpy (int matrix_layout, char uplo, lapack_int m, lapack_int n,
const lapack_complex_float* a, lapack_int lda, lapack_complex_float* b, lapack_int
ldb);
lapack_int LAPACKE_zlacpy (int matrix_layout, char uplo, lapack_int m, lapack_int n,
const lapack_complex_double* a, lapack_int lda, lapack_complex_double* b, lapack_int
ldb);
Include Files
• mkl.h
Description
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
1193
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
a Array, size at least max(1, lda*n) for column major and max(1, lda*m)
for row major layout. A contains the m-by-n matrix A.
Output Parameters
b Array, size at least max(1, ldb*n) for column major and max(1, ldb*m)
for row major layout. Array a contains the m-by-n matrix B.
Return Values
This function returns a value info.
If info = 0, the execution is successful.
?lakf2
Forms a matrix containing Kronecker products
between the given matrices.
Syntax
void slakf2 (lapack_int *m, lapack_int *n, float *a, lapack_int *lda, float *b, float
*d, float *e, float *z, lapack_int *ldz);
void dlakf2 (lapack_int *m, lapack_int *n, double *a, lapack_int *lda, double *b, double
*d, double *e, double *z, lapack_int *ldz);
void clakf2 (lapack_int *m, lapack_int *n, lapack_complex *a, lapack_int *lda,
lapack_complex *b, lapack_complex *d, lapack_complex *e, lapack_complex *z, lapack_int
*ldz);
void zlakf2 (lapack_int *m, lapack_int *n, lapack_complex_double *a, lapack_int *lda,
lapack_complex_double *b, lapack_complex_double *d, lapack_complex_double *e,
lapack_complex_double *z, lapack_int *ldz);
Include Files
• mkl.h
Description
1194
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
where In is the identity matrix of size n and XT is the transpose of X. kron(X, Y) is the Kronecker product
between the matrices X and Y.
Input Parameters
m Size of matrix, m≥ 1
n Size of matrix, n≥ 1
Output Parameters
?lange
Returns the value of the 1-norm, Frobenius norm,
infinity-norm, or the largest absolute value of any
element of a general rectangular matrix.
Syntax
float LAPACKE_slange (int matrix_layout, char norm, lapack_int m, lapack_int n, const
float * a, lapack_int lda);
double LAPACKE_dlange (int matrix_layout, char norm, lapack_int m, lapack_int n, const
double * a, lapack_int lda);
float LAPACKE_clange (int matrix_layout, char norm, lapack_int m, lapack_int n, const
lapack_complex_float * a, lapack_int lda);
double LAPACKE_zlange (int matrix_layout, char norm, lapack_int m, lapack_int n, const
lapack_complex_double * a, lapack_int lda);
Include Files
• mkl.h
Description
The function ?lange returns the value of the 1-norm, or the Frobenius norm, or the infinity norm, or the
element of largest absolute value of a real/complex matrix A.
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
1195
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
a Array, size at least max(1, lda*n) for column major and max(1, lda*m)
for row major layout. Array a contains the m-by-n matrix A.
?lansy
Returns the value of the 1-norm, or the Frobenius
norm, or the infinity norm, or the element of largest
absolute value of a real/complex symmetric matrix.
Syntax
float LAPACKE_slansy (int matrix_layout, char norm, char uplo, lapack_int n, const
float * a, lapack_int lda);
double LAPACKE_dlansy (int matrix_layout, char norm, char uplo, lapack_int n, const
double * a, lapack_int lda);
float LAPACKE_clansy (int matrix_layout, char norm, char uplo, lapack_int n, const
lapack_complex_float * a, lapack_int lda);
double LAPACKE_zlansy (int matrix_layout, char norm, char uplo, lapack_int n, const
lapack_complex_double * a, lapack_int lda);
Include Files
• mkl.h
Description
The function ?lansy returns the value of the 1-norm, or the Frobenius norm, or the infinity norm, or the
element of largest absolute value of a real/complex symmetric matrix A.
1196
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
uplo Specifies whether the upper or lower triangular part of the symmetric
matrix A is to be referenced.
= 'U': Upper triangular part of A is referenced.
?lanhe
Returns the value of the 1-norm, or the Frobenius
norm, or the infinity norm, or the element of largest
absolute value of a complex Hermitian matrix.
Syntax
float LAPACKE_clanhe (int matrix_layout, char norm, char uplo, lapack_int n, const
lapack_complex_float * a, lapack_int lda);
double LAPACKE_zlanhe (int matrix_layout, char norm, char uplo, lapack_int n, const
lapack_complex_double * a, lapack_int lda);
Include Files
• mkl.h
Description
1197
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The function ?lanhe returns the value of the 1-norm, or the Frobenius norm, or the infinity norm, or the
element of largest absolute value of a complex Hermitian matrix A.
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
uplo Specifies whether the upper or lower triangular part of the Hermitian matrix
A is to be referenced.
= 'U': Upper triangular part of A is referenced.
?lantr
Returns the value of the 1-norm, or the Frobenius
norm, or the infinity norm, or the element of largest
absolute value of a trapezoidal or triangular matrix.
Syntax
float LAPACKE_slantr (char * norm, char * uplo, char * diag, lapack_int * m, lapack_int
* n, const float * a, lapack_int * lda, float * work);
double LAPACKE_dlantr (char * norm, char * uplo, char * diag, lapack_int * m,
lapack_int * n, const double * a, lapack_int * lda, double * work);
float LAPACKE_clantr (char * norm, char * uplo, char * diag, lapack_int * m, lapack_int
* n, const lapack_complex_float * a, lapack_int * lda, float * work);
double LAPACKE_zlantr (char * norm, char * uplo, char * diag, lapack_int * m,
lapack_int * n, const lapack_complex_double * a, lapack_int * lda, double * work);
1198
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h
Description
The function ?lantr returns the value of the 1-norm, or the Frobenius norm, or the infinity norm, or the
element of largest absolute value of a trapezoidal or triangular matrix A.
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
a Array, size at least max(1, lda*n) for column major and max(1, lda*m)
for row major layout.
The trapezoidal matrix A (A is triangular if m = n).
If uplo = 'U', the leading m-by-n upper trapezoidal part of the array a
contains the upper trapezoidal matrix, and the strictly lower triangular part
of A is not referenced.
1199
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If uplo = 'L', the leading m-by-n lower trapezoidal part of the array a
contains the lower trapezoidal matrix, and the strictly upper triangular part
of A is not referenced. Note that when diag = 'U', the diagonal elements
of A are not referenced and are assumed to be one.
LAPACKE_set_nancheck
Turns NaN checking off or on
LAPACKE_set_nancheck(int flag);
Description
The routine sets a value for the LAPACKE NaN checking flag, which indicates whether or not LAPACKE
routines check input matrices for NaNs.
Input Parameters
LAPACKE_get_nancheck
Gets the current NaN checking flag, which indicates
whether NaN checking has been turned off or on.
int flag = LAPACKE_get_nancheck ();
Description
The function returns the current value for the LAPACKE NaN checking flag, which indicates whether or not
LAPACKE routines check input matrices for NaNs.
Return Value
An integer value is returned which indicates the current NaN checking status.
The returned flag value is either 0 (OFF) or 1 (ON), even though any integer value can be used as an input
parameter for LAPACKE_set_nancheck.
LAPACKE_set_nancheck(100);
int flag = LAPACKE_get_nancheck(); // flag==1, not 100.
?lapmr
Rearranges rows of a matrix as specified by a
permutation vector.
Syntax
lapack_int LAPACKE_slapmr (int matrix_layout, lapack_logical forwrd, lapack_int m,
lapack_int n, float* x, lapack_int ldx, lapack_int * k);
lapack_int LAPACKE_dlapmr (int matrix_layout, lapack_logical forwrd, lapack_int m,
lapack_int n, double* x, lapack_int ldx, lapack_int * k);
1200
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_clapmr (int matrix_layout, lapack_logical forwrd, lapack_int m,
lapack_int n, lapack_complex_float* x, lapack_int ldx, lapack_int * k);
lapack_int LAPACKE_zlapmr (int matrix_layout, lapack_logical forwrd, lapack_int m,
lapack_int n, lapack_complex_double* x, lapack_int ldx, lapack_int * k);
Include Files
• mkl.h
Description
The ?lapmr routine rearranges the rows of the m-by-n matrix X as specified by the permutation k[0],
k[1], ... , k[m-1] of the integers 1,...,m.
If forwrd is true, forward permutation:
X(k[i-1],:) is moved to X{i,:) for i= 1,2,...,m.
If forwrd is false, backward permutation:
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
x Array, size at least max(1, ldx*n) for column major and max(1, ldx*m)
for row major layout. On entry, the m-by-n matrix X.
ldx The leading dimension of the array X, ldx≥ max(1,m)for column major
layout and ldx≥ max(1,n) for row major layout.
k Array, size (m). On entry, k contains the permutation vector and is used as
internal workspace.
Output Parameters
Return Values
This function returns a value info.
If info = 0, the execution is successful.
1201
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
See Also
?lapmt
Performs a forward or backward permutation of the
columns of a matrix.
Syntax
lapack_int LAPACKE_slapmt (int matrix_layout, lapack_logical forwrd, lapack_int m,
lapack_int n, float * x, lapack_int ldx, lapack_int * k);
lapack_int LAPACKE_dlapmt (int matrix_layout, lapack_logical forwrd, lapack_int m,
lapack_int n, double * x, lapack_int ldx, lapack_int * k);
lapack_int LAPACKE_clapmt (int matrix_layout, lapack_logical forwrd, lapack_int m,
lapack_int n, lapack_complex_float * x, lapack_int ldx, lapack_int * k);
lapack_int LAPACKE_zlapmt (int matrix_layout, lapack_logical forwrd, lapack_int m,
lapack_int n, lapack_complex_double * x, lapack_int ldx, lapack_int * k);
Include Files
• mkl.h
Description
The routine ?lapmt rearranges the columns of the m-by-n matrix X as specified by the permutation k[i -
1]for i = 1,...,n.
If forwrd≠ 0, forward permutation:
Input Parameters
k Array, size (n). On entry, k contains the permutation vector and is used as
internal workspace.
Output Parameters
1202
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
See Also
?lapmr
?lapy2
Returns sqrt(x2+y2).
Syntax
float LAPACKE_slapy2 (floatx, floaty);
double LAPACKE_dlapy2 (doublex, doubley);
Include Files
• mkl.h
Description
The function ?lapy2 returns sqrt(x2+y2), avoiding unnecessary overflow or harmful underflow.
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
Return Values
The function returns a value val.
?lapy3
Returns sqrt(x2+y2+z2).
Syntax
float LAPACKE_slapy3 (floatx, floaty, floatz);
double LAPACKE_dlapy3 (double x, doubley, doublez);
Include Files
• mkl.h
Description
The function ?lapy3 returns sqrt(x2+y2+z2), avoiding unnecessary overflow or harmful underflow.
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
1203
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Return Values
This function returns a value val.
?laran
Returns a random real number from a uniform
distribution.
Syntax
float slaran (lapack_int *iseed);
double dlaran (lapack_int *iseed);
Description
The ?laran routine returns a random real number from a uniform (0,1) distribution. This routine uses a
multiplicative congruential method with modulus 248 and multiplier 33952834046453. 48-bit integers are
stored in four integer array elements with 12 bits per element. Hence the routine is portable across machines
with integers of 32 bits or more.
Input Parameters
iseed Array, size 4. On entry, the seed of the random number generator. The
array elements must be between 0 and 4095, and iseed[3] must be odd.
Output Parameters
Return Values
The function returns a random number.
?larfb
Applies a block reflector or its transpose/conjugate-
transpose to a general rectangular matrix.
Syntax
lapack_int LAPACKE_slarfb (int matrix_layout , char side , char trans , char direct ,
char storev , lapack_int m , lapack_int n , lapack_int k , const float * v , lapack_int
ldv , const float * t , lapack_int ldt , float * c , lapack_int ldc );
lapack_int LAPACKE_dlarfb (int matrix_layout , char side , char trans , char direct ,
char storev , lapack_int m , lapack_int n , lapack_int k , const double * v ,
lapack_int ldv , const double * t , lapack_int ldt , double * c , lapack_int
ldc );lapack_int LAPACKE_clarfb (int matrix_layout , char side , char trans , char
direct , char storev , lapack_int m , lapack_int n , lapack_int k , const
lapack_complex_float * v , lapack_int ldv , const lapack_complex_float * t , lapack_int
ldt , lapack_complex_float * c , lapack_int ldc );
1204
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_zlarfb (int matrix_layout , char side , char trans , char direct ,
char storev , lapack_int m , lapack_int n , lapack_int k , const lapack_complex_double
* v , lapack_int ldv , const lapack_complex_double * t , lapack_int ldt ,
lapack_complex_double * c , lapack_int ldc );
Include Files
• mkl.h
Description
The real flavors of the routine ?larfb apply a real block reflector H or its transpose HT to a real m-by-n
matrix C from either left or right.
The complex flavors of the routine ?larfb apply a complex block reflector H or its conjugate transpose HH to
a complex m-by-n matrix C from either left or right.
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
side If side = 'L': apply H or HT for real flavors and H or HH for complex
flavors from the left.
If side = 'R': apply H or HT for real flavors and H or HH for complex
flavors from the right.
storev Indicates how the vectors which define the elementary reflectors are
stored:
If storev = 'C': Column-wise
storev = C storev = R
1205
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ldv The leading dimension of the array v.It should satisfy the following
conditions:
storev = C storev = R
c Array, size at least max(1, ldc * n) for column major layout and max(1, ldc
* m) for row major layout.
On entry, the m-by-n matrix C.
Output Parameters
Return Values
This function returns a value info.
If info = 0, the execution is successful.
Application Notes
The shape of the matrix V and the storage of the vectors which define the H(i) is best illustrated by the
following example with n = 5 and k = 3. The elements equal to 1 are not stored; the corresponding array
elements are modified but restored on exit. The rest of the array is not used.
1206
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
?larfg
Generates an elementary reflector (Householder
matrix).
Syntax
lapack_int LAPACKE_slarfg (lapack_int n , float * alpha , float * x , lapack_int incx ,
float * tau );
lapack_int LAPACKE_dlarfg (lapack_int n , double * alpha , double * x , lapack_int
incx , double * tau );
lapack_int LAPACKE_clarfg (lapack_int n , lapack_complex_float * alpha ,
lapack_complex_float * x , lapack_int incx , lapack_complex_float * tau );
lapack_int LAPACKE_zlarfg (lapack_int n , lapack_complex_double * alpha ,
lapack_complex_double * x , lapack_int incx , lapack_complex_double * tau );
Include Files
• mkl.h
Description
The routine ?larfg generates a real/complex elementary reflector H of order n, such that
1207
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
where alpha and beta are scalars (with beta real for all flavors), and x is an (n-1)-element real/complex
vector. H is represented in the form
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
alpha
x Array, size (1+(n-2)*abs(incx)).
On entry, the vector x.
Output Parameters
tau
Return Values
This function returns a value info.
If info = 0, the execution is successful.
?larft
Forms the triangular factor T of a block reflector H = I
- V*T*V**H.
1208
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_slarft (int matrix_layout , char direct , char storev , lapack_int
n , lapack_int k , const float * v , lapack_int ldv , const float * tau , float * t ,
lapack_int ldt );
lapack_int LAPACKE_dlarft (int matrix_layout , char direct , char storev , lapack_int
n , lapack_int k , const double * v , lapack_int ldv , const double * tau , double * t ,
lapack_int ldt );
lapack_int LAPACKE_clarft (int matrix_layout , char direct , char storev , lapack_int
n , lapack_int k , const lapack_complex_float * v , lapack_int ldv , const
lapack_complex_float * tau , lapack_complex_float * t , lapack_int ldt );
lapack_int LAPACKE_zlarft (int matrix_layout , char direct , char storev , lapack_int
n , lapack_int k , const lapack_complex_double * v , lapack_int ldv , const
lapack_complex_double * tau , lapack_complex_double * t , lapack_int ldt );
Include Files
• mkl.h
Description
The routine ?larft forms the triangular factor T of a real/complex block reflector H of order n, which is
defined as a product of k elementary reflectors.
If direct = 'F', H = H(1)*H(2)* . . .*H(k) and T is upper triangular;
If storev = 'C', the vector which defines the elementary reflector H(i) is stored in the i-th column of the
array v, and H = I - V*T*VT (for real flavors) or H = I - V*T*VH (for complex flavors) .
If storev = 'R', the vector which defines the elementary reflector H(i) is stored in the i-th row of the array
v, and H = I - VT*T*V (for real flavors) or H = I - VH*T*V (for complex flavors).
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
direct Specifies the order in which the elementary reflectors are multiplied to form
the block reflector:
= 'F': H = H(1)*H(2)*. . . *H(k) (forward)
storev Specifies how the vectors which define the elementary reflectors are stored
(see also Application Notes below):
= 'C': column-wise
= 'R': row-wise.
1209
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
storev = C storev = R
tau Array, size (k). tau[i-1] must contain the scalar factor of the elementary
reflector H(i).
Output Parameters
t Array, size ldt * k. The k-by-k triangular factor T of the block reflector. If
direct = 'F', T is upper triangular; if direct = 'B', T is lower
triangular. The rest of the array is not used.
v The matrix V.
Application Notes
The shape of the matrix V and the storage of the vectors which define the H(i) is best illustrated by the
following example with n = 5 and k = 3. The elements equal to 1 are not stored; the corresponding array
elements are modified but restored on exit. The rest of the array is not used.
1210
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
?larfx
Applies an elementary reflector to a general
rectangular matrix, with loop unrolling when the
reflector has order less than or equal to 10.
Syntax
lapack_int LAPACKE_slarfx (int matrix_layout , char side , lapack_int m , lapack_int
n , const float * v , float tau , float * c , lapack_int ldc , float * work );
lapack_int LAPACKE_dlarfx (int matrix_layout , char side , lapack_int m , lapack_int
n , const double * v , double tau , double * c , lapack_int ldc , double * work );
lapack_int LAPACKE_clarfx (int matrix_layout , char side , lapack_int m , lapack_int
n , const lapack_complex_float * v , lapack_complex_float tau , lapack_complex_float *
c , lapack_int ldc , lapack_complex_float * work );
lapack_int LAPACKE_zlarfx (int matrix_layout , char side , lapack_int m , lapack_int
n , const lapack_complex_double * v , lapack_complex_double tau , lapack_complex_double
* c , lapack_int ldc , lapack_complex_double * work );
Include Files
• mkl.h
Description
The routine ?larfx applies a real/complex elementary reflector H to a real/complex m-by-n matrix C, from
either the left or the right.
H is represented in the following forms:
• H = I - tau*v*vT, where tau is a real scalar and v is a real vector.
• H = I - tau*v*vH, where tau is a complex scalar and v is a complex vector.
If tau = 0, then H is taken to be the unit matrix.
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
1211
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
v Array, size
(m) if side = 'L' or
c Array, size at least max(1, ldc*n) for column major layout and max (1,
ldc*m) for row major layout. On entry, the m-by-n matrix C.
Output Parameters
?large
Pre- and post-multiplies a real general matrix with a
random orthogonal matrix.
Syntax
void slarge (lapack_int *n, float *a, lapack_int *lda, lapack_int *iseed, float * work,
lapack_int *info);
void dlarge (lapack_int *n, double *a, lapack_int *lda, lapack_int *iseed, double *
work, lapack_int *info);
void clarge (lapack_int *n, lapack_complex *a, lapack_int *lda, lapack_int *iseed,
lapack_complex * work, lapack_int *info);
void zlarge (lapack_int *n, lapack_complex_double *a, lapack_int *lda, lapack_int
*iseed, lapack_complex_double * work, lapack_int *info);
Include Files
• mkl.h
Description
The routine ?large pre- and post-multiplies a general n-by-n matrix A with a random orthogonal or unitary
matrix: A = U*D*UT .
1212
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
Output Parameters
?larnd
Returns a random real number from a uniform or
normal distribution.
Syntax
float slarnd (lapack_int *idist, lapack_int *iseed);
double dlarnd (lapack_int *idist, lapack_int *iseed);
The data types for complex variations depend on whether or not the application links with Gnu Fortran
(gfortran) libraries.
For non-gfortran (libmkl_intel_*) interface libraries:
void clarnd (lapack_complex_float *res, lapack_int *idist, lapack_int *iseed);
void zlarnd (lapack_complex_double *res, lapack_int *idist, lapack_int *iseed);
For gfortran (libmkl_gf_*) interface libraries:
lapack_complex_float clarnd (lapack_int *idist, lapack_int *iseed);
lapack_complex_double zlarnd (lapack_int *idist, lapack_int *iseed);
To understand the difference between the non-gfortran and gfortran interfaces and when to use each of
them, see Dynamic Libraries in the lib/intel64 Directory in the oneAPI Math Kernel Library Developer Guide.
Include Files
• mkl.h
Description
The routine ?larnd returns a random number from a uniform or normal distribution.
1213
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
idist Specifies the distribution of the random numbers. For slarnd and dlanrd:
= 1: uniform (0,1)
= 2: uniform (-1,1)
= 3: normal (0,1).
For clarnd and zlanrd:
Output Parameters
Return Values
The function returns a random number (for complex variations libmkl_gf_* interface layer/libraries return
the result as the parameter res).
?larnv
Returns a vector of random numbers from a uniform
or normal distribution.
Syntax
lapack_int LAPACKE_slarnv (lapack_int idist , lapack_int * iseed , lapack_int n , float
* x );
lapack_int LAPACKE_dlarnv (lapack_int idist , lapack_int * iseed , lapack_int n ,
double * x );
lapack_int LAPACKE_clarnv (lapack_int idist , lapack_int * iseed , lapack_int n ,
lapack_complex_float * x );
lapack_int LAPACKE_zlarnv (lapack_int idist , lapack_int * iseed , lapack_int n ,
lapack_complex_double * x );
Include Files
• mkl.h
Description
The routine ?larnv returns a vector of n random real/complex numbers from a uniform or normal
distribution.
1214
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
This routine calls the auxiliary routine ?laruv to generate random real numbers from a uniform (0,1)
distribution, in batches of up to 128 using vectorisable code. The Box-Muller method is used to transform
numbers from a uniform to a normal distribution.
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
idist Specifies the distribution of the random numbers: for slarnv and dlarnv:
= 1: uniform (0,1)
= 2: uniform (-1,1)
= 3: normal (0,1).
for clarnv and zlarnv:
Output Parameters
Return Values
This function returns a value info.
If info = 0, the execution is successful.
?laror
Pre- or post-multiplies an m-by-n matrix by a random
orthogonal/unitary matrix.
Syntax
void slaror (char *side, char *init, lapack_int *m, lapack_int *n, float *a, lapack_int
*lda, lapack_int *iseed, float *x, lapack_int *info);
void dlaror (char *side, char *init, lapack_int *m, lapack_int *n, double *a, lapack_int
*lda, lapack_int *iseed, double *x, lapack_int *info);
void claror (char *side, char *init, lapack_int *m, lapack_int *n, lapack_complex *a,
lapack_int *lda, lapack_int *iseed, lapack_complex *x, lapack_int *info);
1215
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
void zlaror (char *side, char *init, lapack_int *m, lapack_int *n,
lapack_complex_double *a, lapack_int *lda, lapack_int *iseed, lapack_complex_double *x,
lapack_int *info);
Include Files
• mkl.h
Description
The routine ?laror pre- or post-multiplies an m-by-n matrix A by a random orthogonal or unitary matrix U,
overwriting A. A may optionally be initialized to the identity matrix before multiplying by U. U is generated
using the method of G.W. Stewart (SIAM J. Numer. Anal. 17, 1980, 403-409).
Input Parameters
If side = 'C' or 'T', multiply A on the left by U and the right by UT.
1216
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
n The number of columns of A.
'L' 2*m + n
'R' 2*n + m
'C' or 'T' 3*n
Output Parameters
a On exit, overwritten
by UA ( if side = 'L' ),
by AU ( if side = 'R' ),
iseed The values of iseed are changed on exit, and can be used in the next call
to continue the same random number sequence.
?larot
Applies a Givens rotation to two adjacent rows or
columns.
1217
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
void slarot (lapack_logical *lrows, lapack_logical *ileft, lapack_logical *iright,
lapack_int *nl, float *c, float *s, float *a, lapack_int *lda, float *xleft, float
*xright);
void dlarot (lapack_logical *lrows, lapack_logical *ileft, lapack_logical *iright,
lapack_int *nl, double *c, double *s, double *a, lapack_int *lda, double *xleft, double
*xright);
void clarot (lapack_logical *lrows, lapack_logical *ileft, lapack_logical *iright,
lapack_int *nl, lapack_complex *c, lapack_complex *s, lapack_complex *a, lapack_int
*lda, lapack_complex *xleft, lapack_complex *xright);
void zlarot (lapack_logical *lrows, lapack_logical *ileft, lapack_logical *iright,
lapack_int *nl, lapack_complex_double *c, lapack_complex_double *s,
lapack_complex_double *a, lapack_int *lda, lapack_complex_double *xleft,
lapack_complex_double *xright);
Include Files
• mkl.h
Description
The routine ?larot applies a Givens rotation to two adjacent rows or columns, where one element of the
first or last column or row is stored in some format other than GE so that elements of the matrix may be
used or modified for which no array element is provided.
One example is a symmetric matrix in SB format (bandwidth = 4), for which uplo = 'L'. Two adjacent rows
will have the format:
1218
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
nl The length of the rows (if lrows=1) or columns (if lrows=1) to be rotated.
If xleft or xright are used, the columns or rows they are in should be
included in nl, e.g., if lleft = lright = 1, then nl must be at least 2.
a The array containing the rows or columns to be rotated. The first element of
a should be the upper left element to be rotated.
xleft If lrows = 1, xleft is used and modified instead of a[1] (if lrows = 1)
or a[lda + 1] (if lrows = 0).
1219
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
?lartgp
Generates a plane rotation.
Syntax
lapack_int LAPACKE_slartgp (float f, floatg, float* cs, float* sn, float* r);
lapack_int LAPACKE_dlartgp (doublef, doubleg, double* cs, double* sn, double* r);
Include Files
• mkl.h
Description
The routine generates a plane rotation so that
This is a slower, more accurate version of the BLAS Level 1 routine ?rotg, except for the following
differences:
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
Output Parameters
1220
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
r The nonzero component of the rotated vector.
Return Values
If info = 0, the execution is successful.
See Also
?rotg
?lartgs
?lartgs
Generates a plane rotation designed to introduce a
bulge in implicit QR iteration for the bidiagonal SVD
problem.
Syntax
lapack_int LAPACKE_slartgs (floatx, floaty, floatsigma, float* cs, float* sn);
lapack_int LAPACKE_dlartgs (doublex, doubley, doublesigma, double* cs, double* sn);
Include Files
• mkl.h
Description
The routine generates a plane rotation designed to introduce a bulge in Golub-Reinsch-style implicit QR
iteration for the bidiagonal SVD problem. x and y are the top-row entries, and sigma is the shift. The
computed cs and sn define a plane rotation that satisfies the following:
with r nonnegative.
If x2 - sigma and x * y are 0, the rotation is by π/2
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
sigma Shift
Output Parameters
1221
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Return Values
If info = 0, the execution is successful.
If info = - 1, x is NaN.
If info = - 2, y is NaN.
If info = - 3, sigma is NaN.
See Also
?lartgp
?lascl
Multiplies a general rectangular matrix by a real scalar
defined as cto/cfrom.
Syntax
lapack_int LAPACKE_slascl (int matrix_layout, char type, lapack_int kl, lapack_int ku,
float cfrom, float cto, lapack_int m, lapack_int n, float * a, lapack_int lda);
lapack_int LAPACKE_dlascl (int matrix_layout, char type, lapack_int kl, lapack_int ku,
double cfrom, double cto, lapack_int m, lapack_int n, double * a, lapack_int lda);
lapack_int LAPACKE_clascl (int matrix_layout, char type, lapack_int kl, lapack_int ku,
float cfrom, float cto, lapack_int m, lapack_int n, lapack_complex_float * a,
lapack_int lda);
lapack_int LAPACKE_zlascl (int matrix_layout, char type, lapack_int kl, lapack_int ku,
double cfrom, double cto, lapack_int m, lapack_int n, lapack_complex_double * a,
lapack_int lda);
Include Files
• mkl.h
Description
The routine ?lascl multiplies the m-by-n real/complex matrix A by the real scalar cto/cfrom. The operation
is performed without over/underflow as long as the final result cto*A(i,j)/cfrom does not over/underflow.
type specifies that A may be full, upper triangular, lower triangular, upper Hessenberg, or banded.
Input Parameters
type This parameter specifies the storage type of the input matrix.
= 'G': A is a full matrix.
1222
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
= 'B': A is a symmetric band matrix with lower bandwidth kl and upper
bandwidth ku and with the only the lower half stored
= 'Q': A is a symmetric band matrix with lower bandwidth kl and upper
bandwidth ku and with the only the upper half stored.
= 'Z': A is a band matrix with lower bandwidth kl and upper bandwidth ku.
See description of the ?gbtrf function for storage details.
cfrom, cto The matrix A is multiplied by cto/cfrom. A(i,j) is computed without over/
underflow if the final result cto*A(i,j)/cfrom can be represented without
over/underflow. cfrom must be nonzero.
a Array, size (lda*n). The matrix to be multiplied by cto/cfrom. See type for
the storage type.
Output Parameters
See Also
?gbtrf
?lasd0
Computes the singular values of a real upper
bidiagonal n-by-m matrix B with diagonal d and off-
diagonal e. Used by ?bdsdc.
Syntax
void slasd0( lapack_int *n, lapack_int *sqre, float *d, float *e, float *u, lapack_int
*ldu, float *vt, lapack_int *ldvt, lapack_int *smlsiz, lapack_int *iwork, float *work,
lapack_int *info );
void dlasd0( lapack_int *n, lapack_int *sqre, double *d, double *e, double *u,
lapack_int *ldu, double *vt, lapack_int *ldvt, lapack_int *smlsiz, lapack_int *iwork,
double *work, lapack_int *info );
Include Files
• mkl.h
Description
1223
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Using a divide and conquer approach, the routine ?lasd0 computes the singular value decomposition (SVD)
of a real upper bidiagonal n-by-m matrix B with diagonal d and offdiagonal e, where m = n + sqre.
The algorithm computes orthogonal matrices U and VT such that B = U*S*VT. The singular values S are
overwritten on d.
The related subroutine ?lasda computes only the singular values, and optionally, the singular vectors in
compact form.
Input Parameters
n On entry, the row dimension of the upper bidiagonal matrix. This is also the
dimension of the main diagonal array d.
Output Parameters
u Array, DIMENSION at least (ldq, n). On exit, u contains the left singular
vectors.
vt Array, DIMENSION at least (ldvt, m). On exit, vtT contains the right singular
vectors.
?lasd1
Computes the SVD of an upper bidiagonal matrix B of
the specified size. Used by ?bdsdc.
1224
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void slasd1( lapack_int *nl, lapack_int *nr, lapack_int *sqre, float *d, float *alpha,
float *beta, float *u, lapack_int *ldu, float *vt, lapack_int *ldvt, lapack_int *idxq,
lapack_int *iwork, float *work, lapack_int *info );
void dlasd1( lapack_int *nl, lapack_int *nr, lapack_int *sqre, double *d, double *alpha,
double *beta, double *u, lapack_int *ldu, double *vt, lapack_int *ldvt, lapack_int
*idxq, lapack_int *iwork, double *work, lapack_int *info );
Include Files
• mkl.h
Description
The routine computes the SVD of an upper bidiagonal n-by-m matrix B, where n = nl + nr + 1 and m = n
+ sqre.
The routine ?lasd1 is called from ?lasd0.
A related subroutine ?lasd7 handles the case in which the singular values (and the singular vectors in
factored form) are desired.
?lasd1 computes the SVD as follows:
= U(out)*(D(out) 0)*VT(out)
whereZT = (Z1TaZ2Tb) = uT*VTT, and u is a vector of dimension m with alpha and beta in the nl+1 and nl
+2-th entries and zeros elsewhere; and the entry b is empty if sqre = 0.
The left singular vectors of the original matrix are stored in u, and the transpose of the right singular vectors
are stored in vt, and the singular values are in d. The algorithm consists of three stages:
1. The first stage consists of deflating the size of the problem when there are multiple singular values or
when there are zeros in the Z vector. For each such occurrence the dimension of the secular equation
problem is reduced by one. This stage is performed by the routine ?lasd2.
2. The second stage consists of calculating the updated singular values. This is done by finding the square
roots of the roots of the secular equation via the routine ?lasd4 (as called by ?lasd3). This routine
also calculates the singular vectors of the current problem.
3. The final stage consists of computing the updated singular vectors directly using the updated singular
values. The singular vectors for the current problem are multiplied with the singular vectors from the
overall problem.
Input Parameters
1225
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
alpha Contains the diagonal element associated with the added row.
beta Contains the off-diagonal element associated with the added row.
u Array, DIMENSION (ldu, n). On entry u(1:nl, 1:nl) contains the left
singular vectors of the upper block; u(nl+2:n, nl+2:n) contains the left
singular vectors of the lower block.
Output Parameters
alpha On exit, the diagonal element associated with the added row deflated by
max( abs( alpha ), abs( beta ), abs( D(I) ) ), I = 1,n.
beta On exit, the off-diagonal element associated with the added row deflated by
max( abs( alpha ), abs( beta ), abs( D(I) ) ), I = 1,n.
vt On exit vtT contains the right singular vectors of the bidiagonal matrix.
idxq Array, DIMENSION (n). Contains the permutation which will reintegrate the
subproblem just solved back into sorted order, that is, d(idxq( i = 1,
n )) will be in ascending order.
1226
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = 1, a singular value did not converge.
?lasd2
Merges the two sets of singular values together into a
single sorted set. Used by ?bdsdc.
Syntax
void slasd2( lapack_int *nl, lapack_int *nr, lapack_int *sqre, lapack_int *k, float *d,
float *z, float *alpha, float *beta, float *u, lapack_int *ldu, float *vt, lapack_int
*ldvt, float *dsigma, float *u2, lapack_int *ldu2, float *vt2, lapack_int *ldvt2,
lapack_int *idxp, lapack_int *idx, lapack_int *idxq, lapack_int *coltyp, lapack_int
*info );
void dlasd2( lapack_int *nl, lapack_int *nr, lapack_int *sqre, lapack_int *k, double *d,
double *z, double *alpha, double *beta, double *u, lapack_int *ldu, double *vt,
lapack_int *ldvt, double *dsigma, double *u2, lapack_int *ldu2, double *vt2, lapack_int
*ldvt2, lapack_int *idxp, lapack_int *idx, lapack_int *idxq, lapack_int *coltyp,
lapack_int *info );
Include Files
• mkl.h
Description
The routine ?lasd2 merges the two sets of singular values together into a single sorted set. Then it tries to
deflate the size of the problem. There are two ways in which deflation can occur: when two or more singular
values are close together or if there is a tiny entry in the Z vector. For each such occurrence the order of the
related secular equation problem is reduced by one.
The routine ?lasd2 is called from ?lasd1.
Input Parameters
d Array, DIMENSION (n). On entry d contains the singular values of the two
submatrices to be combined.
alpha Contains the diagonal element associated with the added row.
beta Contains the off-diagonal element associated with the added row.
1227
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
u Array, DIMENSION (ldu, n). On entry u contains the left singular vectors of
two submatrices in the two square blocks with corners at (1,1), (nl, nl), and
(nl+2, nl+2), (n,n).
vt Array, DIMENSION (ldvt, m). On entry, vtT contains the right singular
vectors of two submatrices in the two square blocks with corners at (1,1),
(nl+1, nl+1), and (nl+2, nl+2), (m, m).
idxp Workspace array, DIMENSION (n). This will contain the permutation used to
place deflated values of D at the end of the array. On output idxp(2:k)
points to the nondeflated d-values and idxp(k+1:n) points to the deflated
singular values.
idx Workspace array, DIMENSION (n). This will contain the permutation used to
sort the contents of d into ascending order.
coltyp Workspace array, DIMENSION (n). As workspace, this array contains a label
that indicates which of the following types a column in the u2 matrix or a
row in the vt2 matrix is:
1 : non-zero in the upper half only
2 : non-zero in the lower half only
3 : dense
4 : deflated.
idxq Array, DIMENSION (n). This parameter contains the permutation that
separately sorts the two sub-problems in D in the ascending order. Note
that entries in the first half of this permutation must first be moved one
position backwards and entries in the second half must have nl+1 added to
their values.
Output Parameters
k Contains the dimension of the non-deflated matrix, This is the order of the
related secular equation. 1 ≤ k ≤ n.
d On exit D contains the trailing (n-k) updated singular values (those which
were deflated) sorted into increasing order.
u On exit u contains the trailing (n-k) updated left singular vectors (those
which were deflated) in its last n-k columns.
z Array, DIMENSION (n). On exit, z contains the updating row vector in the
secular equation.
dsigma Array, DIMENSION (n). Contains a copy of the diagonal elements (k-1
singular values and one zero) in the secular equation.
1228
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
u2 Array, DIMENSION (ldu2, n). Contains a copy of the first k-1 left singular
vectors which will be used by ?lasd3 in a matrix multiply (?gemm) to solve
for the new left singular vectors. u2 is arranged into four blocks. The first
block contains a column with 1 at nl+1 and zero everywhere else; the
second block contains non-zero entries only at and above nl; the third
contains non-zero entries only below nl+1; and the fourth is dense.
vt On exit, vtT contains the trailing (n-k) updated right singular vectors (those
which were deflated) in its last n-k columns. In case sqre =1, the last row
of vt spans the right null space.
vt2 Array, DIMENSION (ldvt2, n). vt2T contains a copy of the first k right
singular vectors which will be used by ?lasd3 in a matrix multiply (?gemm)
to solve for the new right singular vectors. vt2 is arranged into three blocks.
The first block contains a row that corresponds to the special 0 diagonal
element in sigma; the second block contains non-zeros only at and before
nl +1; the third block contains non-zeros only at and after nl +2.
idxc Array, DIMENSION (n). This will contain the permutation used to arrange the
columns of the deflated u matrix into three groups: the first group contains
non-zero entries only at and above nl, the second contains non-zero entries
only below nl+2, and the third is dense.
?lasd3
Finds all square roots of the roots of the secular
equation, as defined by the values in D and Z, and
then updates the singular vectors by matrix
multiplication. Used by ?bdsdc.
Syntax
void slasd3( lapack_int *nl, lapack_int *nr, lapack_int *sqre, lapack_int *k, float *d,
float *q, lapack_int *ldq, float *dsigma, float *u, lapack_int *ldu, float *u2,
lapack_int *ldu2, float *vt, lapack_int *ldvt, float *vt2, lapack_int *ldvt2,
lapack_int *idxc, lapack_int *ctot, float *z, lapack_int *info );
void dlasd3( lapack_int *nl, lapack_int *nr, lapack_int *sqre, lapack_int *k, double *d,
double *q, lapack_int *ldq, double *dsigma, double *u, lapack_int *ldu, double *u2,
lapack_int *ldu2, double *vt, lapack_int *ldvt, double *vt2, lapack_int *ldvt2,
lapack_int *idxc, lapack_int *ctot, double *z, lapack_int *info );
Include Files
• mkl.h
Description
The routine ?lasd3 finds all the square roots of the roots of the secular equation, as defined by the values in
D and Z.
1229
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
It makes the appropriate calls to ?lasd4 and then updates the singular vectors by matrix multiplication.
Input Parameters
dsigma Array, DIMENSION (k). The first k elements of this array contain the old
roots of the deflated updating problem. These are the poles of the secular
equation.
The first k columns of this matrix contain the non-deflated left singular
vectors for the split problem.
The first k columns of vt2' contain the non-deflated right singular vectors
for the split problem.
The permutation used to arrange the columns of u (and rows of vt) into
three groups: the first group contains non-zero entries only at and above
(or before) nl +1; the second contains non-zero entries only at and below
(or after) nl+2; and the third is dense. The first column of u and the row of
1230
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
vt are treated separately, however. The rows of the singular vectors found
by ?lasd4 must be likewise permuted before the matrix multiplies can take
place.
ctot Array, DIMENSION (4). A count of the total number of the various types of
columns in u (or rows in vt), as described in idxc.
The fourth column type is any column which has been deflated.
z Array, DIMENSION (k). The first k elements of this array contain the
components of the deflation-adjusted updating row vector.
Output Parameters
d Array, DIMENSION (k). On exit the square roots of the roots of the secular
equation, in ascending order.
The last n - k columns of this matrix contain the deflated left singular
vectors.
The last m - k columns of vt' contain the deflated right singular vectors.
z Destroyed on exit.
Application Notes
This code makes very mild assumptions about floating point arithmetic. It will work on machines with a guard
digit in add/subtract, or on those binary machines without guard digits which subtract like the Cray XMP, Cray
YMP, Cray C 90, or Cray 2. It could conceivably fail on hexadecimal or decimal machines without guard digits,
but we know of none.
?lasd4
Computes the square root of the i-th updated
eigenvalue of a positive symmetric rank-one
modification to a positive diagonal matrix. Used
by ?bdsdc.
Syntax
void slasd4( lapack_int *n, lapack_int *i, float *d, float *z, float *delta, float *rho,
float *sigma, float *work, lapack_int *info);
void dlasd4( lapack_int *n, lapack_int *i, double *d, double *z, double *delta, double
*rho, double *sigma, double *work, lapack_int *info);
Include Files
• mkl.h
1231
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Description
The routine computes the square root of the i-th updated eigenvalue of a positive symmetric rank-one
modification to a positive diagonal matrix whose entries are given as the squares of the corresponding
entries in the array d, and that 0 ≤ d(i) < d(j) for i < j and that rho > 0. This is arranged by the
calling routine, and is no loss in generality. The rank-one modified system is thus
diag(d)*diag(d) + rho*Z*ZT,
where the Euclidean norm of Z is equal to 1.The method consists of approximating the rational functions in
the secular equation by simpler interpolating rational functions.
Input Parameters
The original eigenvalues. They must be in order, 0 ≤ d(i) < d(j) for i <
j.
If n = 1, then work( 1 ) = 1.
Output Parameters
?lasd5
Computes the square root of the i-th eigenvalue of a
positive symmetric rank-one modification of a 2-by-2
diagonal matrix.Used by ?bdsdc.
Syntax
void slasd5( lapack_int *i, float *d, float *z, float *delta, float *rho, float *dsigma,
float *work );
1232
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void dlasd5( lapack_int *i, double *d, double *z, double *delta, double *rho, double
*dsigma, double *work );
Include Files
• mkl.h
Description
The routine computes the square root of the i-th eigenvalue of a positive symmetric rank-one modification of
a 2-by-2 diagonal matrix diag(d)*diag(d)+rho*Z*ZT
The diagonal entries in the array d must satisfy 0 ≤ d(i) < d(j) for i<i, rho mustbe greater than 0, and
that the Euclidean norm of the vector Z is equal to 1.
Input Parameters
d Array, dimension (2 ).
z Array, dimension ( 2 ).
Output Parameters
?lasd6
Computes the SVD of an updated upper bidiagonal
matrix obtained by merging two smaller ones by
appending a row. Used by ?bdsdc.
Syntax
void slasd6( lapack_int *icompq, lapack_int *nl, lapack_int *nr, lapack_int *sqre,
float *d, float *vf, float *vl, float *alpha, float *beta, lapack_int *idxq, lapack_int
*perm, lapack_int *givptr, lapack_int *givcol, lapack_int *ldgcol, float *givnum,
lapack_int *ldgnum, float *poles, float *difl, float *difr, float *z, lapack_int *k,
float *c, float *s, float *work, lapack_int *iwork, lapack_int *info );
void dlasd6( lapack_int *icompq, lapack_int *nl, lapack_int *nr, lapack_int *sqre,
double *d, double *vf, double *vl, double *alpha, double *beta, lapack_int *idxq,
lapack_int *perm, lapack_int *givptr, lapack_int *givcol, lapack_int *ldgcol, double
1233
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
*givnum, lapack_int *ldgnum, double *poles, double *difl, double *difr, double *z,
lapack_int *k, double *c, double *s, double *work, lapack_int *iwork, lapack_int
*info );
Include Files
• mkl.h
Description
The routine ?lasd6 computes the SVD of an updated upper bidiagonal matrix B obtained by merging two
smaller ones by appending a row. This routine is used only for the problem which requires all singular values
and optionally singular vector matrices in factored form. B is an n-by-m matrix with n = nl + nr + 1 and m
= n + sqre. A related subroutine, ?lasd1, handles the case in which all singular values and singular vectors
of the bidiagonal matrix are desired. ?lasd6 computes the SVD as follows:
= U(out)*(D(out)*VT(out)
where Z' = (Z1' aZ2' b) = u'*VT', and u is a vector of dimension m with alpha and beta in the nl+1
and nl+2-th entries and zeros elsewhere; and the entry b is empty if sqre = 0.
The singular values of B can be computed using D1, D2, the first components of all the right singular vectors
of the lower block, and the last components of all the right singular vectors of the upper block. These
components are stored and updated in vf and vl, respectively, in ?lasd6. Hence U and VT are not explicitly
referenced.
The singular values are stored in D. The algorithm consists of two stages:
1. The first stage consists of deflating the size of the problem when there are multiple singular values or if
there is a zero in the Z vector. For each such occurrence the dimension of the secular equation problem
is reduced by one. This stage is performed by the routine ?lasd7.
2. The second stage consists of calculating the updated singular values. This is done by finding the roots
of the secular equation via the routine ?lasd4 (as called by ?lasd8). This routine also updates vf and
vl and computes the distances between the updated singular values and the old singular
values. ?lasd6 is called from ?lasda.
Input Parameters
1234
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
nr≥ 1.
vf Array, dimension ( m ).
vl Array, dimension ( m ).
alpha Contains the diagonal element associated with the added row.
beta Contains the off-diagonal element associated with the added row.
ldgcol The leading dimension of the output array givcol, must be at least n.
ldgnum The leading dimension of the output arrays givnum and poles, must be at
least n.
Output Parameters
vf On exit, vf contains the first components of all right singular vectors of the
bidiagonal matrix.
vl On exit, vl contains the last components of all right singular vectors of the
bidiagonal matrix.
alpha On exit, the diagonal element associated with the added row deflated by
max(abs(alpha), abs(beta), abs(D(I))), I = 1,n.
beta On exit, the off-diagonal element associated with the added row deflated by
max(abs(alpha), abs(beta), abs(D(I))), I = 1,n.
idxq Array, dimension (n). This contains the permutation which will reintegrate
the subproblem just solved back into sorted order, that is, d( idxq( i =
1, n ) ) will be in ascending order.
perm Array, dimension (n). The permutations (from deflation and sorting) to be
applied to each block. Not referenced if icompq = 0.
1235
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
givptr The number of Givens rotations which took place in this subproblem. Not
referenced if icompq = 0.
difl Array, dimension (n). On exit, difl(i) is the distance between i-th updated
(undeflated) singular value and the i-th (undeflated) old singular value.
z Array, dimension ( m ).
The first elements of this array contain the components of the deflation-
adjusted updating row vector.
k Contains the dimension of the non-deflated matrix. This is the order of the
related secular equation. 1 ≤ k ≤ n.
?lasd7
Merges the two sets of singular values together into a
single sorted set. Then it tries to deflate the size of
the problem. Used by ?bdsdc.
1236
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void slasd7( lapack_int *icompq, lapack_int *nl, lapack_int *nr, lapack_int *sqre,
lapack_int *k, float *d, float *z, float *zw, float *vf, float *vfw, float *vl, float
*vlw, float *alpha, float *beta, float *dsigma, lapack_int *idx, lapack_int *idxp,
lapack_int *idxq, lapack_int *perm, lapack_int *givptr, lapack_int *givcol, lapack_int
*ldgcol, float *givnum, lapack_int *ldgnum, float *c, float *s, lapack_int *info );
void dlasd7( lapack_int *icompq, lapack_int *nl, lapack_int *nr, lapack_int *sqre,
lapack_int *k, double *d, double *z, double *zw, double *vf, double *vfw, double *vl,
double *vlw, double *alpha, double *beta, double *dsigma, lapack_int *idx, lapack_int
*idxp, lapack_int *idxq, lapack_int *perm, lapack_int *givptr, lapack_int *givcol,
lapack_int *ldgcol, double *givnum, lapack_int *ldgnum, double *c, double *s,
lapack_int *info );
Include Files
• mkl.h
Description
The routine ?lasd7 merges the two sets of singular values together into a single sorted set. Then it tries to
deflate the size of the problem. There are two ways in which deflation can occur: when two or more singular
values are close together or if there is a tiny entry in the Z vector. For each such occurrence the order of the
related secular equation problem is reduced by one. ?lasd7 is called from ?lasd6.
Input Parameters
d Array, DIMENSION (n). On entry d contains the singular values of the two
submatrices to be combined.
zw Array, DIMENSION ( m ).
Workspace for z.
1237
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
vl Array, DIMENSION ( m ).
beta Contains the off-diagonal element associated with the added row.
idx Workspace array, DIMENSION (n). This will contain the permutation used to
sort the contents of d into ascending order.
idxp Workspace array, DIMENSION (n). This will contain the permutation used to
place deflated values of d at the end of the array.
This contains the permutation which separately sorts the two sub-problems
in d into ascending order. Note that entries in the first half of this
permutation must first be moved one position backward; and entries in the
second half must first have nl+1 added to their values.
ldgcol The leading dimension of the output array givcol, must be at least n.
ldgnum The leading dimension of the output array givnum, must be at least n.
Output Parameters
k Contains the dimension of the non-deflated matrix, this is the order of the
related secular equation.
1 ≤ k ≤ n.
d On exit, d contains the trailing (n-k) updated singular values (those which
were deflated) sorted into increasing order.
vf On exit, vf contains the first components of all right singular vectors of the
bidiagonal matrix.
vl On exit, vl contains the last components of all right singular vectors of the
bidiagonal matrix.
dsigma Array, DIMENSION (n). Contains a copy of the diagonal elements (k-1
singular values and one zero) in the secular equation.
1238
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
idxp On output, idxp(2: k) points to the nondeflated d-values and idxp( k+1:n)
points to the deflated singular values.
givptr The number of Givens rotations which took place in this subproblem. Not
referenced if icompq = 0.
?lasd8
Finds the square roots of the roots of the secular
equation, and stores, for each element in D, the
distance to its two nearest poles. Used by ?bdsdc.
Syntax
void slasd8( lapack_int *icompq, lapack_int *k, float *d, float *z, float *vf, float
*vl, float *difl, float *difr, lapack_int *lddifr, float *dsigma, float *work,
lapack_int *info );
void dlasd8( lapack_int *icompq, lapack_int *k, double *d, double *z, double *vf, double
*vl, double *difl, double *difr, lapack_int *lddifr, double *dsigma, double *work,
lapack_int *info );
Include Files
• mkl.h
Description
The routine ?lasd8 finds the square roots of the roots of the secular equation, as defined by the values in
dsigma and z. It makes the appropriate calls to ?lasd4, and stores, for each element in d, the distance to its
two nearest poles (elements in dsigma). It also updates the arrays vf and vl, the first and last components of
all the right singular vectors of the original bidiagonal matrix. ?lasd8 is called from ?lasd6.
1239
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
z Array, DIMENSION ( k ).
The first k elements of this array contain the components of the deflation-
adjusted updating row vector.
vf Array, DIMENSION ( k ).
lddifr The leading dimension of the output array difr, must be at least k.
The first k elements of this array contain the old roots of the deflated
updating problem. These are the poles of the secular equation.
Output Parameters
d Array, DIMENSION ( k ).
z Updated on exit.
difr Array,
DIMENSION ( lddifr, 2 ) if icompq = 1 and
DIMENSION ( k ) if icompq = 0.
On exit, difr(i,1) = d(i) - dsigma(i+1), difr(k,1) is not defined
and will not be referenced. If icompq = 1, difr(1:k,2) is an array
containing the normalizing factors for the right singular vector matrix.
dsigma The elements of this array may be very slightly altered in value.
1240
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
> 0: If info = 1, an singular value did not converge.
?lasd9
Finds the square roots of the roots of the secular
equation, and stores, for each element in D, the
distance to its two nearest poles. Used by ?bdsdc.
Syntax
void slasd9( lapack_int *icompq, lapack_int *k, float *d, float *z, float *vf, float
*vl, float *difl, float *difr, float *dsigma, float *work, lapack_int *info );
void dlasd9( lapack_int *icompq, lapack_int *k, double *d, double *z, double *vf, double
*vl, double *difl, double *difr, double *dsigma, double *work, lapack_int *info );
Include Files
• mkl.h
Description
The routine ?lasd9 finds the square roots of the roots of the secular equation, as defined by the values in
dsigma and z. It makes the appropriate calls to ?lasd4, and stores, for each element in d, the distance to its
two nearest poles (elements in dsigma). It also updates the arrays vf and vl, the first and last components of
all the right singular vectors of the original bidiagonal matrix. ?lasd9 is called from ?lasd7.
Input Parameters
The first k elements of this array contain the old roots of the deflated
updating problem. These are the poles of the secular equation.
z Array, DIMENSION (k). The first k elements of this array contain the
components of the deflation-adjusted updating row vector.
Output Parameters
1241
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
difr Array,
DIMENSION (ldu, 2) if icompq =1 and
DIMENSION (k) if icompq = 0.
On exit, difr(i, 1) = d(i) - dsigma(i+1), difr(k, 1) is not defined
and will not be referenced.
If icompq = 1, difr(1:k, 2) is an array containing the normalizing
factors for the right singular vector matrix.
?lasda
Computes the singular value decomposition (SVD) of a
real upper bidiagonal matrix with diagonal d and off-
diagonal e. Used by ?bdsdc.
Syntax
void slasda( lapack_int *icompq, lapack_int *smlsiz, lapack_int *n, lapack_int *sqre,
float *d, float *e, float *u, lapack_int *ldu, float *vt, lapack_int *k, float *difl,
float *difr, float *z, float *poles, lapack_int *givptr, lapack_int *givcol, lapack_int
*ldgcol, lapack_int *perm, float *givnum, float *c, float *s, float *work, lapack_int
*iwork, lapack_int *info );
void dlasda( lapack_int *icompq, lapack_int *smlsiz, lapack_int *n, lapack_int *sqre,
double *d, double *e, double *u, lapack_int *ldu, double *vt, lapack_int *k, double
*difl, double *difr, double *z, double *poles, lapack_int *givptr, lapack_int *givcol,
lapack_int *ldgcol, lapack_int *perm, double *givnum, double *c, double *s, double
*work, lapack_int *iwork, lapack_int *info );
Include Files
• mkl.h
Description
Using a divide and conquer approach, ?lasda computes the singular value decomposition (SVD) of a real
upper bidiagonal n-by-m matrix B with diagonal d and off-diagonal e, where m = n + sqre.
The algorithm computes the singular values in the SVDB = U*S*VT. The orthogonal matrices U and VT are
optionally computed in compact form. A related subroutine ?lasd0 computes the singular values and the
singular vectors in explicit form.
1242
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
smlsiz The maximum size of the subproblems at the bottom of the computation
tree.
n The row dimension of the upper bidiagonal matrix. This is also the
dimension of the main diagonal array d.
ldu The leading dimension of arrays u, vt, difl, difr, poles, givnum, and z.
ldu≥n.
Output Parameters
1243
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
difr Array,
DIMENSION ( ldu, 2 nlvl ) if icompq = 1 and
DIMENSION (n) if icompq = 0.
If icompq = 1, on exit, difl(1:n, i) and difr(1:n,2i -1) record distances
between singular values on the i-th level and singular values on the (i -1)-
th level, and difr(1:n, 2i ) contains the normalizing factors for the right
singular vector matrix. See ?lasd8 for details.
z Array,
DIMENSION ( ldu, nlvl ) if icompq = 1 and
DIMENSION (n) if icompq = 0. The first k elements of z(1, i) contain the
components of the deflation-adjusted updating row vector for subproblems
on the i-th level.
c Array,
DIMENSION (n) if icompq = 1, and
DIMENSION (1) if icompq = 0.
If icompq = 1 and the i-th subproblem is not square, on exit, c(i) contains
the C-value of a Givens rotation related to the right null space of the i-th
subproblem.
s Array,
DIMENSION (n) icompq = 1, and
DIMENSION (1) if icompq = 0.
1244
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If icompq = 1 and the i-th subproblem is not square, on exit, s(i) contains
the S-value of a Givens rotation related to the right null space of the i-th
subproblem.
?lasdq
Computes the SVD of a real bidiagonal matrix with
diagonal d and off-diagonal e. Used by ?bdsdc.
Syntax
void slasdq( char *uplo, lapack_int *sqre, lapack_int *n, lapack_int *ncvt, lapack_int
*nru, lapack_int *ncc, float *d, float *e, float *vt, lapack_int *ldvt, float *u,
lapack_int *ldu, float *c, lapack_int *ldc, float *work, lapack_int *info );
void dlasdq( char *uplo, lapack_int *sqre, lapack_int *n, lapack_int *ncvt, lapack_int
*nru, lapack_int *ncc, double *d, double *e, double *vt, lapack_int *ldvt, double *u,
lapack_int *ldu, double *c, lapack_int *ldc, double *work, lapack_int *info );
Include Files
• mkl.h
Description
The routine ?lasdq computes the singular value decomposition (SVD) of a real (upper or lower) bidiagonal
matrix with diagonal d and off-diagonal e, accumulating the transformations if desired. If B is the input
bidiagonal matrix, the algorithm computes orthogonal matrices Q and P such that B = Q*S*PT. The singular
values S are overwritten on d.
The input matrix U is changed to U*Q if desired.
Input Parameters
uplo On entry, uplo specifies whether the input bidiagonal matrix is upper or
lower bidiagonal.
If uplo = 'U' or 'u', B is upper bidiagonal;
1245
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
n On entry, n specifies the number of rows and columns in the matrix. n must
be at least 0.
ncvt On entry, ncvt specifies the number of columns of the matrix VT. ncvt must
be at least 0.
nru On entry, nru specifies the number of rows of the matrix U. nru must be at
least 0.
ncc On entry, ncc specifies the number of columns of the matrix C. ncc must be
at least 0.
ldvt On entry, ldvt specifies the leading dimension of vt as declared in the calling
(sub) program. ldvt must be at least 1. If ncvt is nonzero, ldvt must also be
at least n.
u Array, DIMENSION (ldu, n). On entry, contains a matrix which on exit has
been postmultiplied by Q, dimension nru-by-n if sqre = 0 and nru-by-(n
+1) if sqre = 1 (not referenced if nru=0).
ldu On entry, ldu specifies the leading dimension of u as declared in the calling
(sub) program. ldu must be at least max(1, nru ) .
ldc On entry, ldc specifies the leading dimension of C as declared in the calling
(sub) program. ldc must be at least 1. If ncc is non-zero, ldc must also be
at least n.
work Array, DIMENSION (4n). This is a workspace array. Only referenced if one of
ncvt, nru, or ncc is nonzero, and if n is at least 2.
Output Parameters
e On normal exit, e will contain 0. If the algorithm does not converge, d and e
will contain the diagonal and superdiagonal entries of a bidiagonal matrix
orthogonally equivalent to the one given as input.
1246
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
info On exit, a value of 0 indicates a successful exit. If info < 0, argument
number -info is illegal. If info > 0, the algorithm did not converge, and
info specifies how many superdiagonals did not converge.
?lasdt
Creates a tree of subproblems for bidiagonal divide
and conquer. Used by ?bdsdc.
Syntax
void slasdt( lapack_int *n, lapack_int *lvl, lapack_int *nd, lapack_int *inode,
lapack_int *ndiml, lapack_int *ndimr, lapack_int *msub );
void dlasdt( lapack_int *n, lapack_int *lvl, lapack_int *nd, lapack_int *inode,
lapack_int *ndiml, lapack_int *ndimr, lapack_int *msub );
Include Files
• mkl.h
Description
The routine creates a tree of subproblems for bidiagonal divide and conquer.
Input Parameters
msub On entry, the maximum row dimension each subproblem at the bottom of
the tree can be of.
Output Parameters
?laset
Initializes the off-diagonal elements and the diagonal
elements of a matrix to given values.
Syntax
lapack_int LAPACKE_slaset (int matrix_layout , char uplo , lapack_int m , lapack_int
n , float alpha , float beta , float * a , lapack_int lda );
lapack_int LAPACKE_dlaset (int matrix_layout , char uplo , lapack_int m , lapack_int
n , double alpha , double beta , double * a , lapack_int lda );
1247
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine initializes an m-by-n matrix A to beta on the diagonal and alpha on the off-diagonals.
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
alpha, beta The constants to which the off-diagonal and diagonal elements are to be
set, respectively.
a Array, size at least max(1, lda*n) for column major and max(1, lda*m)
for row major layout.
The array a contains the m-by-n matrix A.
Output Parameters
1248
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
and, for all uplo, Aii = beta, 1≤i≤min(m, n).
Return Values
This function returns a value info.
If info = 0, the execution is successful.
?lasrt
Sorts numbers in increasing or decreasing order.
Syntax
lapack_int LAPACKE_slasrt (char id , lapack_int n , float * d );
lapack_int LAPACKE_dlasrt (char id , lapack_int n , double * d );
Include Files
• mkl.h
Description
The routine ?lasrt sorts the numbers in d in increasing order (if id = 'I') or in decreasing order (if id =
'D'). It uses Quick Sort, reverting to Insertion Sort on arrays of size ≤ 20. Dimension of stack limits n to
about 232.
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
Output Parameters
Return Values
This function returns a value info.
If info = 0, the execution is successful.
1249
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?laswp
Performs a series of row interchanges on a general
rectangular matrix.
Syntax
lapack_int LAPACKE_slaswp (int matrix_layout , lapack_int n , float * a , lapack_int
lda , lapack_int k1 , lapack_int k2 , const lapack_int * ipiv , lapack_int incx );
lapack_int LAPACKE_dlaswp (int matrix_layout , lapack_int n , double * a , lapack_int
lda , lapack_int k1 , lapack_int k2 , const lapack_int * ipiv , lapack_int incx );
lapack_int LAPACKE_claswp (int matrix_layout , lapack_int n , lapack_complex_float *
a , lapack_int lda , lapack_int k1 , lapack_int k2 , const lapack_int * ipiv ,
lapack_int incx );
lapack_int LAPACKE_zlaswp (int matrix_layout , lapack_int n , lapack_complex_double *
a , lapack_int lda , lapack_int k1 , lapack_int k2 , const lapack_int * ipiv ,
lapack_int incx );
Include Files
• mkl.h
Description
The routine performs a series of row interchanges on the matrix A. One row interchange is initiated for each
of rows k1 through k2 of A.
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
a Array, size max(1, lda*n) for column major and max(1, lda*mm) for row
major layout. Here mm is not less than maximum of values
ipiv[k1-1+j*|incx|], 0≤j<k2-k1.
Array a contains the m-by-n matrix A.
k1 The first element of ipiv for which a row interchange will be done.
k2 The last element of ipiv for which a row interchange will be done.
1250
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
incx The increment between successive values of ipiv. If ipiv is negative, the
pivots are applied in reverse order.
Output Parameters
Return Values
This function returns a value info.
If info = 0, the execution is successful.
?latm1
Computes the entries of a matrix as specified.
Syntax
void slatm1 (lapack_int *mode, *cond, lapack_int *irsign, lapack_int *idist, lapack_int
*iseed, float *d, lapack_int *n, lapack_int *info);
void dlatm1 (lapack_int *mode, *cond, lapack_int *irsign, lapack_int *idist, lapack_int
*iseed, double *d, lapack_int *n, lapack_int *info);
void clatm1 (lapack_int *mode, *cond, lapack_int *irsign, lapack_int *idist, lapack_int
*iseed, lapack_complex *d, lapack_int *n, lapack_int *info);
void zlatm1 (lapack_int *mode, *cond, lapack_int *irsign, lapack_int *idist, lapack_int
*iseed, lapack_complex_double *d, lapack_int *n, lapack_int *info);
Include Files
• mkl.h
Description
The ?latm1 routine computes the entries of D(1..n) as specified by mode, cond and irsign. idist and
iseed determine the generation of random numbers.
?latm1 is called by slatmr (for slatm1 and dlatm1), and by clatmr(for clatm1 and zlatm1) to generate
random test matrices for LAPACK programs.
Input Parameters
1251
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
= 1: uniform (0,1)
= 2: uniform (-1,1)
= 3: normal (0,1)
For clatm1 and zlatm1:
d Array, size n.
n Number of entries of d.
Output Parameters
If info = -2, mode is neither -6, 0 nor 6, and irsign is neither 0 nor 1.
1252
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = -3, mode is neither -6, 0 nor 6 and cond is less than 1.
?latm2
Returns an entry of a random matrix.
Syntax
float slatm2 (lapack_int *m, lapack_int *n, lapack_int *i, lapack_int *j, lapack_int
*kl, lapack_int *ku, lapack_int *idist, lapack_int *iseed, float *d, lapack_int *igrade,
float *dl, float *dr, lapack_int *ipvtng, lapack_int *iwork, float *sparse);
double dlatm2 (lapack_int *m, lapack_int *n, lapack_int *i, lapack_int *j, lapack_int
*kl, lapack_int *ku, lapack_int *idist, lapack_int *iseed, double *d, lapack_int
*igrade, double *dl, double *dr, lapack_int *ipvtng, lapack_int *iwork, double *sparse);
The data types for complex variations depend on whether or not the application links with Gnu Fortran
(gfortran) libraries.
For non-gfortran (libmkl_intel_*) interface libraries:
void clatm2 (lapack_complex_float *res, lapack_int *m, lapack_int *n, lapack_int *i,
lapack_int *j, lapack_int *kl, lapack_int *ku, lapack_int *idist, lapack_int *iseed,
lapack_complex_float *d, lapack_int *igrade, lapack_complex_float *dl,
lapack_complex_float *dr, lapack_int *ipvtng, lapack_int *iwork, float *sparse);
void zlatm2 (lapack_complex_double *res, lapack_int *m, lapack_int *n, lapack_int *i,
lapack_int *j, lapack_int *kl, lapack_int *ku, lapack_int *idist, lapack_int *iseed,
lapack_complex_double *d, lapack_int *igrade, lapack_complex_double *dl,
lapack_complex_double *dr, lapack_int *ipvtng, lapack_int *iwork, double *sparse);
For gfortran (libmkl_gf_*) interface libraries:
lapack_complex_float clatm2 (lapack_int *m, lapack_int *n, lapack_int *i, lapack_int
*j, lapack_int *kl, lapack_int *ku, lapack_int *idist, lapack_int *iseed,
lapack_complex_float *d, lapack_int *igrade, lapack_complex_float *dl,
lapack_complex_float *dr, lapack_int *ipvtng, lapack_int *iwork, float *sparse);
lapack_complex_double zlatm2 (lapack_int *m, lapack_int *n, lapack_int *i, lapack_int
*j, lapack_int *kl, lapack_int *ku, lapack_int *idist, lapack_int *iseed,
lapack_complex_double *d, lapack_int *igrade, lapack_complex_double *dl,
lapack_complex_double *dr, lapack_int *ipvtng, lapack_int *iwork, double *sparse);
To understand the difference between the non-gfortran and gfortran interfaces and when to use each of
them, see Dynamic Libraries in the lib/intel64 Directory in the oneAPI Math Kernel Library Developer Guide.
Include Files
• mkl.h
Description
The ?latm2 routine returns entry (i , j ) of a random matrix of dimension (m, n). It is called by the ?latmr
routine in order to build random test matrices. No error checking on parameters is done, because this routine
is called in a tight loop by ?latmr which has already checked the parameters.
1253
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Use of ?latm2 differs from ?latm3 in the order in which the random number generator is called to fill in
random matrix entries. With ?latm2, the generator is called to fill in the pivoted matrix columnwise.
With ?latm2, the generator is called to fill in the matrix columnwise, after which it is pivoted. Thus, ?latm3
can be used to construct random matrices which differ only in their order of rows and/or columns. ?latm2 is
used to construct band matrices while avoiding calling the random number generator for entries outside the
band (and therefore generating random numbers).
The matrix whose (i , j ) entry is returned is constructed as follows (this routine only computes one entry):
• If i is outside (1..m) or j is outside (1..n), returns zero (this is convenient for generating matrices in
band format).
• Generate a matrix A with random entries of distribution idist.
• Set the diagonal to D.
• Grade the matrix, if desired, from the left (by dl) and/or from the right (by dr or dl) as specified by
igrade.
• Permute, if desired, the rows and/or columns as specified by ipvtng and iwork.
• Band the matrix to have lower bandwidth kl and upper bandwidth ku.
• Set random entries to zero as specified by sparse.
Input Parameters
kl Lower bandwidth.
ku Upper bandwidth.
= 1: uniform (0,1)
= 2: uniform (-1,1)
= 3: normal (0,1)
for clatm2 and zlatm2:
1254
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
= 2: matrix postmultiplied by diag( dr )
iwork Array, size (i or j), as appropriate. This array specifies the permutation
used. The row (or column) in position k was originally in position iwork[k
- 1]. This differs from iwork for ?latm3.
Output Parameters
Return Values
The function returns an entry of a random matrix (for complex variations libmkl_gf_* interface layer/
libraries return the result as the parameter res).
?latm3
Returns set entry of a random matrix.
1255
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
float slatm3 (lapack_int *m, lapack_int *n, lapack_int *i, lapack_int *j, lapack_int
*isub, lapack_int *jsub, lapack_int *kl, lapack_int *ku, lapack_int *idist, lapack_int
*iseed, float *d, lapack_int *igrade, float *dl, float *dr, lapack_int *ipvtng,
lapack_int *iwork, float *sparse);
double dlatm3 (lapack_int *m, lapack_int *n, lapack_int *i, lapack_int *j, lapack_int
*isub, lapack_int *jsub, lapack_int *kl, lapack_int *ku, lapack_int *idist, lapack_int
*iseed, double *d, lapack_int *igrade, double *dl, double *dr, lapack_int *ipvtng,
lapack_int *iwork, double *sparse);
The data types for complex variations depend on whether or not the application links with Gnu Fortran
(gfortran) libraries.
For non-gfortran (libmkl_intel_*) interface libraries:
void clatm3 (lapack_complex_float *res, lapack_int *m, lapack_int *n, lapack_int *i,
lapack_int *j, lapack_int *isub, lapack_int *jsub, lapack_int *kl, lapack_int *ku,
lapack_int *idist, lapack_int *iseed, lapack_complex_float *d, lapack_int *igrade,
lapack_complex_float *dl, lapack_complex_float *dr, lapack_int *ipvtng, lapack_int
*iwork, float *sparse);
void zlatm3 (lapack_complex_double *res, lapack_int *m, lapack_int *n, lapack_int *i,
lapack_int *j, lapack_int *isub, lapack_int *jsub, lapack_int *kl, lapack_int *ku,
lapack_int *idist, lapack_int *iseed, lapack_complex_double *d, lapack_int *igrade,
lapack_complex_double *dl, lapack_complex_double *dr, lapack_int *ipvtng, lapack_int
*iwork, double *sparse);
For gfortran (libmkl_gf_*) interface libraries:
lapack_complex_float clatm3 (lapack_int *m, lapack_int *n, lapack_int *i, lapack_int
*j, lapack_int *isub, lapack_int *jsub, lapack_int *kl, lapack_int *ku, lapack_int
*idist, lapack_int *iseed, lapack_complex_float *d, lapack_int *igrade,
lapack_complex_float *dl, lapack_complex_float *dr, lapack_int *ipvtng, lapack_int
*iwork, float *sparse);
lapack_complex_double zlatm3 (lapack_int *m, lapack_int *n, lapack_int *i, lapack_int
*j, lapack_int *isub, lapack_int *jsub, lapack_int *kl, lapack_int *ku, lapack_int
*idist, lapack_int *iseed, lapack_complex_double *d, lapack_int *igrade,
lapack_complex_double *dl, lapack_complex_double *dr, lapack_int *ipvtng, lapack_int
*iwork, double *sparse);
To understand the difference between the non-gfortran and gfortran interfaces and when to use each of
them, see Dynamic Libraries in the lib/intel64 Directory in the oneAPI Math Kernel Library Developer Guide.
Include Files
• mkl.h
Description
The ?latm3 routine returns the (isub, jsub) entry of a random matrix of dimension (m, n) described by the
other parameters. (isub, jsub) is the final position of the (i ,j ) entry after pivoting according to ipvtng and
iwork. ?latm3 is called by the ?latmr routine in order to build random test matrices. No error checking on
parameters is done, because this routine is called in a tight loop by ?latmr which has already checked the
parameters.
1256
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Use of ?latm3 differs from ?latm2 in the order in which the random number generator is called to fill in
random matrix entries. With ?latm2, the generator is called to fill in the pivoted matrix columnwise.
With ?latm3, the generator is called to fill in the matrix columnwise, after which it is pivoted. Thus, ?latm3
can be used to construct random matrices which differ only in their order of rows and/or columns. ?latm2 is
used to construct band matrices while avoiding calling the random number generator for entries outside the
band (and therefore generating random numbers in different orders for different pivot orders).
The matrix whose (isub, jsub ) entry is returned is constructed as follows (this routine only computes one
entry):
• If isub is outside (1..m) or jsub is outside (1..n), returns zero (this is convenient for generating
matrices in band format).
• Generate a matrix A with random entries of distribution idist.
• Set the diagonal to D.
• Grade the matrix, if desired, from the left (by dl) and/or from the right (by dr or dl) as specified by
igrade.
• Permute, if desired, the rows and/or columns as specified by ipvtng and iwork.
• Band the matrix to have lower bandwidth kl and upper bandwidth ku.
• Set random entries to zero as specified by sparse.
Input Parameters
kl Lower bandwidth.
ku Upper bandwidth.
= 1: uniform (0,1)
= 2: uniform (-1,1)
= 3: normal (0,1)
for clatm2 and zlatm2:
1257
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
Return Values
The function returns an entry of a random matrix (for complex variations libmkl_gf_* interface layer/
libraries return the result as the parameter res).
1258
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
?latm5
Generates matrices involved in the Generalized
Sylvester equation.
Syntax
void slatm5 (*prtype, lapack_int *m, lapack_int *n, float *a, lapack_int *lda, float *b,
lapack_int *ldb, float *c, lapack_int *ldc, float *d, lapack_int *ldd, float *e,
lapack_int *lde, float *f, lapack_int *ldf, float *r, lapack_int *ldr, float *l,
lapack_int *ldl, float *alpha, lapack_int *qblcka, lapack_int *qblckb);
void dlatm5 (*prtype, lapack_int *m, lapack_int *n, double *a, lapack_int *lda, double
*b, lapack_int *ldb, double *c, lapack_int *ldc, double *d, lapack_int *ldd, double *e,
lapack_int *lde, double *f, lapack_int *ldf, double *r, lapack_int *ldr, double *l,
lapack_int *ldl, double *alpha, lapack_int *qblcka, lapack_int *qblckb);
void clatm5 (*prtype, lapack_int *m, lapack_int *n, lapack_complex_float *a, lapack_int
*lda, lapack_complex_float *b, lapack_int *ldb, lapack_complex_float *c, lapack_int
*ldc, lapack_complex_float *d, lapack_int *ldd, lapack_complex_float *e, lapack_int
*lde, lapack_complex_float *f, lapack_int *ldf, lapack_complex_float *r, lapack_int
*ldr, lapack_complex_float *l, lapack_int *ldl, float *alpha, lapack_int *qblcka,
lapack_int *qblckb);
void zlatm5 (*prtype, lapack_int *m, lapack_int *n, lapack_complex_double *a,
lapack_int *lda, lapack_complex_double *b, lapack_int *ldb, lapack_complex_double *c,
lapack_int *ldc, lapack_complex_double *d, lapack_int *ldd, lapack_complex_double *e,
lapack_int *lde, lapack_complex_double *f, lapack_int *ldf, lapack_complex_double *r,
lapack_int *ldr, lapack_complex_double *l, lapack_int *ldl, float *alpha, lapack_int
*qblcka, lapack_int *qblckb);
Include Files
• mkl.h
Description
The ?latm5 routine generates matrices involved in the Generalized Sylvester equation:
A * R - L * B = C
D * R - L * E = F
They also satisfy the diagonalization condition:
Input Parameters
1259
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
A:
If (i == j) then Ai, j = 1.0.
B:
If (i == j) then Bi, j = 1.0 - alpha.
D:
If (i == j) then Di, j = 1.0.
E:
If (i == j) then Ei, j = 1.0
L = R are chosen from [-10...10], which specifies the right hand sides
(C, F).
• If prtype = 2 or 3: Triangular and/or quasi- triangular.
A:
If (i ≤ j) then Ai, j = [-1...1].
Ak + 1, k = [-1...1];
k = 1, m- 1, qblcka
B:
If (i ≤ j) then Bi, j = [-1...1].
Bk + 1, k = [-1...1]
k = 1, n - 1, qblckb.
D:
If (i ≤ j) then Di, j = [-1...1].
E:
If (i <= j) then Ei, j = [-1...1].
1260
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Otherwise Ei, j = 0.0, i, j = 1...N.
L, R are chosen from [-10...10], which specifies the right hand sides (C,
F).
• If prtype = 4 Full
Ai, j = [-10...10]
Bi, j = [-10...10]
Ri, j = [-10...10]
qblcka When prtype = 3, specifies the distance between 2-by-2 blocks on the
diagonal in A. Otherwise, qblcka is not referenced. qblcka > 1.
qblckb When prtype = 3, specifies the distance between 2-by-2 blocks on the
diagonal in B. Otherwise, qblckb is not referenced. qblckb > 1.
Output Parameters
1261
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
e Array, size lde*n. On exit e contains the n-by-n array E initialized according
to prtype.
?latm6
Generates test matrices for the generalized eigenvalue
problem, their corresponding right and left
eigenvector matrices, and also reciprocal condition
numbers for all eigenvalues and the reciprocal
condition numbers of eigenvectors corresponding to
the 1th and 5th eigenvalues.
Syntax
void slatm6 (lapack_int *type, lapack_int *n, float *a, lapack_int *lda, float *b, float
*x, lapack_int *ldx, float *y, lapack_int *ldy, float *alpha, float *beta, float *wx,
float *wy, float *s, float *dif);
void dlatm6 (lapack_int *type, lapack_int *n, double *a, lapack_int *lda, double *b,
double *x, lapack_int *ldx, double *y, lapack_int *ldy, double *alpha, double *beta,
double *wx, double *wy, double *s, double *dif);
void clatm6 (lapack_int *type, lapack_int *n, lapack_complex_float *a, lapack_int *lda,
lapack_complex_float *b, lapack_complex_float *x, lapack_int *ldx, lapack_complex_float
*y, lapack_int *ldy, lapack_complex_float *alpha, lapack_complex_float *beta,
lapack_complex_float *wx, lapack_complex_float *wy, float *s, float *dif);
void zlatm6 (lapack_int *type, lapack_int *n, lapack_complex_double *a, lapack_int
*lda, lapack_complex_double *b, lapack_complex_double *x, lapack_int *ldx,
lapack_complex_double *y, lapack_int *ldy, lapack_complex_double *alpha,
lapack_complex_double *beta, lapack_complex_double *wx, lapack_complex_double *wy,
double *s, double *dif);
Include Files
• mkl.h
Description
The ?latm6 routine generates test matrices for the generalized eigenvalue problem, their corresponding right
and left eigenvector matrices, and also reciprocal condition numbers for all eigenvalues and the reciprocal
condition numbers of eigenvectors corresponding to the 1th and 5th eigenvalues.
There two kinds of test matrix pairs:
(A, B)= inverse(YH) * (Da, Db) * inverse(X)
1262
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Type 1:
Type 2:
In both cases the same inverse(YH) and inverse(X) are used to compute (A, B), giving the exact eigenvectors
to (A,B) as (YH, X):
,
where a, b, x and y will have all values independently of each other.
Input Parameters
Output Parameters
1263
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
s Array, size (n). s[i - 1] is the reciprocal condition number for eigenvalue
i.
?latme
Generates random non-symmetric square matrices
with specified eigenvalues.
Syntax
void slatme (lapack_int *n, char *dist, lapack_int *iseed, float *d, lapack_int *mode,
float *cond, float *dmax, char *ei, char *rsign, char *upper, char *sim, float *ds,
lapack_int *modes, float *conds, lapack_int *kl, lapack_int *ku, float *anorm, float *a,
lapack_int *lda, float *work, lapack_int *info);void dlatme (lapack_int *n, char *dist,
lapack_int *iseed, double *d, lapack_int *mode, double *cond, double *dmax, char *ei,
char *rsign, char *upper, char *sim, double *ds, lapack_int *modes, double *conds,
lapack_int *kl, lapack_int *ku, double *anorm, double *a, lapack_int *lda, double *work,
lapack_int *info);void clatme (lapack_int *n, char *dist, lapack_int *iseed,
lapack_complex_float *d, lapack_int *mode, float *cond, lapack_complex_float *dmax,
char *ei, char *rsign, char *upper, char *sim, float *ds, lapack_int *modes, float
*conds, lapack_int *kl, lapack_int *ku, float *anorm, lapack_complex_float *a,
lapack_int *lda, lapack_complex_float *work, lapack_int *info);void zlatme (lapack_int
*n, char *dist, lapack_int *iseed, lapack_complex_double *d, lapack_int *mode, double
*cond, lapack_complex_double *dmax, char *ei, char *rsign, char *upper, char *sim,
double *ds, lapack_int *modes, double *conds, lapack_int *kl, lapack_int *ku, double
*anorm, lapack_complex_double *a, lapack_int *lda, lapack_complex_double *work,
lapack_int *info);
Include Files
• mkl.h
Description
The ?latme routine generates random non-symmetric square matrices with specified eigenvalues. ?latme
operates by applying the following sequence of operations:
1. Set the diagonal to d, where d may be input or computed according to mode, cond, dmax, and rsign as
described below.
2. If upper = 'T', the upper triangle of a is set to random values out of distribution dist.
3. If sim='T', a is multiplied on the left by a random matrix X, whose singular values are specified by ds,
modes, and conds, and on the right by X inverse.
4. If kl < n-1, the lower bandwidth is reduced to kl using Householder transformations. If ku < n-1,
the upper bandwidth is reduced to ku.
1264
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
5. If anorm is not negative, the matrix is scaled to have maximum-element-norm anorm.
NOTE
Since the matrix cannot be reduced beyond Hessenberg form, no packing options are
available.
Input Parameters
dist On entry, dist specifies the type of distribution to be used to generate the
random eigen-/singular values, and on the upper triangle (see upper).
cond On entry, this is used as described under mode above. If used, it must be ≥
1.
1265
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If mode = 0, and ei[0]is not ' ' (space character), this array specifies
which elements of d (on input) are real eigenvalues and which are the real
and imaginary parts of a complex conjugate pair of eigenvalues. The
elements of ei may then only have the values 'R' and 'I'.
If mode is not 0, then ei is ignored. If mode is 0 and ei[0] = ' ', then
the eigenvalues will all be real.
rsign If mode is not 0, 6, or -6, and rsign = 'T', then the elements of d, as
computed according to mode and cond, are multiplied by a random sign (+1
or -1) for slatme and dlatme or by a complex number from the unit circle
|z| = 1 for clatme and zlatme.
If rsign = 'F', the elements of d are not multiplied. rsign may only have
the values 'T' or 'F'.
upper If upper = 'T', then the elements of a above the diagonal will be set to
random numbers out of dist.
If upper = 'F', they will not. upper may only have the values 'T' or 'F'.
ds This array is used to specify the singular values of X, in the same way that
d specifies the eigenvalues of a. If mode = 0, the ds contains the singular
values, which may not be zero.
modes Similar to mode, but for specifying the diagonal of S. modes = -6 and +6
are not allowed (since they would result in randomly ill-conditioned
eigenvalues.)
1266
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
kl This specifies the lower bandwidth of the matrix. kl = 1 specifies upper
Hessenberg form. If kl is at least n-1, then A will have full lower
bandwidth.
If ku and ku are both at least n-1, then a will be dense. Only one of ku and
kl may be less than n-1.
Output Parameters
If info = -6, cond is less than 1.0, and mode is not -6, 0, or 6 .
If info = -16, ku is less than 1, or kl and ku are both less than n-1.
1267
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?latmr
Generates random matrices of various types.
Syntax
void slatmr (lapack_int *m, lapack_int *n, char *dist, lapack_int *iseed, char *sym,
float *d, lapack_int *mode, float *cond, float *dmax, char *rsign, char *grade, float
*dl, lapack_int *model, float *condl, float *dr, lapack_int *moder, float *condr, char
*pivtng, lapack_int *ipivot, lapack_int *kl, lapack_int *ku, float *sparse, float
*anorm, char *pack, float *a, lapack_int *lda, lapack_int *iwork, lapack_int *info);
void dlatmr (lapack_int *m, lapack_int *n, char *dist, lapack_int *iseed, char *sym,
double *d, lapack_int *mode, double *cond, double *dmax, char *rsign, char *grade,
double *dl, lapack_int *model, double *condl, double *dr, lapack_int *moder, double
*condr, char *pivtng, lapack_int *ipivot, lapack_int *kl, lapack_int *ku, double
*sparse, double *anorm, char *pack, double *a, lapack_int *lda, lapack_int *iwork,
lapack_int *info);
void clatmr (lapack_int *m, lapack_int *n, char *dist, lapack_int *iseed, char *sym,
lapack_complex *d, lapack_int *mode, float *cond, lapack_complex *dmax, char *rsign,
char *grade, lapack_complex *dl, lapack_int *model, float *condl, lapack_complex *dr,
lapack_int *moder, float *condr, char *pivtng, lapack_int *ipivot, lapack_int *kl,
lapack_int *ku, float *sparse, float *anorm, char *pack, float *a, lapack_int *lda,
lapack_int *iwork, lapack_int *info);
void zlatmr (lapack_int *m, lapack_int *n, char *dist, lapack_int *iseed, char *sym,
lapack_complex_double *d, lapack_int *mode, float *cond, lapack_complex_double *dmax,
char *rsign, char *grade, lapack_complex_double *dl, lapack_int *model, float *condl,
lapack_complex_double *dr, lapack_int *moder, float *condr, char *pivtng, lapack_int
*ipivot, lapack_int *kl, lapack_int *ku, float *sparse, float *anorm, char *pack, float
*a, lapack_int *lda, lapack_int *iwork, lapack_int *info);
Description
1268
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
8. Pack the matrix if desired. See options specified by the pack parameter.
NOTE
If two calls to ?latmr differ only in the pack parameter, they generate mathematically equivalent
matrices. If two calls to ?latmr both have full bandwidth (kl = m-1 and ku = n-1), and differ only in
the pivtng and pack parameters, then the matrices generated differ only in the order of the rows and
columns, and otherwise contain the same data. This consistency cannot be and is not maintained with
less than full bandwidth.
Input Parameters
m Number of rows of A.
n Number of columns of A.
If dist = 'S', real and imaginary parts are independent uniform( -1, 1 ).
1269
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
cond On entry, used as described under mode above. If used, cond must be ≥ 1.
rsign If mode is not -6, 0, or 6, specifies the sign of the diagonal as follows:
For slatmr and dlatmr, if rsign = 'T', diagonal entries are multiplied 1
or -1 with a probability of 0.5.
For clatmr and zlatmr, if rsign = 'T', diagonal entries are multiplied by
a random complex number uniformly distributed with absolute value 1.
If rsign = 'F', diagonal entries are unchanged.
NOTE
if grade = 'E', then m must equal n.
1270
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Not referenced if grade = 'N' or 'R'. Changed on exit.
model This specifies how the diagonal array dl is computed, just as mode specifies
how D is computed.
condl When model is not zero, this specifies the condition number of the
computed dl.
moder This specifies how the diagonal array dr is to be computed, just as mode
specifies how d is to be computed.
condr When moder is not zero, this specifies the condition number of the
computed dr.
If pivtng = 'B' or 'F': both or full pivoting, i.e., on both sides. In this
case, m must equal n.
ipivot Array, size (n or m) This array specifies the permutation used. After the
basic matrix is generated, the rows, columns, or both are permuted.
If row pivoting is selected, ?latmr starts with the last row and interchanges
row m and row ipivot[m - 1], then moves to the next-to-last row,
interchanging rows [m - 2] and row ipivot[m - 2], and so on. In terms
of "2-cycles", the permutation is (1 ipivot[0]) (2 ipivot[1]) ...
(mipivot[m - 1]) where the rightmost cycle is applied first. This is the
inverse of the effect of pivoting in LINPACK. The idea is that factoring (with
pivoting) an identity matrix which has been inverse-pivoted in this way
should result in a pivot vector identical to ipivot. Not referenced if pivtng
= 'N'.
1271
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If pack = 'U': zero out all subdiagonal entries (if symmetric or Hermitian)
GB 'Z'
PB, HB or TB 'B' or 'Q'
PP, HP or TP 'C' or 'R'
If two calls to ?latmr differ only in the pack parameter, they generate
mathematically equivalent matrices.
lda On entry, lda specifies the first dimension of a as declared in the calling
program.
If pack = 'N', 'U' or 'L', lda must be at least max( 1, m ).
1272
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If pack = 'Z', lda must be at least kuu + kll + 1, where kuu =
min( ku, n-1 ) and kll = min( kl, n-1 ).
iwork Array, size (n or m). Workspace. Not referenced if pivtng = 'N'. Changed
on exit.
Output Parameters
a On exit, a is the desired test matrix. Only those entries of a which are
significant on output is referenced (even if a is in packed or band
storage format). The unoccupied corners of a in band format are
zeroed out.
If info = -8, cond is less than 1.0, and mode is neither -6, 0 nor 6.
1273
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?lauum
Computes the product U*UT(U*UH) or LT*L (LH*L),
where U and L are upper or lower triangular matrices
(blocked algorithm).
Syntax
lapack_int LAPACKE_slauum (int matrix_layout , char uplo , lapack_int n , float * a ,
lapack_int lda );
lapack_int LAPACKE_dlauum (int matrix_layout , char uplo , lapack_int n , double * a ,
lapack_int lda );
lapack_int LAPACKE_clauum (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * a , lapack_int lda );
lapack_int LAPACKE_zlauum (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * a , lapack_int lda );
Include Files
• mkl.h
Description
The routine ?lauum computes the product U*UT or LT*L for real flavors, and U*UH or LH*L for complex
flavors. Here the triangular factor U or L is stored in the upper or lower triangular part of the array a.
If uplo = 'U' or 'u', then the upper triangle of the result is stored, overwriting the factor U in A.
If uplo = 'L' or 'l', then the lower triangle of the result is stored, overwriting the factor L in A.
This is the blocked form of the algorithm, calling BLAS Level 3 Routines.
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
1274
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
uplo Specifies whether the triangular factor stored in the array a is upper or
lower triangular:
= 'U': Upper triangular
Output Parameters
a On exit,
if uplo = 'U', then the upper triangle of a is overwritten with the upper
triangle of the product U*UT(U*UH);
if uplo = 'L', then the lower triangle of a is overwritten with the lower
triangle of the product LT*L (LH*L).
Return Values
This function returns a value info.
If info = 0, the execution is successful.
?syswapr
Applies an elementary permutation on the rows and
columns of a symmetric matrix.
Syntax
lapack_int LAPACKE_ssyswapr (int matrix_layout , char uplo , lapack_int n , float * a ,
lapack_int i1 , lapack_int i2 );
lapack_int LAPACKE_dsyswapr (int matrix_layout , char uplo , lapack_int n , double *
a , lapack_int i1 , lapack_int i2 );
lapack_int LAPACKE_csyswapr (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * a , lapack_int i1 , lapack_int i2 );
lapack_int LAPACKE_zsyswapr (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * a , lapack_int i1 , lapack_int i2 );
Include Files
• mkl.h
Description
The routine applies an elementary permutation on the rows and columns of a symmetric matrix.
1275
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
If uplo = 'L', the array a stores the lower triangular factor L of the
factorization A = L*D*LT.
The array a contains the block diagonal matrix D and the multipliers used to
obtain the factor U or L as computed by ?sytrf.
Output Parameters
If info = 'U', the upper triangular part of the inverse is formed and the part of
A below the diagonal is not referenced.
If info = 'L', the lower triangular part of the inverse is formed and the part of
A above the diagonal is not referenced.
Return Values
This function returns a value info.
See Also
?sytrf
?heswapr
Applies an elementary permutation on the rows and
columns of a Hermitian matrix.
Syntax
lapack_int LAPACKE_cheswapr (int matrix_layout, char uplo, lapack_int n,
lapack_complex_float* a, lapack_int i1, lapack_int i2);
1276
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_zheswapr (int matrix_layout, char uplo, lapack_int n,
lapack_complex_double* a, lapack_int i1, lapack_int i2);
Include Files
• mkl.h
Description
The routine applies an elementary permutation on the rows and columns of a Hermitian matrix.
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
If uplo = 'L', the array a stores the lower triangular factor L of the
factorization A = L*D*LH.
The array a contains the block diagonal matrix D and the multipliers used to
obtain the factor U or L as computed by ?hetrf.
Output Parameters
If info = 'U', the upper triangular part of the inverse is formed and the part of
A below the diagonal is not referenced.
If info = 'L', the lower triangular part of the inverse is formed and the part of
A above the diagonal is not referenced.
Return Values
This function returns a value info.
1277
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
See Also
?hetrf
?sfrk
Performs a symmetric rank-k operation for matrix in
RFP format.
Syntax
lapack_int LAPACKE_ssfrk (int matrix_layout , char transr , char uplo , char trans ,
lapack_int n , lapack_int k , float alpha , const float * a , lapack_int lda , float
beta , float * c );
lapack_int LAPACKE_dsfrk (int matrix_layout , char transr , char uplo , char trans ,
lapack_int n , lapack_int k , double alpha , const double * a , lapack_int lda , double
beta , double * c );
Include Files
• mkl.h
Description
The ?sfrk routines perform a matrix-matrix operation using symmetric matrices. The operation is defined as
C := alpha*A*AT + beta*C,
or
C := alpha*AT*A + beta*C,
where:
alpha and beta are scalars,
C is an n-by-n symmetric matrix in rectangular full packed (RFP) format,
A is an n-by-k matrix in the first case and a k-by-n matrix in the second case.
Input Parameters
uplo Specifies whether the upper or lower triangular part of the array c is used.
If uplo = 'U' or 'u', then the upper triangular part of the array c is used.
If uplo = 'L' or 'l', then the low triangular part of the array c is used.
n Specifies the order of the matrix C. The value of n must be at least zero.
1278
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
k On entry with trans = 'N' or 'n', k specifies the number of columns of
the matrix A, and on entry with trans = 'T' or 't', k specifies the
number of rows of the matrix A.
The value of k must be at least zero.
Col_major Row_major
k n
trans = 'N'
n k
trans = 'T'
Before entry with trans = 'N' or 'n', the leading n-by-k part of the array
a must contain the matrix A, otherwise the leading k-by-n part of the array
a must contain the matrix A.
Output Parameters
Return Values
This function returns a value info.
If info = 0, the execution is successful.
?hfrk
Performs a Hermitian rank-k operation for matrix in
RFP format.
Syntax
lapack_int LAPACKE_chfrk( int matrix_layout, char transr, char uplo, char trans,
lapack_int n, lapack_int k, float alpha, const lapack_complex_float* a, lapack_int lda,
float beta, lapack_complex_float* c );
1279
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
lapack_int LAPACKE_zhfrk( int matrix_layout, char transr, char uplo, char trans,
lapack_int n, lapack_int k, double alpha, const lapack_complex_double* a, lapack_int
lda, double beta, lapack_complex_double* c );
Include Files
• mkl.h
Description
The ?hfrk routines perform a matrix-matrix operation using Hermitian matrices. The operation is defined as
C := alpha*A*AH + beta*C,
or
C := alpha*AH*A + beta*C,
where:
alpha and beta are real scalars,
C is an n-by-n Hermitian matrix in RFP format,
A is an n-by-k matrix in the first case and a k-by-n matrix in the second case.
Input Parameters
uplo Specifies whether the upper or lower triangular part of the array c is used.
If uplo = 'U' or 'u', then the upper triangular part of the array c is used.
If uplo = 'L' or 'l', then the low triangular part of the array c is used.
n Specifies the order of the matrix C. The value of n must be at least zero.
Col_major Row_major
k n
trans = 'N'
1280
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
n k
trans = 'T'
Before entry with trans = 'N' or 'n', the leading n-by-k part of the array
a must contain the matrix A, otherwise the leading k-by-n part of the array
a must contain the matrix A.
Output Parameters
Return Values
This function returns a value info.
If info = 0, the execution is successful.
?tfsm
Solves a matrix equation (one operand is a triangular
matrix in RFP format).
Syntax
lapack_int LAPACKE_stfsm (int matrix_layout , char transr , char side , char uplo ,
char trans , char diag , lapack_int m , lapack_int n , float alpha , const float * a ,
float * b , lapack_int ldb );
lapack_int LAPACKE_dtfsm (int matrix_layout , char transr , char side , char uplo ,
char trans , char diag , lapack_int m , lapack_int n , double alpha , const double * a ,
double * b , lapack_int ldb );
lapack_int LAPACKE_ctfsm (int matrix_layout , char transr , char side , char uplo ,
char trans , char diag , lapack_int m , lapack_int n , lapack_complex_float alpha ,
const lapack_complex_float * a , lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_ztfsm (int matrix_layout , char transr , char side , char uplo ,
char trans , char diag , lapack_int m , lapack_int n , lapack_complex_double alpha ,
const lapack_complex_double * a , lapack_complex_double * b , lapack_int ldb );
1281
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
op(A)*X = alpha*B,
or
X*op(A) = alpha*B,
where:
alpha is a scalar,
X and B are m-by-n matrices,
A is a unit, or non-unit, upper or lower triangular matrix in rectangular full packed (RFP) format.
op(A) can be one of the following:
• op(A) = A or op(A) = AT for real flavors
• op(A) = A or op(A) = AH for complex flavors
The matrix B is overwritten by the solution matrix X.
Input Parameters
side Specifies whether op(A) appears on the left or right of X in the equation:
1282
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
m Specifies the number of rows of B. The value of m must be at least zero.
b Array, size max(1, ldb*n) for column major and max(1, ldb*m) for row
major.
Before entry, the leading m-by-n part of the array b must contain the right-
hand side matrix B.
Output Parameters
Return Values
This function returns a value info.
If info = 0, the execution is successful.
?tfttp
Copies a triangular matrix from the rectangular full
packed format (TF) to the standard packed format
(TP) .
Syntax
lapack_int LAPACKE_stfttp (int matrix_layout , char transr , char uplo , lapack_int n ,
const float * arf , float * ap );
lapack_int LAPACKE_dtfttp (int matrix_layout , char transr , char uplo , lapack_int n ,
const double * arf , double * ap );
lapack_int LAPACKE_ctfttp (int matrix_layout , char transr , char uplo , lapack_int n ,
const lapack_complex_float * arf , lapack_complex_float * ap );
lapack_int LAPACKE_ztfttp (int matrix_layout , char transr , char uplo , lapack_int n ,
const lapack_complex_double * arf , lapack_complex_double * ap );
Include Files
• mkl.h
Description
1283
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The routine copies a triangular matrix A from the Rectangular Full Packed (RFP) format to the standard
packed format. For the description of the RFP format, see Matrix Storage Schemes.
Input Parameters
On entry, the upper or lower triangular matrix A stored in the RFP format.
Output Parameters
Return Values
This function returns a value info.
If info = 0, the execution is successful.
?tfttr
Copies a triangular matrix from the rectangular full
packed format (TF) to the standard full format (TR) .
Syntax
lapack_int LAPACKE_stfttr (int matrix_layout , char transr , char uplo , lapack_int n ,
const float * arf , float * a , lapack_int lda );
lapack_int LAPACKE_dtfttr (int matrix_layout , char transr , char uplo , lapack_int n ,
const double * arf , double * a , lapack_int lda );
lapack_int LAPACKE_ctfttr (int matrix_layout , char transr , char uplo , lapack_int n ,
const lapack_complex_float * arf , lapack_complex_float * a , lapack_int lda );
lapack_int LAPACKE_ztfttr (int matrix_layout , char transr , char uplo , lapack_int n ,
const lapack_complex_double * arf , lapack_complex_double * a , lapack_int lda );
1284
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl.h
Description
The routine copies a triangular matrix A from the Rectangular Full Packed (RFP) format to the standard full
format. For the description of the RFP format, see Matrix Storage Schemes.
Input Parameters
Output Parameters
On exit, the triangular matrix A. If uplo = 'U', the leading n-by-n upper
triangular part of the array a contains the upper triangular matrix, and the
strictly lower triangular part of a is not referenced. If uplo = 'L', the leading
n-by-n lower triangular part of the array a contains the lower triangular
matrix, and the strictly upper triangular part of a is not referenced.
Return Values
This function returns a value info.
If info = 0, the execution is successful.
1285
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?tpqrt2
Computes a QR factorization of a real or complex
"triangular-pentagonal" matrix, which is composed of
a triangular block and a pentagonal block, using the
compact WY representation for Q.
Syntax
lapack_int LAPACKE_stpqrt2 (int matrix_layout, lapack_int m, lapack_int n, lapack_int
l, float * a, lapack_int lda, float * b, lapack_int ldb, float * t, lapack_int ldt);
lapack_int LAPACKE_dtpqrt2 (int matrix_layout, lapack_int m, lapack_int n, lapack_int
l, double * a, lapack_int lda, double * b, lapack_int ldb, double * t, lapack_int ldt);
lapack_int LAPACKE_ctpqrt2 (int matrix_layout, lapack_int m, lapack_int n, lapack_int
l, lapack_complex_float * a, lapack_int lda, lapack_complex_float * b, lapack_int ldb,
lapack_complex_float * t, lapack_int ldt );
lapack_int LAPACKE_ztpqrt2 (int matrix_layout, lapack_int m, lapack_int n, lapack_int
l, lapack_complex_double * a, lapack_int lda, lapack_complex_double * b, lapack_int
ldb, lapack_complex_double * t, lapack_int ldt );
Include Files
• mkl.h
Description
where A is an n-by-n upper triangular matrix, and B is an m-by-n pentagonal matrix consisting of an (m-l)-
by-n rectangular matrix B1 on top of an l-by-n upper trapezoidal matrix B2:
The upper trapezoidal matrix B2 consists of the first l rows of an n-by-n upper triangular matrix, where 0 ≤
l ≤ min(m,n). If l=0, B is an m-by-n rectangular matrix. If m=l=n, B is upper triangular. The matrix W
contains the elementary reflectors H(i) in the ith column below the diagonal (of A) in the (n+m)-by-n input
matrix C so that W can be represented as
1286
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Thus, V contains all of the information needed for W, and is returned in array b.
NOTE
V has the same form as B:
Input Parameters
a, b Arrays: a, size max(1, lda *n) contains the n-by-n upper triangular matrix
A.
b, size max(1,ldb* n) for column major and max(1,ldb*m) for row major,
the pentagonal m-by-n matrix B. The first (m-l) rows contain the
rectangular B1 matrix, and the next l rows contain the upper trapezoidal
B2 matrix.
ldb The leading dimension of b; at least max(1, m) for column major and
max(1,n) for row major.
1287
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
a The elements on and above the diagonal of the array contain the upper
triangular matrix R.
Return Values
This function returns a value info.
If info = 0, the execution is successful.
If info < 0 and info = -i, the ith argument had an illegal value.
?tprfb
Applies a real or complex "triangular-pentagonal"
blocked reflector to a real or complex matrix, which is
composed of two blocks.
Syntax
lapack_int LAPACKE_stprfb (int matrix_layout, char side, char trans, char direct, char
storev, lapack_int m, lapack_int n, lapack_int k, lapack_int l, const float * v,
lapack_int ldv, const float * t, lapack_int ldt, float * a, lapack_int lda, float * b,
lapack_int ldb);
lapack_int LAPACKE_dtprfb (int matrix_layout, char side, char trans, char direct, char
storev, lapack_int m, lapack_int n, lapack_int k, lapack_int l, const double * v,
lapack_int ldv, const double * t, lapack_int ldt, double * a, lapack_int lda, double *
b, lapack_int ldb);
lapack_int LAPACKE_ctprfb (int matrix_layout, char side, char trans, char direct, char
storev, lapack_int m, lapack_int n, lapack_int k, lapack_int l, const
lapack_complex_float * v, lapack_int ldv, const lapack_complex_float * t, lapack_int
ldt, lapack_complex_float * a, lapack_int lda, lapack_complex_float * b, lapack_int
ldb);
lapack_int LAPACKE_ztprfb (int matrix_layout, char side, char trans, char direct, char
storev, lapack_int m, lapack_int n, lapack_int k, lapack_int l, const
lapack_complex_double * v, lapack_int ldv, const lapack_complex_double * t, lapack_int
ldt, lapack_complex_double * a, lapack_int lda, lapack_complex_double * b, lapack_int
ldb);
Include Files
• mkl.h
Description
1288
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The ?tprfb routine applies a real or complex "triangular-pentagonal" block reflector H, HT, or HH from either
the left or the right to a real or complex matrix C, which is composed of two blocks A and B.
The block B is m-by-n. If side = 'R', A is m-by-k, and if side = 'L', A is of size k-by-n.
The pentagonal matrix V is composed of a rectangular block V1 and a trapezoidal block V2. The size of the
trapezoidal block is determined by the parameter l, where 0≤l≤k. if l=k, the V2 block of V is triangular; if
l=0, there is no trapezoidal block, thus V = V1 is rectangular.
direct='F' direct='B'
storev='C'
V2 is upper trapezoidal (first l rows of k-by-k V2 is lower trapezoidal (last l rows of k-by-k
upper triangular) lower triangular matrix)
storev='R'
side='L' side='R'
storev='C'
V is m-by-k V is n-by-k
V2 is l-by-k V2 is l-by-k
storev='R'
V is k-by-m V is k-by-n
V2 is k-by-l V2 is k-by-l
1289
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
storev Indicates how the vectors that define the elementary reflectors are stored:
= 'C': Columns,
= 'R': Rows.
storev = C storev = R
ldv The leading dimension of the array v.It should satisfy the following
conditions:
storev = C storev = R
1290
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
t Array size max(1,ldt * k). The triangular k-by-k matrix T in the
representation of the block reflector.
side = L side = R
Column major
max(1,lda*n) max(1,lda*k)
Row major
max(1,lda*k) max(1,lda*m)
lda The leading dimension of the array a should satisfy the following conditions:
side = L side = R
Column major
max(1,k) max(1,m)
Row major
max(1,n) max(1,k)
b Array size at least max(1, ldb *n) for column major layout and max(1, ldb
*m) for row major layout, the m-by-n matrix B.
ldb The leading dimension of the array b (ldb ≥ max(1, m) for column major
layout and ldb ≥ max(1, n) for row major layout).
Output Parameters
a Contains the corresponding block of H*C, HT*C, HH*C, C*H, C*HT, or C*HH.
b Contains the corresponding block of H*C, HT*C, HH*C, C*H, C*HT, or C*HH.
Return Values
This function returns a value info.
If info = 0, the execution is successful.
?tpttf
Copies a triangular matrix from the standard packed
format (TP) to the rectangular full packed format (TF).
Syntax
lapack_int LAPACKE_stpttf (int matrix_layout , char transr , char uplo , lapack_int n ,
const float * ap , float * arf );
lapack_int LAPACKE_dtpttf (int matrix_layout , char transr , char uplo , lapack_int n ,
const double * ap , double * arf );
lapack_int LAPACKE_ctpttf (int matrix_layout , char transr , char uplo , lapack_int n ,
const lapack_complex_float * ap , lapack_complex_float * arf );
1291
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl.h
Description
The routine copies a triangular matrix A from the standard packed format to the Rectangular Full Packed
(RFP) format. For the description of the RFP format, see Matrix Storage Schemes.
Input Parameters
= 'T': arf must be in the Transpose format (for stpttf and dtpttf),
Output Parameters
Return Values
This function returns a value info.
If info = 0, the execution is successful.
?tpttr
Copies a triangular matrix from the standard packed
format (TP) to the standard full format (TR) .
1292
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
lapack_int LAPACKE_stpttr (int matrix_layout , char uplo , lapack_int n , const float *
ap , float * a , lapack_int lda );
lapack_int LAPACKE_dtpttr (int matrix_layout , char uplo , lapack_int n , const double
* ap , double * a , lapack_int lda );
lapack_int LAPACKE_ctpttr (int matrix_layout , char uplo , lapack_int n , const
lapack_complex_float * ap , lapack_complex_float * a , lapack_int lda );
lapack_int LAPACKE_ztpttr (int matrix_layout , char uplo , lapack_int n , const
lapack_complex_double * ap , lapack_complex_double * a , lapack_int lda );
Include Files
• mkl.h
Description
The routine copies a triangular matrix A from the standard packed format to the standard full format.
Input Parameters
ap Array, size at least max (1, n*(n+1)/2). (see Matrix Storage Schemes).
Output Parameters
Return Values
This function returns a value info.
If info = 0, the execution is successful.
1293
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?trttf
Copies a triangular matrix from the standard full
format (TR) to the rectangular full packed format (TF).
Syntax
lapack_int LAPACKE_strttf (int matrix_layout , char transr , char uplo , lapack_int n ,
const float * a , lapack_int lda , float * arf );
lapack_int LAPACKE_dtrttf (int matrix_layout , char transr , char uplo , lapack_int n ,
const double * a , lapack_int lda , double * arf );
lapack_int LAPACKE_ctrttf (int matrix_layout , char transr , char uplo , lapack_int n ,
const lapack_complex_float * a , lapack_int lda , lapack_complex_float * arf );
lapack_int LAPACKE_ztrttf (int matrix_layout , char transr , char uplo , lapack_int n ,
const lapack_complex_double * a , lapack_int lda , lapack_complex_double * arf );
Include Files
• mkl.h
Description
The routine copies a triangular matrix A from the standard full format to the Rectangular Full Packed (RFP)
format. For the description of the RFP format, see Matrix Storage Schemes.
Input Parameters
= 'T': arf must be in the Transpose format (for strttf and dtrttf),
1294
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
Return Values
This function returns a value info.
If info = 0, the execution is successful.
?trttp
Copies a triangular matrix from the standard full
format (TR) to the standard packed format (TP) .
Syntax
lapack_int LAPACKE_strttp (int matrix_layout , char uplo , lapack_int n , const float *
a , lapack_int lda , float * ap );
lapack_int LAPACKE_dtrttp (int matrix_layout , char uplo , lapack_int n , const double
* a , lapack_int lda , double * ap );
lapack_int LAPACKE_ctrttp (int matrix_layout , char uplo , lapack_int n , const
lapack_complex_float * a , lapack_int lda , lapack_complex_float * ap );
lapack_int LAPACKE_ztrttp (int matrix_layout , char uplo , lapack_int n , const
lapack_complex_double * a , lapack_int lda , lapack_complex_double * ap );
Include Files
• mkl.h
Description
The routine copies a triangular matrix A from the standard full format to the standard packed format.
Input Parameters
1295
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
Return Values
This function returns a value info.
If info = 0, the execution is successful.
?lacp2
Copies all or part of a real two-dimensional array to a
complex array.
Syntax
lapack_int LAPACKE_clacp2 (int matrix_layout , char uplo , lapack_int m , lapack_int
n , const float * a , lapack_int lda , lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zlacp2 (int matrix_layout , char uplo , lapack_int m , lapack_int
n , const double * a , lapack_int lda , lapack_complex_double * b , lapack_int ldb );
Include Files
• mkl.h
Description
Input Parameters
1296
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If uplo = 'U', only the upper triangle or trapezoid is accessed; if uplo =
'L', only the lower triangle or trapezoid is accessed.
lda The leading dimension of a; lda≥ max(1, m) for column major and lda≥
max(1, n) for row major.
ldb The leading dimension of the output array b; ldb≥ max(1, m) for column
major and ldb≥ max(1, n) for row major.
Output Parameters
Return Values
This function returns a value info.
If info = 0, the execution is successful.
?larcm
Multiplies a square real matrix by a complex matrix.
Syntax
lapack_int LAPACKE_clarcm(int matrix_layout,lapack_int m,lapack_int n,const float
*a,lapack_int lda,const lapack_complex_float * b,lapack_int ldb,lapack_complex_float *
c,lapack_int ldc);
lapack_int LAPACKE_zlarcm(int matrix_layout,lapack_int m,lapack_int n,const double *
a,lapack_int lda,const lapack_complex_double *b,lapack_int ldb,lapack_complex_double
*c ,lapack_int ldc);
Description
Input Parameters
m The number of rows and columns of matrix A and the number of rows of
matrix C (m≥ 0).
1297
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
Return Values
This function returns a value info. If info = 0, the execution is successful. If info = -i, parameter i had
an illegal value.
mkl_?tppack
Copies a triangular/symmetric matrix or submatrix
from standard full format to standard packed format.
Syntax
lapack_int LAPACKE_mkl_stppack (int matrix_layout, char uplo, char trans, lapack_int n,
float* ap, lapack_int i, lapack_int j, lapack_int rows, lapack_int cols, const float* a,
lapack_int lda);
lapack_int LAPACKE_mkl_dtppack (int matrix_layout, char uplo, char trans, lapack_int n,
double* ap, lapack_int i, lapack_int j, lapack_int rows, lapack_int cols, const double*
a, lapack_int lda);
lapack_int LAPACKE_mkl_ctppack (int matrix_layout, char uplo, char trans, lapack_int n,
MKL_Complex8* ap, lapack_int i, lapack_int j, lapack_int rows, lapack_int cols, const
MKL_Complex8* a, lapack_int lda);
lapack_int LAPACKE_mkl_ztppack (int matrix_layout, char uplo, char trans, lapack_int n,
MKL_Complex16* ap, lapack_int i, lapack_int j, lapack_int rows, lapack_int cols, const
MKL_Complex16* a, lapack_int lda);
Include Files
• mkl.h
Description
The routine copies a triangular or symmetric matrix or its submatrix from standard full format to packed
format
1298
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• GE: general
• TR: triangular
• SY: symmetric indefinite
• HE: Hermitian indefinite
• PO: symmetric or Hermitian positive definite
NOTE
Any elements of the copied submatrix rectangular outside of the triangular part of the
matrix AP are skipped.
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
If trans = 'C',conjugate transpose: op(A) = AH. For real data this is the
same as trans = 'T'.
If uplo=’L’, 1 ≤j≤i≤n.
1299
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
If there are elements outside of the triangular part of AP, they
are skipped and are not copied from a.
Output Parameters
Return Values
This function returns a value info. If info=0, the execution is successful. If info = -i, the i-th parameter
had an illegal value.
mkl_?tpunpack
Copies a triangular/symmetric matrix or submatrix
from standard packed format to full format.
Syntax
lapack_int LAPACKE_mkl_stpunpack ( int matrix_layout, char uplo, char trans,
lapack_int n, const float* ap, lapack_int i, lapack_int j, lapack_int rows,
lapack_int cols, float* a, lapack_int lda );
lapack_int LAPACKE_mkl_dtpunpack ( int matrix_layout, char uplo, char trans,
lapack_int n, const double* ap, lapack_int i, lapack_int j, lapack_int rows,
lapack_int cols, double* a, lapack_int lda );
lapack_int LAPACKE_mkl_ctpunpack ( int matrix_layout, char uplo, char trans,
lapack_int n, const MKL_Complex8* ap, lapack_int i, lapack_int j, lapack_int rows,
lapack_int cols, MKL_Complex8* a, lapack_int lda );
lapack_int LAPACKE_mkl_ztpunpack ( int matrix_layout, char uplo, char trans,
lapack_int n, const MKL_Complex16* ap, lapack_int i, lapack_int j, lapack_int rows,
lapack_int cols, MKL_Complex16* a, lapack_int lda );
Include Files
• mkl.h
1300
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
The routine copies a triangular or symmetric matrix or its submatrix from standard packed format to full
format.
A := op(APi:i+rows-1, j:j+cols-1)
• GE: general
• TR: triangular
• SY: symmetric indefinite
• HE: Hermitian indefinite
• PO: symmetric or Hermitian positive definite
NOTE
Any elements of the copied submatrix rectangular outside of the triangular part of AP are
skipped.
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
1301
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If uplo=’L’, 1 ≤j≤i≤n.
Output Parameters
The size of a is
NOTE
If there are elements outside of the triangular part of ap
indicated by uplo, they are skipped and are not copied to
a.
Return Values
This function returns a value info. If info=0, the execution is successful. If info = -i, the i-th parameter
had an illegal value.
1302
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
LAPACK Utility Routines
Routine Name Data Description
Types
See Also
lsame Tests two characters for equality regardless of the case.
lsamen Tests two character strings for equality regardless of the case.
second/dsecnd Returns elapsed time in seconds. Use to estimate real time between two calls to
this function.
xerbla Error handling function called by BLAS, LAPACK, Vector Math, and Vector Statistics
functions.
ilaver
Returns the version of the LAPACK library.
Syntax
void LAPACKE_ilaver (lapack_int * vers_major, lapack_int * vers_minor, lapack_int *
vers_patch);
Include Files
• mkl.h
Description
This routine returns the version of the LAPACK library.
Output Parameters
vers_minor Returns the minor version from the major version of the LAPACK library.
vers_patch Returns the patch version from the minor version of the LAPACK library.
ilaenv
Environmental enquiry function that returns values for
tuning algorithmic performance.
Syntax
MKL_INT ilaenv (const MKL_INT *ispec, const char *name, const char *opts, const MKL_INT
*n1, const MKL_INT *n2, const MKL_INT *n3, const MKL_INT *n4);
Include Files
• mkl.h
1303
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Description
The enquiry function ilaenv is called from the LAPACK routines to choose problem-dependent parameters
for the local environment. See ispec below for a description of the parameters.
This version provides a set of parameters that should give good, but not optimal, performance on many of
the currently available computers.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
name The name of the calling subroutine, in either upper case or lower case.
opts The character options to the subroutine name, concatenated into a single
character string. For example, uplo = 'U', trans = 'T', and diag =
'N' for a triangular routine would be specified as opts = 'UTN'.
1304
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
NOTE
Use only uppercase characters for the opts string.
n1, n2, n3, n4 Problem dimensions for the subroutine name; these may not all be
required.
Output Parameters
Return Values
ilaenv returns value.
If value≥ 0: the value of the parameter specified by ispec;
Application Notes
The following conventions have been used when calling ilaenv from the LAPACK routines:
1. opts is a concatenation of all of the character options to subroutine name, in the same order that they
appear in the argument list for name, even if they are not used in determining the value of the
parameter specified by ispec.
2. The problem dimensions n1, n2, n3, n4 are specified in the order that they appear in the argument list
for name. n1 is used first, n2 second, and so on, and unused problem dimensions are passed a value of
-1.
3. The parameter value returned by ilaenv is checked for validity in the calling subroutine. For example,
ilaenv is used to retrieve the optimal blocksize for strtri as follows:
#include <stdio.h>
#include "mkl.h"
int main(void)
{
int size = 1000;
int ispec = 1;
int dummy = -1;
int blockSize1 = ilaenv(&ispec, "dsytrd", "U", &size, &dummy, &dummy, &dummy);
int blockSize2 = ilaenv(&ispec, "dormtr", "LUN", &size, &size, &dummy, &dummy);
printf("DSYTRD blocksize = %d\n", blockSize1);
printf("DORMTR blocksize = %d\n", blockSize2);
return 0;
}
See Also
?hseqr
1305
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?lamch
Determines machine parameters for floating-point
arithmetic.
Syntax
float LAPACKE_slamch (char cmach );
double LAPACKE_dlamch (char cmach );
Include Files
• mkl.h
Description
The function ?lamch determines single precision and double precision machine parameters.
Input Parameters
where
eps = relative machine precision;
sfmin = safe minimum, such that 1/sfmin does not overflow;
base = base of the machine;
prec = eps*base;
t = number of (base) digits in the mantissa;
rnd = 1.0 when rounding occurs in addition, 0.0 otherwise;
emin = minimum exponent before (gradual) underflow;
rmin = underflow_threshold - base**(emin-1);
emax = largest exponent before overflow;
rmax = overflow_threshold - (base**emax)*(1-eps).
1306
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
NOTE
You can use a character string for cmach instead of a single
character in order to make your code more readable. The first
character of the string determines the value to be returned. For
example, 'Precision' is interpreted as 'p'.
Output Parameters
?lagge
Generates a general m-by-n matrix .
Syntax
lapack_int LAPACKE_slagge (int matrix_layout , lapack_int m , lapack_int n , lapack_int
kl , lapack_int ku , const float * d , float * a , lapack_int lda , lapack_int *
iseed );
lapack_int LAPACKE_dlagge (int matrix_layout , lapack_int m , lapack_int n , lapack_int
kl , lapack_int ku , const double * d , double * a , lapack_int lda , lapack_int *
iseed );
lapack_int LAPACKE_clagge (int matrix_layout , lapack_int m , lapack_int n , lapack_int
kl , lapack_int ku , const float * d , lapack_complex_float * a , lapack_int lda ,
lapack_int * iseed );
lapack_int LAPACKE_zlagge (int matrix_layout , lapack_int m , lapack_int n , lapack_int
kl , lapack_int ku , const double * d , lapack_complex_double * a , lapack_int lda ,
lapack_int * iseed );
Include Files
• mkl.h
Description
The routine generates a general m-by-n matrix A, by pre- and post- multiplying a real diagonal matrix D with
random matrices U and V:
A := U*D*V,
where U and V are orthogonal for real flavors and unitary for complex flavors. The lower and upper
bandwidths may then be reduced to kl and ku by additional orthogonal transformations.
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
1307
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
d The array d with the dimension of (min(m, n)) contains the diagonal
elements of the diagonal matrix D.
lda The leading dimension of the array a (lda≥m) for column major layout and
(lda≥n) for row major layout.
iseed The array iseed with the dimension of 4 contains the seed of the random
number generator. The elements must be between 0 and 4095 and iseed
must be odd.
Output Parameters
a The array a with size at least max(1,lda*n) for column major layout and
max(1,lda*m) for row major layout contains the generated m-by-n matrix
A.
Return Values
This function returns a value info.
If info = 0, the execution is successful.
?laghe
Generates a complex Hermitian matrix .
Syntax
lapack_int LAPACKE_claghe (int matrix_layout , lapack_int n , lapack_int k , const
float * d , lapack_complex_float * a , lapack_int lda , lapack_int * iseed );
lapack_int LAPACKE_zlaghe (int matrix_layout , lapack_int n , lapack_int k , const
double * d , lapack_complex_double * a , lapack_int lda , lapack_int * iseed );
Include Files
• mkl.h
Description
The routine generates a complex Hermitian matrix A, by pre- and post- multiplying a real diagonal matrix D
with random unitary matrix:
A := U*D*UH
The semi-bandwidth may then be reduced to k by additional unitary transformations.
1308
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
d The array d with the dimension of (n) contains the diagonal elements of the
diagonal matrix D.
iseed The array iseed with the dimension of 4 contains the seed of the random
number generator. The elements must be between 0 and 4095 and
iseed[3] must be odd.
Output Parameters
a The array a of size at least max (1,lda*n) contains the generated n-by-n
Hermitian matrix D.
Return Values
This function returns a value info.
If info = 0, the execution is successful.
?lagsy
Generates a symmetric matrix by pre- and post-
multiplying a real diagonal matrix with a random
unitary matrix .
Syntax
lapack_int LAPACKE_slagsy (int matrix_layout , lapack_int n , lapack_int k , const
float * d , float * a , lapack_int lda , lapack_int * iseed );
lapack_int LAPACKE_dlagsy (int matrix_layout , lapack_int n , lapack_int k , const
double * d , double * a , lapack_int lda , lapack_int * iseed );
lapack_int LAPACKE_clagsy (int matrix_layout , lapack_int n , lapack_int k , const
float * d , lapack_complex_float * a , lapack_int lda , lapack_int * iseed );
lapack_int LAPACKE_zlagsy (int matrix_layout , lapack_int n , lapack_int k , const
double * d , lapack_complex_double * a , lapack_int lda , lapack_int * iseed );
Include Files
• mkl.h
1309
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Description
The ?lagsy routine generates a symmetric matrix A by pre- and post- multiplying a real diagonal matrix D
with a random matrix U:
A := U*D*UT,
where U is orthogonal for real flavors and unitary for complex flavors. The semi-bandwidth may then be
reduced to k by additional unitary transformations.
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
d The array d with the dimension of (n) contains the diagonal elements of the
diagonal matrix D.
iseed The array iseed with the dimension of 4 contains the seed of the random
number generator. The elements must be between 0 and 4095 and
iseed[3] must be odd.
Output Parameters
a The array aof size max (1,lda*n) contains the generated symmetric n-by-n
matrix D.
Return Values
This function returns a value info.
If info = 0, the execution is successful.
?latms
Generates a general m-by-n matrix with specific
singular values.
Syntax
lapack_int LAPACKE_slatms (int matrix_layout, lapack_int m, lapack_int n, char dist,
lapack_int * iseed, char sym, float * d, lapack_int mode, float cond, float dmax,
lapack_int kl, lapack_int ku, char pack, float * a, lapack_int lda);
lapack_int LAPACKE_dlatms (int matrix_layout, lapack_int m, lapack_int n, char dist,
lapack_int * iseed, char sym, double * d, lapack_int mode, double cond, double dmax,
lapack_int kl, lapack_int ku, char pack, double * a, lapack_int lda);
1310
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lapack_int LAPACKE_clatms (int matrix_layout, lapack_int m, lapack_int n, char dist,
lapack_int * iseed, char sym, float * d, lapack_int mode, float cond, float dmax,
lapack_int kl, lapack_int ku, char pack, lapack_complex_float * a, lapack_int lda);
lapack_int LAPACKE_zlatms (int matrix_layout, lapack_int m, lapack_int n, char dist,
lapack_int * iseed, char sym, double * d, lapack_int mode, double cond, double dmax,
lapack_int kl, lapack_int ku, char pack, lapack_complex_double * a, lapack_int lda);
Include Files
• mkl.h
Description
The ?latms routine generates random matrices with specified singular values, or symmetric/Hermitian
matrices with specified eigenvalues for testing LAPACK programs.
It applies this sequence of operations:
1. Set the diagonal to d, where d is input or computed according to mode, cond, dmax, and sym as
described in Input Parameters.
2. Generate a matrix with the appropriate band structure, by one of two methods:
Method A is chosen if the bandwidth is a large fraction of the order of the matrix, and lda is at least m (so a
dense matrix can be stored.) Method B is chosen if the bandwidth is small (less than (1/2)*n for symmetric
or Hermitian or less than .3*n+m for nonsymmetric), or lda is less than m and not less than the bandwidth.
Pack the matrix if desired, using one of the methods specified by the pack parameter.
If Method B is chosen and band format is specified, then the matrix is generated in the band format and no
repacking is necessary.
Input Parameters
A <datatype> placeholder, if present, is used for the C interface data types in the C interface section above.
See C Interface Conventions for the C interface principal conventions and type definitions.
1311
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
This array is used to specify the singular values or eigenvalues of A (see the
description of sym). If mode=0, then d is assumed to contain the
eigenvalues or singular values, otherwise elements of d are computed
according to mode, cond, and dmax.
mode < 0 has the same meaning as ABS(mode), except that the order of the
elements of d is reversed. Thus, if mode is positive, d has entries ranging
from 1 to 1/cond, if negative, from 1/cond to 1.
If sym='S' or 'H', and mode is not 0, 6, nor -6, then the elements of d are
also given a random sign (multiplied by +1 or -1).
cond Used in setting d as described for the mode parameter. If used, cond≥ 1.
dmax If mode is not -6, 0 nor 6, the contents of d, as computed according to mode
and cond, are scaled by dmax / max(abs(d[i-1])); thus, the maximum
absolute eigenvalue or singular value (the norm) is abs(dmax).
1312
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
NOTE
dmax need not be positive: if dmax is negative (or zero), d will be
scaled by a negative number (or zero).
kl Specifies the lower bandwidth of the matrix. For example, kl=0 implies
upper triangular, kl=1 implies upper Hessenberg, and kl being at least m -
1 means that the matrix has full lower bandwidth. kl must equal ku if the
matrix is symmetric or Hermitian.
ku Specifies the upper bandwidth of the matrix. For example, ku=0 implies
lower triangular, ku=1 implies lower Hessenberg, and ku being at least n -
1 means that the matrix has full upper bandwidth. kl must equal ku if the
matrix is symmetric or Hermitian.
• 'N': no packing
• 'U': zero out all subdiagonal entries (if symmetric or Hermitian)
• 'L': zero out all superdiagonal entries (if symmetric or Hermitian)
• 'B': store the lower triangle in band storage scheme (only if matrix
symmetric, Hermitian, or lower triangular)
• 'Q': store the upper triangle in band storage scheme (only if matrix
symmetric, Hermitian, or upper triangular)
• 'Z': store the entire matrix in band storage scheme (pivoting can be
provided for by using this option to store A in the trailing rows of the
allocated storage)
Using these options, the various LAPACK packed and banded storage
schemes can be obtained:
If two calls to ?latms differ only in the pack parameter, they generate
mathematically equivalent matrices.
lda lda specifies the first dimension of a as declared in the calling program.
1313
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If pack='N', 'U', 'L', 'C', or 'R', then lda must be at least m for column major
or at least n for row major.
If pack='Z', lda must be large enough to hold the packed array: MIN( ku,
n - 1) + MIN( kl, m - 1) + 1.
Output Parameters
NOTE
The array d is not modified if mode = 0.
Return Values
This function returns a value info.
If info = 0, the execution is successful.
1314
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
LAPACK_DECL lapack_int LAPACKE_zhesv_aa_2stage (int matrix_layout , char uplo ,
lapack_int n , lapack_int nrhs , lapack_complex_double * a , lapack_int lda ,
lapack_complex_double * tb , lapack_int ltb , lapack_int * ipiv , lapack_int * ipiv2 ,
lapack_complex_double * b , lapack_int ldb );
LAPACK_DECL lapack_int LAPACKE_chetrf_aa_2stage (int matrix_layout , char uplo ,
lapack_int n , lapack_complex_float * a , lapack_int lda , lapack_complex_float * tb ,
lapack_int ltb , lapack_int * ipiv , lapack_int * ipiv2 );
LAPACK_DECL lapack_int LAPACKE_dsytrf_aa_2stage (int matrix_layout , char uplo ,
lapack_int n , double * a , lapack_int lda , double * tb , lapack_int ltb , lapack_int
* ipiv , lapack_int * ipiv2 );
LAPACK_DECL lapack_int LAPACKE_ssytrf_aa_2stage (int matrix_layout , char uplo ,
lapack_int n , float * a , lapack_int lda , float * tb , lapack_int ltb , lapack_int *
ipiv , lapack_int * ipiv2 );
LAPACK_DECL lapack_int LAPACKE_zhetrf_aa_2stage (int matrix_layout , char uplo ,
lapack_int n , lapack_complex_double * a , lapack_int lda , lapack_complex_double *
tb , lapack_int ltb , lapack_int * ipiv , lapack_int * ipiv2 );
LAPACK_DECL lapack_int LAPACKE_chetrs_aa_2stage (int matrix_layout , char uplo ,
lapack_int n , lapack_int nrhs , lapack_complex_float * a , lapack_int lda ,
lapack_complex_float * tb , lapack_int ltb , lapack_int * ipiv , lapack_int * ipiv2 ,
lapack_complex_float * b , lapack_int ldb );
LAPACK_DECL lapack_int LAPACKE_dsytrs_aa_2stage (int matrix_layout , char uplo ,
lapack_int n , lapack_int nrhs , double * a , lapack_int lda , double * tb , lapack_int
ltb , lapack_int * ipiv , lapack_int * ipiv2 , double * b , lapack_int ldb );
LAPACK_DECL lapack_int LAPACKE_ssytrs_aa_2stage (int matrix_layout , char uplo ,
lapack_int n , lapack_int nrhs , float * a , lapack_int lda , float * tb , lapack_int
ltb , lapack_int * ipiv , lapack_int * ipiv2 , float * b , lapack_int ldb );
LAPACK_DECL lapack_int LAPACKE_zhetrs_aa_2stage (int matrix_layout , char uplo ,
lapack_int n , lapack_int nrhs , lapack_complex_double * a , lapack_int lda ,
lapack_complex_double * tb , lapack_int ltb , lapack_int * ipiv , lapack_int * ipiv2 ,
lapack_complex_double * b , lapack_int ldb );
call csysv_aa_2stage (uplo , n , nrhs , a , lda , tb , ltb , ipiv , ipiv2 , b , ldb ,
info);
LAPACK_DECL lapack_int LAPACKE_csysv_aa_2stage (int matrix_layout , char uplo ,
lapack_int n , lapack_int nrhs , lapack_complex_float * a , lapack_int lda ,
lapack_complex_float * tb , lapack_int ltb , lapack_int * ipiv , lapack_int * ipiv2 ,
lapack_complex_float * b , lapack_int ldb );
call zsysv_aa_2stage (uplo , n , nrhs , a , lda , tb , ltb , ipiv , ipiv2 , b , ldb ,
info);
LAPACK_DECL lapack_int LAPACKE_zsysv_aa_2stage (int matrix_layout , char uplo ,
lapack_int n , lapack_int nrhs , lapack_complex_double * a , lapack_int lda ,
lapack_complex_double * tb , lapack_int ltb , lapack_int * ipiv , lapack_int * ipiv2 ,
lapack_complex_double * b , lapack_int ldb );
LAPACK_DECL lapack_int LAPACKE_csytrf_aa_2stage (int matrix_layout , char uplo ,
lapack_int n , lapack_complex_float * a , lapack_int lda , lapack_complex_float * tb ,
lapack_int ltb , lapack_int * ipiv , lapack_int * ipiv2 );
LAPACK_DECL lapack_int LAPACKE_zsytrf_aa_2stage (int matrix_layout , char uplo ,
lapack_int n , lapack_complex_double * a , lapack_int lda , lapack_complex_double *
tb , lapack_int ltb , lapack_int * ipiv , lapack_int * ipiv2 );
1315
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
1316
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
LAPACK_DECL lapack_int LAPACKE_zheevr_2stage (int matrix_layout, char jobz, char range,
char uplo, lapack_int n, lapack_complex_double * a, lapack_int lda, double vl, double
vu, lapack_int il, lapack_int iu, double abstol, lapack_int * m, double * w,
lapack_complex_double * z, lapack_int ldz, lapack_int * isuppz);
LAPACK_DECL lapack_int LAPACKE_cheevx_2stage (int matrix_layout, char jobz, char range,
char uplo, lapack_int n, lapack_complex_float * a, lapack_int lda, float vl, float vu,
lapack_int il, lapack_int iu, float abstol, lapack_int * m, float * w,
lapack_complex_float * z, lapack_int ldz, lapack_int * ifail);
LAPACK_DECL lapack_int LAPACKE_zheevx_2stage (int matrix_layout, char jobz, char range,
char uplo, lapack_int n, lapack_complex_double * a, lapack_int lda, double vl, double
vu, lapack_int il, lapack_int iu, double abstol, lapack_int * m, double * w,
lapack_complex_double * z, lapack_int ldz, lapack_int * ifail);
LAPACK_DECL lapack_int LAPACKE_chegv_2stage (int matrix_layout, lapack_int itype, char
jobz, char uplo, lapack_int n, lapack_complex_float * a, lapack_int lda,
lapack_complex_float * b, lapack_int ldb, float * w);
LAPACK_DECL lapack_int LAPACKE_zhegv_2stage (int matrix_layout, lapack_int itype, char
jobz, char uplo, lapack_int n, lapack_complex_double * a, lapack_int lda,
lapack_complex_double * b, lapack_int ldb, double * w);
LAPACK_DECL lapack_int LAPACKE_ssbev_2stage (int matrix_layout, char jobz, char uplo,
lapack_int n, lapack_int kd, float * ab, lapack_int ldab, float * w, float * z,
lapack_int ldz);
LAPACK_DECL lapack_int LAPACKE_dsbev_2stage (int matrix_layout, char jobz, char uplo,
lapack_int n, lapack_int kd, double * ab, lapack_int ldab, double * w, double * z,
lapack_int ldz);
LAPACK_DECL lapack_int LAPACKE_ssbevd_2stage (int matrix_layout, char jobz, char uplo,
lapack_int n, lapack_int kd, float * ab, lapack_int ldab, float * w, float * z,
lapack_int ldz);
LAPACK_DECL lapack_int LAPACKE_dsbevd_2stage (int matrix_layout, char jobz, char uplo,
lapack_int n, lapack_int kd, double * ab, lapack_int ldab, double * w, double * z,
lapack_int ldz);
LAPACK_DECL lapack_int LAPACKE_ssbevx_2stage (int matrix_layout, char jobz, char range,
char uplo, lapack_int n, lapack_int kd, float * ab, lapack_int ldab, float * q,
lapack_int ldq, float vl, float vu, lapack_int il, lapack_int iu, float abstol,
lapack_int * m, float * w, float * z, lapack_int ldz, lapack_int * ifail);
LAPACK_DECL lapack_int LAPACKE_dsbevx_2stage (int matrix_layout, char jobz, char range,
char uplo, lapack_int n, lapack_int kd, double * ab, lapack_int ldab, double * q,
lapack_int ldq, double vl, double vu, lapack_int il, lapack_int iu, double abstol,
lapack_int * m, double * w, double * z, lapack_int ldz, lapack_int * ifail);
LAPACK_DECL lapack_int LAPACKE_chbev_2stage (int matrix_layout, char jobz, char uplo,
lapack_int n, lapack_int kd, lapack_complex_float * ab, lapack_int ldab, float * w,
lapack_complex_float * z, lapack_int ldz);
LAPACK_DECL lapack_int LAPACKE_zhbev_2stage (int matrix_layout, char jobz, char uplo,
lapack_int n, lapack_int kd, lapack_complex_double * ab, lapack_int ldab, double * w,
lapack_complex_double * z, lapack_int ldz);
LAPACK_DECL lapack_int LAPACKE_chbevd_2stage (int matrix_layout, char jobz, char uplo,
lapack_int n, lapack_int kd, lapack_complex_float * ab, lapack_int ldab, float * w,
lapack_complex_float * z, lapack_int ldz);
LAPACK_DECL lapack_int LAPACKE_zhbevd_2stage (int matrix_layout, char jobz, char uplo,
lapack_int n, lapack_int kd, lapack_complex_double * ab, lapack_int ldab, double * w,
lapack_complex_double * z, lapack_int ldz);
1317
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ScaLAPACK Routines
Intel® oneAPI Math Kernel Library implements routines from the ScaLAPACK package for distributed-memory
architectures. Routines are supported for both real and complex dense and band matrices to perform the
tasks of solving systems of linear equations, solving linear least-squares problems, eigenvalue and singular
value problems, as well as performing a number of related computational tasks.
Intel® oneAPI Math Kernel Library (oneMKL) ScaLAPACK routines are written in FORTRAN 77 with exception
of a few utility routines written in C to exploit the IEEE arithmetic. All routines are available in all precision
types: single precision, double precision, complexm, and double complex precision. See
themkl_scalapack.h header file for C declarations of ScaLAPACK routines.
NOTE
ScaLAPACK routines are provided only for Intel® 64 or Intel® Many Integrated Core architectures.
See descriptions of ScaLAPACK computational routines that perform distinct computational tasks, as well as
driver routinesfor solving standard types of problems in one call. Additionally, Intel® oneAPI Math Kernel
Library implements ScaLAPACKAuxiliary Routines, Utility Functions and Routines, and Matrix Redistribution/
Copy Routines. The library includes routines for both real and complex data.
The <install_directory>/examples/scalapackf directory contains sample code demonstrating the use
of ScaLAPACK routines.
Generally, ScaLAPACK runs on a network of computers using MPI as a message-passing layer and a set of
prebuilt communication subprograms (BLACS), as well as a set of BLAS optimized for the target architecture.
Intel® oneAPI Math Kernel Library (oneMKL) version of ScaLAPACK is optimized for Intel® processors. For the
detailed system and environment requirements, seeIntel® oneAPI Math Kernel Library (oneMKL) Release
Notes and Intel® oneAPI Math Kernel Library (oneMKL) Developer Guide.
For full reference on ScaLAPACK routines and related information, see [SLUG].
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
1318
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
dtype_a desca[dtype_] 0
Descriptor type ( =1 for dense matrices).
ctxt_a desca[ctxt_] BLACS context handle for the process grid. 1
m_a desca[m_] Number of rows in the global matrix A. 2
n_a desca[n_] Number of columns in the global matrix A. 3
mb_a desca[mb_] Row blocking factor. 4
1319
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Similar notations are used for different matrices. For example: lld_b is the leading dimension of the local
matrix storing the local blocks of the distributed matrix B and dtype_z is the type of the global matrix Z.
The number of rows and columns of a global dense matrix that a particular process in a grid receives after
data distributing is denoted by LOCr() and LOCc(), respectively. To compute these numbers, you can use the
ScaLAPACK tool routine numroc.
After the block-cyclic distribution of global data is done, you may choose to perform an operation on a
submatrix sub(A) of the global matrix A defined by the following 6 values (for dense matrices):
1320
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
desca The array descriptor for the global matrix A
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
The second and third letters yy indicate the matrix type as:
ge general
gb general band
sy symmetric
he Hermitian
or orthogonal
tz trapezoidal
un unitary
For computational routines, the last three letters zzz indicate the computation performed and have the same
meaning as for LAPACK routines.
For driver routines, the last two letters zz or three letters zzz have the following meaning:
1321
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
evd a simple driver for solving an eigenvalue problem using a divide and conquer
algorithm
gvx an expert driver for solving a generalized symmetric definite eigenvalue problem
Simple driver here means that the driver just solves the general problem, whereas an expert driver is more
versatile and can also optionally perform some related computations (such, for example, as refining the
solution and computing error bounds after the linear system is solved).
1322
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Table “Computational Routines for Systems of Linear Equations” lists the ScaLAPACK computational routines
for factorizing, equilibrating, and inverting matrices, estimating their condition numbers, solving systems of
equations with real matrices, refining the solution, and estimating its error.
Computational Routines for Systems of Linear Equations
Matrix type, storage Factorize Equilibrate Solve Condition Estimate Invert
scheme matrix matrix system number error matrix
general (partial pivoting) p?getrf p?geequ p?getrs p?gecon p?gerfs p?getri
general band (partial p?gbtrf p?gbtrs
pivoting)
general band (no p?dbtrf p?dbtrs
pivoting)
general tridiagonal (no p?dttrf p?dttrs
pivoting)
symmetric/Hermitian p?potrf p?poequ p?potrs p?pocon p?porfs p?potri
positive-definite
symmetric/Hermitian p?pbtrf p?pbtrs
positive-definite, band
symmetric/Hermitian p?pttrf p?pttrs
positive-definite,
tridiagonal
triangular p?trtrs p?trcon p?trrfs p?trtri
In this table ? stands for s (single precision real), d (double precision real), c (single precision complex), or z
(double precision complex).
p?getrf
Computes the LU factorization of a general m-by-n
distributed matrix.
Syntax
void psgetrf (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *ipiv , MKL_INT *info );
void pdgetrf (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *ipiv , MKL_INT *info );
void pcgetrf (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *ipiv , MKL_INT *info );
void pzgetrf (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *ipiv , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
1323
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The p?getrffunction forms the LU factorization of a general m-by-n distributed matrix sub(A) = A(ia:ia
+m-1, ja:ja+n-1) as
A = P*L*U
where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m>n)
and U is upper triangular (upper trapezoidal if m < n). L and U are stored in sub(A).
NOTE
This function supports the Progress Routine feature. See mkl_progress for details.
Input Parameters
a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
Output Parameters
Contains the pivoting information: local row i was interchanged with global
row ipiv[i-1]. This array is tied to the distributed matrix A.
info (global)
If info=0, the execution is successful.
info < 0: if the i-th argument is an array and the j-th entry, indexed j - 1,
had an illegal value, then info = -(i*100+j); if the i-th argument is a
scalar and had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?gbtrf
Computes the LU factorization of a general n-by-n
banded distributed matrix.
1324
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void psgbtrf (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , float *a , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *ipiv , float *af , MKL_INT *laf , float *work , MKL_INT
*lwork , MKL_INT *info );
void pdgbtrf (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , double *a , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *ipiv , double *af , MKL_INT *laf , double *work , MKL_INT
*lwork , MKL_INT *info );
void pcgbtrf (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_Complex8 *a , MKL_INT
*ja , MKL_INT *desca , MKL_INT *ipiv , MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8
*work , MKL_INT *lwork , MKL_INT *info );
void pzgbtrf (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_Complex16 *a , MKL_INT
*ja , MKL_INT *desca , MKL_INT *ipiv , MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16
*work , MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?gbtrf function computes the LU factorization of a general n-by-n real/complex banded distributed
matrix A(1:n, ja:ja+n-1) using partial pivoting with row interchanges.
The resulting factorization is not the same factorization as returned from the LAPACK function ?gbtrf.
Additional permutations are performed on the matrix for the sake of parallelism.
The factorization has the form
A(1:n, ja:ja+n-1) = P*L*U*Q
where P and Q are permutation matrices, and L and U are banded lower and upper triangular matrices,
respectively. The matrix Q represents reordering of columns for the sake of parallelism, while P represents
reordering of rows for numerical stability using classic partial pivoting.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1)
where
1325
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
If dtype_a = 501, then dlen_≥ 7;
If laf is not large enough, an error code will be returned and the minimum
acceptable size will be returned in af[0].
lwork (local or global) The size of the work array (lwork≥ 1). If lwork is too
small, the minimal acceptable size will be returned in work[0] and an error
code is returned.
Output Parameters
a On exit, this array contains details of the factorization. Note that additional
permutations are performed on the matrix, so that the factors returned are
different from those returned by LAPACK.
Contains pivot indices for local factorizations. Note that you should not alter
the contents of this array between factorization and solve.
af (local)
Array of size laf.
Auxiliary fill-in space. The fill-in space is created in a call to the factorization
function p?gbtrf and is stored in af.
info (global)
If info=0, the execution is successful.
info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.
1326
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
info> 0:
If info = k ≤ NPROCS, the submatrix stored on processor info and
factored locally was not nonsingular, and the factorization was not
completed.
If info = k > NPROCS, the submatrix stored on processor info-NPROCS
representing interactions with other processors was not nonsingular, and
the factorization was not completed.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?dbtrf
Computes the LU factorization of a n-by-n diagonally
dominant-like banded distributed matrix.
Syntax
void psdbtrf (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , float *a , MKL_INT *ja ,
MKL_INT *desca , float *af , MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT
*info );
void pddbtrf (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , double *a , MKL_INT *ja ,
MKL_INT *desca , double *af , MKL_INT *laf , double *work , MKL_INT *lwork , MKL_INT
*info );
void pcdbtrf (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_Complex8 *a , MKL_INT
*ja , MKL_INT *desca , MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
void pzdbtrf (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_Complex16 *a , MKL_INT
*ja , MKL_INT *desca , MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?dbtrffunction computes the LU factorization of a n-by-n real/complex diagonally dominant-like
banded distributed matrix A(1:n, ja:ja+n-1) without pivoting.
NOTE
A matrix is called diagonally dominant-like if pivoting is not required for LU to be
numerically stable.
Note that the resulting factorization is not the same factorization as returned from LAPACK. Additional
permutations are performed on the matrix for the sake of parallelism.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
1327
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).
Contains the local pieces of the n-by-n distributed banded matrix A(1:n,
ja:ja+n-1) to be factored.
ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
If dtype_a = 501, then dlen_≥ 7;
If laf is not large enough, an error code will be returned and the minimum
acceptable size will be returned in af[0].
lwork (local or global) The size of the work array, must be lwork≥
(max(bwl,bwu))2. If lwork is too small, the minimal acceptable size will
be returned in work[0] and an error code is returned.
Output Parameters
a On exit, this array contains details of the factorization. Note that additional
permutations are performed on the matrix, so that the factors returned are
different from those returned by LAPACK.
af (local)
Array of size laf.
Auxiliary fill-in space. The fill-in space is created in a call to the factorization
function p?dbtrf and is stored in af.
work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.
1328
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
info (global)
If info=0, the execution is successful.
info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.
info> 0:
If info = k ≤ NPROCS, the submatrix stored on processor info and
factored locally was not diagonally dominant-like, and the factorization was
not completed.
If info = k > NPROCS, the submatrix stored on processor info-NPROCS
representing interactions with other processors was not nonsingular, and
the factorization was not completed.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?dttrf
Computes the LU factorization of a diagonally
dominant-like tridiagonal distributed matrix.
Syntax
void psdttrf (MKL_INT *n , float *dl , float *d , float *du , MKL_INT *ja , MKL_INT
*desca , float *af , MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT *info );
void pddttrf (MKL_INT *n , double *dl , double *d , double *du , MKL_INT *ja , MKL_INT
*desca , double *af , MKL_INT *laf , double *work , MKL_INT *lwork , MKL_INT *info );
void pcdttrf (MKL_INT *n , MKL_Complex8 *dl , MKL_Complex8 *d , MKL_Complex8 *du ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8 *work ,
MKL_INT *lwork , MKL_INT *info );
void pzdttrf (MKL_INT *n , MKL_Complex16 *dl , MKL_Complex16 *d , MKL_Complex16 *du ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16 *work ,
MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?dttrffunction computes the LU factorization of an n-by-n real/complex diagonally dominant-like
tridiagonal distributed matrix A(1:n, ja:ja+n-1) without pivoting for stability.
The resulting factorization is not the same factorization as returned from LAPACK. Additional permutations
are performed on the matrix for the sake of parallelism.
The factorization has the form:
A(1:n, ja:ja+n-1) = P*L*U*PT,
where P is a permutation matrix, and L and U are banded lower and upper triangular matrices, respectively.
1329
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
n (global) The number of rows and columns to be operated on, that is, the
order of the distributed submatrix A(1:n, ja:ja+n-1) (n≥ 0).
dl, d, du (local)
Pointers to the local arrays of size nb_a each.
On entry, the array dl contains the local part of the global vector storing
the subdiagonal elements of the matrix. Globally, dl[0] is not referenced,
and dl must be aligned with d.
On entry, the array d contains the local part of the global vector storing the
diagonal elements of the matrix.
On entry, the array du contains the local part of the global vector storing
the super-diagonal elements of the matrix. du[n-1] is not referenced, and
du must be aligned with d.
ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
If dtype_a = 501, then dlen_≥ 7;
If laf is not large enough, an error code will be returned and the minimum
acceptable size will be returned in af[0].
lwork (local or global) The size of the work array, must be at least lwork≥
8*NPCOL.
Output Parameters
dl, d, du On exit, overwritten by the information containing the factors of the matrix.
af (local)
Array of size laf.
Auxiliary fill-in space. The fill-in space is created in a call to the factorization
function p?dttrf and is stored in af.
1330
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Note that if a linear system is to be solved using p?dttrs after the
factorization function,af must not be altered.
work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
If info=0, the execution is successful.
info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.
info> 0:
If info = k ≤ NPROCS, the submatrix stored on processor info and
factored locally was not diagonally dominant-like, and the factorization was
not completed.
If info = k > NPROCS, the submatrix stored on processor info-NPROCS
representing interactions with other processors was not nonsingular, and
the factorization was not completed.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?potrf
Computes the Cholesky factorization of a symmetric
(Hermitian) positive-definite distributed matrix.
Syntax
void pspotrf (char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *info );
void pdpotrf (char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *info );
void pcpotrf (char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *info );
void pzpotrf (char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?potrffunction computes the Cholesky factorization of a real symmetric or complex Hermitian positive-
definite distributed n-by-n matrix A(ia:ia+n-1, ja:ja+n-1), denoted below as sub(A).
1331
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
uplo (global)
Indicates whether the upper or lower triangular part of sub(A) is stored.
Must be 'U' or 'L'.
If uplo = 'U', the array a stores the upper triangular part of the matrix
sub(A) that is factored as UH*U.
If uplo = 'L', the array a stores the lower triangular part of the
matrix sub(A) that is factored as L*LH.
n (global) The order of the distributed matrix sub(A) (n≥0).
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1).
On entry, this array contains the local pieces of the n-by-n symmetric/
Hermitian distributed matrix sub(A) to be factored.
Depending on uplo, the array a contains either the upper or the lower
triangular part of the matrix sub(A) (see uplo).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
Output Parameters
info (global) .
If info=0, the execution is successful;
info < 0: if the i-th argument is an array, and the j-th entry, indexed j -
1, had an illegal value, then info = -(i*100+j); if the i-th argument is a
scalar and had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?pbtrf
Computes the Cholesky factorization of a symmetric
(Hermitian) positive-definite banded distributed
matrix.
Syntax
void pspbtrf (char *uplo , MKL_INT *n , MKL_INT *bw , float *a , MKL_INT *ja , MKL_INT
*desca , float *af , MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT *info );
void pdpbtrf (char *uplo , MKL_INT *n , MKL_INT *bw , double *a , MKL_INT *ja , MKL_INT
*desca , double *af , MKL_INT *laf , double *work , MKL_INT *lwork , MKL_INT *info );
1332
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void pcpbtrf (char *uplo , MKL_INT *n , MKL_INT *bw , MKL_Complex8 *a , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
void pzpbtrf (char *uplo , MKL_INT *n , MKL_INT *bw , MKL_Complex16 *a , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?pbtrffunction computes the Cholesky factorization of an n-by-n real symmetric or complex Hermitian
positive-definite banded distributed matrix A(1:n, ja:ja+n-1).
The resulting factorization is not the same factorization as returned from LAPACK. Additional permutations
are performed on the matrix for the sake of parallelism.
The factorization has the form:
A(1:n, ja:ja+n-1) = P*UH*U*PT, if uplo='U', or
where P is a permutation matrix and U and L are banded upper and lower triangular matrices, respectively.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
(n≥0).
bw (global)
The number of superdiagonals of the distributed matrix if uplo = 'U', or
the number of subdiagonals if uplo = 'L' (bw≥0).
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1).
On entry, this array contains the local pieces of the upper or lower triangle
of the symmetric/Hermitian band distributed matrix A(1:n, ja:ja+n-1) to
be factored.
ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).
1333
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
If dtype_a = 501, then dlen_≥ 7;
If laf is not large enough, an error code will be returned and the minimum
acceptable size will be returned in af[0].
lwork (local or global) The size of the work array, must be lwork≥bw2.
Output Parameters
af (local)
Array of size laf. Auxiliary fill-in space. The fill-in space is created in a call
to the factorization function p?pbtrf and stored in af. Note that if a linear
system is to be solved using p?pbtrs after the factorization function,af
must not be altered.
work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
If info=0, the execution is successful.
info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.
info>0:
If info = k ≤ NPROCS, the submatrix stored on processor info and
factored locally was not positive definite, and the factorization was not
completed.
If info = k > NPROCS, the submatrix stored on processor info-NPROCS
representing interactions with other processors was not nonsingular, and
the factorization was not completed.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
1334
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
p?pttrf
Computes the Cholesky factorization of a symmetric
(Hermitian) positive-definite tridiagonal distributed
matrix.
Syntax
void pspttrf (MKL_INT *n , float *d , float *e , MKL_INT *ja , MKL_INT *desca , float
*af , MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT *info );
void pdpttrf (MKL_INT *n , double *d , double *e , MKL_INT *ja , MKL_INT *desca , double
*af , MKL_INT *laf , double *work , MKL_INT *lwork , MKL_INT *info );
void pcpttrf (MKL_INT *n , float *d , MKL_Complex8 *e , MKL_INT *ja , MKL_INT *desca ,
MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT
*info );
void pzpttrf (MKL_INT *n , double *d , MKL_Complex16 *e , MKL_INT *ja , MKL_INT
*desca , MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16 *work , MKL_INT *lwork ,
MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?pttrffunction computes the Cholesky factorization of an n-by-n real symmetric or complex hermitian
positive-definite tridiagonal distributed matrix A(1:n, ja:ja+n-1).
The resulting factorization is not the same factorization as returned from LAPACK. Additional permutations
are performed on the matrix for the sake of parallelism.
The factorization has the form:
A(1:n, ja:ja+n-1) = P*L*D*LH*PT, or
where P is a permutation matrix, and U and L are tridiagonal upper and lower triangular matrices,
respectively.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
(n≥ 0).
d, e (local)
Pointers into the local memory to arrays of size nb_a each.
On entry, the array d contains the local part of the global vector storing the
main diagonal of the distributed matrix A.
1335
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
On entry, the array e contains the local part of the global vector storing the
upper diagonal of the distributed matrix A.
ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).
desca (global and local ) array of size dlen_. The array descriptor for the
distributed matrix A.
If dtype_a = 501, then dlen_≥ 7;
Must be laf≥nb_a+2.
If laf is not large enough, an error code will be returned and the minimum
acceptable size will be returned in af[0].
lwork (local or global) The size of the work array, must be at least
lwork≥ 8*NPCOL.
Output Parameters
af (local)
Array of size laf.
Auxiliary fill-in space. The fill-in space is created in a call to the factorization
function p?pttrf and stored in af.
work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
If info=0, the execution is successful.
info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.
info> 0:
If info = k ≤ NPROCS, the submatrix stored on processor info and
factored locally was not positive definite, and the factorization was not
completed.
If info = k > NPROCS, the submatrix stored on processor info-NPROCS
representing interactions with other processors was not nonsingular, and
the factorization was not completed.
1336
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?getrs
Solves a system of distributed linear equations with a
general square matrix, using the LU factorization
computed by p?getrf.
Syntax
void psgetrs (char *trans , MKL_INT *n , MKL_INT *nrhs , float *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , float *b , MKL_INT *ib , MKL_INT *jb ,
MKL_INT *descb , MKL_INT *info );
void pdgetrs (char *trans , MKL_INT *n , MKL_INT *nrhs , double *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , double *b , MKL_INT *ib , MKL_INT *jb ,
MKL_INT *descb , MKL_INT *info );
void pcgetrs (char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , MKL_Complex8 *b , MKL_INT *ib ,
MKL_INT *jb , MKL_INT *descb , MKL_INT *info );
void pzgetrs (char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , MKL_Complex16 *b , MKL_INT *ib ,
MKL_INT *jb , MKL_INT *descb , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?getrsfunction solves a system of distributed linear equations with a general n-by-n distributed matrix
sub(A) = A(ia:ia+n-1, ja:ja+n-1) using the LU factorization computed by p?getrf.
Before calling this function,you must call p?getrf to compute the LU factorization of sub(A).
Input Parameters
1337
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
n (global) The number of linear equations; the order of the matrix sub(A)
(n≥0).
nrhs (global) The number of right hand sides; the number of columns of the
distributed matrix sub(B) (nrhs≥0).
a, b (local)
Pointers into the local memory to arrays of local sizes lld_a*LOCc(ja+n-1)
and lld_b*LOCc(jb+nrhs-1), respectively.
On entry, the array a contains the local pieces of the factors L and U from
the factorization sub(A) = P*L*U; the unit diagonal elements of L are not
stored. On entry, the array b contains the right hand sides sub(B).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of the matrix sub(B), respectively.
descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
Output Parameters
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?gbtrs
Solves a system of distributed linear equations with a
general band matrix, using the LU factorization
computed by p?gbtrf.
Syntax
void psgbtrs (char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs ,
float *a , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , float *b , MKL_INT *ib ,
MKL_INT *descb , float *af , MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT
*info );
1338
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void pdgbtrs (char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs ,
double *a , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , double *b , MKL_INT *ib ,
MKL_INT *descb , double *af , MKL_INT *laf , double *work , MKL_INT *lwork , MKL_INT
*info );
void pcgbtrs (char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs ,
MKL_Complex8 *a , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , MKL_Complex8 *b ,
MKL_INT *ib , MKL_INT *descb , MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8 *work ,
MKL_INT *lwork , MKL_INT *info );
void pzgbtrs (char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs ,
MKL_Complex16 *a , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , MKL_Complex16 *b ,
MKL_INT *ib , MKL_INT *descb , MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16 *work ,
MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?gbtrs function solves a system of distributed linear equations with a general band distributed matrix
sub(A) = A(1:n, ja:ja+n-1) using the LU factorization computed by p?gbtrf.
Before calling this function,you must call p?gbtrf to compute the LU factorization of sub(A).
Input Parameters
n (global) The number of linear equations; the order of the distributed matrix
sub(A) (n≥ 0).
nrhs (global) The number of right hand sides; the number of columns of the
distributed matrix sub(B) (nrhs≥ 0).
a, b (local)
1339
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on ( which may be either all of A or a submatrix of A).
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
If dtype_a = 501, then dlen_≥ 7;
ib (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).
descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
If dtype_b = 502, then dlen_≥ 7;
Must be laf≥nb_a*(bwl+bwu)+6*(bwl+bwu)*(bwl+2*bwu).
If laf is not large enough, an error code will be returned and the minimum
acceptable size will be returned in af[0].
lwork (local or global) The size of the work array, must be at least
lwork≥nrhs*(nb_a+2*bwl+4*bwu).
Output Parameters
Contains pivot indices for local factorizations. Note that you should not alter
the contents of this array between factorization and solve.
af (local)
Array of size laf.
1340
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.
info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?dbtrs
Solves a system of linear equations with a diagonally
dominant-like banded distributed matrix using the
factorization computed by p?dbtrf.
Syntax
void psdbtrs (char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs ,
float *a , MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *descb ,
float *af , MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT *info );
void pddbtrs (char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs ,
double *a , MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *descb ,
double *af , MKL_INT *laf , double *work , MKL_INT *lwork , MKL_INT *info );
void pcdbtrs (char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs ,
MKL_Complex8 *a , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib ,
MKL_INT *descb , MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
void pzdbtrs (char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs ,
MKL_Complex16 *a , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib ,
MKL_INT *descb , MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?dbtrsfunction solves for X one of the systems of equations:
sub(A)*X = sub(B),
(sub(A))T*X = sub(B), or
(sub(A))H*X = sub(B),
where sub(A) = A(1:n, ja:ja+n-1) is a diagonally dominant-like banded distributed matrix, and sub(B)
denotes the distributed matrix B(ib:ib+n-1, 1:nrhs).
Input Parameters
1341
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
nrhs (global) The number of right hand sides; the number of columns of the
distributed matrix sub(B) (nrhs≥ 0).
a, b (local)
Pointers into the local memory to arrays of local sizes lld_a*LOCc(ja+n-1)
and lld_b*LOCc(nrhs), respectively.
On entry, the array b contains the local pieces of the right hand side
distributed matrix sub(B).
ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
If dtype_a = 501, then dlen_≥ 7;
ib (global) The row index in the global matrix B indicating the first row of the
matrix to be operated on (which may be either all of B or a submatrix of B).
descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
If dtype_b = 502, then dlen_≥ 7;
Must be laf≥NB*(bwl+bwu)+6*(max(bwl,bwu))2 .
1342
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If laf is not large enough, an error code will be returned and the minimum
acceptable size will be returned in af[0].
lwork (local or global) The size of the array work, must be at least
lwork≥ (max(bwl,bwu))2.
Output Parameters
b On exit, this array contains the local pieces of the solution distributed
matrix X.
work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?dttrs
Solves a system of linear equations with a diagonally
dominant-like tridiagonal distributed matrix using the
factorization computed by p?dttrf.
Syntax
void psdttrs (char *trans , MKL_INT *n , MKL_INT *nrhs , float *dl , float *d , float
*du , MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *descb , float
*af , MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT *info );
void pddttrs (char *trans , MKL_INT *n , MKL_INT *nrhs , double *dl , double *d , double
*du , MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *descb , double
*af , MKL_INT *laf , double *work , MKL_INT *lwork , MKL_INT *info );
void pcdttrs (char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *dl ,
MKL_Complex8 *d , MKL_Complex8 *du , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b ,
MKL_INT *ib , MKL_INT *descb , MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8 *work ,
MKL_INT *lwork , MKL_INT *info );
void pzdttrs (char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *dl ,
MKL_Complex16 *d , MKL_Complex16 *du , MKL_INT *ja , MKL_INT *desca , MKL_Complex16
*b , MKL_INT *ib , MKL_INT *descb , MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16
*work , MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?dttrsfunction solves for X one of the systems of equations:
sub(A)*X = sub(B),
(sub(A))T*X = sub(B), or
1343
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
(sub(A))H*X = sub(B),
where sub(A) =A(1:n, ja:ja+n-1) is a diagonally dominant-like tridiagonal distributed matrix, and sub(B)
denotes the distributed matrix B(ib:ib+n-1, 1:nrhs).
Input Parameters
nrhs (global) The number of right hand sides; the number of columns of the
distributed matrix sub(B) (nrhs≥ 0).
dl, d, du (local)
Pointers to the local arrays of size nb_a each.
ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
If dtype_a = 501 or dtype_a = 502, then dlen_≥ 7;
On entry, the array b contains the local pieces of the n-by-nrhs right hand
side distributed matrix sub(B).
ib (global) The row index in the global matrix B indicating the first row of the
matrix to be operated on (which may be either all of B or a submatrix of B).
descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
If dtype_b = 502, then dlen_≥ 7;
1344
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The array af contains auxiliary fill-in space. The fill-in space is created in a
call to the factorization function p?dttrf and is stored in af. If a linear
system is to be solved using p?dttrs after the factorization function,af
must not be altered.
The array work is a workspace array.
Must be laf≥NB*(bwl+bwu)+6*(bwl+bwu)*(bwl+2*bwu).
If laf is not large enough, an error code will be returned and the minimum
acceptable size will be returned in af[0].
lwork (local or global) The size of the array work, must be at least lwork≥
10*NPCOL+4*nrhs.
Output Parameters
b On exit, this array contains the local pieces of the solution distributed
matrix X.
work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?potrs
Solves a system of linear equations with a Cholesky-
factored symmetric/Hermitian distributed positive-
definite matrix.
Syntax
void pspotrs (char *uplo , MKL_INT *n , MKL_INT *nrhs , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT
*info );
void pdpotrs (char *uplo , MKL_INT *n , MKL_INT *nrhs , double *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb ,
MKL_INT *info );
void pcpotrs (char *uplo , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , MKL_INT *info );
void pzpotrs (char *uplo , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *jb ,
MKL_INT *descb , MKL_INT *info );
1345
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl_scalapack.h
Description
The p?potrsfunction solves for X a system of distributed linear equations in the form:
sub(A)*X = sub(B) ,
where sub(A) = A(ia:ia+n-1, ja:ja+n-1) is an n-by-n real symmetric or complex Hermitian positive
definite distributed matrix, and sub(B) denotes the distributed matrix B(ib:ib+n-1, jb:jb+nrhs-1).
Input Parameters
nrhs (global) The number of right hand sides; the number of columns of the
distributed matrix sub(B) (nrhs≥0).
a, b (local)
Pointers into the local memory to arrays of local sizes
lld_a*LOCc(ja+n-1) and lld_b*LOCc(jb+nrhs-1), respectively.
On entry, the array b contains the local pieces of the right hand sides
sub(B).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of the matrix sub(B), respectively.
descb (local) array of size dlen_. The array descriptor for the distributed matrix B.
Output Parameters
info < 0: if the i-th argument is an array and the j-th entry, indexed j - 1,
had an illegal value, then info = -(i*100+j); if the i-th argument is a
scalar and had an illegal value, then info = -i.
1346
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?pbtrs
Solves a system of linear equations with a Cholesky-
factored symmetric/Hermitian positive-definite band
matrix.
Syntax
void pspbtrs (char *uplo , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs , float *a , MKL_INT
*ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *descb , float *af , MKL_INT
*laf , float *work , MKL_INT *lwork , MKL_INT *info );
void pdpbtrs (char *uplo , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs , double *a ,
MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *descb , double *af ,
MKL_INT *laf , double *work , MKL_INT *lwork , MKL_INT *info );
void pcpbtrs (char *uplo , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs , MKL_Complex8 *a ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *descb ,
MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT
*info );
void pzpbtrs (char *uplo , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs , MKL_Complex16
*a , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *descb ,
MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT
*info );
Include Files
• mkl_scalapack.h
Description
The p?pbtrsfunction solves for X a system of distributed linear equations in the form:
sub(A)*X = sub(B) ,
where sub(A) = A(1:n, ja:ja+n-1) is an n-by-n real symmetric or complex Hermitian positive definite
distributed band matrix, and sub(B) denotes the distributed matrix B(ib:ib+n-1, 1:nrhs).
Input Parameters
nrhs (global) The number of right hand sides; the number of columns of the
distributed matrix sub(B) (nrhs≥0).
1347
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
a, b (local)
Pointers into the local memory to arrays of local sizes lld_a*LOCc(ja+n-1)
and lld_b*LOCc(nrhs-1), respectively.
On entry, the array b contains the local pieces of the n-by-nrhs right hand
side distributed matrix sub(B).
ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
If dtype_a = 501, then dlen_≥ 7;
ib (global) The row index in the global matrix B indicating the first row of the
matrix sub(B).
descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
If dtype_b = 502, then dlen_≥ 7;
The array af is of size laf. It contains auxiliary fill-in space. The fill-in
space is created in a call to the factorization function p?dbtrf and is stored
in af.
Must be laf≥nrhs*bw.
If laf is not large enough, an error code will be returned and the minimum
acceptable size will be returned in af[0].
lwork (local or global) The size of the array work, must be at least lwork≥bw2.
Output Parameters
b On exit, if info=0, this array contains the local pieces of the n-by-nrhs
solution distributed matrix X.
work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.
info < 0:
1348
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?pttrs
Solves a system of linear equations with a symmetric
(Hermitian) positive-definite tridiagonal distributed
matrix using the factorization computed by p?pttrf.
Syntax
void pspttrs (MKL_INT *n , MKL_INT *nrhs , float *d , float *e , MKL_INT *ja , MKL_INT
*desca , float *b , MKL_INT *ib , MKL_INT *descb , float *af , MKL_INT *laf , float
*work , MKL_INT *lwork , MKL_INT *info );
void pdpttrs (MKL_INT *n , MKL_INT *nrhs , double *d , double *e , MKL_INT *ja , MKL_INT
*desca , double *b , MKL_INT *ib , MKL_INT *descb , double *af , MKL_INT *laf , double
*work , MKL_INT *lwork , MKL_INT *info );
void pcpttrs (char *uplo , MKL_INT *n , MKL_INT *nrhs , float *d , MKL_Complex8 *e ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *descb ,
MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT
*info );
void pzpttrs (char *uplo , MKL_INT *n , MKL_INT *nrhs , double *d , MKL_Complex16 *e ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *descb ,
MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT
*info );
Include Files
• mkl_scalapack.h
Description
The p?pttrsfunction solves for X a system of distributed linear equations in the form:
sub(A)*X = sub(B) ,
where sub(A) = A(1:n, ja:ja+n-1) is an n-by-n real symmetric or complex Hermitian positive definite
tridiagonal distributed matrix, and sub(B) denotes the distributed matrix B(ib:ib+n-1, 1:nrhs).
Input Parameters
1349
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
nrhs (global) The number of right hand sides; the number of columns of the
distributed matrix sub(B) (nrhs≥0).
d, e (local)
Pointers into the local memory to arrays of size nb_a each.
ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
If dtype_a = 501 or dtype_a = 502, then dlen_≥ 7;
On entry, the array b contains the local pieces of the n-by-nrhsright hand
side distributed matrix sub(B).
ib (global) The row index in the global matrix B indicating the first row of the
matrix to be operated on (which may be either all of B or a submatrix of B).
descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
If dtype_b = 502, then dlen_≥ 7;
Must be laf≥nb_a+2.
If laf is not large enough, an error code is returned and the minimum
acceptable size will be returned in af[0].
lwork (local or global) The size of the array work, must be at least
lwork≥ (10+2*min(100,nrhs))*NPCOL+4*nrhs.
Output Parameters
b On exit, this array contains the local pieces of the solution distributed
matrix X.
1350
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
work[0]) On exit, work[0] contains the minimum value of lwork required for
optimum performance.
info < 0:
if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a
scalar and had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?trtrs
Solves a system of linear equations with a triangular
distributed matrix.
Syntax
void pstrtrs (char *uplo , char *trans , char *diag , MKL_INT *n , MKL_INT *nrhs , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *jb ,
MKL_INT *descb , MKL_INT *info );
void pdtrtrs (char *uplo , char *trans , char *diag , MKL_INT *n , MKL_INT *nrhs ,
double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib ,
MKL_INT *jb , MKL_INT *descb , MKL_INT *info );
void pctrtrs (char *uplo , char *trans , char *diag , MKL_INT *n , MKL_INT *nrhs ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b ,
MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT *info );
void pztrtrs (char *uplo , char *trans , char *diag , MKL_INT *n , MKL_INT *nrhs ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b ,
MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?trtrsfunction solves for X one of the following systems of linear equations:
sub(A)*X = sub(B),
(sub(A))T*X = sub(B), or
(sub(A))H*X = sub(B),
where sub(A) = A(ia:ia+n-1, ja:ja+n-1) is a triangular distributed matrix of order n, and sub(B) denotes
the distributed matrix B(ib:ib+n-1, jb:jb+nrhs-1).
Input Parameters
1351
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
nrhs (global) The number of right-hand sides; i.e., the number of columns of the
distributed matrix sub(B) (nrhs≥0).
a, b (local)
Pointers into the local memory to arrays of local sizes lld_a*LOCc(ja+n-1)
and lld_b*LOCc(jb+nrhs-1), respectively.
The array a contains the local pieces of the distributed triangular matrix
sub(A).
If uplo = 'U', the leading n-by-n upper triangular part of sub(A) contains
the upper triangular matrix, and the strictly lower triangular part of sub(A)
is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
the lower triangular matrix, and the strictly upper triangular part of sub(A)
is not referenced.
If diag = 'U', the diagonal elements of sub(A) are also not referenced
and are assumed to be 1.
On entry, the array b contains the local pieces of the right hand side
distributed matrix sub(B).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of the matrix sub(B), respectively.
descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
Output Parameters
1352
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
info < 0:
if the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.
info> 0:
if info = i, the i-th diagonal element of sub(A) is zero, indicating that the
submatrix is singular and the solutions X have not been computed.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?gecon
Estimates the reciprocal of the condition number of a
general distributed matrix in either the 1-norm or the
infinity-norm.
Syntax
void psgecon (char *norm , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *anorm , float *rcond , float *work , MKL_INT *lwork , MKL_INT *iwork ,
MKL_INT *liwork , MKL_INT *info );
void pdgecon (char *norm , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *anorm , double *rcond , double *work , MKL_INT *lwork , MKL_INT
*iwork , MKL_INT *liwork , MKL_INT *info );
void pcgecon (char *norm , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , float *anorm , float *rcond , MKL_Complex8 *work , MKL_INT *lwork ,
float *rwork , MKL_INT *lrwork , MKL_INT *info );
void pzgecon (char *norm , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *anorm , double *rcond , MKL_Complex16 *work , MKL_INT *lwork ,
double *rwork , MKL_INT *lrwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?gecon function estimates the reciprocal of the condition number of a general distributed real/complex
matrix sub(A) = A(ia:ia+n-1, ja:ja+n-1) in either the 1-norm or infinity-norm, using the LU factorization
computed by p?getrf.
An estimate is obtained for ||(sub(A))-1||, and the reciprocal of the condition number is computed as
1353
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1).
The array a contains the local pieces of the factors L and U from the
factorization sub(A) = P*L*U; the unit diagonal elements of L are not
stored.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
anorm (global)
If norm = '1' or 'O', the 1-norm of the original distributed matrix sub(A);
work (local)
The array work of size lwork is a workspace array.
1354
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
LOCr and LOCc values can be computed using the ScaLAPACK tool function
numroc; NPROW and NPCOL can be determined by calling the function
blacs_gridinfo.
NOTE
iceil(x,y) is the ceiling of x/y, and mod(x,y) is the integer
remainder of x/y.
iwork (local) Workspace array of size liwork. Used in real flavors only.
liwork (local or global) The size of the array iwork; used in real flavors only. Must
be at least
liwork≥LOCr(n+mod(ia-1,mb_a)).
rwork (local)
Workspace array of size lrwork. Used in complex flavors only.
lrwork (local or global) The size of the array rwork; used in complex flavors only.
Must be at least
lrwork≥ max(1, 2*LOCc(n+mod(ja-1,nb_a))).
Output Parameters
rcond (global)
The reciprocal of the condition number of the distributed matrix sub(A). See
Description.
work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.
iwork[0] On exit, iwork[0] contains the minimum value of liwork required for
optimum performance (for real flavors).
rwork[0] On exit, rwork[0] contains the minimum value of lrwork required for
optimum performance (for complex flavors).
info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?pocon
Estimates the reciprocal of the condition number (in
the 1 - norm) of a symmetric / Hermitian positive-
definite distributed matrix.
1355
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
void pspocon (char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *anorm , float *rcond , float *work , MKL_INT *lwork , MKL_INT *iwork ,
MKL_INT *liwork , MKL_INT *info );
void pdpocon (char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *anorm , double *rcond , double *work , MKL_INT *lwork , MKL_INT
*iwork , MKL_INT *liwork , MKL_INT *info );
void pcpocon (char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , float *anorm , float *rcond , MKL_Complex8 *work , MKL_INT *lwork ,
float *rwork , MKL_INT *lrwork , MKL_INT *info );
void pzpocon (char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *anorm , double *rcond , MKL_Complex16 *work , MKL_INT *lwork ,
double *rwork , MKL_INT *lrwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?poconfunction estimates the reciprocal of the condition number (in the 1 - norm) of a real symmetric
or complex Hermitian positive definite distributed matrix sub(A) = A(ia:ia+n-1, ja:ja+n-1), using the
Cholesky factorization sub(A) = UH*U or sub(A) = L*LH computed by p?potrf.
An estimate is obtained for ||(sub(A))-1||, and the reciprocal of the condition number is computed as
Input Parameters
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1).
The array a contains the local pieces of the factors L or U from the Cholesky
factorization sub(A) = UH*U, or sub(A) = L*LH, as computed by p?potrf.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.
1356
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
anorm (global)
The 1-norm of the symmetric/Hermitian distributed matrix sub(A).
work (local)
The array work of size lwork is a workspace array.
NOTE
iceil(x,y) is the ceiling of x/y, and mod(x,y) is the integer
remainder of x/y.
iwork (local) Workspace array of size liwork. Used in real flavors only.
liwork (local or global) The size of the array iwork; used in real flavors only. Must
be at least liwork≥LOCr(n+mod(ia-1,mb_a)).
rwork (local)
Workspace array of size lrwork. Used in complex flavors only.
lrwork (local or global) The size of the array rwork; used in complex flavors only.
Must be at least lrwork≥ 2*LOCc(n+mod(ja-1,nb_a)).
1357
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
rcond (global)
The reciprocal of the condition number of the distributed matrix sub(A).
work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.
iwork[0] On exit, iwork[0] contains the minimum value of liwork required for
optimum performance (for real flavors).
rwork[0] On exit, rwork[0] contains the minimum value of lrwork required for
optimum performance (for complex flavors).
info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?trcon
Estimates the reciprocal of the condition number of a
triangular distributed matrix in either 1-norm or
infinity-norm.
Syntax
void pstrcon (char *norm , char *uplo , char *diag , MKL_INT *n , float *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , float *rcond , float *work , MKL_INT *lwork ,
MKL_INT *iwork , MKL_INT *liwork , MKL_INT *info );
void pdtrcon (char *norm , char *uplo , char *diag , MKL_INT *n , double *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , double *rcond , double *work , MKL_INT *lwork ,
MKL_INT *iwork , MKL_INT *liwork , MKL_INT *info );
void pctrcon (char *norm , char *uplo , char *diag , MKL_INT *n , MKL_Complex8 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *rcond , MKL_Complex8 *work ,
MKL_INT *lwork , float *rwork , MKL_INT *lrwork , MKL_INT *info );
void pztrcon (char *norm , char *uplo , char *diag , MKL_INT *n , MKL_Complex16 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *rcond , MKL_Complex16 *work ,
MKL_INT *lwork , double *rwork , MKL_INT *lrwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?trconfunction estimates the reciprocal of the condition number of a triangular distributed matrix
sub(A) = A(ia:ia+n-1, ja:ja+n-1), in either the 1-norm or the infinity-norm.
The norm of sub(A) is computed and an estimate is obtained for ||(sub(A))-1||, then the reciprocal of the
condition number is computed as
1358
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1).
The array a contains the local pieces of the triangular distributed matrix
sub(A).
If uplo = 'U', the leading n-by-n upper triangular part of this distributed
matrix contains the upper triangular matrix, and its strictly lower triangular
part is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of this distributed
matrix contains the lower triangular matrix, and its strictly upper triangular
part is not referenced.
If diag = 'U', the diagonal elements of sub(A) are also not referenced
and are assumed to be 1.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
work (local)
The array work of size lwork is a workspace array.
1359
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
iceil(x,y) is the ceiling of x/y, and mod(x,y) is the integer
remainder of x/y.
iwork (local) Workspace array of size liwork. Used in real flavors only.
liwork (local or global) The size of the array iwork; used in real flavors only. Must
be at least
liwork≥LOCr(n+mod(ia-1,mb_a)).
rwork (local)
Workspace array of size lrwork. Used in complex flavors only.
lrwork (local or global) The size of the array rwork; used in complex flavors only.
Must be at least
lrwork≥LOCc(n+mod(ja-1,nb_a)).
Output Parameters
rcond (global)
The reciprocal of the condition number of the distributed matrix sub(A).
work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.
iwork[0] On exit, iwork[0] contains the minimum value of liwork required for
optimum performance (for real flavors).
rwork[0] On exit, rwork[0] contains the minimum value of lrwork required for
optimum performance (for complex flavors).
info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
1360
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Refining the Solution and Estimating Its Error: ScaLAPACK Computational Routines
This section describes the ScaLAPACK routines for refining the computed solution of a system of linear
equations and estimating the solution error. You can call these routines after factorizing the matrix of the
system of equations and computing the solution (see Routines for Matrix Factorization and Solving Systems
of Linear Equations).
p?gerfs
Improves the computed solution to a system of linear
equations and provides error bounds and backward
error estimates for the solution.
Syntax
void psgerfs (char *trans , MKL_INT *n , MKL_INT *nrhs , float *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *af , MKL_INT *iaf , MKL_INT *jaf , MKL_INT
*descaf , MKL_INT *ipiv , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , float
*x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , float *ferr , float *berr , float
*work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork , MKL_INT *info );
void pdgerfs (char *trans , MKL_INT *n , MKL_INT *nrhs , double *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *af , MKL_INT *iaf , MKL_INT *jaf , MKL_INT
*descaf , MKL_INT *ipiv , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb ,
double *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , double *ferr , double *berr ,
double *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork , MKL_INT *info );
void pcgerfs (char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *af , MKL_INT *iaf , MKL_INT *jaf ,
MKL_INT *descaf , MKL_INT *ipiv , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , MKL_Complex8 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , float *ferr ,
float *berr , MKL_Complex8 *work , MKL_INT *lwork , float *rwork , MKL_INT *lrwork ,
MKL_INT *info );
void pzgerfs (char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *af , MKL_INT *iaf , MKL_INT *jaf ,
MKL_INT *descaf , MKL_INT *ipiv , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *jb ,
MKL_INT *descb , MKL_Complex16 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , double
*ferr , double *berr , MKL_Complex16 *work , MKL_INT *lwork , double *rwork , MKL_INT
*lrwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?gerfs function improves the computed solution to one of the systems of linear equations
sub(A)*sub(X) = sub(B),
sub(A)T*sub(X) = sub(B), or
sub(A)H*sub(X) = sub(B) and provides error bounds and backward error estimates for the solution.
Here sub(A) = A(ia:ia+n-1, ja:ja+n-1), sub(B) = B(ib:ib+n-1, jb:jb+nrhs-1), and sub(X) = X(ix:ix
+n-1, jx:jx+nrhs-1).
1361
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
nrhs (global) The number of right-hand sides, i.e., the number of columns of the
matrices sub(B) and sub(X) (nrhs≥ 0).
a, af, b, x (local)
Pointers into the local memory to arrays of local sizes
a: lld_a * LOCc(ja+n-1),
af: lld_af * LOCc(jaf+n-1),
b: lld_b * LOCc(jb+nrhs-1),
x: lld_x * LOCc(jx+nrhs-1).
The array a contains the local pieces of the distributed matrix sub(A).
The array af contains the local pieces of the distributed factors of the
matrix sub(A) = P*L*U as computed by p?getrf.
The array b contains the local pieces of the distributed matrix of right hand
sides sub(B).
On entry, the array x contains the local pieces of the distributed solution
matrix sub(X).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
iaf, jaf (global) The row and column indices in the global matrix AF indicating the
first row and the first column of the matrix sub(AF), respectively.
descaf (global and local) array of size dlen_. The array descriptor for the
distributed matrix AF.
ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of the matrix sub(B), respectively.
descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
ix, jx (global) The row and column indices in the global matrix X indicating the
first row and the first column of the matrix sub(X), respectively.
1362
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
descx (global and local) array of size dlen_. The array descriptor for the
distributed matrix X.
ipiv (local)
Array of size LOCr(m_af) + mb_af.
work (local)
The array work of size lwork is a workspace array.
NOTE
mod(x,y) is the integer remainder of x/y.
iwork (local) Workspace array, size liwork. Used in real flavors only.
liwork (local or global) The size of the array iwork; used in real flavors only. Must
be at least
liwork≥LOCr(n+mod(ib-1,mb_b)).
rwork (local)
Workspace array, size lrwork. Used in complex flavors only.
lrwork (local or global) The size of the array rwork; used in complex flavors only.
Must be at least lrwork≥LOCr(n+mod(ib-1,mb_b))).
Output Parameters
The array ferr contains the estimated forward error bound for each
solution vector of sub(X).
If XTRUE is the true solution corresponding to sub(X), ferr is an estimated
upper bound for the magnitude of the largest element in (sub(X) - XTRUE)
divided by the magnitude of the largest element in sub(X). The estimate is
as reliable as the estimate for rcond, and is almost always a slight
overestimate of the true error.
1363
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.
iwork[0] On exit, iwork[0] contains the minimum value of liwork required for
optimum performance (for real flavors).
rwork[0] On exit, rwork[0] contains the minimum value of lrwork required for
optimum performance (for complex flavors).
info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?porfs
Improves the computed solution to a system of linear
equations with symmetric/Hermitian positive definite
distributed matrix and provides error bounds and
backward error estimates for the solution.
Syntax
void psporfs (char *uplo , MKL_INT *n , MKL_INT *nrhs , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *af , MKL_INT *iaf , MKL_INT *jaf , MKL_INT *descaf , float
*b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , float *x , MKL_INT *ix , MKL_INT *jx ,
MKL_INT *descx , float *ferr , float *berr , float *work , MKL_INT *lwork , MKL_INT
*iwork , MKL_INT *liwork , MKL_INT *info );
void pdporfs (char *uplo , MKL_INT *n , MKL_INT *nrhs , double *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *af , MKL_INT *iaf , MKL_INT *jaf , MKL_INT
*descaf , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , double *x , MKL_INT
*ix , MKL_INT *jx , MKL_INT *descx , double *ferr , double *berr , double *work ,
MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork , MKL_INT *info );
void pcporfs (char *uplo , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *af , MKL_INT *iaf , MKL_INT *jaf , MKL_INT
*descaf , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_Complex8
*x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , float *ferr , float *berr ,
MKL_Complex8 *work , MKL_INT *lwork , float *rwork , MKL_INT *lrwork , MKL_INT *info );
void pzporfs (char *uplo , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *af , MKL_INT *iaf , MKL_INT *jaf ,
MKL_INT *descaf , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb ,
MKL_Complex16 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , double *ferr , double
*berr , MKL_Complex16 *work , MKL_INT *lwork , double *rwork , MKL_INT *lrwork ,
MKL_INT *info );
1364
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl_scalapack.h
Description
The p?porfsfunction improves the computed solution to the system of linear equations
sub(A)*sub(X) = sub(B),
where sub(A) = A(ia:ia+n-1, ja:ja+n-1) is a real symmetric or complex Hermitian positive definite
distributed matrix and
sub(B) = B(ib:ib+n-1, jb:jb+nrhs-1),
are right-hand side and solution submatrices, respectively. This function also provides error bounds and
backward error estimates for the solution.
Input Parameters
nrhs (global) The number of right-hand sides, i.e., the number of columns of the
matrices sub(B) and sub(X) (nrhs≥0).
a, af, b, x (local)
Pointers into the local memory to arrays of local sizes
a: lld_a * LOCc(ja+n-1),
af: lld_af * LOCc(jaf+n-1),
b: lld_b * LOCc(jb+nrhs-1),
x: lld_x * LOCc(jx+nrhs-1).
The array a contains the local pieces of the n-by-n symmetric/Hermitian
distributed matrix sub(A).
If uplo = 'U', the leading n-by-n upper triangular part of sub(A) contains
the upper triangular part of the matrix, and its strictly lower triangular part
is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
the lower triangular part of the distributed matrix, and its strictly upper
triangular part is not referenced.
The array af contains the factors L or U from the Cholesky factorization
sub(A) = L*LH or sub(A) = UH*U, as computed by p?potrf.
On entry, the array b contains the local pieces of the distributed matrix of
right hand sides sub(B).
1365
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
On entry, the array x contains the local pieces of the solution vectors
sub(X).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
iaf, jaf (global) The row and column indices in the global matrix AF indicating the
first row and the first column of the matrix sub(AF), respectively.
descaf (global and local) array of size dlen_. The array descriptor for the
distributed matrix AF.
ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of the matrix sub(B), respectively.
descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
ix, jx (global) The row and column indices in the global matrix X indicating the
first row and the first column of the matrix sub(X), respectively.
descx (global and local) array of size dlen_. The array descriptor for the
distributed matrix X.
work (local)
The array work of size lwork is a workspace array.
NOTE
mod(x,y) is the integer remainder of x/y.
iwork (local) Workspace array of size liwork. Used in real flavors only.
liwork (local or global) The size of the array iwork; used in real flavors only. Must
be at least
liwork≥LOCr(n+mod(ib-1,mb_b)).
rwork (local)
Workspace array of size lrwork. Used in complex flavors only.
lrwork (local or global) The size of the array rwork; used in complex flavors only.
Must be at least lrwork≥LOCr(n+mod(ib-1,mb_b))).
1366
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
The array ferr contains the estimated forward error bound for each
solution vector of sub(X).
If XTRUE is the true solution corresponding to sub(X), ferr is an estimated
upper bound for the magnitude of the largest element in (sub(X) - XTRUE)
divided by the magnitude of the largest element in sub(X). The estimate is
as reliable as the estimate for rcond, and is almost always a slight
overestimate of the true error.
This array is tied to the distributed matrix X.
The array berr contains the component-wise relative backward error of
each solution vector (that is, the smallest relative change in any entry of
sub(A) or sub(B) that makes sub(X) an exact solution). This array is tied to
the distributed matrix X.
work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.
iwork[0] On exit, iwork[0] contains the minimum value of liwork required for
optimum performance (for real flavors).
rwork[0] On exit, rwork[0] contains the minimum value of lrwork required for
optimum performance (for complex flavors).
info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?trrfs
Provides error bounds and backward error estimates
for the solution to a system of linear equations with a
distributed triangular coefficient matrix.
Syntax
void pstrrfs (char *uplo , char *trans , char *diag , MKL_INT *n , MKL_INT *nrhs , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *jb ,
MKL_INT *descb , float *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , float *ferr ,
float *berr , float *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork , MKL_INT
*info );
void pdtrrfs (char *uplo , char *trans , char *diag , MKL_INT *n , MKL_INT *nrhs ,
double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib ,
MKL_INT *jb , MKL_INT *descb , double *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx ,
double *ferr , double *berr , double *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT
*liwork , MKL_INT *info );
1367
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
void pctrrfs (char *uplo , char *trans , char *diag , MKL_INT *n , MKL_INT *nrhs ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b ,
MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_Complex8 *x , MKL_INT *ix , MKL_INT
*jx , MKL_INT *descx , float *ferr , float *berr , MKL_Complex8 *work , MKL_INT *lwork ,
float *rwork , MKL_INT *lrwork , MKL_INT *info );
void pztrrfs (char *uplo , char *trans , char *diag , MKL_INT *n , MKL_INT *nrhs ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b ,
MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_Complex16 *x , MKL_INT *ix , MKL_INT
*jx , MKL_INT *descx , double *ferr , double *berr , MKL_Complex16 *work , MKL_INT
*lwork , double *rwork , MKL_INT *lrwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?trrfsfunction provides error bounds and backward error estimates for the solution to one of the
systems of linear equations
sub(A)*sub(X) = sub(B),
sub(A)T*sub(X) = sub(B), or
sub(A)H*sub(X) = sub(B) ,
where sub(A) = A(ia:ia+n-1, ja:ja+n-1) is a triangular matrix,
The solution matrix X must be computed by p?trtrs or some other means before entering this function. The
function p?trrfs does not do iterative refinement because doing so cannot improve the backward error.
Input Parameters
1368
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
nrhs (global) The number of right-hand sides, that is, the number of columns of
the matrices sub(B) and sub(X) (nrhs≥0).
a, b, x (local)
Pointers into the local memory to arrays of local sizes
a: lld_a * LOCc(ja+n-1),
b: lld_b * LOCc(jb+nrhs-1),
x: lld_x * LOCc(jx+nrhs-1).
The array a contains the local pieces of the original triangular distributed
matrix sub(A).
If uplo = 'U', the leading n-by-n upper triangular part of sub(A) contains
the upper triangular part of the matrix, and its strictly lower triangular part
is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
the lower triangular part of the distributed matrix, and its strictly upper
triangular part is not referenced.
If diag = 'U', the diagonal elements of sub(A) are also not referenced
and are assumed to be 1.
On entry, the array b contains the local pieces of the distributed matrix of
right hand sides sub(B).
On entry, the array x contains the local pieces of the solution vectors
sub(X).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of the matrix sub(B), respectively.
descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
ix, jx (global) The row and column indices in the global matrix X indicating the
first row and the first column of the matrix sub(X), respectively.
descx (global and local) array of size dlen_. The array descriptor for the
distributed matrix X.
work (local)
The array work of size lwork is a workspace array.
1369
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
lwork≥ 2*LOCr(n+mod(ia-1,mb_a))
NOTE
mod(x,y) is the integer remainder of x/y.
iwork (local) Workspace array of size liwork. Used in real flavors only.
liwork (local or global) The size of the array iwork; used in real flavors only. Must
be at least
liwork≥LOCr(n+mod(ib-1,mb_b)).
rwork (local)
Workspace array of size lrwork. Used in complex flavors only.
lrwork (local or global) The size of the array rwork; used in complex flavors only.
Must be at least lrwork≥LOCr(n+mod(ib-1,mb_b))).
Output Parameters
The array ferr contains the estimated forward error bound for each
solution vector of sub(X).
If XTRUE is the true solution corresponding to sub(X), ferr is an estimated
upper bound for the magnitude of the largest element in (sub(X) - XTRUE)
divided by the magnitude of the largest element in sub(X). The estimate is
as reliable as the estimate for rcond, and is almost always a slight
overestimate of the true error.
This array is tied to the distributed matrix X.
The array berr contains the component-wise relative backward error of
each solution vector (that is, the smallest relative change in any entry of
sub(A) or sub(B) that makes sub(X) an exact solution). This array is tied to
the distributed matrix X.
work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.
iwork[0] On exit, iwork[0] contains the minimum value of liwork required for
optimum performance (for real flavors).
rwork[0] On exit, rwork[0] contains the minimum value of lrwork required for
optimum performance (for complex flavors).
info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
1370
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
p?getri
Computes the inverse of a LU-factored distributed
matrix.
Syntax
void psgetri (MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca ,
MKL_INT *ipiv , float *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork ,
MKL_INT *info );
void pdgetri (MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca ,
MKL_INT *ipiv , double *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork ,
MKL_INT *info );
void pcgetri (MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *ipiv , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT
*liwork , MKL_INT *info );
void pzgetri (MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *ipiv , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT *iwork ,
MKL_INT *liwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?getrifunction computes the inverse of a general distributed matrix sub(A) = A(ia:ia+n-1, ja:ja
+n-1) using the LU factorization computed by p?getrf. This method inverts U and then computes the
inverse of sub(A) by solving the system
inv(sub(A))*L = inv(U)
for inv(sub(A)).
Input Parameters
n (global) The number of rows and columns to be operated on, that is, the
order of the distributed matrix sub(A) (n≥0).
a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).
On entry, the array a contains the local pieces of the L and U obtained by
the factorization sub(A) = P*L*U computed by p?getrf.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
1371
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
work (local)
The array work of size lwork is a workspace array.
lwork (local) The size of the array work. lwork must be at least
lwork≥LOCr(n+mod(ia-1,mb_a))*nb_a.
NOTE
mod(x,y) is the integer remainder of x/y.
The array work is used to keep at most an entire column block of sub(A).
iwork (local) Workspace array used for physically transposing the pivots, size
liwork.
Output Parameters
ipiv (local)
Array of size LOCr(m_a)+ mb_a.
work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.
iwork[0] On exit, iwork[0] contains the minimum value of liwork required for
optimum performance.
info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.
info> 0:
1372
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = i, the matrix element U(i,i) is exactly zero. The factorization has
been completed, but the factor U is exactly singular, and division by zero
will occur if it is used to solve a system of equations.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?potri
Computes the inverse of a symmetric/Hermitian
positive definite distributed matrix.
Syntax
void pspotri (char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *info );
void pdpotri (char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *info );
void pcpotri (char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *info );
void pzpotri (char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?potrifunction computes the inverse of a real symmetric or complex Hermitian positive definite
distributed matrix sub(A) = A(ia:ia+n-1, ja:ja+n-1) using the Cholesky factorization sub(A) = UH*U or
sub(A) = L*LH computed by p?potrf.
Input Parameters
n (global) The number of rows and columns to be operated on, that is, the
order of the distributed matrix sub(A) (n≥0).
a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).
On entry, the array a contains the local pieces of the triangular factor U or L
from the Cholesky factorization sub(A) = UH*U, or sub(A) = L*LH, as
computed by p?potrf.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.
1373
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
Output Parameters
a On exit, overwritten by the local pieces of the upper or lower triangle of the
(symmetric/Hermitian) inverse of sub(A).
info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.
info> 0:
If info = i, the element (i, i) of the factor U or L is zero, and the inverse
could not be computed.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?trtri
Computes the inverse of a triangular distributed
matrix.
Syntax
void pstrtri (char *uplo , char *diag , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , MKL_INT *info );
void pdtrtri (char *uplo , char *diag , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , MKL_INT *info );
void pctrtri (char *uplo , char *diag , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_INT *info );
void pztrtri (char *uplo , char *diag , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?trtrifunction computes the inverse of a real or complex upper or lower triangular distributed matrix
sub(A) = A(ia:ia+n-1, ja:ja+n-1).
Input Parameters
1374
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
diag Must be 'N' or 'U'.
n (global) The number of rows and columns to be operated on, that is, the
order of the distributed matrix sub(A) (n≥0).
a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).
The array a contains the local pieces of the triangular distributed matrix
sub(A).
If uplo = 'U', the leading n-by-n upper triangular part of sub(A) contains
the upper triangular matrix to be inverted, and the strictly lower triangular
part of sub(A) is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
the lower triangular matrix, and the strictly upper triangular part of sub(A)
is not referenced.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
Output Parameters
info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.
info> 0:
If info = k, the matrix element A(ia+k-1, ja+k-1) is exactly zero. The
triangular matrix sub(A) is singular and its inverse cannot be computed.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?geequ
Computes row and column scaling factors intended to
equilibrate a general rectangular distributed matrix
and reduce its condition number.
1375
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
void psgeequ (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *r , float *c , float *rowcnd , float *colcnd , float *amax , MKL_INT
*info );
void pdgeequ (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *r , double *c , double *rowcnd , double *colcnd , double *amax ,
MKL_INT *info );
void pcgeequ (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , float *r , float *c , float *rowcnd , float *colcnd , float *amax ,
MKL_INT *info );
void pzgeequ (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *r , double *c , double *rowcnd , double *colcnd , double
*amax , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?geequfunction computes row and column scalings intended to equilibrate an m-by-n distributed matrix
sub(A) = A(ia:ia+m-1, ja:ja+n-1) and reduce its condition number. The output array r returns the row
scale factors ri , and the array c returns the column scale factors cj . These factors are chosen to try to make
the largest element in each row and column of the matrix B with elements bij=ri*aij*cj have absolute value 1.
ri and cj are restricted to be between SMLNUM = smallest safe number and BIGNUM = largest safe number.
Use of these scaling factors is not guaranteed to reduce the condition number of sub(A) but works well in
practice.
SMLNUM and BIGNUM are parameters representing machine precision. You can use the ?lamch routines to
compute them. For example, compute single precision values of SMLNUM and BIGNUM as follows:
Input Parameters
m (global) The number of rows to be operated on, that is, the number of rows
of the distributed matrix sub(A) (m≥ 0).
n (global) The number of columns to be operated on, that is, the number of
columns of the distributed matrix sub(A) (n≥ 0).
a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).
The array a contains the local pieces of the m-by-n distributed matrix whose
equilibration factors are to be computed.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.
1376
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
Output Parameters
r, c (local)
Arrays of sizes LOCr(m_a) and LOCc(n_a), respectively.
If info = 0, or info>ia+m-1, r[i] contain the row scale factors for sub(A)
for ia-1≤ i<ia+m-1. r is aligned with the distributed matrix A, and
replicated across every process column. r is tied to the distributed matrix
A.
If info = 0, c[i] contain the column scale factors for sub(A) for ja-1≤
i<ja+n-1. c is aligned with the distributed matrix A, and replicated down
every process row. c is tied to the distributed matrix A.
amax (global)
Absolute value of the largest matrix element. If amax is very close to
overflow or very close to underflow, the matrix should be scaled.
info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.
info> 0:
If info = i and
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?poequ
Computes row and column scaling factors intended to
equilibrate a symmetric (Hermitian) positive definite
distributed matrix and reduce its condition number.
1377
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
void pspoequ (MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float
*sr , float *sc , float *scond , float *amax , MKL_INT *info );
void pdpoequ (MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca ,
double *sr , double *sc , double *scond , double *amax , MKL_INT *info );
void pcpoequ (MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *sr , float *sc , float *scond , float *amax , MKL_INT *info );
void pzpoequ (MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *sr , double *sc , double *scond , double *amax , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?poequ function computes row and column scalings intended to equilibrate a real symmetric or
complex Hermitian positive definite distributed matrix sub(A) = A(ia:ia+n-1, ja:ja+n-1) and reduce its
condition number (with respect to the two-norm). The output arrays sr and sc return the row and column
scale factors
These factors are chosen so that the scaled distributed matrix B with elements bij=s(i)*aij*s(j) has ones on
the diagonal.
This choice of sr and sc puts the condition number of B within a factor n of the smallest possible condition
number over all possible diagonal scalings.
The auxiliary function p?laqsy uses scaling factors computed by p?geequ to scale a general rectangular
matrix.
Input Parameters
n (global) The number of rows and columns to be operated on, that is, the
order of the distributed matrix sub(A) (n≥0).
a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
Output Parameters
sr, sc (local)
1378
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Arrays of sizes LOCr(m_a) and LOCc(n_a), respectively.
If info = 0, the array sr(ia:ia+n-1) contains the row scale factors for
sub(A). sr is aligned with the distributed matrix A, and replicated across
every process column. sr is tied to the distributed matrix A.
scond (global)
amax (global)
Absolute value of the largest matrix element. If amax is very close to
overflow or very close to underflow, the matrix should be scaled.
info (global)
If info=0, the execution is successful.
info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.
info> 0:
If info = k, the k-th diagonal entry of sub(A) is nonpositive.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
1379
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
p?geqrf
Computes the QR factorization of a general m-by-n
matrix.
Syntax
void psgeqrf (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdgeqrf (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
void pcgeqrf (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT
*info );
void pzgeqrf (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT
*info );
Include Files
• mkl_scalapack.h
Description
The p?geqrf function forms the QR factorization of a general m-by-n distributed matrix sub(A)= A(ia:ia
+m-1, ja:ja+n-1) as
A=Q*R.
Input Parameters
m (global) The number of rows in the distributed matrix sub(A); (m≥ 0).
1380
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
n (global) The number of columns in the distributed matrix sub(A); (n≥ 0).
a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A(ia:ia+m-1, ja:ja+n-1),
respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A
work (local).
Workspace array of size lwork.
Output Parameters
a The elements on and above the diagonal of sub(A) contain the min(m,n)-by-
n upper trapezoidal matrix R (R is upper triangular if m≥n); the elements
below the diagonal, with the array tau, represent the orthogonal/unitary
matrix Q as a product of elementary reflectors (see Application Notes
below).
tau (local)
Array of size LOCc(ja+min(m,n)-1).
work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0, the execution is successful.
1381
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
< 0, if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
Application Notes
The matrix Q is represented as a product of elementary reflectors
Q = H(ja)*H(ja+1)*...*H(ja+k-1),
where k = min(m,n).
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?geqpf
Computes the QR factorization of a general m-by-n
matrix with pivoting.
Syntax
void psgeqpf (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *ipiv , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdgeqpf (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *ipiv , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
void pcgeqpf (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *ipiv , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT
*lwork , float *rwork , MKL_INT *lrwork , MKL_INT *info );
void pzgeqpf (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *ipiv , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT
*lwork , double *rwork , MKL_INT *lrwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?geqpf function forms the QR factorization with column pivoting of a general m-by-n distributed matrix
sub(A)= A(ia:ia+m-1, ja:ja+n-1) as
sub(A)*P=Q*R.
Input Parameters
a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).
1382
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Contains the local pieces of the distributed matrix sub(A) to be factored.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A(ia:ia+m-1, ja:ja+n-1),
respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
work (local).
Workspace array of size lwork.
You can determine MYROW, MYCOL, NPROW and NPCOL by calling the
blacs_gridinfofunction.
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.
rwork (local).
Workspace array of size lrwork (complex flavors only).
lrwork (local or global) size of rwork (complex flavors only). The value of lrwork
must be at least
lwork≥LOCc (ja+n-1) + nq0 .
Here
iroff = mod(ia-1, mb_a), icoff = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),
iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),
mp0 = numroc(m+iroff, mb_a, MYROW, iarow, NPROW ),
nq0 = numroc(n+icoff, nb_a, MYCOL, iacol, NPCOL),
1383
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
You can determine MYROW, MYCOL, NPROW and NPCOL by calling the
blacs_gridinfofunction.
If lrwork = -1, then lrwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.
Output Parameters
ipiv[i] = k, the local (i+1)-th column of sub(A)*P was the global k-th
column of sub(A) (0 ≤ i < LOCc(ja+n-1). ipiv is tied to the distributed
matrix A.
tau (local)
Array of size LOCc(ja+min(m, n)-1).
Contains the scalar factor tau of elementary reflectors. tau is tied to the
distributed matrix A.
work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.
rwork[0] On exit, rwork[0] contains the minimum value of lrwork required for
optimum performance.
info (global)
= 0, the execution is successful.
< 0, if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
Application Notes
The matrix Q is represented as a product of elementary reflectors
Q = H(1)*H(2)*...*H(k)
where k = min(m,n).
1384
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The matrix P is represented in ipiv as follows: if ipiv[j]= i then the (j+1)-th column of P is the i-th
canonical unit vector (0 ≤ j < LOCc(ja+n-1).
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?orgqr
Generates the orthogonal matrix Q of the QR
factorization formed by p?geqrf.
Syntax
void psorgqr (MKL_INT *m , MKL_INT *n , MKL_INT *k , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdorgqr (MKL_INT *m , MKL_INT *n , MKL_INT *k , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?orgqrfunction generates the whole or part of m-by-n real distributed matrix Q denoting A(ia:ia+m-1,
ja:ja+n-1) with orthonormal columns, which is defined as the first n columns of a product of k elementary
reflectors of order m
Q= H(1)*H(2)*...*H(k)
as returned by p?geqrf.
Input Parameters
a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).
The j-th column of the matrix stored in amust contain the vector that
defines the elementary reflector H(j), ja≤ j ≤ ja +k-1, as returned by
p?geqrf in the k columns of its distributed matrix argument A(ia:*, ja:ja
+k-1).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A(ia:ia+m-1, ja:ja+n-1),
respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
tau (local)
Array of size LOCc(ja+k-1).
1385
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
work (local)
Workspace array of size of lwork.
Output Parameters
work[0] On exit, [0] contains the minimum value of lwork required for optimum
performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?ungqr
Generates the complex unitary matrix Q of the QR
factorization formed by p?geqrf.
Syntax
void pcungqr (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
void pzungqr (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );
1386
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl_scalapack.h
Description
This function generates the whole or part of m-by-n complex distributed matrix Q denoting A(ia:ia+m-1,
ja:ja+n-1) with orthonormal columns, which is defined as the first n columns of a product of k elementary
reflectors of order m
Q = H(1)*H(2)*...*H(k)
as returned by p?geqrf.
Input Parameters
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1). The
j-th column of the matrix stored in amust contain the vector that defines
the elementary reflector H(j), ja≤ j≤ ja +k-1, as returned by p?geqrf in
the k columns of its distributed matrix argument A(ia:*, ja:ja+k-1).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
tau (local)
Array of size LOCc(ja+k-1).
work (local)
Workspace array of size of lwork.
1387
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.
Output Parameters
work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?ormqr
Multiplies a general matrix by the orthogonal matrix Q
of the QR factorization formed by p?geqrf.
Syntax
void psormqr (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *tau , float *c , MKL_INT *ic ,
MKL_INT *jc , MKL_INT *descc , float *work , MKL_INT *lwork , MKL_INT *info );
void pdormqr (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , double
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *tau , double *c , MKL_INT
*ic , MKL_INT *jc , MKL_INT *descc , double *work , MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?ormqrfunction overwrites the general real m-by-n distributed matrix sub (C) = C(iс:iс+m-1,jс:jс
+n-1) with
where Q is a real orthogonal distributed matrix defined as the product of k elementary reflectors
Q = H(1) H(2)... H(k)
1388
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
side (global)
='L':Q or QT is applied from the left.
='R':Q or QT is applied from the right.
trans (global)
='N', no transpose, Q is applied.
='T', transpose, QT is applied.
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1). The
j-th column of the matrix stored in amust contain the vector that defines
the elementary reflector H(j), ja≤j≤ja+k-1, as returned by p?geqrf in the
k columns of its distributed matrix argument A(ia:*, ja:ja+k-1). A(ia:*,
ja:ja+k-1) is modified by the function but restored on exit.
If side = 'L', lld_a ≥ max(1, LOCr(ia+m-1))
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
tau (local)
Array of size LOCc(ja+k-1).
c (local)
Pointer into the local memory to an array of local size lld_c*LOCc(jc+n-1).
ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the matrix sub(C), respectively.
descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.
work (local)
1389
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
if side = 'L',
lwork≥max((nb_a*(nb_a-1))/2, (nqc0+max(npa0+numroc(numroc(n
+icoffc, nb_a, 0, 0, NPCOL), nb_a, 0, 0, lcmq), mpc0))*nb_a)
+ nb_a*nb_a
end if
where
lcmq = lcm/NPCOL with lcm = ilcm(NPROW, NPCOL),
iroffa = mod(ia-1, mb_a),
icoffa = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),
npa0= numroc(n+iroffa, mb_a, MYROW, iarow, NPROW),
iroffc = mod(ic-1, mb_c),
icoffc = mod(jc-1, nb_c),
icrow = indxg2p(ic, mb_c, MYROW, rsrc_c, NPROW),
iccol = indxg2p(jc, nb_c, MYCOL, csrc_c, NPCOL),
mpc0= numroc(m+iroffc, mb_c, MYROW, icrow, NPROW),
nqc0= numroc(n+icoffc, nb_c, MYCOL, iccol, NPCOL),
ilcm, indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL,
NPROW and NPCOL can be determined by calling the function
blacs_gridinfo.
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.
Output Parameters
work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
1390
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?unmqr
Multiplies a complex matrix by the unitary matrix Q of
the QR factorization formed by p?geqrf.
Syntax
void pcunmqr (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau ,
MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work ,
MKL_INT *lwork , MKL_INT *info );
void pzunmqr (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau ,
MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex16 *work ,
MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
This function overwrites the general complex m-by-n distributed matrix sub (C) = C(iс:iс+m-1,jс:jс+n-1)
with
where Q is a complex unitary distributed matrix defined as the product of k elementary reflectors
Q = H(1) H(2)... H(k) as returned by p?geqrf. Q is of order m if side = 'L' and of order n if side ='R'.
Input Parameters
side (global)
='L': Q or QH is applied from the left.
='R': Q or QH is applied from the right.
trans (global)
='N', no transpose, Q is applied.
='C', conjugate transpose, QH is applied.
1391
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+k-1). The
j-th column of the matrix stored in amust contain the vector that defines
the elementary reflector H(j), ja≤j≤ja+k-1, as returned by p?geqrf in the
k columns of its distributed matrix argument A(ia:*, ja:ja+k-1). A(ia:*,
ja:ja+k-1) is modified by the function but restored on exit.
If side = 'L', lld_a ≥ max(1, LOCr(ia+m-1))
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
tau (local)
Array of size LOCc(ja+k-1).
c (local)
Pointer into the local memory to an array of local size lld_c*LOCc(jc+n-1).
ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the submatrix C, respectively.
descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.
work (local)
Workspace array of size of lwork.
If side = 'L',
1392
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
npa0 = numroc(n+iroffa, mb_a, MYROW, iarow, NPROW),
iroffc = mod(ic-1, mb_c),
icoffc = mod(jc-1, nb_c),
icrow = indxg2p(ic, mb_c, MYROW, rsrc_c, NPROW),
iccol = indxg2p(jc, nb_c, MYCOL, csrc_c, NPCOL),
mpc0 = numroc(m+iroffc, mb_c, MYROW, icrow, NPROW),
nqc0 = numroc(n+icoffc, nb_c, MYCOL, iccol, NPCOL),
ilcm, indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL,
NPROW and NPCOL can be determined by calling the function
blacs_gridinfo.
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.
Output Parameters
work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?gelqf
Computes the LQ factorization of a general
rectangular matrix.
Syntax
void psgelqf (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdgelqf (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
void pcgelqf (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT
*info );
void pzgelqf (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT
*info );
1393
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl_scalapack.h
Description
The p?gelqf function computes the LQ factorization of a real/complex distributed m-by-n matrix sub(A)=
A(ia:ia+m-1,ja:ja+n-1) = L*Q.
Input Parameters
m (global) The number of rows in the distributed submatrix sub(A) (m≥ 0).
a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).
ia, ja (global) The row and column indices in the global array A indicating the first
row and the first column of the submatrix A(ia:ia+m-1,ja:ja+n-1),
respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
work (local)
Workspace array of size of lwork.
NOTE
mod(x,y) is the integer remainder of x/y.
1394
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
a The elements on and below the diagonal of sub(A) contain the m-by-
min(m,n) lower trapezoidal matrix L (L is lower trapezoidal if m ≤ n); the
elements above the diagonal, with the array tau, represent the orthogonal/
unitary matrix Q as a product of elementary reflectors (see Application
Notes below).
tau (local)
Array of size LOCr(ia+min(m, n)-1).
work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
Application Notes
The matrix Q is represented as a product of elementary reflectors
Q = H(ia+k-1)*H(ia+k-2)*...*H(ia),
where k = min(m,n)
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?orglq
Generates the real orthogonal matrix Q of the LQ
factorization formed by p?gelqf.
Syntax
void psorglq (MKL_INT *m , MKL_INT *n , MKL_INT *k , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdorglq (MKL_INT *m , MKL_INT *n , MKL_INT *k , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
1395
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Description
The p?orglq function generates the whole or part of m-by-n real distributed matrix Q denoting A(ia:ia
+m-1,ja:ja+n-1) with orthonormal rows, which is defined as the first m rows of a product of k elementary
reflectors of order n
as returned by p?gelqf.
Input Parameters
a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).
On entry, the i-th row of the matrix stored in amust contain the vector that
defines the elementary reflector H(i), ia≤i≤ia+k-1, as returned by
p?gelqf in the k rows of its distributed matrix argument A(ia:ia+k-1,
ja:*).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A(ia:ia+m-1,ja:ja+n-1),
respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
work (local)
Workspace array of size of lwork.
NOTE
mod(x,y) is the integer remainder of x/y.
indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.
1396
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.
Output Parameters
tau (local)
Array of size LOCr(ia+k-1).
work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?unglq
Generates the unitary matrix Q of the LQ factorization
formed by p?gelqf.
Syntax
void pcunglq (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
void pzunglq (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
This function generates the whole or part of m-by-n complex distributed matrix Q denoting A(ia:ia
+m-1,ja:ja+n-1) with orthonormal rows, which is defined as the first m rows of a product of k elementary
reflectors of order n
Input Parameters
1397
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).
On entry, the i-th row of the matrix stored in amust contain the vector that
defines the elementary reflector H(i), ia≤i≤ia+k-1, as returned by
p?gelqf in the k rows of its distributed matrix argument A(ia:ia+k-1,
ja:*).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A(ia:ia+m-1,ja:ja+n-1),
respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
tau (local)
Array of size LOCr(ia+k-1).
work (local)
Workspace array of size of lwork.
NOTE
mod(x,y) is the integer remainder of x/y.
1398
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?ormlq
Multiplies a general matrix by the orthogonal matrix Q
of the LQ factorization formed by p?gelqf.
Syntax
void psormlq (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *tau , float *c , MKL_INT *ic ,
MKL_INT *jc , MKL_INT *descc , float *work , MKL_INT *lwork , MKL_INT *info );
void pdormlq (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , double
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *tau , double *c , MKL_INT
*ic , MKL_INT *jc , MKL_INT *descc , double *work , MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?ormlq function overwrites the general real m-by-n distributed matrix sub(C) = C(iс:iс+m-1,jс:jс
+n-1) with
where Q is a real orthogonal distributed matrix defined as the product of k elementary reflectors
Q = H(k)...H(2) H(1)
Input Parameters
side (global)
='L': Q or QT is applied from the left.
='R': Q or QT is applied from the right.
trans (global)
1399
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+m-1), if
side = 'L' and lld_a*LOCc(ja+n-1), if side = 'R'. The i-th row of the
matrix stored in amust contain the vector that defines the elementary
reflector H(i), ia≤i≤ia+k-1, as returned by p?gelqf in the k rows of its
distributed matrix argument A(ia:ia+k-1, ja:*).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
tau (local)
Array of size LOCc(ja+k-1).
c (local)
Pointer into the local memory to an array of local size lld_c*LOCc(jc+n-1).
ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the submatrix C, respectively.
descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.
work (local)
Workspace array of size of lwork.
If side = 'L',
1400
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lwork≥max((mb_a* (mb_a-1))/2, (mpc0+nqc0)*mb_a + mb_a*mb_a
end if
where
lcmp = lcm/NPROW with lcm = ilcm (NPROW, NPCOL),
iroffa = mod(ia-1, mb_a),
icoffa = mod(ja-1, nb_a),
iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),
mqa0 = numroc(m+icoffa, nb_a, MYCOL, iacol, NPCOL),
iroffc = mod(ic-1, mb_c),
icoffc = mod(jc-1, nb_c),
icrow = indxg2p(ic, mb_c, MYROW, rsrc_c, NPROW),
iccol = indxg2p(jc, nb_c, MYCOL, csrc_c, NPCOL),
mpc0 = numroc(m+iroffc, mb_c, MYROW, icrow, NPROW),
nqc0 = numroc(n+icoffc, nb_c, MYCOL, iccol, NPCOL),
NOTE
mod(x,y) is the integer remainder of x/y.
ilcm, indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL,
NPROW and NPCOL can be determined by calling the function
blacs_gridinfo.
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.
Output Parameters
work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
1401
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
p?unmlq
Multiplies a general matrix by the unitary matrix Q of
the LQ factorization formed by p?gelqf.
Syntax
void pcunmlq (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau ,
MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work ,
MKL_INT *lwork , MKL_INT *info );
void pzunmlq (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau ,
MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex16 *work ,
MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
This function overwrites the general complex m-by-n distributed matrix sub(C) = C(iс:iс+m-1,jс:jс+n-1)
with
where Q is a complex unitary distributed matrix defined as the product of k elementary reflectors
Q = H(k)' ... H(2)' H(1)'
Input Parameters
side (global)
='L': Q or QH is applied from the left.
='R': Q or QH is applied from the right.
trans (global)
='N', no transpose, Q is applied.
='C', conjugate transpose, QH is applied.
a (local)
1402
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Pointer into the local memory to an array of size lld_a*LOCc(ja+m-1), if
side = 'L' and lld_a*LOCc(ja+n-1), if side = 'R', where lld_a≥
max(1, LOCr (ia+k-1)). The i-th column of the matrix stored in amust
contain the vector that defines the elementary reflector H(i), ia≤i≤ia+k-1,
as returned by p?gelqf in the k rows of its distributed matrix argument
A( ia:ia+k-1, ja:*). A( ia:ia+k-1, ja:*) is modified by the function but
restored on exit.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
tau (local)
Array of size LOCc(ia+k-1).
c (local)
Pointer into the local memory to an array of local size lld_c*LOCc(jc+n-1).
ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the submatrix C, respectively.
descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.
work (local)
Workspace array of size of lwork.
If side = 'L',
1403
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
mod(x,y) is the integer remainder of x/y.
ilcm, indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL,
NPROW and NPCOL can be determined by calling the function
blacs_gridinfo.
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.
Output Parameters
work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?geqlf
Computes the QL factorization of a general matrix.
Syntax
void psgeqlf (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdgeqlf (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
void pcgeqlf (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT
*info );
void pzgeqlf (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT
*info );
1404
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl_scalapack.h
Description
The p?geqlf function forms the QL factorization of a real/complex distributed m-by-n matrix sub(A)=
A(ia:ia+m-1, ja:ja+n-1) = Q*L.
Input Parameters
a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).
Contains the local pieces of the distributed matrix sub(A) to be factored.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A(ia:ia+m-1, ja:ja+n-1),
respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
work (local)
Workspace array of size of lwork.
NOTE
mod(x,y) is the integer remainder of x/y.
numroc and indxg2p are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.
1405
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
tau (local)
Array of size LOCc(ja+n-1).
work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
Application Notes
The matrix Q is represented as a product of elementary reflectors
Q = H(ja+k-1)*...*H(ja+1)*H(ja)
where k = min(m,n)
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?orgql
Generates the orthogonal matrix Q of the QL
factorization formed by p?geqlf.
Syntax
void psorgql (MKL_INT *m , MKL_INT *n , MKL_INT *k , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdorgql (MKL_INT *m , MKL_INT *n , MKL_INT *k , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
1406
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
The p?orgql function generates the whole or part of m-by-n real distributed matrix Q denoting A(ia:ia
+m-1,ja:ja+n-1) with orthonormal rows, which is defined as the first m rows of a product of k elementary
reflectors of order n
Q = H(k)*...*H(2)*H(1)
as returned by p?geqlf.
Input Parameters
a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).
On entry, the j-th column of the matrix stored in amust contain the vector
that defines the elementary reflector H(j),ja+n-k≤j≤ja+n-1, as returned
by p?geqlf in the k columns of its distributed matrix argument A(ia:*,ja
+n-k:ja+n-1).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A(ia:ia+m-1,ja:ja+n-1),
respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
tau (local)
Array of size LOCc(ja+n-1).
work (local)
Workspace array of size of lwork.
1407
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
mod(x,y) is the integer remainder of x/y.
indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.
Output Parameters
work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?ungql
Generates the unitary matrix Q of the QL factorization
formed by p?geqlf.
Syntax
void pcungql (const MKL_INT *m , const MKL_INT *n , const MKL_INT *k , MKL_Complex8
*a , const MKL_INT *ia , const MKL_INT *ja , const MKL_INT *desca , const MKL_Complex8
*tau , MKL_Complex8 *work , const MKL_INT *lwork , MKL_INT *info );
void pzungql (const MKL_INT *m , const MKL_INT *n , const MKL_INT *k , MKL_Complex16
*a , const MKL_INT *ia , const MKL_INT *ja , const MKL_INT *desca , const MKL_Complex16
*tau , MKL_Complex16 *work , const MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
This function generates the whole or part of m-by-n complex distributed matrix Q denoting A(ia:ia
+m-1,ja:ja+n-1) with orthonormal rows, which is defined as the first n columns of a product of k
elementary reflectors of order m
1408
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja
+n-1). On entry, the j-th columnof the matrix stored in a must
contain the vector that defines the elementary reflector H(j), ja+n-
k≤ j≤ ja+n-1, as returned by p?geqlf in the k columns of its
distributed matrix argument A(ia:*, ja+n-k: ja+n-1).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A(ia:ia+m-1,ja:ja+n-1),
respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
tau (local)
Array of size LOCr(ia+n-1).
work (local)
Workspace array of size of lwork.
indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.
Output Parameters
1409
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?ormql
Multiplies a general matrix by the orthogonal matrix Q
of the QL factorization formed by p?geqlf.
Syntax
void psormql (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *tau , float *c , MKL_INT *ic ,
MKL_INT *jc , MKL_INT *descc , float *work , MKL_INT *lwork , MKL_INT *info );
void pdormql (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , double
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *tau , double *c , MKL_INT
*ic , MKL_INT *jc , MKL_INT *descc , double *work , MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?ormqlfunction overwrites the general real m-by-n distributed matrix sub(C) = C(iс:iс+m-1,jс:jс
+n-1) with
where Q is a real orthogonal distributed matrix defined as the product of k elementary reflectors
Input Parameters
side (global)
='L': Q or QT is applied from the left.
='R': Q or QT is applied from the right.
trans (global)
='N', no transpose, Q is applied.
='T', transpose, QT is applied.
1410
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
m (global) The number of rows in the distributed matrix sub(C), (m≥0).
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+k-1). The
j-th column of the matrix stored in amust contain the vector that defines
the elementary reflector H(j), ja≤j≤ja+k-1, as returned by p?gelqf in the
k columns of its distributed matrix argument A(ia:*, ja:ja+k-1). A(ia:*,
ja:ja+k-1) is modified by the function but restored on exit.
If side = 'L',lld_a ≥ max(1, LOCr(ia+m-1)),
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
tau (local)
Array of size LOCc(ja+n-1).
c (local)
Pointer into the local memory to an array of local size lld_c*LOCc(jc+n-1).
ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the submatrix C, respectively.
descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.
work (local)
Workspace array of size of lwork.
If side = 'L',
lwork≥max((nb_a*(nb_a-1))/2, (nqc0+max(npa0 +
numroc(numroc(n+icoffc, nb_a, 0, 0, NPCOL), nb_a, 0, 0,
lcmq), mpc0))*nb_a) + nb_a*nb_a
1411
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
end if
where
lcmq = lcm/NPCOL with lcm = ilcm (NPROW, NPCOL),
iroffa = mod(ia-1, mb_a),
icoffa = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),
npa0= numroc(n + iroffa, mb_a, MYROW, iarow, NPROW),
iroffc = mod(ic-1, mb_c),
icoffc = mod(jc-1, nb_c),
icrow = indxg2p(ic, mb_c, MYROW, rsrc_c, NPROW),
iccol = indxg2p(jc, nb_c, MYCOL, csrc_c, NPCOL),
mpc0 = numroc(m+iroffc, mb_c, MYROW, icrow, NPROW),
nqc0 = numroc(n+icoffc, nb_c, MYCOL, iccol, NPCOL),
NOTE
mod(x,y) is the integer remainder of x/y.
ilcm, indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL,
NPROW and NPCOL can be determined by calling the function
blacs_gridinfo.
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.
Output Parameters
work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?unmql
Multiplies a general matrix by the unitary matrix Q of
the QL factorization formed by p?geqlf.
1412
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void pcunmql (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau ,
MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work ,
MKL_INT *lwork , MKL_INT *info );
void pzunmql (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau ,
MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex16 *work ,
MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
This function overwrites the general complex m-by-n distributed matrix sub(C) = C(iс:iс+m-1,jс:jс+n-1)
with
where Q is a complex unitary distributed matrix defined as the product of k elementary reflectors
Q = H(k)' ... H(2)' H(1)'
Input Parameters
side (global)
='L': Q or QH is applied from the left.
='R': Q or QH is applied from the right.
trans (global)
='N', no transpose, Q is applied.
='C', conjugate transpose, QH is applied.
a (local)
1413
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
tau (local)
Array of size LOCc(ia+n-1).
c (local)
Pointer into the local memory to an array of local size lld_c*LOCc(jc+n-1).
ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the submatrix C, respectively.
descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.
work (local)
Workspace array of size of lwork.
If side = 'L',
1414
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
iroffc = mod(ic-1, mb_c),
icoffc = mod(jc-1, nb_c),
icrow = indxg2p(ic, mb_c, MYROW, rsrc_c, NPROW),
iccol = indxg2p(jc, nb_c, MYCOL, csrc_c, NPCOL),
mpc0 = numroc(m+iroffc, mb_c, MYROW, icrow, NPROW),
nqc0 = numroc(n+icoffc, nb_c, MYCOL, iccol, NPCOL),
NOTE
mod(x,y) is the integer remainder of x/y.
ilcm, indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL,
NPROW and NPCOL can be determined by calling the function
blacs_gridinfo.
NOTE
mod(x,y) is the integer remainder of x/y.
Output Parameters
work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?gerqf
Computes the RQ factorization of a general
rectangular matrix.
Syntax
void psgerqf (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdgerqf (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
1415
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl_scalapack.h
Description
The p?gerqf function forms the QR factorization of a general m-by-n distributed matrix sub(A)= A(ia:ia
+m-1, ja:ja+n-1) as
A= R*Q
Input Parameters
a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A(ia:ia+m-1, ja:ja+n-1),
respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A
work (local).
Workspace array of size lwork.
NOTE
mod(x,y) is the integer remainder of x/y.
1416
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.
Output Parameters
tau (local)
Array of size LOCr(ia+m-1).
work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0, the execution is successful.
< 0, if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
Application Notes
The matrix Q is represented as a product of elementary reflectors
Q = H(ia)*H(ia+1)*...*H(ia+k-1),
where k = min(m,n).
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?orgrq
Generates the orthogonal matrix Q of the RQ
factorization formed by p?gerqf.
Syntax
void psorgrq (MKL_INT *m , MKL_INT *n , MKL_INT *k , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdorgrq (MKL_INT *m , MKL_INT *n , MKL_INT *k , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
1417
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl_scalapack.h
Description
The p?orgrqfunction generates the whole or part of m-by-n real distributed matrix Q denoting A(ia:ia
+m-1,ja:ja+n-1) with orthonormal rows that is defined as the last m rows of a product of k elementary
reflectors of order n
Q= H(1)*H(2)*...*H(k)
as returned by p?gerqf.
Input Parameters
a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).
The i-th row of the matrix stored in amust contain the vector that defines
the elementary reflector H(i), ia≤i≤ia+m-1, as returned by p?gerqf in the
k rows of its distributed matrix argument A(ia+m-k:ia+m-1, ja:*).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
tau (local)
Array of size LOCc(ja+k-1).
work (local)
Workspace array of size of lwork.
1418
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.
NOTE
mod(x,y) is the integer remainder of x/y.
Output Parameters
work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?ungrq
Generates the unitary matrix Q of the RQ factorization
formed by p?gerqf.
Syntax
void pcungrq (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
void pzungrq (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
This function generates the m-by-n complex distributed matrix Q denoting A(ia:ia+m-1,ja:ja+n-1) with
orthonormal rows, which is defined as the last m rows of a product of k elementary reflectors of order n
Input Parameters
1419
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1). The
i-th row of the matrix stored in amust contain the vector that defines the
elementary reflector H(i), ia+m-k≤i≤ia+m-1, as returned by p?gerqf in
the k rows of its distributed matrix argument A(ia+m-k:ia+m-1, ja:*).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
tau (local)
Array of size LOCr(ia+m-1).
work (local)
Workspace array of size of lwork.
NOTE
mod(x,y) is the integer remainder of x/y.
indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.
Output Parameters
1420
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?ormr3
Applies an orthogonal distributed matrix to a general
m-by-n distributed matrix.
Syntax
void psormr3 (const char* side, const char* trans, const MKL_INT* m, const MKL_INT* n,
const MKL_INT* k, const MKL_INT* l, const float* a, const MKL_INT* ia, const MKL_INT*
ja, const MKL_INT* desca, const float* tau, float* c, const MKL_INT* ic, const MKL_INT*
jc, const MKL_INT* descc, float* work, const MKL_INT* lwork, MKL_INT* info);
void pdormr3 (const char* side, const char* trans, const MKL_INT* m, const MKL_INT* n,
const MKL_INT* k, const MKL_INT* l, const double* a, const MKL_INT* ia, const MKL_INT*
ja, const MKL_INT* desca, const double* tau, double* c, const MKL_INT* ic, const
MKL_INT* jc, const MKL_INT* descc, double* work, const MKL_INT* lwork, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
p?ormr3 overwrites the general real m-by-n distributed matrix sub( C ) = C(ic:ic+m-1,jc:jc+n-1) with
where Q is a real orthogonal distributed matrix defined as the product of k elementary reflectors
Input Parameters
side (global)
= 'L': apply Q or QT from the Left;
= 'R': apply Q or QT from the Right.
1421
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
trans (global)
= 'N': No transpose, apply Q;
= 'T': Transpose, apply QT.
m (global)
The number of rows to be operated on i.e the number of rows of the
distributed submatrix sub( C ). m >= 0.
n (global)
The number of columns to be operated on i.e the number of columns of the
distributed submatrix sub( C ). n >= 0.
k (global)
The number of elementary reflectors whose product defines the matrix Q.
If side = 'L', m >= k >= 0,
l (global)
The columns of the distributed submatrix sub( A ) containing the
meaningful part of the Householder reflectors.
If side = 'L', m >= l >= 0,
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+m-1) if
side='L', and lld_a*LOCc(ja+n-1) if side='R', where lld_a >=
MAX(1,LOCr(ia+k-1));
On entry, the i-th row must contain the vector which defines the elementary
reflector H(i), ia <= i <= ia+k-1, as returned by p?tzrzf in the k rows of
its distributed matrix argument A(ia:ia+k-1,ja:*).
ia (global)
The row index in the global array a indicating the first row of sub( A ).
ja (global)
The column index in the global array a indicating the first column of
sub( A ).
tau (local)
Array, size LOCc(ia+k-1).
This array contains the scalar factors tau(i) of the elementary reflectors
H(i) as returned by p?tzrzf. tau is tied to the distributed matrix A.
1422
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
c (local)
Pointer into the local memory to an array of size lld_c*LOCc(jc+n-1) .
ic (global)
The row index in the global array c indicating the first row of sub( C ).
jc (global)
The column index in the global array c indicating the first column of
sub( C ).
work (local)
Array, size (lwork)
lwork (local)
The size of the array work.
Output Parameters
1423
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
info (local)
= 0: successful exit
< 0: If the i-th argument is an array and the j-th entry had an illegal
value, then info = -(i*100+j), if the i-th argument is a scalar and
had an illegal value, then info = -i.
Application Notes
Alignment requirements
The distributed submatrices A(ia:*, ja:*) and C(ic:ic+m-1,jc:jc+n-1) must verify some alignment
properties, namely the following expressions should be true:
If side = 'L',
p?unmr3
Applies an orthogonal distributed matrix to a general
m-by-n distributed matrix.
Syntax
void pcunmr3 (const char* side, const char* trans, const MKL_INT* m, const MKL_INT* n,
const MKL_INT* k, const MKL_INT* l, const MKL_Complex8* a, const MKL_INT* ia, const
MKL_INT* ja, const MKL_INT* desca, const MKL_Complex8* tau, MKL_Complex8* c, const
MKL_INT* ic, const MKL_INT* jc, const MKL_INT* descc, MKL_Complex8* work, const
MKL_INT* lwork, MKL_INT* info);
void pzunmr3 (const char* side, const char* trans, const MKL_INT* m, const MKL_INT* n,
const MKL_INT* k, const MKL_INT* l, const MKL_Complex16* a, const MKL_INT* ia, const
MKL_INT* ja, const MKL_INT* desca, const MKL_Complex16* tau, MKL_Complex16* c, const
MKL_INT* ic, const MKL_INT* jc, const MKL_INT* descc, MKL_Complex16* work, const
MKL_INT* lwork, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
p?unmr3 overwrites the general complex m-by-n distributed matrix sub( C ) = C(ic:ic+m-1,jc:jc+n-1) with
side = 'L' side = 'R'
trans = 'N': Q * sub( C ) sub( C ) * Q
where Q is a complex unitary distributed matrix defined as the product of k elementary reflectors
1424
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
side (global)
= 'L': apply Q or QH from the Left;
= 'R': apply Q or QH from the Right.
trans (global)
= 'N': No transpose, apply Q;
= 'C': Conjugate transpose, apply QH.
m (global)
The number of rows to be operated on i.e the number of rows of the
distributed submatrix sub( C ). m >= 0.
n (global)
The number of columns to be operated on i.e the number of columns of the
distributed submatrix sub( C ). n >= 0.
k (global)
The number of elementary reflectors whose product defines the matrix Q.
If side = 'L', m >= k >= 0, if side = 'R', n >= k >= 0.
l (global)
The columns of the distributed submatrix sub( A ) containing the
meaningful part of the Householder reflectors.
If side = 'L', m >= l >= 0, if side = 'R', n >= l >= 0.
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+m-1) if
side='L', and lld_a*LOCc(ja+n-1) if side='R', where lld_a >=
MAX(1,LOCr(ia+k-1));
On entry, the i-th row must contain the vector which defines the elementary
reflector H(i), ia <= i <= ia+k-1, as returned by p?tzrzf in the k rows of
its distributed matrix argument A(ia:ia+k-1,ja:*).
ia (global)
The row index in the global array a indicating the first row of sub( A ).
ja (global)
The column index in the global array a indicating the first column of
sub( A ).
tau (local)
Array, size LOCc(ia+k-1).
1425
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
This array contains the scalar factors tau(i) of the elementary reflectors
H(i) as returned by p?tzrzf. tau is tied to the distributed matrix A.
c (local)
Pointer into the local memory to an array of size lld_c*LOCc(jc+n-1) .
ic (global)
The row index in the global array c indicating the first row of sub( C ).
jc (global)
The column index in the global array c indicating the first column of
sub( C ).
work (local)
Array, size (lwork)
1426
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
work (local)
Array, size (lwork)
info (local)
= 0: successful exit
< 0: If the i-th argument is an array and the j-th entry had an illegal
value, then info = -(i*100+j), if the i-th argument is a scalar and
had an illegal value, then info = -i.
Application Notes
Alignment requirements
The distributed submatrices A(ia:*, ja:*) and C(ic:ic+m-1,jc:jc+n-1) must verify some alignment
properties, namely the following expressions should be true:
If side = 'L', ( nb_a = MB_C and ICOFFA = IROFFC )
If side = 'R', ( nb_a = nb_c and ICOFFA = ICOFFC and IACOL = ICCOL )
p?ormrq
Multiplies a general matrix by the orthogonal matrix Q
of the RQ factorization formed by p?gerqf.
Syntax
void psormrq (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *tau , float *c , MKL_INT *ic ,
MKL_INT *jc , MKL_INT *descc , float *work , MKL_INT *lwork , MKL_INT *info );
void pdormrq (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , double
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *tau , double *c , MKL_INT
*ic , MKL_INT *jc , MKL_INT *descc , double *work , MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?ormrqfunction overwrites the general real m-by-n distributed matrix sub (C) = C(iс:iс+m-1,jс:jс
+n-1) with
where Q is a real orthogonal distributed matrix defined as the product of k elementary reflectors
1427
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
side (global)
='L': Q or QT is applied from the left.
='R': Q or QT is applied from the right.
trans (global)
='N', no transpose, Q is applied.
='T', transpose, QT is applied.
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+m-1) if
side = 'L', and lld_a*LOCc(ja+n-1) if side = 'R'.
The i-th row of the matrix stored in a must contain the vector that defines
the elementary reflector H(i), ia≤i≤ia+k-1, as returned by p?gerqf in the
k rows of its distributed matrix argument A(ia:ia+k-1, ja:*). A(ia:ia
+k-1, ja:*) is modified by the function but restored on exit.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
tau (local)
Array of size LOCc(ja+k-1).
c (local)
Pointer into the local memory to an array of local size lld_c*LOCc(jc+n-1).
ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the matrix sub(C), respectively.
descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.
work (local)
Workspace array of size of lwork.
1428
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lwork (local or global) size of work, must be at least:
If side = 'L',
NOTE
mod(x,y) is the integer remainder of x/y.
ilcm, indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL,
NPROW and NPCOL can be determined by calling the function
blacs_gridinfo.
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.
Output Parameters
work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
1429
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?unmrq
Multiplies a general matrix by the unitary matrix Q of
the RQ factorization formed by p?gerqf.
Syntax
void pcunmrq (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau ,
MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work ,
MKL_INT *lwork , MKL_INT *info );
void pzunmrq (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau ,
MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex16 *work ,
MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
This function overwrites the general complex m-by-n distributed matrix sub (C) = C(iс:iс+m-1,jс:jс+n-1)
with
where Q is a complex unitary distributed matrix defined as the product of k elementary reflectors
Input Parameters
side (global)
='L': Q or QH is applied from the left.
='R': Q or QH is applied from the right.
trans (global)
='N', no transpose, Q is applied.
='C', conjugate transpose, QH is applied.
1430
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
k (global) The number of elementary reflectors whose product defines the
matrix Q. Constraints:
If side = 'L', m≥k≥0
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+m-1) if
side = 'L', and lld_a*LOCc(ja+n-1) if side = 'R'. The i-th row of the
matrix stored in amust contain the vector that defines the elementary
reflector H(i), ia≤i≤ia+k-1, as returned by p?gerqf in the k rows of its
distributed matrix argument A(ia:ia+k-1, ja:*). A(ia:ia+k-1, ja:*) is
modified by the function but restored on exit.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
tau (local)
Array of size LOCc(ja+k-1).
c (local)
Pointer into the local memory to an array of local size lld_c*LOCc(jc+n-1).
ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the submatrix C, respectively.
descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.
work (local)
Workspace array of size of lwork.
If side = 'L',
lwork≥max((mb_a*(mb_a-1))/2, (mpc0 +
max(mqa0+numroc(numroc(n+iroffc, mb_a, 0, 0, NPROW), mb_a,
0, 0, lcmp), nqc0))*mb_a) + mb_a*mb_a
else if side = 'R',
1431
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
mod(x,y) is the integer remainder of x/y.
ilcm, indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL,
NPROW and NPCOL can be determined by calling the function
blacs_gridinfo.
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.
Output Parameters
work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?tzrzf
Reduces the upper trapezoidal matrix A to upper
triangular form.
Syntax
void pstzrzf (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdtzrzf (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
1432
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void pctzrzf (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT
*info );
void pztzrzf (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT
*info );
Include Files
• mkl_scalapack.h
Description
The p?tzrzffunction reduces the m-by-n (m ≤ n) real/complex upper trapezoidal matrix sub(A)= A(ia:ia
+m-1, ja:ja+n-1) to upper triangular form by means of orthogonal/unitary transformations. The upper
trapezoidal matrix A is factored as
A = (R 0)*Z,
where Z is an n-by-n orthogonal/unitary matrix and R is an m-by-m upper triangular matrix.
Input Parameters
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1).
Contains the local pieces of the m-by-n distributed matrix sub (A) to be
factored.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
work (local)
Workspace array of size of lwork.
NOTE
mod(x,y) is the integer remainder of x/y.
1433
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.
Output Parameters
a On exit, the leading m-by-m upper triangular part of sub(A) contains the
upper triangular matrix R, and elements m+1 to n of the first m rows of sub
(A), with the array tau, represent the orthogonal/unitary matrix Z as a
product of m elementary reflectors.
work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.
tau (local)
Array of size LOCr(ia+m-1).
info (global)
= 0: the execution is successful.
< 0:if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
Application Notes
The factorization is obtained by the Householder's method. The k-th transformation matrix, Z(k), which is or
whose conjugate transpose is used to introduce zeros into the (m - k +1)-th row of sub(A), is given in the
form
where
T(k) = i - tau*u(k)*u(k)',
tau is a scalar and Z(k) is an (n - m) element vector. tau and Z(k) are chosen to annihilate the elements of
the k-th row of sub(A). The scalar tau is returned in the k-th element of tau, indexed k-1, and the vector
u(k) in the k-th row of sub(A), such that the elements of Z(k) are in a(k, m + 1),..., a(k, n). The
elements of R are returned in the upper triangular part of sub(A). Z is given by
1434
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Z = Z(1) * Z(2) *... * Z(m).
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?ormrz
Multiplies a general matrix by the orthogonal matrix
from a reduction to upper triangular form formed by
p?tzrzf.
Syntax
void psormrz (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_INT
*l , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *tau , float *c ,
MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , float *work , MKL_INT *lwork , MKL_INT
*info );
void pdormrz (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_INT
*l , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *tau , double *c ,
MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , double *work , MKL_INT *lwork , MKL_INT
*info );
Include Files
• mkl_scalapack.h
Description
This function overwrites the general real m-by-n distributed matrix sub(C) = C(iс:iс+m-1,jс:jс+n-1) with
where Q is a real orthogonal distributed matrix defined as the product of k elementary reflectors
Input Parameters
side (global)
='L': Q or QT is applied from the left.
='R': Q or QT is applied from the right.
trans (global)
='N', no transpose, Q is applied.
='T', transpose, QT is applied.
1435
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If side = 'L', m ≥ k ≥0
l (global)
The columns of the distributed matrix sub(A) containing the meaningful
part of the Householder reflectors.
If side = 'L', m ≥ l ≥0
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+m-1) if
side = 'L', and lld_a*LOCc(ja+n-1) if side = 'R', where lld_a ≥
max(1,LOCr(ia+k-1)).
The i-th row of the matrix stored in amust contain the vector that defines
the elementary reflector H(i), ia≤i≤ia+k-1, as returned by p?tzrzf in the
k rows of its distributed matrix argument A(ia:ia+k-1, ja:*). A(ia:ia
+k-1, ja:*) is modified by the function but restored on exit.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
tau (local)
Array of size LOCc(ia+k-1).
c (local)
Pointer into the local memory to an array of local size lld_c*LOCc(jc+n-1).
ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the submatrix C, respectively.
descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.
work (local)
Workspace array of size of lwork.
If side = 'L',
1436
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lwork≥max((mb_a*(mb_a-1))/2, (mpc0 + nqc0)*mb_a) + mb_a*mb_a
end if
where
lcmp = lcm/NPROW with lcm = ilcm (NPROW, NPCOL),
iroffa = mod(ia-1, mb_a), icoffa = mod(ja-1, nb_a),
iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),
mqa0 = numroc(n+icoffa, nb_a, MYCOL, iacol, NPCOL),
iroffc = mod(ic-1, mb_c),
icoffc = mod(jc-1, nb_c),
icrow = indxg2p(ic, mb_c, MYROW, rsrc_c, NPROW),
iccol = indxg2p(jc, nb_c, MYCOL, csrc_c, NPCOL),
mpc0 = numroc(m+iroffc, mb_c, MYROW, icrow, NPROW),
nqc0 = numroc(n+icoffc, nb_c, MYCOL, iccol, NPCOL),
NOTE
mod(x,y) is the integer remainder of x/y.
ilcm, indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL,
NPROW and NPCOL can be determined by calling the function
blacs_gridinfo.
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.
Output Parameters
work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
1437
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
p?unmrz
Multiplies a general matrix by the unitary
transformation matrix from a reduction to upper
triangular form determined by p?tzrzf.
Syntax
void pcunmrz (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_INT
*l , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau ,
MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work ,
MKL_INT *lwork , MKL_INT *info );
void pzunmrz (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_INT
*l , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16
*tau , MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex16
*work , MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
This function overwrites the general complex m-by-n distributed matrix sub (C) = C(iс:iс+m-1,jс:jс+n-1)
with
where Q is a complex unitary distributed matrix defined as the product of k elementary reflectors
Input Parameters
side (global)
='L': Q or QH is applied from the left.
='R': Q or QH is applied from the right.
trans (global)
='N', no transpose, Q is applied.
='C', conjugate transpose, QH is applied.
1438
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
l (global) The columns of the distributed matrix sub(A) containing the
meaningful part of the Householder reflectors.
If side = 'L', m≥l≥0
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+m-1) if
side = 'L', and lld_a*LOCc(ja+n-1) if side = 'R', where lld_a ≥
max(1, LOCr(ja+k-1)). The i-th row of the matrix stored in amust
contain the vector that defines the elementary reflector H(i), ia≤i≤ia+k-1,
as returned by p?gerqf in the k rows of its distributed matrix argument
A(ia:ia+k-1, ja:*). A(ia:ia+k-1, ja:*) is modified by the function but
restored on exit.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
tau (local)
Array of size LOCc(ia+k-1).
c (local)
Pointer into the local memory to an array of local size lld_c*LOCc(jc+n-1).
ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the submatrix C, respectively.
descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.
work (local)
Workspace array of size lwork.
If side = 'L',
lwork≥max((mb_a*(mb_a-1))/2, (mpc0+max(mqa0+numroc(numroc(n
+iroffc, mb_a, 0, 0, NPROW), mb_a, 0, 0, lcmp), nqc0))*mb_a)
+ mb_a*mb_a
else if side ='R',
1439
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
mod(x,y) is the integer remainder of x/y.
ilcm, indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL,
NPROW and NPCOL can be determined by calling the function
blacs_gridinfo.
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.
Output Parameters
work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?ggqrf
Computes the generalized QR factorization.
Syntax
void psggqrf (MKL_INT *n , MKL_INT *m , MKL_INT *p , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *taua , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , float *taub , float *work , MKL_INT *lwork , MKL_INT *info );
1440
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void pdggqrf (MKL_INT *n , MKL_INT *m , MKL_INT *p , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *taua , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , double *taub , double *work , MKL_INT *lwork , MKL_INT *info );
void pcggqrf (MKL_INT *n , MKL_INT *m , MKL_INT *p , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *taua , MKL_Complex8 *b , MKL_INT *ib ,
MKL_INT *jb , MKL_INT *descb , MKL_Complex8 *taub , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
void pzggqrf (MKL_INT *n , MKL_INT *m , MKL_INT *p , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *taua , MKL_Complex16 *b , MKL_INT *ib ,
MKL_INT *jb , MKL_INT *descb , MKL_Complex16 *taub , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?ggqrffunction forms the generalized QR factorization of an n-by-m matrix
as
sub(A) = Q*R, sub(B) = Q*T*Z,
where Q is an n-by-n orthogonal/unitary matrix, Z is a p-by-p orthogonal/unitary matrix, and R and T
assume one of the forms:
If n ≥ m
or if n < m
1441
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
n (global) The number of rows in the distributed matrices sub (A) and sub(B)
(n≥0).
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+m-1).
Contains the local pieces of the n-by-m matrix sub(A) to be factored.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
b (local)
Pointer into the local memory to an array of size lld_b*LOCc(jb+p-1).
Contains the local pieces of the n-by-p matrix sub(B) to be factored.
ib, jb (global) The row and column indices in the global matrix B
indicating the first row and the first column of the submatrix B,
respectively.
descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
work (local)
Workspace array of size of lwork.
lwork≥max(nb_a*(npa0+mqa0+nb_a), max((nb_a*(nb_a-1))/2,
(pqb0+npb0)*nb_a)+nb_a*nb_a, mb_b*(npb0+pqb0+mb_b)),
where
iroffa = mod(ia-1, mb_A),
icoffa = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),
iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),
1442
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
npa0 = numroc (n+iroffa, mb_a, MYROW, iarow, NPROW),
mqa0 = numroc (m+icoffa, nb_a, MYCOL, iacol, NPCOL)
iroffb = mod(ib-1, mb_b),
icoffb = mod(jb-1, nb_b),
ibrow = indxg2p(ib, mb_b, MYROW, rsrc_b, NPROW),
ibcol = indxg2p(jb, nb_b, MYCOL, csrc_b, NPCOL),
npb0 = numroc (n+iroffa, mb_b, MYROW, Ibrow, NPROW),
pqb0 = numroc(m+icoffb, nb_b, MYCOL, ibcol, NPCOL)
NOTE
mod(x,y) is the integer remainder of x/y.
and numroc, indxg2p are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.
Output Parameters
a On exit, the elements on and above the diagonal of sub (A) contain the
min(n, m)-by-m upper trapezoidal matrix R (R is upper triangular if n≥m); the
elements below the diagonal, with the array taua, represent the
orthogonal/unitary matrix Q as a product of min(n, m) elementary
reflectors. (See Application Notes below).
work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
1443
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Application Notes
The matrix Q is represented as a product of elementary reflectors
Q = H(ja)*H(ja+1)*...*H(ja+k-1),
where k= min(n,m).
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?ggrqf
Computes the generalized RQ factorization.
Syntax
void psggrqf (MKL_INT *m , MKL_INT *p , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *taua , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , float *taub , float *work , MKL_INT *lwork , MKL_INT *info );
void pdggrqf (MKL_INT *m , MKL_INT *p , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *taua , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , double *taub , double *work , MKL_INT *lwork , MKL_INT *info );
void pcggrqf (MKL_INT *m , MKL_INT *p , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *taua , MKL_Complex8 *b , MKL_INT *ib ,
MKL_INT *jb , MKL_INT *descb , MKL_Complex8 *taub , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
void pzggrqf (MKL_INT *m , MKL_INT *p , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *taua , MKL_Complex16 *b , MKL_INT *ib ,
MKL_INT *jb , MKL_INT *descb , MKL_Complex16 *taub , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?ggrqffunction forms the generalized RQ factorization of an m-by-n matrix sub(A) = A(ia:ia+m-1,
ja:ja+n-1) and a p-by-n matrix sub(B) = B(ib:ib+p-1, jb:jb+n-1):
1444
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
sub(A) = R*Q, sub(B) = Z*T*Q,
where Q is an n-by-n orthogonal/unitary matrix, Z is a p-by-p orthogonal/unitary matrix, and R and T
assume one of the forms:
or
or
Input Parameters
m (global) The number of rows in the distributed matrices sub (A) (m≥0).
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1).
Contains the local pieces of the m-by-n distributed matrix sub(A) to be
factored.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
1445
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
b (local)
Pointer into the local memory to an array of size lld_b*LOCc(jb+n-1).
ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of the submatrix B, respectively.
descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
work (local)
Workspace array of size of lwork.
NOTE
mod(x,y) is the integer remainder of x/y.
and numroc, indxg2p are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.
1446
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
Application Notes
The matrix Q is represented as a product of elementary reflectors
Q = H(ia)*H(ia+1)*...*H(ia+k-1),
where k= min(m,n).
1447
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?syngst
Reduces a complex Hermitian-definite generalized
eigenproblem to standard form.
Syntax
void pssyngst (const MKL_INT* ibtype, const char* uplo, const MKL_INT* n, float* a,
const MKL_INT* ia, const MKL_INT* ja, const MKL_INT* desca, const float* b, const
MKL_INT* ib, const MKL_INT* jb, const MKL_INT* descb, float* scale, float* work, const
MKL_INT* lwork, MKL_INT* info);
void pdsyngst (const MKL_INT* ibtype, const char* uplo, const MKL_INT* n, double* a,
const MKL_INT* ia, const MKL_INT* ja, const MKL_INT* desca, const double* b, const
MKL_INT* ib, const MKL_INT* jb, const MKL_INT* descb, double* scale, double* work,
const MKL_INT* lwork, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
p?syngst reduces a complex Hermitian-definite generalized eigenproblem to standard form.
p?syngst performs the same function as p?hegst, but is based on rank 2K updates, which are faster and
more scalable than triangular solves (the basis of p?syngst).
p?syngst calls p?hegst when uplo='U', hence p?hengst provides improved performance only when
uplo='L', ibtype=1.
1448
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
p?syngst also calls p?hegst when insufficient workspace is provided, hence p?syngst provides improved
performance only when lwork >= 2 * NP0 * NB + NQ0 * NB + NB * NB
In the following sub( A ) denotes A( ia:ia+n-1, ja:ja+n-1 ) and sub( B ) denotes B( ib:ib+n-1, jb:jb
+n-1 ).
If ibtype = 1, the problem is sub( A )*x = lambda*sub( B )*x, and sub( A ) is overwritten by
inv(UH)*sub( A )*inv(U) or inv(L)*sub( A )*inv(LH)
If ibtype = 2 or 3, the problem is sub( A )*sub( B )*x = lambda*x or sub( B )*sub( A )*x = lambda*x, and
sub( A ) is overwritten by U*sub( A )*UH or LH*sub( A )*L.
sub( B ) must have been previously factorized as UH*U or L*LH by p?potrf.
Input Parameters
ibtype (global)
= 1: compute inv(UH)*sub( A )*inv(U) or inv(L)*sub( A )*inv(LH);
= 2 or 3: compute U*sub( A )*UH or LH*sub( A )*L.
uplo (global)
= 'U': Upper triangle of sub( A ) is stored and sub( B ) is factored as UH*U;
= 'L': Lower triangle of sub( A ) is stored and sub( B ) is factored as L*LH.
n (global)
The order of the matrices sub( A ) and sub( B ). n >= 0.
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1).
On entry, this array contains the local pieces of the n-by-n Hermitian
distributed matrix sub( A ). If uplo = 'U', the leading n-by-n upper
triangular part of sub( A ) contains the upper triangular part of the matrix,
and its strictly lower triangular part is not referenced. If uplo = 'L', the
leading n-by-n lower triangular part of sub( A ) contains the lower
triangular part of the matrix, and its strictly upper triangular part is not
referenced.
ia (global)
A's global row index, which points to the beginning of the submatrix which
is to be operated on.
ja (global)
A's global column index, which points to the beginning of the submatrix
which is to be operated on.
b (local)
Pointer into the local memory to an array of size lld_b*LOCc(jb+n-1).
1449
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
On entry, this array contains the local pieces of the triangular factor from
the Cholesky factorization of sub( B ), as returned by p?potrf.
ib (global)
B's global row index, which points to the beginning of the submatrix which
is to be operated on.
jb (global)
B's global column index, which points to the beginning of the submatrix
which is to be operated on.
work (local)
Array, size (lwork)
lwork is local input and must be at least lwork >= MAX( NB * ( NP0 +1 ),
3 * NB )
When ibtype = 1 and uplo = 'L', p?syngst provides improved
performance when lwork >= 2 * NP0 * NB + NQ0 * NB + NB * NB,
Output Parameters
scale (global)
Amount by which the eigenvalues should be scaled to compensate for
the scaling performed in this routine. At present, scale is always
returned as 1.0, it is returned here to allow for future enhancement.
work (local)
Array, size (lwork)
1450
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
On exit, work[0] returns the minimal and optimal lwork.
info (global)
= 0: successful exit
< 0: If the i-th argument is an array and the j-th entry had an illegal
value, then info = -(i*100+j), if the i-th argument is a scalar and
had an illegal value, then info = -i.
p?syntrd
Reduces a real symmetric matrix to symmetric
tridiagonal form.
Syntax
void pssyntrd (const char* uplo, const MKL_INT* n, float* a, const MKL_INT* ia, const
MKL_INT* ja, const MKL_INT* desca, float* d, float* e, float* tau, float* work, const
MKL_INT* lwork, MKL_INT* info);
void pdsyntrd (const char* uplo, const MKL_INT* n, double* a, const MKL_INT* ia, const
MKL_INT* ja, const MKL_INT* desca, double* d, double* e, double* tau, double* work,
const MKL_INT* lwork, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
p?syntrd is a prototype version of p?sytrd which uses tailored codes (either the serial, ?sytrd, or the
parallel code, p?syttrd) when the workspace provided by the user is adequate.
p?syntrd reduces a real symmetric matrix sub( A ) to symmetric tridiagonal form T by an orthogonal
similarity transformation:
Q' * sub( A ) * Q = T, where sub( A ) = A(ia:ia+n-1,ja:ja+n-1).
Features
p?syntrd is faster than p?sytrd on almost all matrices, particularly small ones (i.e. n < 500 * sqrt(P) ),
provided that enough workspace is available to use the tailored codes.
The tailored codes provide performance that is essentially independent of the input data layout.
The tailored codes place no restrictions on ia, ja, MB or NB. At present, ia, ja, MB and NB are restricted to
those values allowed by p?hetrd to keep the interface simple (see the Application Notes section for more
information about the restrictions).
Input Parameters
uplo (global)
Specifies whether the upper or lower triangular part of the symmetric
matrix sub( A ) is stored:
= 'U': Upper triangular
= 'L': Lower triangular
n (global)
1451
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The number of rows and columns to be operated on, i.e. the order of the
distributed submatrix sub( A ). n >= 0.
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1).
On entry, this array contains the local pieces of the symmetric distributed
matrix sub( A ). If uplo = 'U', the leading n-by-n upper triangular part of
sub( A ) contains the upper triangular part of the matrix, and its strictly
lower triangular part is not referenced. If uplo = 'L', the leading n-by-n
lower triangular part of sub( A ) contains the lower triangular part of the
matrix, and its strictly upper triangular part is not referenced.
ia (global)
The row index in the global array a indicating the first row of sub( A ).
ja (global)
The column index in the global array a indicating the first column of
sub( A ).
work (local)
Array, size (lwork)
Output Parameters
1452
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
of sub( A ) are overwritten by the corresponding elements of the
tridiagonal matrix T, and the elements below the first subdiagonal,
with the array tau, represent the orthogonal matrix Q as a product of
elementary reflectors. See Further Details.
d (local)
Array, size LOCc(ja+n-1)
e (local)
Array, size LOCc(ja+n-1) if uplo = 'U', LOCc(ja+n-2) otherwise.
tau (local)
Array, size LOCc(ja+n-1).
This array contains the scalar factors tau of the elementary reflectors.
tau is tied to the distributed matrix A.
work (local)
Array, size (lwork)
info (global)
= 0: successful exit
< 0: If the i-th argument is an array and the j-th entry had an illegal
value, then info = -(i*100+j), if the i-th argument is a scalar and
had an illegal value, then info = -i.
Application Notes
If uplo = 'U', the matrix Q is represented as a product of elementary reflectors
The contents of sub( A ) on exit are illustrated by the following examples with n = 5:
if uplo = 'U':
1453
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
d e v2 v3 v4
d e v3 v4
d e v3
d e
d
if uplo = 'L':
d
e d
v1 e d
v1 v2 e d
v1 v2 v3 e d
where d and e denote diagonal and off-diagonal elements of T, and vi denotes an element of the vector
defining H(i).
Alignment requirements
The distributed submatrix sub( A ) must verify some alignment properties, namely the following expression
should be true:
( mb_a = nb_a and IROFFA = ICOFFA and IROFFA = 0 ) with IROFFA = mod( ia-1, mb_a), and ICOFFA =
mod( ja-1, nb_a ).
p?sytrd
Reduces a symmetric matrix to real symmetric
tridiagonal form by an orthogonal similarity
transformation.
Syntax
void pssytrd (char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *d , float *e , float *tau , float *work , MKL_INT *lwork , MKL_INT
*info );
void pdsytrd (char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *d , double *e , double *tau , double *work , MKL_INT *lwork , MKL_INT
*info );
Include Files
• mkl_scalapack.h
Description
The p?sytrd function reduces a real symmetric matrix sub(A) to symmetric tridiagonal form T by an
orthogonal similarity transformation:
Q'*sub(A)*Q = T,
where sub(A) = A(ia:ia+n-1,ja:ja+n-1).
Input Parameters
uplo (global)
Specifies whether the upper or lower triangular part of the symmetric
matrix sub(A) is stored:
If uplo = 'U', upper triangular
1454
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If uplo = 'L', lower triangular
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1). On
entry, this array contains the local pieces of the symmetric distributed
matrix sub(A).
If uplo = 'U', the leading n-by-n upper triangular part of sub(A) contains
the upper triangular part of the matrix, and its strictly lower triangular part
is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
the lower triangular part of the matrix, and its strictly upper triangular part
is not referenced. See Application Notes below.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
work (local)
Workspace array of size lwork.
Output Parameters
a On exit, if uplo = 'U', the diagonal and first superdiagonal of sub(A) are
overwritten by the corresponding elements of the tridiagonal matrix T, and
the elements above the first superdiagonal, with the array tau, represent
the orthogonal matrix Q as a product of elementary reflectors; if uplo =
'L', the diagonal and first subdiagonal of sub(A) are overwritten by the
corresponding elements of the tridiagonal matrix T, and the elements below
the first subdiagonal, with the array tau, represent the orthogonal matrix Q
as a product of elementary reflectors. See Application Notes below.
d (local)
Arrays of size LOCc(ja+n-1) .The diagonal elements of the tridiagonal
matrix T:
1455
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
e (local)
Arrays of size LOCc(ja+n-1) if uplo = 'U', LOCc(ja+n-2) otherwise.
tau (local)
Arrays of size LOCc(ja+n-1). This array contains the scalar factors of the
elementary reflectors. tau is tied to the distributed matrix A.
work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
Application Notes
If uplo = 'U', the matrix Q is represented as a product of elementary reflectors
The contents of sub(A) on exit are illustrated by the following examples with n = 5:
If uplo = 'U':
1456
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If uplo = 'L':
where d and e denote diagonal and off-diagonal elements of T, and vi denotes an element of the vector
defining H(i).
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?ormtr
Multiplies a general matrix by the orthogonal
transformation matrix from a reduction to tridiagonal
form determined by p?sytrd.
Syntax
void psormtr (char *side , char *uplo , char *trans , MKL_INT *m , MKL_INT *n , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *tau , float *c , MKL_INT *ic ,
MKL_INT *jc , MKL_INT *descc , float *work , MKL_INT *lwork , MKL_INT *info );
void pdormtr (char *side , char *uplo , char *trans , MKL_INT *m , MKL_INT *n , double
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *tau , double *c , MKL_INT
*ic , MKL_INT *jc , MKL_INT *descc , double *work , MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
This function overwrites the general real distributed m-by-n matrix sub(C) = C(iс:iс+m-1,jс:jс+n-1) with
where Q is a real orthogonal distributed matrix of order nq, with nq = m if side = 'L' and nq = n if side =
'R'.
1457
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
side (global)
='L': Q or QT is applied from the left.
='R': Q or QT is applied from the right.
trans (global)
='N', no transpose, Q is applied.
='T', transpose, QT is applied.
uplo (global)
= 'U': Upper triangle of A(ia:*, ja:*) contains elementary reflectors
from p?sytrd;
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+m-1) if
side = 'L', and lld_a*LOCc(ja+n-1) if side = 'R'.
Contains the vectors that define the elementary reflectors, as returned by
p?sytrd.
If side='L', lld_a ≥ max(1,LOCr(ia+m-1));
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
tau (local)
Array of size of ltau where
if side = 'L' and uplo = 'U', ltau = LOCc(m_a),
tau[i] must contain the scalar factor of the elementary reflector H(i+1), as
returned by p?sytrd (0 ≤ i < ltau). tau is tied to the distributed matrix A.
c (local)
1458
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Pointer into the local memory to an array of size lld_c*LOCc(jc+n-1).
Contains the local pieces of the distributed matrix sub (C).
ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the submatrix C, respectively.
descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.
work (local)
Workspace array of size lwork.
if uplo = 'U',
iaa= ia; jaa= ja+1, icc= ic; jcc= jc;
else uplo = 'L',
end if
end if
If side = 'L',
mi= m; mi = n-1;
lwork≥max((nb_a*(nb_a-1))/2, (nqc0 +
max(npa0+numroc(numroc(ni+icoffc, nb_a, 0, 0, NPCOL), nb_a,
0, 0, lcmq), mpc0))*nb_a)+ nb_a*nb_a
end if
where lcmq = lcm/NPCOL with lcm = ilcm(NPROW, NPCOL),
1459
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
mod(x,y) is the integer remainder of x/y.
ilcm, indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL,
NPROW and NPCOL can be determined by calling the function
blacs_gridinfo. If lwork = -1, then lwork is global input and a
workspace query is assumed; the function only calculates the minimum and
optimal size for all work arrays. Each of these values is returned in the first
entry of the corresponding work array, and no error message is issued by
pxerbla.
Output Parameters
work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?hengst
Reduces a complex Hermitian-definite generalized
eigenproblem to standard form.
Syntax
void pchengst (const MKL_INT* ibtype, const char* uplo, const MKL_INT* n, MKL_Complex8*
a, const MKL_INT* ia, const MKL_INT* ja, const MKL_INT* desca, const MKL_Complex8* b,
const MKL_INT* ib, const MKL_INT* jb, const MKL_INT* descb, float* scale, MKL_Complex8*
work, const MKL_INT* lwork, MKL_INT* info);
void pzhengst (const MKL_INT* ibtype, const char* uplo, const MKL_INT* n,
MKL_Complex16* a, const MKL_INT* ia, const MKL_INT* ja, const MKL_INT* desca, const
MKL_Complex16* b, const MKL_INT* ib, const MKL_INT* jb, const MKL_INT* descb, double*
scale, MKL_Complex16* work, const MKL_INT* lwork, MKL_INT* info);
Include Files
• mkl_scalapack.h
1460
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
p?hengst reduces a complex Hermitian-definite generalized eigenproblem to standard form.
p?hengst performs the same function as p?hegst, but is based on rank 2K updates, which are faster and
more scalable than triangular solves (the basis of p?hengst).
p?hengst calls p?hegst when uplo='U', hence p?hengst provides improved performance only when
uplo='L' and ibtype=1.
p?hengst also calls p?hegst when insufficient workspace is provided, hence p?hengst provides improved
performance only when lwork is sufficient (as described in the parameter descriptions).
In the following sub( A ) denotes the submatrix A( ia:ia+n-1, ja:ja+n-1 ) and sub( B ) denotes the
submatrix B( ib:ib+n-1, jb:jb+n-1 ).
If ibtype = 1, the problem is sub( A )*x = lambda*sub( B )*x, and sub( A ) is overwritten by
inv(UH)*sub( A )*inv(U) or inv(L)*sub( A )*inv(LH)
If ibtype = 2 or 3, the problem is sub( A )*sub( B )*x = lambda*x or sub( B )*sub( A )*x = lambda*x, and
sub( A ) is overwritten by U*sub( A )*UH or LH*sub( A )*L.
sub( B ) must have been previously factorized as UH*U or L*LH by p?potrf.
Input Parameters
ibtype (global)
= 1: compute inv(UH)*sub( A )*inv(U) or inv(L)*sub( A )*inv(LH);
= 2 or 3: compute U*sub( A )*UH or LH*sub( A )*L.
uplo (global)
= 'U': Upper triangle of sub( A ) is stored and sub( B ) is factored as UH*U;
= 'L': Lower triangle of sub( A ) is stored and sub( B ) is factored as L*LH.
n (global)
The order of the matrices sub( A ) and sub( B ). n >= 0.
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1).
On entry, this array contains the local pieces of the n-by-n Hermitian
distributed matrix sub( A ). If uplo = 'U', the leading n-by-n upper
triangular part of sub( A ) contains the upper triangular part of the matrix,
and its strictly lower triangular part is not referenced. If uplo = 'L', the
leading n-by-n lower triangular part of sub( A ) contains the lower
triangular part of the matrix, and its strictly upper triangular part is not
referenced.
ia (global)
Global row index of matrix A, which points to the beginning of the
submatrix on which to operate.
ja (global)
Global column index of matrix A, which points to the beginning of the
submatrix on which to operate.
1461
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
b (local)
Pointer into the local memory to an array of size lld_b*LOCc(jb+n-1).
ib (global)
Global row index of matrix B, which points to the beginning of the
submatrix on which to operate.
jb (global)
Global column index of matrix B, which points to the beginning of the
submatrix on which to operate.
work (local)
Array, size (lwork)
lwork (local)
The size of the array work.
lwork is local input and must be at least lwork >= MAX( NB * ( NP0
+1 ), 3 * NB ).
When ibtype = 1 and uplo = 'L', p?hengst provides improved
performance when lwork >= 2 * NP0 * NB + NQ0 * NB + NB * NB, where
NB = mb_a = nb_a, NP0 = numroc( n, NB, 0, 0, NPROW ), NQ0 =
numroc( n, NB, 0, 0, NPROW ), and numroc is a ScaLAPACK tool function.
MYROW, MYCOL, NPROW and NPCOL can be determined by calling the
subroutine blacs_gridinfo.
Output Parameters
scale (global)
Amount by which the eigenvalues should be scaled to compensate for
the scaling performed in this routine.
scale is always returned as 1.0.
1462
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
work On exit, work[0] returns the minimal and optimal lwork.
info (global)
= 0: successful exit
< 0: If the i-th argument is an array and the j-entry had an illegal
value, then info = -(i*100+j), if the i-th argument is a scalar and
had an illegal value, then info = -i.
p?hentrd
Reduces a complex Hermitian matrix to Hermitian
tridiagonal form.
Syntax
void pchentrd (const char* uplo, const MKL_INT* n, MKL_Complex8* a, const MKL_INT* ia,
const MKL_INT* ja, const MKL_INT* desca, float* d, float* e, MKL_Complex8* tau,
MKL_Complex8* work, const MKL_INT* lwork, float* rwork, const MKL_INT* lrwork, MKL_INT*
info);
void pzhentrd (const char* uplo, const MKL_INT* n, MKL_Complex16* a, const MKL_INT* ia,
const MKL_INT* ja, const MKL_INT* desca, double* d, double* e, MKL_Complex16* tau,
MKL_Complex16* work, const MKL_INT* lwork, double* rwork, const MKL_INT* lrwork,
MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
p?hentrd is a prototype version of p?hetrd which uses tailored codes (either the serial, ?hetrd, or the
parallel code, p?hettrd) when adequate workspace is provided.
p?hentrd reduces a complex Hermitian matrix sub( A ) to Hermitian tridiagonal form T by an unitary
similarity transformation:
Q' * sub( A ) * Q = T, where sub( A ) = A(ia:ia+n-1,ja:ja+n-1).
p?hentrd is faster than p?hetrd on almost all matrices, particularly small ones (i.e. n < 500 * sqrt(P) ),
provided that enough workspace is available to use the tailored codes.
The tailored codes provide performance that is essentially independent of the input data layout.
The tailored codes place no restrictions on ia, ja, MB or NB. At present, ia, ja, MB and NB are restricted to
those values allowed by p?hetrd to keep the interface simple (see the Application Notes section for more
information about the restrictions).
Input Parameters
uplo (global)
Specifies whether the upper or lower triangular part of the Hermitian matrix
sub( A ) is stored:
= 'U': Upper triangular
= 'L': Lower triangular
n (global)
1463
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The number of rows and columns to be operated on, i.e. the order of the
distributed submatrix sub( A ). n >= 0.
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1).
On entry, this array contains the local pieces of the Hermitian distributed
matrix sub( A ). If uplo = 'U', the leading n-by-n upper triangular part of
sub( A ) contains the upper triangular part of the matrix, and its strictly
lower triangular part is not referenced. If uplo = 'L', the leading n-by-n
lower triangular part of sub( A ) contains the lower triangular part of the
matrix, and its strictly upper triangular part is not referenced.
ia (global)
The row index in the global array a indicating the first row of sub( A ).
ja (global)
The column index in the global array a indicating the first column of
sub( A ).
work (local)
Array, size (lwork)
rwork (local)
Array, size (lrwork)
1464
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lrwork is local input and must be at least lrwork >= 1.
For optimal performance, greater workspace is needed, i.e. lrwork >=
MAX( 2 * n )
Output Parameters
d (local)
Array, size LOCc(ja+n-1)
e (local)
Array, size LOCc(ja+n-1) if uplo = 'U', LOCc(ja+n-2) otherwise.
tau (local)
Array, size LOCc(ja+n-1).
This array contains the scalar factors tau of the elementary reflectors.
tau is tied to the distributed matrix A.
info (global)
= 0: successful exit
< 0: If the i-th argument is an array and the j-th entry had an illegal
value, then info = -(i*100+j), if the i-th argument is a scalar and
had an illegal value, then info = -i.
Application Notes
If uplo = 'U', the matrix Q is represented as a product of elementary reflectors
1465
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The contents of sub( A ) on exit are illustrated by the following examples with n = 5:
if uplo = 'U':
d e v2 v3 v4
d e v3 v4
d e v3
d e
d
if uplo = 'L':
d
e d
v1 e d
v1 v2 e d
v1 v2 v3 e d
where d and e denote diagonal and off-diagonal elements of T, and vi denotes an element of the vector
defining H(i).
Alignment requirements
The distributed submatrix sub( A ) must verify some alignment properties, namely the following expression
should be true:
( mb_a = nb_a and IROFFA = ICOFFA and IROFFA = 0 ) with IROFFA = mod( ia-1, mb_a), and ICOFFA =
mod( ja-1, nb_a ).
p?hetrd
Reduces a Hermitian matrix to Hermitian tridiagonal
form by a unitary similarity transformation.
Syntax
void pchetrd (char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , float *d , float *e , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
void pzhetrd (char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *d , double *e , MKL_Complex16 *tau , MKL_Complex16 *work ,
MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?hetrd function reduces a complex Hermitian matrix sub(A) to Hermitian tridiagonal form T by a
unitary similarity transformation:
Q'*sub(A)*Q = T
1466
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
where sub(A) = A(ia:ia+n-1,ja:ja+n-1).
Input Parameters
uplo (global)
Specifies whether the upper or lower triangular part of the Hermitian matrix
sub(A) is stored:
If uplo = 'U', upper triangular
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1). On
entry, this array contains the local pieces of the Hermitian distributed
matrix sub(A).
If uplo = 'U', the leading n-by-n upper triangular part of sub(A) contains
the upper triangular part of the matrix, and its strictly lower triangular part
is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
the lower triangular part of the matrix, and its strictly upper triangular part
is not referenced. (see Application Notes below).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
work (local)
Workspace array of size lwork.
Output Parameters
a On exit,
1467
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
d (local)
Arrays of size LOCc(ja+n-1). The diagonal elements of the tridiagonal
matrix T:
d[i]= A(i+1,i+1), 0 ≤i < LOCc(ja+n-1).
d is tied to the distributed matrix A.
e (local)
Arrays of size LOCc(ja+n-1) if uplo = 'U'; LOCc(ja+n-2) - otherwise.
tau (local)
Array of size LOCc(ja+n-1). This array contains the scalar factors of the
elementary reflectors. tau is tied to the distributed matrix A.
work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
Application Notes
If uplo = 'U', the matrix Q is represented as a product of elementary reflectors
Q = H(n-1)*...*H(2)*H(1).
Q = H(1)*H(2)*...*H(n-1).
1468
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
H(i) = i - tau*v*v',
where tau is a complex scalar, and v is a complex vector with v(1:i) = 0 and v(i+1) = 1; v(i+2:n) is stored
on exit in A(ia+i+1:ia+n-1,ja+i-1), and tau in tau[ja+i-2].
The contents of sub(A) on exit are illustrated by the following examples with n = 5:
If uplo = 'U':
If uplo = 'L':
where d and e denote diagonal and off-diagonal elements of T, and vi denotes an element of the vector
defining H(i).
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?unmtr
Multiplies a general matrix by the unitary
transformation matrix from a reduction to tridiagonal
form determined by p?hetrd.
Syntax
void pcunmtr (char *side , char *uplo , char *trans , MKL_INT *m , MKL_INT *n ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau ,
MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work ,
MKL_INT *lwork , MKL_INT *info );
void pzunmtr (char *side , char *uplo , char *trans , MKL_INT *m , MKL_INT *n ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau ,
MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex16 *work ,
MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
1469
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Description
This function overwrites the general complex distributed m-by-n matrix sub(C) = C(iс:iс+m-1,jс:jс+n-1)
with
where Q is a complex unitary distributed matrix of order nq, with nq =m if side = 'L' and nq =n if side =
'R'.
Q is defined as the product of nq-1 elementary reflectors, as returned by p?hetrd.
Input Parameters
side (global)
='L': Q or QH is applied from the left.
='R': Q or QH is applied from the right.
trans (global)
='N', no transpose, Q is applied.
='C', conjugate transpose, QH is applied.
uplo (global)
= 'U': Upper triangle of A(ia:*, ja:*) contains elementary reflectors
from p?hetrd;
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+m-1) if
side = 'L', and lld_a*LOCc(ja+n-1) if side = 'R'.
Contains the vectors which define the elementary reflectors, as returned by
p?hetrd.
If side='L', lld_a ≥ max(1,LOCr(ia+m-1));
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
tau (local)
1470
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Array of size of ltau where
If side = 'L' and uplo = 'U', ltau = LOCc(m_a),
tau[i] must contain the scalar factor of the elementary reflector H(i+1), as
returned by p?hetrd (0 ≤ i < ltau). tau is tied to the distributed matrix A.
c (local)
Pointer into the local memory to an array of size lld_c*LOCc(jc+n-1).
Contains the local pieces of the distributed matrix sub (C).
ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the submatrix C, respectively.
descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.
work (local)
Workspace array of size lwork.
If uplo = 'U',
end if
end if
If side = 'L',
mi= m; mi = n-1;
lwork ≥ max((nb_a*(nb_a-1))/2, (nqc0 +
max(npa0+numroc(numroc(ni+icoffc, nb_a, 0, 0, NPCOL), nb_a,
0, 0, lcmq), mpc0))*nb_a) + nb_a*nb_a
end if
where lcmq = lcm/NPCOL with lcm = ilcm(NPROW, NPCOL),
1471
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
mod(x,y) is the integer remainder of x/y.
ilcm, indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL,
NPROW and NPCOL can be determined by calling the function
blacs_gridinfo. If lwork = -1, then lwork is global input and a
workspace query is assumed; the function only calculates the minimum and
optimal size for all work arrays. Each of these values is returned in the first
entry of the corresponding work array, and no error message is issued by
pxerbla.
Output Parameters
work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?stebz
Computes the eigenvalues of a symmetric tridiagonal
matrix by bisection.
Syntax
void psstebz (MKL_INT *ictxt , char *range , char *order , MKL_INT *n , float *vl ,
float *vu , MKL_INT *il , MKL_INT *iu , float *abstol , float *d , float *e , MKL_INT
*m , MKL_INT *nsplit , float *w , MKL_INT *iblock , MKL_INT *isplit , float *work ,
MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork , MKL_INT *info );
1472
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void pdstebz (MKL_INT *ictxt , char *range , char *order , MKL_INT *n , double *vl ,
double *vu , MKL_INT *il , MKL_INT *iu , double *abstol , double *d , double *e ,
MKL_INT *m , MKL_INT *nsplit , double *w , MKL_INT *iblock , MKL_INT *isplit , double
*work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?stebz function computes the eigenvalues of a symmetric tridiagonal matrix in parallel. These may be
all eigenvalues, all eigenvalues in the interval [vlvu], or the eigenvalues il through iu. A static partitioning
of work is done at the beginning of p?stebz which results in all processes finding an (almost) equal number
of eigenvalues.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
vl, vu (global)
If range = 'V', the function computes the lower and the upper bounds for
the eigenvalues on the interval [1, vu].
il, iu (global)
Constraint: 1≤il≤iu≤n.
If range = 'I', the index of the smallest eigenvalue is returned for il and
of the largest eigenvalue for iu (assuming that the eigenvalues are in
ascending order) must be returned.
If range = 'A' or 'V', il and iu are not referenced.
1473
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
abstol (global)
The absolute tolerance to which each eigenvalue is required. An eigenvalue
(or cluster) is considered to have converged if it lies in an interval of width
abstol. If abstol≤0, then the tolerance is taken as ulp||T||, where ulp is
the machine precision, and ||T|| means the 1-norm of T
Eigenvalues will be computed most accurately when abstol is set to the
underflow threshold slamch('U'), not 0. Note that if eigenvectors are
desired later by inverse iteration (p?stein), abstol should be set to
2*p?lamch('S').
d (global)
Array of size n.
e (global)
Array of size n - 1.
work (local)
Array of size max(5n, 7). This is a workspace array.
lwork (local) The size of the work array must be ≥ max(5n, 7).
liwork (local) The size of the iwork array must ≥max(4n, 14, NPROCS).
Output Parameters
w (global)
Array of size n. On exit, the first m elements of w contain the eigenvalues on
all processes.
1474
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
iblock (global)
Array of size n. At each row/column j where e[j-1] is zero or small, the
matrix T is considered to split into a block diagonal matrix. On exit
iblock[i] specifies which block (from 1 to the number of blocks) the
eigenvalue w[i] belongs to.
NOTE
In the (theoretically impossible) event that bisection does not
converge for some or all eigenvalues, info is set to 1 and the
ones for which it did not are identified by a negative block
number.
isplit (global)
Array of size n.
info (global)
If info = 0, the execution is successful.
If info < 0, if info = -i, the i-th argument has an illegal value.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?stedc
Computes all eigenvalues and eigenvectors of a
symmetric tridiagonal matrix in parallel.
Syntax
void psstedc (const char* compz, const MKL_INT* n, float* d, float* e, float* q, const
MKL_INT* iq, const MKL_INT* jq, const MKL_INT* descq, float* work, MKL_INT* lwork,
MKL_INT* iwork, const MKL_INT* liwork, MKL_INT* info);
1475
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
void pdstedc (const char* compz, const MKL_INT* n, double* d, double* e, double* q,
const MKL_INT* iq, const MKL_INT* jq, const MKL_INT* descq, double* work, MKL_INT*
lwork, MKL_INT* iwork, const MKL_INT* liwork, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
p?stedc computes all eigenvalues and eigenvectors of a symmetric tridiagonal matrix in parallel, using the
divide and conquer algorithm.
Input Parameters
n (global)
The order of the tridiagonal matrix T. n >= 0.
d (global)
Array, size (n)
e (global)
Array, size (n-1).
iq (global)
Q's global row index, which points to the beginning of the submatrix which
is to be operated on.
jq (global)
Q's global column index, which points to the beginning of the submatrix
which is to be operated on.
work (local)
Array, size (lwork)
lwork (local)
The size of the array work.
1476
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
NQ = numroc( n, NB, MYCOL, DESCQ( csrc_ ), NPCOL )
iwork (local)
Array, size (liwork)
Output Parameters
q (local)
Array, local size ( lld_q, LOCc(jq+n-1))
info (global)
= 0: successful exit.
< 0: If the i-th argument is an array and the j-th entry had an illegal
value, then info = -(i*100+j), if the i-th argument is a scalar and
had an illegal value, then info = -i.
p?stein
Computes the eigenvectors of a tridiagonal matrix
using inverse iteration.
Syntax
void psstein (MKL_INT *n , float *d , float *e , MKL_INT *m , float *w , MKL_INT
*iblock , MKL_INT *isplit , float *orfac , float *z , MKL_INT *iz , MKL_INT *jz ,
MKL_INT *descz , float *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork ,
MKL_INT *ifail , MKL_INT *iclustr , float *gap , MKL_INT *info );
1477
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl_scalapack.h
Description
The p?stein function computes the eigenvectors of a symmetric tridiagonal matrix T corresponding to
specified eigenvalues, by inverse iteration. p?stein does not orthogonalize vectors that are on different
processes. The extent of orthogonalization is controlled by the input parameter lwork. Eigenvectors that are
to be orthogonalized are computed by the same process. p?stein decides on the allocation of work among
the processes and then calls ?stein2 (modified LAPACK function) on each individual process. If insufficient
workspace is allocated, the expected orthogonalization may not be done.
NOTE
If the eigenvectors obtained are not orthogonal, increase lwork and run the code again.
Input Parameters
d, e, w (global)
Arrays:
iblock (global)
1478
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Array of size n. The submatrix indices associated with the corresponding
eigenvalues in w: 1 for eigenvalues belonging to the first submatrix from
the top, 2 for those belonging to the second submatrix, etc. (The output
array iblock from p?stebz is expected here).
isplit (global)
Array of size n. The splitting points at which T breaks up into submatrices.
The first submatrix consists of rows/columns 1 to isplit[0], the second of
rows/columns isplit[0]+1 through isplit[1], and so on, and the
nsplit-th submatrix consists of rows/columns isplit[nsplit-2]+1
through isplit[nsplit-1]=n. (The output array isplit from p?stebz is
expected here.)
orfac (global)
orfac specifies which eigenvectors should be orthogonalized. Eigenvectors
that correspond to eigenvalues within orfac*||T|| of each other are to be
orthogonalized. However, if the workspace is insufficient (see lwork), this
tolerance may be decreased until all eigenvectors can be stored in one
process. No orthogonalization is done if orfac is equal to zero. A default
value of 1000 is used if orfac is negative. orfac should be identical on all
processes
iz, jz (global) The row and column indices in the global matrix Z indicating the
first row and the first column of the submatrix Z, respectively.
descz (global and local) array of size dlen_. The array descriptor for the
distributed matrix Z.
work (local).
Workspace array of size lwork.
lwork (local)
lwork controls the extent of orthogonalization which can be done. The
number of eigenvectors for which storage is allocated on each process is
nvec = floor((lwork-max(5*n,np00*mq00))/n). Eigenvectors
corresponding to eigenvalue clusters of size (nvec - ceil(m/p) + 1) are
guaranteed to be orthogonal (the orthogonality is similar to that obtained
from ?stein2).
NOTE
lwork must be no smaller than max(5*n,np00*mq00) + ceil(m/
p)*n and should have the same input value on all processes.
iwork (local)
1479
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
liwork (local) The size of the array iwork. It must be greater than 3*n+p+1.
Output Parameters
z (local)
Array of size descz[dlen_-1], n/NPCOL + NB). z contains the computed
eigenvectors associated with the specified eigenvalues. Any vector which
fails to converge is set to its current iterate after MAXIT iterations
(See ?stein2). On output, z is distributed across the p processes in block
cyclic format.
work On exit, work[0] gives a lower bound on the workspace (lwork) that
guarantees the user desired orthogonalization (see orfac). Note that this
may overestimate the minimum workspace needed.
ifail (global) Array of size m. On normal exit, all elements of ifail are zero. If
one or more eigenvectors fail to converge after MAXIT iterations (as
in ?stein), then info > 0 is returned. If mod(info, m+1)>0, then for i=1
to mod(info,m+1), the eigenvector corresponding to the eigenvalue
w[ifail[i-1]-1] failed to converge (w refers to the array of eigenvalues
on output).
NOTE
mod(x,y) is the integer remainder of x/y.
gap (global)
This output array contains the gap between eigenvalues whose
eigenvectors could not be orthogonalized. The info/m output values
in this array correspond to the info/(m+1) clusters indicated by the
1480
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
array iclustr. As a result, the dot product between eigenvectors
corresponding to the i-th cluster may be as high as
(O(n)*macheps)/gap[i-1].
info (global)
If info = 0, the execution is successful.
If info < 0: If the i-th argument is an array and the j-th entry, indexed
j-1, had an illegal value, then info = -(i*100+j),
If the i-th argument is a scalar and had an illegal value, then info = -i.
If info < 0: if info = -i, the i-th argument had an illegal value.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?gehrd
Reduces a general matrix to upper Hessenberg form.
Syntax
void psgehrd (MKL_INT *n , MKL_INT *ilo , MKL_INT *ihi , float *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *tau , float *work , MKL_INT *lwork , MKL_INT
*info );
void pdgehrd (MKL_INT *n , MKL_INT *ilo , MKL_INT *ihi , double *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *tau , double *work , MKL_INT *lwork , MKL_INT
*info );
void pcgehrd (MKL_INT *n , MKL_INT *ilo , MKL_INT *ihi , MKL_Complex8 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
1481
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl_scalapack.h
Description
The p?gehrd function reduces a real/complex general distributed matrix sub(A) to upper Hessenberg form H
by an orthogonal or unitary similarity transformation
Q'*sub(A)*Q = H,
where sub(A) = A(ia:ia+n-1, ja:ja+n-1).
Input Parameters
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1). On
entry, this array contains the local pieces of the n-by-n general distributed
matrix sub(A) to be reduced.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
work (local)
Workspace array of size lwork.
lwork (local or global) size of the array work. lwork is local input and must be at
least
lwork≥NB*NB + NB*max(ihip+1, ihlp+inlq)
where NB = mb_a = nb_a,
1482
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ilcol = indxg2p(ja+ilo-1, NB, MYCOL, csrc_a, NPCOL),
inlq = numroc(n-ilo+ioff+1, NB, MYCOL, ilcol, NPCOL),
NOTE
mod(x,y) is the integer remainder of x/y.
indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.
Output Parameters
a On exit, the upper triangle and the first subdiagonal of sub(A) are
overwritten with the upper Hessenberg matrix H, and the elements below
the first subdiagonal, with the array tau, represent the orthogonal/unitary
matrix Q as a product of elementary reflectors (see Application Notes
below).
tau (local).
Array of size at least max(ja+n-2).
work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
Application Notes
The matrix Q is represented as a product of (ihi-ilo) elementary reflectors
Q = H(ilo)*H(ilo+1)*...*H(ihi-1).
(ia:ia+n-1,ja:ja+n-1) are illustrated by the following example, with n = 7, ilo = 2 and ihi = 6:
on entry
1483
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
on exit
where a denotes an element of the original matrix sub(A), H denotes a modified element of the upper
Hessenberg matrix H, and vi denotes an element of the vector defining H(ja+ilo+i-2).
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?ormhr
Multiplies a general matrix by the orthogonal
transformation matrix from a reduction to Hessenberg
form determined by p?gehrd.
Syntax
void psormhr (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *ilo ,
MKL_INT *ihi , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *tau ,
float *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , float *work , MKL_INT *lwork ,
MKL_INT *info );
void pdormhr (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *ilo ,
MKL_INT *ihi , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *tau ,
double *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , double *work , MKL_INT *lwork ,
MKL_INT *info );
Include Files
• mkl_scalapack.h
1484
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
The p?ormhr function overwrites the general real distributed m-by-n matrix sub(C)= C(iс:iс+m-1,jс:jс
+n-1) with
where Q is a real orthogonal distributed matrix of order nq, with nq = m if side = 'L' and nq = n if side =
'R'.
Q is defined as the product of ihi-ilo elementary reflectors, as returned by p?gehrd.
Input Parameters
side (global)
='L': Q or QT is applied from the left.
='R': Q or QT is applied from the right.
trans (global)
='N', no transpose, Q is applied.
='T', transpose, QT is applied.
m (global) The number of rows in the distributed matrix sub (C) (m≥0).
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+m-1) if
side = 'L', and lld_a*LOCc(ja+n-1) if side = 'R'.
Contains the vectors which define the elementary reflectors, as returned by
p?gehrd.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
tau (local)
1485
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
c (local)
Pointer into the local memory to an array of size lld_c*LOCc(jc+n-1).
ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the submatrix C, respectively.
descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.
work (local)
Workspace array with size lwork.
1486
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
NOTE
mod(x,y) is the integer remainder of x/y.
ilcm, indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL,
NPROW and NPCOL can be determined by calling the function
blacs_gridinfo.
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.
Output Parameters
work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?unmhr
Multiplies a general matrix by the unitary
transformation matrix from a reduction to Hessenberg
form determined by p?gehrd.
Syntax
void pcunmhr (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *ilo ,
MKL_INT *ihi , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca ,
MKL_Complex8 *tau , MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc ,
MKL_Complex8 *work , MKL_INT *lwork , MKL_INT *info );
void pzunmhr (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *ilo ,
MKL_INT *ihi , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca ,
MKL_Complex16 *tau , MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc ,
MKL_Complex16 *work , MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
This function overwrites the general complex distributed m-by-n matrix sub(C) = C(iс:iс+m-1,jс:jс+n-1)
with
1487
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
where Q is a complex unitary distributed matrix of order nq, with nq = m if side = 'L' and nq = n if side
= 'R'.
Q is defined as the product of ihi-ilo elementary reflectors, as returned by p?gehrd.
Input Parameters
side (global)
='L': Q or QH is applied from the left.
='R': Q or QH is applied from the right.
trans (global)
='N', no transpose, Q is applied.
='C', conjugate transpose, QH is applied.
m (global) The number of rows in the distributed matrix sub (C) (m≥0).
n (global) The number of columns in the distributed matrix sub (C) (n≥0).
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+m-1) if
side = 'L', and lld_a*LOCc(ja+n-1) if side = 'R'.
Contains the vectors which define the elementary reflectors, as returned by
p?gehrd.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
tau (local)
Array of size LOCc(ja+m-2), if side = 'L', and LOCc(ja+n-2) if side =
'R'.
tau[j] contains the scalar factor of the elementary reflector H(j+1) as
returned by p?gehrd (0 ≤ j < size(tau)). tau is tied to the distributed
matrix A.
1488
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
c (local)
Pointer into the local memory to an array of size lld_c*LOCc(jc+n-1).
ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the submatrix C, respectively.
descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.
work (local)
Workspace array with size lwork.
NOTE
mod(x,y) is the integer remainder of x/y.
ilcm, indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL,
NPROW and NPCOL can be determined by calling the function
blacs_gridinfo.
1489
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
work[0]) On exit work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?lahqr
Computes the Schur decomposition and/or
eigenvalues of a matrix already in Hessenberg form.
Syntax
void pslahqr (MKL_INT *wantt, MKL_INT *wantz, MKL_INT *n, MKL_INT *ilo, MKL_INT *ihi,
float *a, MKL_INT *desca, float *wr, float *wi, MKL_INT *iloz, MKL_INT *ihiz, float *z,
MKL_INT *descz, float *work, MKL_INT *lwork, MKL_INT *iwork, MKL_INT *ilwork, MKL_INT
*info );
void pdlahqr (MKL_INT *wantt, MKL_INT *wantz, MKL_INT *n, MKL_INT *ilo, MKL_INT *ihi,
double *a, MKL_INT *desca, double *wr, double *wi, MKL_INT *iloz, MKL_INT *ihiz, double
*z, MKL_INT *descz, double *work, MKL_INT *lwork, MKL_INT *iwork, MKL_INT *ilwork,
MKL_INT *info );
void pclahqr (const MKL_INT *wantt, const MKL_INT *wantz, const MKL_INT *n, const
MKL_INT *ilo, const MKL_INT *ihi, MKL_Complex8 *a, const MKL_INT *desca, MKL_Complex8
*w, const MKL_INT *iloz, const MKL_INT *ihiz, MKL_Complex8 *z, const MKL_INT *descz,
MKL_Complex8 *work, const MKL_INT *lwork, const MKL_INT *iwork, const MKL_INT *ilwork,
MKL_INT *info );
void pzlahqr (const MKL_INT *wantt, const MKL_INT *wantz, const MKL_INT *n, const
MKL_INT *ilo, const MKL_INT *ihi, MKL_Complex16 *a, const MKL_INT *desca, MKL_Complex16
*w, const MKL_INT *iloz, const MKL_INT *ihiz, MKL_Complex16 *z, const MKL_INT *descz,
MKL_Complex16 *work, const MKL_INT *lwork, const MKL_INT *iwork, const MKL_INT *ilwork,
MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
This is an auxiliary function used to find the Schur decomposition and/or eigenvalues of a matrix already in
Hessenberg form from columns ilo and ihi.
1490
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
NOTE
These restrictions apply to the use of p?lahqr:
• The code requires the distributed block size to be square and at least 6.
• The code requires A and Z to be distributed identically and have identical contexts.
• The matrix A must be in upper Hessenberg form. If elements below the subdiagonal are non-zero,
the resulting transformations can be nonsimilar.
• All eigenvalues are distributed to all the nodes.
Input Parameters
wantt (global)
If wantt≠ 0, the full Schur form T is required;
wantz (global)
If wantz≠ 0, the matrix of Schur vectors Z is required;
a (global)
Array, of size lld_a * LOCc(n) . On entry, the upper Hessenberg matrix A.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
iloz, ihiz (global) Specify the rows of the matrix Z to which transformations must be
applied if wantz is non-zero. 1≤iloz≤ilo; ihi≤ihiz≤n.
z (global )
Array. If wantz is non-zero, on entry z must contain the current matrix Z of
transformations accumulated by pdhseqr. If wantz is zero, z is not
referenced.
descz (global and local) array of size dlen_. The array descriptor for the
distributed matrix Z.
work (local)
Workspace array with size lwork.
lwork (local) The size of work. lwork is assumed big enough so that lwork≥3*n
+ max(2*max(lld_z,lld_a) + 2*LOCq(n), 7*ceil(n/hbl)/
lcm(NPROW,NPCOL))).
1491
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If lwork = -1, then work[0] gets set to the above number and the code
returns immediately.
iwork (global and local) array of size ilwork. Not referenced and can be NULL
pointer.
ilwork (local) This holds some of the iblk integer arrays. Not referenced and can be
NULL pointer.
Output Parameters
work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: the parameter number - info is incorrect or inconsistent
> 0: p?lahqr failed to compute all the eigenvalues ilo to ihi in a total of
30*(ihi-ilo+1) iterations; if info = i, elements i+1: ihi of wr and wi
contain the eigenvalues that have been successfully computed.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?trevc
Computes right and/or left eigenvectors of a complex
upper triangular matrix in parallel.
1492
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void pctrevc (const char* side, const char* howmny, const MKL_INT* select, const
MKL_INT* n, MKL_Complex8* t, const MKL_INT* desct, MKL_Complex8* vl, const MKL_INT*
descvl, MKL_Complex8* vr, const MKL_INT* descvr, const MKL_INT* mm, MKL_INT* m,
MKL_Complex8* work, float* rwork, MKL_INT* info);
void pztrevc (const char* side, const char* howmny, const MKL_INT* select, const
MKL_INT* n, MKL_Complex16* t, const MKL_INT* desct, MKL_Complex16* vl, const MKL_INT*
descvl, MKL_Complex16* vr, const MKL_INT* descvr, const MKL_INT* mm, MKL_INT* m,
MKL_Complex16* work, double* rwork, MKL_INT* info);
void pdtrevc (const char* side, const char* howmny, const MKL_INT* select, const
MKL_INT* n, double* t, const MKL_INT* desct, double* vl, const MKL_INT* descvl, double*
vr, const MKL_INT* descvr, const MKL_INT* mm, MKL_INT* m, double* work, MKL_INT* info);
void pstrevc (const char* side, const char* howmny, const MKL_INT* select, const
MKL_INT* n, float* t, const MKL_INT* desct, float* vl, const MKL_INT* descvl, float* vr,
const MKL_INT* descvr, const MKL_INT* mm, MKL_INT* m, float* work, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
p?trevc computes some or all of the right and/or left eigenvectors of a complex upper triangular matrix T in
parallel.
The right eigenvector x and the left eigenvector y of T corresponding to an eigenvalue w are defined by:
T*x = w*x,
y'*T = w*y'
where y' denotes the conjugate transpose of the vector y.
If all eigenvectors are requested, the routine may either return the matrices X and/or Y of right or left
eigenvectors of T, or the products Q*X and/or Q*Y, where Q is an input unitary matrix. If T was obtained
from the Schur factorization of an original matrix A = Q*T*Q', then Q*X and Q*Y are the matrices of right or
left eigenvectors of A.
Input Parameters
side (global)
= 'R': compute right eigenvectors only;
= 'L': compute left eigenvectors only;
= 'B': compute both right and left eigenvectors.
howmny (global)
= 'A': compute all right and/or left eigenvectors;
= 'B': compute all right and/or left eigenvectors, and backtransform them
using the input matrices supplied in vr and/or vl;
select (global)
1493
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
n (global)
The order of the matrix T. n >= 0.
t (local)
Array, size lld_t*LOCc(n).
vl (local)
Array, size (descvl(lld_),mm)
On entry, if side = 'L' or 'B' and howmny = 'B', vl must contain an n-by-n
matrix Q (usually the unitary matrix Q of Schur vectors returned
by ?hseqr).
vr (local)
Array, size descvr(lld_)*mm.
On entry, if side = 'R' or 'B' and howmny = 'B', vr must contain an n-by-n
matrix Q (usually the unitary matrix Q of Schur vectors returned
by ?hseqr).
mm (global)
The number of columns in the arrays vl and/or vr. mm >= m.
work (local)
Array, size ( 2*desct(lld_) )
1494
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
m (global)
The number of columns in the arrays vl and/or vr actually used to
store the eigenvectors. If howmny = 'A' or 'B', m is set to n. Each
selected eigenvector occupies one column.
info (global)
= 0: successful exit
< 0: if info = -i, the i-th argument had an illegal value
Application Notes
The algorithm used in this program is basically backward (forward) substitution. Scaling should be used to
make the code robust against possible overflow. But scaling has not yet been implemented in p?lattrs
which is called by this routine to solve the triangular systems. p?lattrs just calls p?trsv.
Each eigenvector is normalized so that the element of largest magnitude has magnitude 1; here the
magnitude of a complex number (x,y) is taken to be |x| + |y|.
1495
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
p?gebrd
Reduces a general matrix to bidiagonal form.
Syntax
void psgebrd (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *d , float *e , float *tauq , float *taup , float *work , MKL_INT
*lwork , MKL_INT *info );
void pdgebrd (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *d , double *e , double *tauq , double *taup , double *work , MKL_INT
*lwork , MKL_INT *info );
void pcgebrd (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , float *d , float *e , MKL_Complex8 *tauq , MKL_Complex8 *taup ,
MKL_Complex8 *work , MKL_INT *lwork , MKL_INT *info );
void pzgebrd (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *d , double *e , MKL_Complex16 *tauq , MKL_Complex16 *taup ,
MKL_Complex16 *work , MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?gebrd function reduces a real/complex general m-by-n distributed matrix sub(A)= A(ia:ia+m-1,
ja:ja+n-1) to upper or lower bidiagonal form B by an orthogonal/unitary transformation:
Q'*sub(A)*P = B.
If m≥ n, B is upper bidiagonal; if m < n, B is lower bidiagonal.
Input Parameters
a (local)
Real pointer into the local memory to an array of size lld_a*LOCc(ja+n-1).
On entry, this array contains the distributed matrix sub (A).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
work (local)
Workspace array of size lwork.
1496
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
iarow = indxg2p(ia, nb, MYROW, rsrc_a, NPROW),
iacol = indxg2p (ja, nb, MYCOL, csrc_a, NPCOL),
mpa0 = numroc(m +iroffa, nb, MYROW, iarow, NPROW),
nqa0 = numroc(n +icoffa, nb, MYCOL, iacol, NPCOL),
NOTE
mod(x,y) is the integer remainder of x/y.
indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.
Output Parameters
a On exit, if m≥n, the diagonal and the first superdiagonal of sub(A) are
overwritten with the upper bidiagonal matrix B; the elements below the
diagonal, with the array tauq, represent the orthogonal/unitary matrix Q as
a product of elementary reflectors, and the elements above the first
superdiagonal, with the array taup, represent the orthogonal matrix P as a
product of elementary reflectors. If m < n, the diagonal and the first
subdiagonal are overwritten with the lower bidiagonal matrix B; the
elements below the first subdiagonal, with the array tauq, represent the
orthogonal/unitary matrix Q as a product of elementary reflectors, and the
elements above the diagonal, with the array taup, represent the orthogonal
matrix P as a product of elementary reflectors. See Application Notes below.
d (local)
Array of size LOCc(ja+min(m,n)-1) if m≥n and LOCr(ia+min(m,n)-1)
otherwise. The distributed diagonal elements of the bidiagonal matrix B:
d[i] = A(i+1,i+1), 0 ≤ i < size (d).
d is tied to the distributed matrix A.
e (local)
Array of size LOCr(ia+min(m,n)-1) if m≥n; LOCc(ja+min(m,n)-2)
otherwise. The distributed off-diagonal elements of the bidiagonal
distributed matrix B:
If m≥n, e[i] = A(i+1,i+2) for i = 0,1,..., n-2; if m < n, e[i] = A(i+2,i+1)
for i = 0,1,..., m-2. e is tied to the distributed matrix A.
1497
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
Application Notes
The matrices Q and P are represented as products of elementary reflectors:
If m≥n,
If m < n,
Q = H(1)*H(2)*...*H(m-1), and P = G(1)* G(2)*...* G(m)
Each H (i) and G(i) has the form:
H(i)= i-tauq*v*v' and G(i)= i-taup*u*u'
here tauq and taup are real/complex scalars, and v and u are real/complex vectors;
v(1:i) = 0, v(i+1) = 1, and v(i+2:m) is stored on exit in A (ia+i:ia+m-1,ja+i-1); u(1:i-1) = 0, u(i) = 1, and
u(i+1:n) is stored on exit in A(ia+i-1,ja+i+1:ja+n-1);
1498
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
where d and e denote diagonal and off-diagonal elements of B, vi denotes an element of the vector defining
H(i), and ui an element of the vector defining G(i).
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?ormbr
Multiplies a general matrix by one of the orthogonal
matrices from a reduction to bidiagonal form
determined by p?gebrd.
Syntax
void psormbr (char *vect , char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT
*k , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *tau , float *c ,
MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , float *work , MKL_INT *lwork , MKL_INT
*info );
void pdormbr (char *vect , char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT
*k , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *tau , double *c ,
MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , double *work , MKL_INT *lwork , MKL_INT
*info );
Include Files
• mkl_scalapack.h
Description
If vect = 'Q', the p?ormbr function overwrites the general real distributed m-by-n matrix sub(C) = C(iс:iс
+m-1,jс:jс+n-1) with
Here Q and PT are the orthogonal distributed matrices determined by p?gebrd when reducing a real
distributed matrix A(ia:*, ja:*) to bidiagonal form: A(ia:*, ja:*) = Q*B*PT. Q and PT are defined as
products of elementary reflectors H(i) and G(i) respectively.
1499
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Let nq = m if side = 'L' and nq = n if side = 'R'. Therefore nq is the order of the orthogonal matrix Q or
PT that is applied.
If vect = 'Q', A(ia:*, ja:*) is assumed to have been an nq-by-k matrix:
If nq ≥ k, Q = H(1) H(2)...H(k);
Input Parameters
vect (global)
If vect ='Q', then Q or QT is applied.
side (global)
If side ='L', then Q or QT, P or PT is applied from the left.
trans (global)
If trans = 'N', no transpose, Q or P is applied.
k (global)
If vect = 'Q', the number of columns in the original distributed matrix
reduced by p?gebrd;
Constraints: k≥ 0.
a (local)
Pointer into the local memory to an array of size lld_a * LOCc(ja
+min(nq,k)-1) if vect='Q', and lld_a * LOCc(ja+nq-1) if vect = 'P'.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
1500
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
tau (local)
Array of size LOCc(ja+min(nq, k)-1), if vect = 'Q', and LOCr(ia
+min(nq, k)-1), if vect = 'P'.
tau[i] must contain the scalar factor of the elementary reflector H(i+1) or
G (i+1)
which determines Q or P, as returned by pdgebrd in its array argument
tauq or taup. tau is tied to the distributed matrix A.
c (local)
Pointer into the local memory to an array of size lld_c*LOCc(jc+n-1).
ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the submatrix C, respectively.
descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.
work (local)
Workspace array of size lwork.
If side = 'L'
nq = m;
if ((vect = 'Q' and nq≥k) or (vect is not equal to 'Q' and nq>k)),
iaa=ia; jaa=ja; mi=m; ni=n; icc=ic; jcc=jc;
else
iaa= ia+1; jaa=ja; mi=m-1; ni=n; icc=ic+1; jcc= jc;
end if
else
If side = 'R', nq = n;
else
iaa= ia; jaa= ja+1; mi= m; ni= n-1; icc= ic; jcc= jc+1;
end if
end if
If vect = 'Q',
1501
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
mod(x,y) is the integer remainder of x/y.
indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.
1502
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?unmbr
Multiplies a general matrix by one of the unitary
transformation matrices from a reduction to bidiagonal
form determined by p?gebrd.
Syntax
void pcunmbr (char *vect , char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT
*k , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau ,
MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work ,
MKL_INT *lwork , MKL_INT *info );
void pzunmbr (char *vect , char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT
*k , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16
*tau , MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex16
*work , MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
If vect = 'Q', the p?unmbr function overwrites the general complex distributed m-by-n matrix sub(C) =
C(iс:iс+m-1,jс:jс+n-1) with
Here Q and PH are the unitary distributed matrices determined by p?gebrd when reducing a complex
distributed matrix A(ia:*, ja:*) to bidiagonal form: A(ia:*, ja:*) = Q*B*PH.
1503
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Q and PH are defined as products of elementary reflectors H(i) and G(i) respectively.
Let nq = m if side = 'L' and nq = n if side = 'R'. Therefore nq is the order of the unitary matrix Q or PH
that is applied.
If vect = 'Q', A(ia:*, ja:*) is assumed to have been an nq-by-k matrix:
Input Parameters
vect (global)
If vect ='Q', then Q or QH is applied.
side (global)
If side ='L', then Q or QH, P or PH is applied from the left.
trans (global)
If trans = 'N', no transpose, Q or P is applied.
m (global) The number of rows in the distributed matrix sub (C) m≥0.
n (global) The number of columns in the distributed matrix sub (C) n≥0.
k (global)
If vect = 'Q', the number of columns in the original distributed matrix
reduced by p?gebrd;
Constraints: k≥ 0.
a (local)
Pointer into the local memory to an array of size lld_a * LOCc(ja
+min(nq,k)-1) if vect='Q', and lld_a * LOCc(ja+nq-1) if vect = 'P'.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
1504
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
tau (local)
Array of size LOCc(ja+min(nq, k)-1), if vect = 'Q', and LOCr(ia
+min(nq, k)-1), if vect = 'P'.
tau[i] must contain the scalar factor of the elementary reflector H(i+1) or
G (i+1), which determines Q or P, as returned by p?gebrd in its array
argument tauq or taup. tau is tied to the distributed matrix A.
c (local)
Pointer into the local memory to an array of size lld_c*LOCc(jc+n-1).
ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the submatrix C, respectively.
descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.
work (local)
Workspace array of size lwork.
If side = 'L'
nq = m;
if ((vect = 'Q' and nq ≥ k) or (vect is not equal to 'Q' and
nq>k)), iaa= ia; jaa= ja; mi= m; ni= n; icc= ic; jcc= jc;
else
iaa= ia+1; jaa= ja; mi= m-1; ni= n; icc= ic+1; jcc= jc;
end if
else
If side = 'R', nq = n;
else
iaa= ia; jaa= ja+1; mi= m; ni= n-1; icc= ic; jcc= jc+1;
end if
end if
If vect = 'Q',
1505
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
if side = 'L',
NOTE
mod(x,y) is the integer remainder of x/y.
indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.
1506
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
work[0] On exit work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?sygst
Reduces a real symmetric-definite generalized
eigenvalue problem to the standard form.
Syntax
void pssygst (MKL_INT *ibtype , char *uplo , MKL_INT *n , float *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb ,
float *scale , MKL_INT *info );
void pdsygst (MKL_INT *ibtype , char *uplo , MKL_INT *n , double *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb ,
double *scale , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?sygstfunction reduces real symmetric-definite generalized eigenproblems to the standard form.
In the following sub(A) denotes A(ia:ia+n-1, ja:ja+n-1) and sub(B) denotes B(ib:ib+n-1, jb:jb+n-1).
sub(A)*x = λ*sub(B)*x,
1507
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
If uplo = 'U', the upper triangle of sub(A) is stored and sub (B) is
factored as UT*U.
If uplo = 'L', the lower triangle of sub(A) is stored and sub (B) is
factored as L*LT.
n (global) The order of the matrices sub (A) and sub (B) (n≥ 0).
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1). On
entry, the array contains the local pieces of the n-by-n symmetric
distributed matrix sub(A).
If uplo = 'U', the leading n-by-n upper triangular part of sub(A) contains
the upper triangular part of the matrix, and its strictly lower triangular part
is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
the lower triangular part of the matrix, and its strictly upper triangular part
is not referenced.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
b (local)
Pointer into the local memory to an array of size lld_b*LOCc(jb+n-1). On
entry, the array contains the local pieces of the triangular factor from the
Cholesky factorization of sub (B) as returned by p?potrf.
ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of the submatrix B, respectively.
descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
1508
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
scale (global)
Amount by which the eigenvalues should be scaled to compensate for the
scaling performed in this function. At present, scale is always returned as
1.0, it is returned here to allow for future enhancement.
info (global)
If info = 0, the execution is successful. If info < 0, if the i-th argument
is an array and the j-th entry, indexed j - 1, had an illegal value, then info
= -(i*100+j); if the i-th argument is a scalar and had an illegal value, then
info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?hegst
Reduces a Hermitian positive-definite generalized
eigenvalue problem to the standard form.
Syntax
void pchegst (MKL_INT *ibtype , char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *jb ,
MKL_INT *descb , float *scale , MKL_INT *info );
void pzhegst (MKL_INT *ibtype , char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *jb ,
MKL_INT *descb , double *scale , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?hegst function reduces complex Hermitian positive-definite generalized eigenproblems to the
standard form.
In the following sub(A) denotes A(ia:ia+n-1, ja:ja+n-1) and sub(B) denotes B(ib:ib+n-1, jb:jb+n-1).
sub(A)*x = λ*sub(B)*x,
and sub(A) is overwritten by inv(UH)*sub(A)*inv(U), or inv(L)*sub(A)*inv(LH).
If ibtype = 2 or 3, the problem is
1509
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
If uplo = 'U', the upper triangle of sub(A) is stored and sub (B) is
factored as UH*U.
If uplo = 'L', the lower triangle of sub(A) is stored and sub (B) is
factored as L*LH.
n (global) The order of the matrices sub (A) and sub (B) (n≥0).
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1). On
entry, the array contains the local pieces of the n-by-n Hermitian distributed
matrix sub(A). If uplo = 'U', the leading n-by-n upper triangular part of
sub(A) contains the upper triangular part of the matrix, and its strictly
lower triangular part is not referenced. If uplo = 'L', the leading n-by-n
lower triangular part of sub(A) contains the lower triangular part of the
matrix, and its strictly upper triangular part is not referenced.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
b (local)
Pointer into the local memory to an array of size lld_b*LOCc(jb+n-1). On
entry, the array contains the local pieces of the triangular factor from the
Cholesky factorization of sub (B) as returned by p?potrf.
ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of the submatrix B, respectively.
descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
Output Parameters
scale (global)
Amount by which the eigenvalues should be scaled to compensate for the
scaling performed in this function. At present, scale is always returned as
1.0, it is returned here to allow for future enhancement.
info (global)
1510
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If info = 0, the execution is successful. If info <0, if the i-th argument is
an array and the j-th entry, indexed j - 1, had an illegal value, then info =
-(i*100+j); if the i-th argument is a scalar and had an illegal value, then
info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?geevx
Computes for an n-by-n real/complex non-symmetric
matrix A, the eigenvalues and, optionally, the left
and/or right eigenvectors.
Syntax
void psgeevx (const char *balanc, const char *jobvl, const char *jobvr, const char
*sense, const MKL_INT *n, float *a, const MKL_INT *desca, float *wr, float *wi, float
*vl, const MKL_INT *descvl, float *vr, const MKL_INT *descvr, MKL_INT *ilo, MKL_INT
*ihi, float *scale, float *abnrm, float *rconde, float *rcondv, float *work, const
MKL_INT *lwork, MKL_INT *info);
1511
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
void pdgeevx (const char *balanc, const char *jobvl, const char *jobvr, const char
*sense, const MKL_INT *n, double *a, const MKL_INT *desca, double *wr, double *wi,
double *vl, const MKL_INT *descvl, double *vr, const MKL_INT *descvr, MKL_INT *ilo,
MKL_INT *ihi, double *scale, double *abnrm, double *rconde, double *rcondv, double
*work, const MKL_INT *lwork, MKL_INT *info);
void pcgeevx (const char *balanc, const char *jobvl, const char *jobvr, const char
*sense, const MKL_INT *n, MKL_Complex8 *a, const MKL_INT *desca, MKL_Complex8 *w,
MKL_Complex8 *vl, const MKL_INT *descvl, MKL_Complex8 *vr, const MKL_INT *descvr,
MKL_INT *ilo, MKL_INT *ihi, float *scale, float *abnrm, float *rconde, float *rcondv,
MKL_Complex8 *work, const MKL_INT *lwork, MKL_INT *info);
void pzgeevx (const char *balanc, const char *jobvl, const char *jobvr, const char
*sense, const MKL_INT *n, MKL_Complex16 *a, const MKL_INT *desca, MKL_Complex16 *w,
MKL_Complex16 *vl, const MKL_INT *descvl, MKL_Complex16 *vr, const MKL_INT *descvr,
MKL_INT *ilo, MKL_INT *ihi, double *scale, double *abnrm, double *rconde, double
*rcondv, MKL_Complex16 *work, const MKL_INT *lwork, MKL_INT *info);
Include Files
• mkl_scalapack.h
Description
The p?geevx function computes for an n-by-n real/complex non-symmetric matrix A, the eigenvalues and,
optionally, the left and/or right eigenvectors.
Optionally also, it computes a balancing transformation to improve the conditioning of the eigenvalues and
eigenvectors (ilo, ihi, scale, and abnrm), reciprocal condition numbers for the eigenvalues (rconde).
The right eigenvector v of A satisfies
A⋅v = λ⋅v
where ƛ is its eigenvalue.
The left eigenvector u of A satisfies.
uHA = ƛuH
where uH denotes the conjugate transpose of u. The computed eigenvectors are normalized to have
Euclidean norm equal to 1 and largest component real.
Balancing a matrix means permuting the rows and columns to make it more nearly upper triangular, and
applying a diagonal similarity transformation D*A*inv(D), where D is a diagonal matrix, to make its rows and
columns closer in norm and the condition number of its eigenvalues smaller. The computed reciprocal
condition numbers correspond to the balanced matrix. Permuting rows and columns will not change the
condition numbers in exact arithmetic, but diagonal scaling will.
NOTE
The current version doesn’t support computation of the reciprocal condition numbers for the
right eigenvectors.
• The current implementation of p?lahqr requires the distributed block size to be square and at least six
(6); unlike simpler codes like LU, this algorithm is extremely sensitive to block size.
1512
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• The current implementation of p?lahqr requires that input matrix A, the left and right eigenvector
matrices VR and/or VL to be distributed identically and have identical context.
Parameters
balanc (global). Must be 'N', 'P', 'S', or 'B'. Indicates how the input matrix should
be diagonally scaled and/or permuted to improve the conditioning of its
eigenvalues.
If balanc = 'N', do not diagonally scale or permute;
If balanc = 'P', perform permutations to make the matrix more nearly upper
triangular. Do not diagonally scale;
If balanc = 'S', diagonally scale the matrix, that is, replace A by
D*A*inv(D), where D is a diagonal matrix chosen to make the rows and
columns of A more equal in norm. Do not permute;
If balanc = 'B', both diagonally scale and permute A.
sense (global). Must be 'N' or 'E. Determines which reciprocal condition numbers
are computed.
If sense = 'N', none are computed.
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(n). On entry,
this array contains the local pieces of the n-by-n general distributed matrix
A to be reduced.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
wr, wi (global output) Arrays, size at least max (1, n) each. Contain the real and
imaginary parts, respectively, of the computed eigenvalues. Complex
conjugate pairs of eigenvalues appear consecutively with the eigenvalue
having positive imaginary part first.
1513
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
w (global output) Array, size at least max(1, n). Contains the computed
eigenvalues.
vl (local output)
Pointer into the local memory to an array of size (DESCVL(LLD_),LOCc(n)).
If jobvl = 'N', vl is not referenced. If jobvl = 'V', the vl parameter contains
the local pieces of the left eigenvectors of the matrix A.
descvl (global and local input) array of size dlen_. The array descriptor for the
distributed matrix vl.
vr (local output)
Pointer into the local memory to an array of size (DESCVR(LLD_),LOCc(n)).
If jobvr = 'N', vr is not referenced. If jobvr = 'V', the vr parameter contains
the local pieces of the right eigenvectors of the matrix A.
descvr (global and local input) array of size dlen_. The array descriptor for the
distributed matrix vr.
The order in which the interchanges are made is n to ihi+1, then 1 to ilo-1.
abnrm The one-norm of the balanced matrix (the maximum of the sum of absolute
values of elements of any column).
work (local)
Workspace array of size lwork.
1514
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j- 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and had
an illegal value, then info = -i.
p?gesv
Computes the solution to the system of linear
equations with a square distributed matrix and
multiple right-hand sides.
Syntax
void psgesv (MKL_INT *n , MKL_INT *nrhs , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *ipiv , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT
*info );
void pdgesv (MKL_INT *n , MKL_INT *nrhs , double *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *ipiv , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , MKL_INT *info );
void pcgesv (MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *ipiv , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , MKL_INT *info );
void pzgesv (MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , MKL_INT *ipiv , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *jb ,
MKL_INT *descb , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?gesvfunction computes the solution to a real or complex system of linear equations sub(A)*X =
sub(B), where sub(A) = A(ia:ia+n-1, ja:ja+n-1) is an n-by-n distributed matrix and X and sub(B) =
B(ib:ib+n-1, jb:jb+nrhs-1) are n-by-nrhs distributed matrices.
The LU decomposition with partial pivoting and row interchanges is used to factor sub(A) as sub(A) =
P*L*U, where P is a permutation matrix, L is unit lower triangular, and U is upper triangular. L and U are
stored in sub(A). The factored form of sub(A) is then used to solve the system of equations sub(A)*X =
sub(B).
Input Parameters
n (global) The number of rows and columns to be operated on, that is, the
order of the distributed submatrix sub(A) (n≥ 0).
nrhs (global) The number of right hand sides, that is, the number of columns of
the distributed submatrices B and X(nrhs≥ 0).
a, b (local)
Pointers into the local memory to arrays of local size a: lld_a*LOCc(ja
+n-1) and b: lld_b*LOCc(jb+nrhs-1), respectively.
1515
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
On entry, the array a contains the local pieces of the n-by-n distributed
matrix sub(A) to be factored.
On entry, the array b contains the right hand side distributed matrix sub(B).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of sub(B), respectively.
descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
Output Parameters
ipiv (local) Array of size LOCr(m_a)+mb_a. This array contains the pivoting
information. The (local) row i of the matrix was interchanged with the
(global) row ipiv[i - 1].
info < 0:
If the i-th argument is an array and the j-th entry had an illegal value, then
info = -(i*100+j); if the i-th argument is a scalar and had an illegal
value, then info = -i.
info> 0:
If info = k, U(ia+k-1,ja+k-1) is exactly zero. The factorization has been
completed, but the factor U is exactly singular, so the solution could not be
computed.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?gesvx
Uses the LU factorization to compute the solution to
the system of linear equations with a square matrix A
and multiple right-hand sides, and provides error
bounds on the solution.
Syntax
void psgesvx (char *fact , char *trans , MKL_INT *n , MKL_INT *nrhs , float *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , float *af , MKL_INT *iaf , MKL_INT *jaf , MKL_INT
*descaf , MKL_INT *ipiv , char *equed , float *r , float *c , float *b , MKL_INT *ib ,
1516
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
MKL_INT *jb , MKL_INT *descb , float *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx ,
float *rcond , float *ferr , float *berr , float *work , MKL_INT *lwork , MKL_INT
*iwork , MKL_INT *liwork , MKL_INT *info );
void pdgesvx (char *fact , char *trans , MKL_INT *n , MKL_INT *nrhs , double *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *af , MKL_INT *iaf , MKL_INT *jaf ,
MKL_INT *descaf , MKL_INT *ipiv , char *equed , double *r , double *c , double *b ,
MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , double *x , MKL_INT *ix , MKL_INT *jx ,
MKL_INT *descx , double *rcond , double *ferr , double *berr , double *work , MKL_INT
*lwork , MKL_INT *iwork , MKL_INT *liwork , MKL_INT *info );
void pcgesvx (char *fact , char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *af , MKL_INT *iaf , MKL_INT
*jaf , MKL_INT *descaf , MKL_INT *ipiv , char *equed , float *r , float *c ,
MKL_Complex8 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_Complex8 *x ,
MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , float *rcond , float *ferr , float *berr ,
MKL_Complex8 *work , MKL_INT *lwork , float *rwork , MKL_INT *lrwork , MKL_INT *info );
void pzgesvx (char *fact , char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *af , MKL_INT *iaf ,
MKL_INT *jaf , MKL_INT *descaf , MKL_INT *ipiv , char *equed , double *r , double *c ,
MKL_Complex16 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_Complex16 *x ,
MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , double *rcond , double *ferr , double
*berr , MKL_Complex16 *work , MKL_INT *lwork , double *rwork , MKL_INT *lrwork ,
MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?gesvx function uses the LU factorization to compute the solution to a real or complex system of linear
equations AX = B, where A denotes the n-by-n submatrix A(ia:ia+n-1, ja:ja+n-1), B denotes the n-by-
nrhs submatrix B(ib:ib+n-1, jb:jb+nrhs-1) and X denotes the n-by-nrhs submatrix X(ix:ix+n-1,
jx:jx+nrhs-1).
Error bounds on the solution and a condition estimate are also provided.
In the following description, af stands for the subarray of af from row iaf and column jaf to row iaf+n-1 and
column jaf+n-1.
The function p?gesvx performs the following steps:
1. If fact = 'E', real scaling factors R and C are computed to equilibrate the system:
1517
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
If trans = 'C', the system has the form AH*X = B (Conjugate transpose);
n (global) The number of linear equations; the order of the submatrix A(n≥
0).
nrhs (global) The number of right hand sides; the number of columns of the
distributed submatrices B and X(nrhs≥ 0).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A(ia:ia+n-1, ja:ja
+n-1), respectively.
1518
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
iaf, jaf (global) The row and column indices in the global matrix AF indicating the
first row and the first column of the subarray af, respectively.
descaf (global and local) array of size dlen_. The array descriptor for the
distributed matrix AF.
ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of the submatrix B(ib:ib+n-1, jb:jb
+nrhs-1), respectively.
descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
equed (global) Must be 'N', 'R', 'C', or 'B'. equed is an input argument if fact
= 'F' . It specifies the form of equilibration that was done:
If equed = 'R', row equilibration was done, that is, A has been
premultiplied by diag(r);
If equed = 'C', column equilibration was done, that is, A has been
postmultiplied by diag(c);
If equed = 'B', both row and column equilibration was done; A has been
replaced by diag(r)*A*diag(c).
r, c (local)
Arrays of size LOCr(m_a) and LOCc(n_a), respectively.
The array r contains the row scale factors for A, and the array c contains
the column scale factors for A. These arrays are input arguments if fact =
'F' only; otherwise they are output arguments. If equed = 'R' or 'B', A
is multiplied on the left by diag(r); if equed = 'N' or 'C', r is not
accessed.
If fact = 'F' and equed = 'R' or 'B', each element of r must be
positive.
If equed = 'C' or 'B', A is multiplied on the right by diag(c); if equed =
'N' or 'R', c is not accessed.
If fact = 'F' and equed = 'C' or 'B', each element of c must be
positive. Array r is replicated in every process column, and is aligned with
the distributed matrix A. Array c is replicated in every process row, and is
aligned with the distributed matrix A.
1519
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ix, jx (global) The row and column indices in the global matrix X indicating the
first row and the first column of the submatrix X(ix:ix+n-1, jx:jx
+nrhs-1), respectively.
descx (global and local) array of size dlen_. The array descriptor for the
distributed matrix X.
lwork (local or global) The size of the array work ; must be at least
max(p?gecon(lwork), p?gerfs(lwork))+LOCr(n_a).
liwork (local, psgesvx/pdgesvx only). The size of the array iwork , must be at
least LOCr(n_a).
rwork (local)
Workspace array, used in complex flavors only.
The size of rwork is (lrwork).
lrwork (local or global, pcgesvx/pzgesvx only). The size of the array rwork;must
be at least 2*LOCc(n_a) .
Output Parameters
x (local)
Pointer into the local memory to an array of local size lld_x*LOCc(jx
+nrhs-1).
If info = 0, the array x contains the solution matrix X to the original
system of equations. Note that A and B are modified on exit if equed≠'N',
and the solution to the equilibrated system is:
diag(C)-1*X, if trans = 'N' and equed = 'C' or 'B'; and
diag(R)-1*X, if trans = 'T' or 'C' and equed = 'R' or 'B'.
a Array a is not modified on exit if fact = 'F' or 'N', or if fact = 'E' and
equed = 'N'.
If equed≠'N', A is scaled on exit as follows:
1520
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
See the description of r, c in Input Arguments section.
rcond (global).
An estimate of the reciprocal condition number of the matrix A after
equilibration (if done). The function sets rcond =0 if the estimate
underflows; in this case the matrix is singular (to working precision).
However, anytime rcond is small compared to 1.0, for the working
precision, the matrix may be poorly conditioned or even singular.
ipiv If fact = 'N' or 'E', then ipiv is an output argument and on exit contains
the pivot indices from the factorization A = P*L*U of the original matrix A
(if fact = 'N') or of the equilibrated matrix A (if fact = 'E').
work[0] If info=0, on exit work[0] returns the minimum value of lwork required
for optimum performance.
iwork[0] If info=0, on exit iwork[0] returns the minimum value of liwork required
for optimum performance.
rwork[0] If info=0, on exit rwork[0] returns the minimum value of lrwork required
for optimum performance.
info < 0: if the ith argument is an array and the jth entry had an illegal
value, then info = -(i*100+j); if the ith argument is a scalar and had an
illegal value, then info = -i. If info = i, and i ≤ n, then U(i,i) is
exactly zero. The factorization has been completed, but the factor U is
exactly singular, so the solution and error bounds could not be computed. If
info = i, and i = n +1, then U is nonsingular, but rcond is less than
machine precision. The factorization has been completed, but the matrix is
singular to working precision and the solution and error bounds have not
been computed.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?gbsv
Computes the solution to the system of linear
equations with a general banded distributed matrix
and multiple right-hand sides.
1521
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
void psgbsv (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs , float *a ,
MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , float *b , MKL_INT *ib , MKL_INT *descb ,
float *work , MKL_INT *lwork , MKL_INT *info );
void pdgbsv (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs , double *a ,
MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , double *b , MKL_INT *ib , MKL_INT
*descb , double *work , MKL_INT *lwork , MKL_INT *info );
void pcgbsv (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs , MKL_Complex8
*a , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , MKL_Complex8 *b , MKL_INT *ib ,
MKL_INT *descb , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT *info );
void pzgbsv (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs , MKL_Complex16
*a , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , MKL_Complex16 *b , MKL_INT *ib ,
MKL_INT *descb , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?gbsvfunction computes the solution to a real or complex system of linear equations
sub(A)*X = sub(B),
where sub(A) = A(1:n, ja:ja+n-1) is an n-by-n real/complex general banded distributed matrix with bwl
subdiagonals and bwu superdiagonals, and X and sub(B)= B(ib:ib+n-1, 1:rhs) are n-by-nrhs distributed
matrices.
The LU decomposition with partial pivoting and row interchanges is used to factor sub(A) as sub(A) =
P*L*U*Q, where P and Q are permutation matrices, and L and U are banded lower and upper triangular
matrices, respectively. The matrix Q represents reordering of columns for the sake of parallelism, while P
represents reordering of rows for numerical stability using classic partial pivoting.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
n (global) The number of rows and columns to be operated on, that is, the
order of the distributed matrix sub(A) (n≥ 0).
bwl (global) The number of subdiagonals within the band of A (0≤ bwl ≤ n-1 ).
bwu (global) The number of superdiagonals within the band of A (0≤ bwu ≤
n-1 ).
nrhs (global) The number of right hand sides; the number of columns of the
distributed matrix sub(B) (nrhs≥ 0).
a, b (local)
Pointers into the local memory to arrays of local size a: lld_a*LOCc(ja
+n-1) and b: lld_b*LOCc(nrhs).
1522
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
On entry, the array a contains the local pieces of the global array A.
On entry, the array b contains the right hand side distributed matrix sub(B).
ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
If desca[dtype_ - 1] = 501, then dlen_≥ 7;
ib (global) The row index in the global matrix B indicating the first row of the
matrix to be operated on (which may be either all of B or a submatrix of B).
descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
If descb[dtype_-1] = 502, then dlen_≥ 7;
work (local)
Workspace array of size lwork.
lwork (local or global) The size of the array work, must be at least lwork≥ (NB
+bwu)*(bwl+bwu)+6*(bwl+bwu)*(bwl+2*bwu) +
+ max(nrhs *(NB+2*bwl+4*bwu), 1).
Output Parameters
b On exit, this array contains the local pieces of the solution distributed
matrix X.
work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.
If the ith argument is an array and the j-th entry had an illegal value, then
info = -(i*100+j); if the ith argument is a scalar and had an illegal
value, then info = -i.
info> 0:
1523
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?dbsv
Solves a general band system of linear equations.
Syntax
void psdbsv (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs , float *a ,
MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *descb , float *work ,
MKL_INT *lwork , MKL_INT *info );
void pddbsv (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs , double *a ,
MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *descb , double *work ,
MKL_INT *lwork , MKL_INT *info );
void pcdbsv (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs , MKL_Complex8
*a , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *descb ,
MKL_Complex8 *work , MKL_INT *lwork , MKL_INT *info );
void pzdbsv (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs , MKL_Complex16
*a , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *descb ,
MKL_Complex16 *work , MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?dbsvfunction solves the following system of linear equations:
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
1524
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
nrhs (global) The number of right-hand sides; the number of columns of the
distributed submatrix B, (nrhs ≥ 0).
a (local).
Pointer into the local memory to an array with leading size lld_a ≥ (bwl
+bwu+1) (stored in desca). On entry, this array contains the local pieces of
the distributed matrix.
ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).
b (local)
Pointer into the local memory to an array of local lead size lld_b ≥ nb. On
entry, this array contains the local pieces of the right hand sides B(ib:ib
+n-1, 1:nrhs).
ib (global) The row index in the global matrix B indicating the first row of the
matrix to be operated on (which may be either all of b or a submatrix of B).
work (local).
Temporary workspace. This space may be overwritten in between calls to
functions. work must be the size given in lwork.
lwork (local or global) Size of user-input workspace work. If lwork is too small,
the minimal acceptable size will be returned in work[0] and an error code
is returned.
lwork ≥ nb(bwl+bwu)+6max(bwl,bwu)*max(bwl,bwu)
+max((max(bwl,bwu)nrhs), max(bwl,bwu)*max(bwl,bwu))
Output Parameters
b On exit, this contains the local piece of the solutions distributed matrix X.
1525
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
< 0: If the i-th argument is an array and the j-entry had an illegal value,
then info = -(i*100+j), if the i-th argument is a scalar and had an
illegal value, then info = -i.
> 0: If info = k < NPROCS, the submatrix stored on processor info and
factored locally was not positive definite, and the factorization was not
completed.
If info = k > NPROCS, the submatrix stored on processor info-NPROCS
representing interactions with other processors was not positive definite,
and the factorization was not completed.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?dtsv
Solves a general tridiagonal system of linear
equations.
Syntax
void psdtsv (MKL_INT *n , MKL_INT *nrhs , float *dl , float *d , float *du , MKL_INT
*ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *descb , float *work , MKL_INT
*lwork , MKL_INT *info );
void pddtsv (MKL_INT *n , MKL_INT *nrhs , double *dl , double *d , double *du , MKL_INT
*ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *descb , double *work , MKL_INT
*lwork , MKL_INT *info );
void pcdtsv (MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *dl , MKL_Complex8 *d ,
MKL_Complex8 *du , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib ,
MKL_INT *descb , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT *info );
void pzdtsv (MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *dl , MKL_Complex16 *d ,
MKL_Complex16 *du , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib ,
MKL_INT *descb , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The function solves a system of linear equations
A(1:n, ja:ja+n-1) * X = B(ib:ib+n-1, 1:nrhs),
where A(1:n, ja:ja+n-1) is an n-by-n complex tridiagonal diagonally dominant-like distributed matrix.
Gaussian elimination without pivoting is used to factor a reordering of the matrix into L U.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
1526
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Product and Performance Information
Input Parameters
nrhs The number of right hand sides; the number of columns of the distributed
matrix B(nrhs≥ 0).
dl (local).
Pointer to local part of global vector storing the lower diagonal of the
matrix. Globally, dl[0] is not referenced, and dl must be aligned with d.
Must be of size > desca[nb_ - 1].
d (local).
Pointer to local part of global vector storing the main diagonal of the matrix.
du (local).
Pointer to local part of global vector storing the upper diagonal of the
matrix. Globally, du[n - 1] is not referenced, and du must be aligned with
d.
ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).
b (local)
Pointer into the local memory to an array of local lead size lld_b > nb. On
entry, this array contains the local pieces of the right hand sides B(ib:ib
+n-1, 1:nrhs).
ib (global) The row index in the global matrix B indicating the first row of the
matrix to be operated on (which may be either all of b or a submatrix of B).
work (local).
1527
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
lwork (local or global) Size of user-input workspace work. If lwork is too small,
the minimal acceptable size will be returned in work[0] and an error code
is returned. lwork > (12*NPCOL+3*nb)+max((10+2*min(100,
nrhs))*NPCOL+4*nrhs, 8*NPCOL)
Output Parameters
b On exit, this contains the local piece of the solutions distributed matrix X.
< 0: If the i-th argument is an array and the j-entry had an illegal value,
then info = -(i*100+j), if the i-th argument is a scalar and had an
illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?posv
Solves a symmetric positive definite system of linear
equations.
Syntax
void psposv (char *uplo , MKL_INT *n , MKL_INT *nrhs , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT
*info );
void pdposv (char *uplo , MKL_INT *n , MKL_INT *nrhs , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT
*info );
void pcposv (char *uplo , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , MKL_INT *info );
void pzposv (char *uplo , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , MKL_INT *info );
1528
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl_scalapack.h
Description
The p?posvfunction computes the solution to a real/complex system of linear equations
sub(A)*X = sub(B),
where sub(A) denotes A(ia:ia+n-1,ja:ja+n-1) and is an n-by-n symmetric/Hermitian distributed positive
definite matrix and X and sub(B) denoting B(ib:ib+n-1,jb:jb+nrhs-1) are n-by-nrhs distributed
matrices. The Cholesky decomposition is used to factor sub(A) as
sub(A) = UT*U, if uplo = 'U', or
sub(A) = L*LT, if uplo = 'L',
where U is an upper triangular matrix and L is a lower triangular matrix. The factored form of sub(A) is then
used to solve the system of equations.
Input Parameters
nrhs The number of right-hand sides; the number of columns of the distributed
matrix sub(B) (nrhs≥ 0).
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1). On
entry, this array contains the local pieces of the n-by-n symmetric
distributed matrix sub(A) to be factored.
If uplo = 'U', the leading n-by-n upper triangular part of sub(A) contains
the upper triangular part of the matrix, and its strictly lower triangular part
is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
the lower triangular part of the distributed matrix, and its strictly upper
triangular part is not referenced.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
b (local)
Pointer into the local memory to an array of size lld_b*LOCc(jb+nrhs-1).
On entry, the local pieces of the right hand sides distributed matrix sub(B).
ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of the submatrix B, respectively.
descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
1529
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
a On exit, if info = 0, this array contains the local pieces of the factor U or L
from the Cholesky factorization sub(A) = UH*U, or L*LH.
info (global)
If info =0, the execution is successful.
If info < 0: If the i-th argument is an array and the j-th entry, indexed
j-1, had an illegal value, then info = -(i*100+j), if the i-th argument is
a scalar and had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?posvx
Solves a symmetric or Hermitian positive definite
system of linear equations.
Syntax
void psposvx (char *fact , char *uplo , MKL_INT *n , MKL_INT *nrhs , float *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , float *af , MKL_INT *iaf , MKL_INT *jaf , MKL_INT
*descaf , char *equed , float *sr , float *sc , float *b , MKL_INT *ib , MKL_INT *jb ,
MKL_INT *descb , float *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , float *rcond ,
float *ferr , float *berr , float *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT
*liwork , MKL_INT *info );
void pdposvx (char *fact , char *uplo , MKL_INT *n , MKL_INT *nrhs , double *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , double *af , MKL_INT *iaf , MKL_INT *jaf , MKL_INT
*descaf , char *equed , double *sr , double *sc , double *b , MKL_INT *ib , MKL_INT
*jb , MKL_INT *descb , double *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , double
*rcond , double *ferr , double *berr , double *work , MKL_INT *lwork , MKL_INT *iwork ,
MKL_INT *liwork , MKL_INT *info );
void pcposvx (char *fact , char *uplo , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *af , MKL_INT *iaf , MKL_INT
*jaf , MKL_INT *descaf , char *equed , float *sr , float *sc , MKL_Complex8 *b , MKL_INT
*ib , MKL_INT *jb , MKL_INT *descb , MKL_Complex8 *x , MKL_INT *ix , MKL_INT *jx ,
MKL_INT *descx , float *rcond , float *ferr , float *berr , MKL_Complex8 *work ,
MKL_INT *lwork , float *rwork , MKL_INT *lrwork , MKL_INT *info );
void pzposvx (char *fact , char *uplo , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *af , MKL_INT *iaf , MKL_INT
*jaf , MKL_INT *descaf , char *equed , double *sr , double *sc , MKL_Complex16 *b ,
MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_Complex16 *x , MKL_INT *ix , MKL_INT
*jx , MKL_INT *descx , double *rcond , double *ferr , double *berr , MKL_Complex16
*work , MKL_INT *lwork , double *rwork , MKL_INT *lrwork , MKL_INT *info );
1530
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl_scalapack.h
Description
The p?posvxfunction uses the Cholesky factorization A=UT*U or A=L*LT to compute the solution to a real or
complex system of linear equations
A(ia:ia+n-1, ja:ja+n-1)*X = B(ib:ib+n-1, jb:jb+nrhs-1),
where A(ia:ia+n-1, ja:ja+n-1) is a n-by-n matrix and X and B(ib:ib+n-1,jb:jb+nrhs-1) are n-by-
nrhs matrices.
Error bounds on the solution and a condition estimate are also provided.
In the following comments y denotes Y(iy:iy+m-1, jy:jy+k-1), an m-by-k matrix where y can be a, af, b
and x.
The function p?posvx performs the following steps:
1. If fact = 'E', real scaling factors s are computed to equilibrate the system:
diag(sr)*A*diag(sc)*inv(diag(sc))*X = diag(sr)*B
Whether or not the system will be equilibrated depends on the scaling of the matrix A, but if
equilibration is used, A is overwritten by diag(sr)*A*diag(sc) and B by diag(sr)*B .
2. If fact = 'N' or 'E', the Cholesky decomposition is used to factor the matrix A (after equilibration if
fact = 'E') as
A = UT*U, if uplo = 'U', or
A = L*LT, if uplo = 'L',
where U is an upper triangular matrix and L is a lower triangular matrix.
3. The factored form of A is used to estimate the condition number of the matrix A. If the reciprocal of the
condition number is less than machine precision, steps 4-6 are skipped
4. The system of equations is solved for X using the factored form of A.
5. Iterative refinement is applied to improve the computed solution matrix and calculate error bounds and
backward error estimates for it.
6. If equilibration was used, the matrix X is premultiplied by diag(sr) so that it solves the original system
before equilibration.
Input Parameters
1531
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
nrhs (global) The number of right-hand sides; the number of columns of the
distributed submatrices B and X. (nrhs≥ 0).
a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja
+n-1). On entry, the symmetric/Hermitian matrix A, except if fact = 'F'
and equed = 'Y', then A must contain the equilibrated matrix
diag(sr)*A*diag(sc).
If uplo = 'U', the leading n-by-n upper triangular part of A contains the
upper triangular part of the matrix A, and the strictly lower triangular part
of A is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of A contains the
lower triangular part of the matrix A, and the strictly upper triangular part
of A is not referenced. A is not modified if fact = 'F' or 'N', or if fact =
'E' and equed = 'N' on exit.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
af (local)
Pointer into the local memory to an array of local size lld_af*LOCc(ja
+n-1).
If fact = 'F', then af is an input argument and on entry contains the
triangular factor U or L from the Cholesky factorization A = UT*U or A =
L*LT, in the same storage format as A. If equed ≠ 'N', then af is the
factored form of the equilibrated matrix diag(sr)*A*diag(sc).
iaf, jaf (global) The row and column indices in the global matrix AF indicating the
first row and the first column of the submatrix AF, respectively.
descaf (global and local) array of size dlen_. The array descriptor for the
distributed matrix AF.
sr (local)
Array of size lld_a.
The array s contains the scale factors for A. This array is an input argument
if fact = 'F' only; otherwise it is an output argument.
1532
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If fact = 'F' and equed = 'Y', each element of s must be positive.
b (local)
Pointer into the local memory to an array of local size lld_b*LOCc(jb
+nrhs-1). On entry, the n-by-nrhs right-hand side matrix B.
ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of the submatrix B, respectively.
descb (global and local) Array of size dlen_. The array descriptor for the
distributed matrix B.
x (local)
Pointer into the local memory to an array of local size lld_x*LOCc(jx
+nrhs-1).
ix, jx (global) The row and column indices in the global matrix X indicating the
first row and the first column of the submatrix X, respectively.
descx (global and local) array of size dlen_. The array descriptor for the
distributed matrix X.
work (local)
Workspace array of size lwork.
Output Parameters
1533
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
x (local)
If info = 0 the n-by-nrhs solution matrix X to the original system of
equations.
Note that A and B are modified on exit if equed≠'N', and the solution to
the equilibrated system is
inv(diag(sc))*X if trans = 'N' and equed = 'C' or 'B', or
inv(diag(sr))*X if trans = 'T' or 'C' and equed = 'R' or 'B'.
rcond (global)
An estimate of the reciprocal condition number of the matrix A after
equilibration (if done). If rcond is less than the machine precision (in
particular, if rcond=0), the matrix is singular to working precision. This
condition is indicated by a return code of info > 0.
ferr Arrays of size at least max(LOC,n_b). The estimated forward error bounds
for each solution vector X(j) (the j-th column of the solution matrix X). If
xtrue is the true solution, ferr[j - 1] bounds the magnitude of the largest
entry in (X(j) - xtrue) divided by the magnitude of the largest entry in
X(j). The quality of the error bound depends on the quality of the estimate
of norm(inv(A)) computed in the code; if the estimate of norm(inv(A))
is accurate, the error bound is guaranteed.
berr (local)
Arrays of size at least max(LOC,n_b). The componentwise relative
backward error of each solution vector X(j) (the smallest relative change in
any entry of A or B that makes X(j) an exact solution).
work[0] (local) On exit, work[0] returns the minimal and optimal liwork.
info (global)
If info=0, the execution is successful.
1534
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
> 0: if info = i, and i is ≤ n: if info = i, the leading minor of order i of
a is not positive definite, so the factorization could not be completed, and
the solution and error bounds could not be computed.
= n+1: rcond is less than machine precision. The factorization has been
completed, but the matrix is singular to working precision, and the solution
and error bounds have not been computed.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?pbsv
Solves a symmetric/Hermitian positive definite banded
system of linear equations.
Syntax
void pspbsv (char *uplo , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs , float *a , MKL_INT
*ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *descb , float *work , MKL_INT
*lwork , MKL_INT *info );
void pdpbsv (char *uplo , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs , double *a , MKL_INT
*ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *descb , double *work , MKL_INT
*lwork , MKL_INT *info );
void pcpbsv (char *uplo , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs , MKL_Complex8 *a ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *descb ,
MKL_Complex8 *work , MKL_INT *lwork , MKL_INT *info );
void pzpbsv (char *uplo , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs , MKL_Complex16 *a ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *descb ,
MKL_Complex16 *work , MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?pbsvfunction solves a system of linear equations
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
1535
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
a (local).
Pointer into the local memory to an array with leading size lld_a ≥ (bw
+1) (stored in desca). On entry, this array contains the local pieces of the
distributed matrix sub(A) to be factored.
ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
b (local)
Pointer into the local memory to an array of local lead size lld_b ≥ nb. On
entry, this array contains the local pieces of the right hand sides B(ib:ib
+n-1, 1:nrhs).
ib (global) The row index in the global matrix B indicating the first row of the
matrix to be operated on (which may be either all of b or a submatrix of B).
work (local).
Temporary workspace. This space may be overwritten in between calls to
functions. work must be the size given in lwork.
lwork (local or global) Size of user-input workspace work. If lwork is too small,
the minimal acceptable size will be returned in work[0] and an error code
is returned. lwork ≥ (nb+2*bw)*bw +max((bw*nrhs), bw*bw)
Output Parameters
1536
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
< 0: If the i-th argument is an array and the j-entry had an illegal value,
then info = -(i*100+j), if the i-th argument is a scalar and had an
illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?ptsv
Syntax
Solves a symmetric or Hermitian positive definite tridiagonal system of linear equations.
void psptsv (MKL_INT *n , MKL_INT *nrhs , float *d , float *e , MKL_INT *ja , MKL_INT
*desca , float *b , MKL_INT *ib , MKL_INT *descb , float *work , MKL_INT *lwork ,
MKL_INT *info );
void pdptsv (MKL_INT *n , MKL_INT *nrhs , double *d , double *e , MKL_INT *ja , MKL_INT
*desca , double *b , MKL_INT *ib , MKL_INT *descb , double *work , MKL_INT *lwork ,
MKL_INT *info );
void pcptsv (char *uplo , MKL_INT *n , MKL_INT *nrhs , float *d , MKL_Complex8 *e ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *descb ,
MKL_Complex8 *work , MKL_INT *lwork , MKL_INT *info );
void pzptsv (char *uplo , MKL_INT *n , MKL_INT *nrhs , double *d , MKL_Complex16 *e ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *descb ,
MKL_Complex16 *work , MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?ptsvfunction solves a system of linear equations
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
1537
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
nrhs (global) The number of right-hand sides; the number of columns of the
distributed submatrix B(nrhs≥ 0).
d (local)
Pointer to local part of global vector storing the main diagonal of the matrix.
e (local)
Pointer to local part of global vector storing the upper diagonal of the
matrix. Globally, du(n) is not referenced, and du must be aligned with d.
ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).
b (local)
Pointer into the local memory to an array of local lead size lld_b ≥ nb.
On entry, this array contains the local pieces of the right hand sides
B(ib:ib+n-1, 1:nrhs).
ib (global) The row index in the global matrix B indicating the first row of the
matrix to be operated on (which may be either all of b or a submatrix of B).
work (local).
Temporary workspace. This space may be overwritten in between calls to
functions. work must be the size given in lwork.
lwork (local or global) Size of user-input workspace work. If lwork is too small,
the minimal acceptable size will be returned in work[0] and an error code
is returned. lwork > (12*NPCOL+3*nb)+max((10+2*min(100,
nrhs))*NPCOL+4*nrhs, 8*NPCOL).
Output Parameters
d On exit, this array contains information containing the factors of the matrix.
Must be of size greater than or equal to desca[nb_ - 1].
1538
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
e On exit, this array contains information containing the factors of the matrix.
Must be of size greater than or equal to desca[nb_ - 1].
b On exit, this contains the local piece of the solutions distributed matrix X.
< 0: If the i-th argument is an array and the j-entry had an illegal value,
then info = -(i*100+j), if the i-th argument is a scalar and had an
illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?gels
Solves overdetermined or underdetermined linear
systems involving a matrix of full rank.
Syntax
void psgels (char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *nrhs , float *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , float *work , MKL_INT *lwork , MKL_INT *info );
void pdgels (char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *nrhs , double *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , double *work , MKL_INT *lwork , MKL_INT *info );
void pcgels (char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT
*jb , MKL_INT *descb , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT *info );
void pzgels (char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT
*jb , MKL_INT *descb , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?gels function solves overdetermined or underdetermined real/ complex linear systems involving an
m-by-n matrix sub(A) = A(ia:ia+m-1,ja:ja+n-1), or its transpose/ conjugate-transpose, using a QTQ or
LQ factorization of sub(A). It is assumed that sub(A) has full rank.
The following options are provided:
1. If trans = 'N' and m≥n: find the least squares solution of an overdetermined system, that is, solve
the least squares problem
1539
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
If trans = 'T', the linear system involves the transposed matrix AT (for
real flavors only).
m (global) The number of rows in the distributed matrix sub (A) (m≥ 0).
n (global) The number of columns in the distributed matrix sub (A) (n≥ 0).
nrhs (global) The number of right-hand sides; the number of columns in the
distributed submatrices sub(B) and X. (nrhs≥ 0).
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1). On
entry, contains the m-by-n matrix A.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
b (local)
Pointer into the local memory to an array of local size lld_b*LOCc(jb
+nrhs-1). On entry, this array contains the local pieces of the distributed
matrix B of right-hand side vectors, stored columnwise; sub(B) is m-by-
nrhs if trans='N', and n-by-nrhs otherwise.
ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of the submatrix B, respectively.
descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
work (local)
Workspace array with size lwork.
1540
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The size of the array worklwork is local input and must be at least lwork ≥
ltau + max(lwf, lws), where if m > n, then
ltau = numroc(ja+min(m,n)-1, nb_a, MYCOL, csrc_a, NPCOL),
lwf = nb_a*(mpa0 + nqa0 + nb_a)
lws = max((nb_a*(nb_a-1))/2, (nrhsqb0 + mpb0)*nb_a) +
nb_a*nb_a
else
ltau = numroc(ia+min(m,n)-1, mb_a, MYROW, rsrc_a, NPROW),
lwf = mb_a * (mpa0 + nqa0 + mb_a)
lws = max((mb_a*(mb_a-1))/2, (npb0 + max(nqa0 +
numroc(numroc(n+iroffb, mb_a, 0, 0, NPROW), mb_a, 0, 0,
lcmp), nrhsqb0))*mb_a) + mb_a*mb_a
end if,
where lcmp = lcm/NPROW with lcm = ilcm(NPROW, NPCOL),
NOTE
mod(x,y) is the integer remainder of x/y.
ilcm, indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL,
NPROW, and NPCOL can be determined by calling the function
blacs_gridinfo.
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.
1541
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-entry had an illegal value,
then info = - (i* 100+j), if the i-th argument is a scalar and had an
illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?syev
Computes all eigenvalues and, optionally,
eigenvectors of a symmetric matrix.
Syntax
void pssyev (char *jobz , char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *w , float *z , MKL_INT *iz , MKL_INT *jz , MKL_INT
*descz , float *work , MKL_INT *lwork , MKL_INT *info );
void pdsyev (char *jobz , char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *w , double *z , MKL_INT *iz , MKL_INT *jz , MKL_INT
*descz , double *work , MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?syevfunction computes all eigenvalues and, optionally, eigenvectors of a real symmetric matrix A by
calling the recommended sequence of ScaLAPACK functions.
1542
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
In its present form, the function assumes a homogeneous system and makes no checks for consistency of
the eigenvalues or eigenvectors across the different processes. Because of this, it is possible that a
heterogeneous system may return incorrect results without any error messages.
Input Parameters
np = the number of rows local to a given process.
nq = the number of columns local to a given process.
uplo (global) Must be 'U' or 'L'. Specifies whether the upper or lower
triangular part of the symmetric matrix A is stored:
If uplo = 'U', a stores the upper triangular part of A.
n (global) The number of rows and columns of the matrix A(n≥ 0).
a (local)
Block cyclic array of global size n*n and local size lld_a*LOCc(ja+n-1).
On entry, the symmetric matrix A.
If uplo = 'U', only the upper triangular part of A is used to define the
elements of the symmetric matrix.
If uplo = 'L', only the lower triangular part of A is used to define the
elements of the symmetric matrix.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
iz, jz (global) The row and column indices in the global matrix Z indicating the
first row and the first column of the submatrix Z, respectively.
descz (global and local) array of size dlen_. The array descriptor for the
distributed matrix Z.
work (local)
Array of size lwork.
lwork (local) See below for definitions of variables used to define lwork.
If no eigenvectors are requested (jobz = 'N'), then lwork ≥ 5*n +
sizesytrd + 1,
where sizesytrdis the workspace for p?sytrd and is max(NB*(np +1),
3*NB).
If eigenvectors are requested (jobz = 'V') then the amount of workspace
required to guarantee that all eigenvectors are computed is:
qrmem = 2*n-2
1543
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
a On exit, the lower triangle (if uplo='L') or the upper triangle (if
uplo='U') of A, including the diagonal, is destroyed.
w (global).
Array of size n.
On normal exit, the first entries contain the selected eigenvalues in
ascending order.
z (local).
Array, global size n*n, local size lld_z*LOCc(jz+n-1). If jobz = 'V',
then on normal exit the first columns of z contain the orthonormal
eigenvectors of the matrix corresponding to the selected eigenvalues.
If jobz = 'N', then z is not referenced.
1544
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
info (global)
If info = 0, the execution is successful.
If info < 0: If the i-th argument is an array and the j-entry had an illegal
value, then info = -(i*100+j), if the i-th argument is a scalar and had
an illegal value, then info = -i.
If info > 0:
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?syevd
Computes all eigenvalues and eigenvectors of a real
symmetric matrix by using a divide and conquer
algorithm.
Syntax
void pssyevd (char *jobz , char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *w , float *z , MKL_INT *iz , MKL_INT *jz , MKL_INT
*descz , float *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork , MKL_INT
*info );
void pdsyevd (char *jobz , char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *w , double *z , MKL_INT *iz , MKL_INT *jz , MKL_INT
*descz , double *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork , MKL_INT
*info );
Include Files
• mkl_scalapack.h
Description
The p?syevd function computes all eigenvalues and eigenvectors of a real symmetric matrix A by using a
divide and conquer algorithm.
Input Parameters
np = the number of rows local to a given process.
nq = the number of columns local to a given process.
1545
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Specifies whether the upper or lower triangular part of the Hermitian matrix
A is stored:
If uplo = 'U', a stores the upper triangular part of A.
n (global) The number of rows and columns of the matrix A(n≥ 0).
a (local).
Block cyclic array of global size n*n and local size lld_a*LOCc(ja+n-1).
On entry, the symmetric matrix A.
If uplo = 'U', only the upper triangular part of A is used to define the
elements of the symmetric matrix.
If uplo = 'L', only the lower triangular part of A is used to define the
elements of the symmetric matrix.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A. If desca[ctxt_ - 1] is incorrect, p?syevd cannot
guarantee correct error reporting.
iz, jz (global) The row and column indices in the global matrix Z indicating the
first row and the first column of the submatrix Z, respectively.
descz (global and local) array of size dlen_. The array descriptor for the
distributed matrix Z. descz[ctxt_ - 1] must equal desca[ctxt_ - 1].
work (local).
Array of size lwork.
1546
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
a On exit, the lower triangle (if uplo = 'L'), or the upper triangle (if uplo =
'U') of A, including the diagonal, is overwritten.
w (global).
Array of size n. If info = 0, w contains the eigenvalues in the ascending
order.
z (local).
Array, global size (n, n), local size lld_z*LOCc(jz+n-1).
iwork[0] (local).
On exit, if liwork > 0, iwork[0] returns the optimal liwork.
info (global)
If info = 0, the execution is successful.
If info < 0:
If the i-th argument is an array and the j-entry had an illegal value, then
info = -(i*100+j). If the i-th argument is a scalar and had an illegal
value, then info = -i.
If info> 0:
NOTE
mod(x,y) is the integer remainder of x/y.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?syevr
Computes selected eigenvalues and, optionally,
eigenvectors of a real symmetric matrix using
Relatively Robust Representation.
Syntax
void pssyevr(char* jobz, char* range, char* uplo, MKL_INT* n, float* a, MKL_INT* ia,
MKL_INT* ja, MKL_INT* desca, float* vl, float* vu, MKL_INT* il, MKL_INT* iu, MKL_INT* m,
MKL_INT* nz, float* w, float* z, MKL_INT* iz, MKL_INT* jz, MKL_INT* descz, float* work,
MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork, MKL_INT* info);
void pdsyevr(char* jobz, char* range, char* uplo, MKL_INT* n, double* a, MKL_INT* ia,
MKL_INT* ja, MKL_INT* desca, double* vl, double* vu, MKL_INT* il, MKL_INT* iu, MKL_INT*
m, MKL_INT* nz, double* w, double* z, MKL_INT* iz, MKL_INT* jz, MKL_INT* descz, double*
work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork, MKL_INT* info);
1547
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl_scalapack.h
Description
p?syevr computes selected eigenvalues and, optionally, eigenvectors of a real symmetric matrix A
distributed in 2D blockcyclic format by calling the recommended sequence of ScaLAPACK functions.
First, the matrix A is reduced to real symmetric tridiagonal form. Then, the eigenproblem is solved using the
parallel MRRR algorithm. Last, if eigenvectors have been computed, a backtransformation is done.
Upon successful completion, each processor stores a copy of all computed eigenvalues in w. The eigenvector
matrix z is stored in 2D block-cyclic format distributed over all processors.
Note that subsets of eigenvalues/vectors can be selected by specifying a range of values or a range of indices
for the desired eigenvalues.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
jobz (global)
Specifies whether or not to compute the eigenvectors:
= 'N': Compute eigenvalues only.
= 'V': Compute eigenvalues and eigenvectors.
range (global)
= 'A': all eigenvalues will be found.
= 'V': all eigenvalues in the interval [vl,vu] will be found.
uplo (global)
Specifies whether the upper or lower triangular part of the symmetric
matrix A is stored:
= 'U': Upper triangular
= 'L': Lower triangular
n (global )
The number of rows and columns of the matrix a. n≥ 0
a Block cyclic array of global size n * n), local size lld_a * LOCc(ja+n-1).
This array contains the local pieces of the symmetric distributed matrix A. If
uplo = 'U', only the upper triangular part of a is used to define the
elements of the symmetric matrix. If uplo = 'L', only the lower triangular
part of a is used to define the elements of the symmetric matrix.
On exit, the lower triangle (if uplo='L') or the upper triangle (if uplo='U')
of a, including the diagonal, is destroyed.
1548
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ia (global )
Global row index in the global matrix A that points to the beginning of the
submatrix which is to be operated on. It should be set to 1 when operating
on a full matrix.
ja (global )
Global column index in the global matrix A that points to the beginning of
the submatrix which is to be operated on. It should be set to 1 when
operating on a full matrix.
vl (global )
If range='V', the lower bound of the interval to be searched for
eigenvalues. Not referenced if range = 'A' or 'I'.
vu (global )
If range='V', the upper bound of the interval to be searched for
eigenvalues. Not referenced if range = 'A' or 'I'.
il (global )
If range='I', the index (from smallest to largest) of the smallest eigenvalue
to be returned. il≥ 1.
iu (global )
If range='I', the index (from smallest to largest) of the largest eigenvalue
to be returned. min(il,n) ≤iu≤n.
iz (global )
Global row index in the global matrix Z that points to the beginning of the
submatrix which is to be operated on. It should be set to 1 when operating
on a full matrix.
jz (global )
Global column index in the global matrix Z that points to the beginning of
the submatrix which is to be operated on. It should be set to 1 when
operating on a full matrix.
The context descz[ctxt_ - 1] must equal desca[ctxt_ - 1]. Also note the
array alignment requirements specified below.
lwork (local )
Size of work, must be at least 3.
1549
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
nn = max( n, neig, 2 )
liwork (local )
size of iwork
OUTPUT Parameters
m (global )
Total number of eigenvalues found. 0 ≤m≤n.
nz (global )
Total number of eigenvectors computed. 0 ≤nz≤m.
1550
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If jobz = 'V', nz = m
Upon successful exit, the first m entries contain the selected eigenvalues in
ascending order.
work On return, work[0] contains the optimal amount of workspace required for
efficient execution. If jobz='N' work[0] = optimal amount of workspace
required to compute the eigenvalues. If jobz='V' work[0] = optimal
amount of workspace required to compute eigenvalues and eigenvectors.
info (global )
= 0: successful exit
< 0: If the i-th argument is an array and the jth-entry had an illegal value,
then info = -(i*100+j), if the i-th argument is a scalar and had an illegal
value, then info = -i.
Application Notes
The distributed submatrices a(ia:*, ja:*) and z(iz:iz+m-1,jz:jz+n-1) must satisfy the following
alignment properties:
NOTE
mod(x,y) is the integer remainder of x/y.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?syevx
Computes selected eigenvalues and, optionally,
eigenvectors of a symmetric matrix.
Syntax
void pssyevx (char *jobz , char *range , char *uplo , MKL_INT *n , float *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , float *vl , float *vu , MKL_INT *il , MKL_INT *iu ,
float *abstol , MKL_INT *m , MKL_INT *nz , float *w , float *orfac , float *z , MKL_INT
*iz , MKL_INT *jz , MKL_INT *descz , float *work , MKL_INT *lwork , MKL_INT *iwork ,
MKL_INT *liwork , MKL_INT *ifail , MKL_INT *iclustr , float *gap , MKL_INT *info );
1551
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
void pdsyevx (char *jobz , char *range , char *uplo , MKL_INT *n , double *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , double *vl , double *vu , MKL_INT *il , MKL_INT
*iu , double *abstol , MKL_INT *m , MKL_INT *nz , double *w , double *orfac , double
*z , MKL_INT *iz , MKL_INT *jz , MKL_INT *descz , double *work , MKL_INT *lwork ,
MKL_INT *iwork , MKL_INT *liwork , MKL_INT *ifail , MKL_INT *iclustr , double *gap ,
MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?syevxfunction computes selected eigenvalues and, optionally, eigenvectors of a real symmetric matrix
A by calling the recommended sequence of ScaLAPACK functions. Eigenvalues and eigenvectors can be
selected by specifying either a range of values or a range of indices for the desired eigenvalues.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
np = the number of rows local to a given process.
nq = the number of columns local to a given process.
If range = 'V', all eigenvalues in the half-open interval [vl, vu] will be
found.
If range = 'I', the eigenvalues with indices il through iu will be found.
n (global) The number of rows and columns of the matrix A(n≥ 0).
a (local).
Block cyclic array of global size n*n and local size lld_a*LOCc(ja+n-1).
On entry, the symmetric matrix A.
If uplo = 'U', only the upper triangular part of A is used to define the
elements of the symmetric matrix.
1552
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If uplo = 'L', only the lower triangular part of A is used to define the
elements of the symmetric matrix.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
vl, vu (global)
If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues; vl ≤ vu. Not referenced if range = 'A' or 'I'.
il, iu (global)
If range ='I', the indices of the smallest and largest eigenvalues to be
returned.
Constraints: il ≥ 1
min(il,n) ≤ iu ≤ n
Not referenced if range = 'A' or 'V'.
abstol (global).
If jobz='V', setting abstol to p?lamch(context, 'U') yields the most
orthogonal eigenvectors.
The absolute error tolerance for the eigenvalues. An approximate
eigenvalue is accepted as converged when it is determined to lie in an
interval [a, b] of width less than or equal to
orfac (global).
Specifies which eigenvectors should be reorthogonalized. Eigenvectors that
correspond to eigenvalues which are within tol=orfac*norm(A)of each
other are to be reorthogonalized. However, if the workspace is insufficient
(see lwork), tol may be decreased until all eigenvectors to be
reorthogonalized can be stored in one process. No reorthogonalization will
be done if orfac equals zero. A default value of 1.0e-3 is used if orfac is
negative. orfac should be identical on all processes.
iz, jz (global) The row and column indices in the global matrix Z indicating the
first row and the first column of the submatrix Z, respectively.
descz (global and local) array of size dlen_. The array descriptor for the
distributed matrix Z.descz[ctxt_ - 1] must equal desca[ctxt_ - 1].
1553
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
work (local)
Array of size lwork.
1554
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lwork ≥ max(lwork, 5*n + nsytrd_lwopt),
where lwork, as defined previously, depends upon the number of
eigenvectors requested, and
nsytrd_lwopt = n + 2*(anb+1)*(4*nps+2) + (nps + 3)*nps;
anb = pjlaenv(desca[ctxt_ - 1], 3, 'p?syttrd', 'L', 0, 0, 0,
0);
sqnpc = int(sqrt(dble(NPROW * NPCOL)));
nps = max(numroc(n, 1, 0, 0, sqnpc), 2*anb);
numroc is a ScaLAPACK tool functions;
pjlaenv is a ScaLAPACK environmental inquiry function
MYROW, MYCOL, NPROW and NPCOL can be determined by calling the function
blacs_gridinfo.
For large n, no extra workspace is needed, however the biggest boost in
performance comes for small n, so it is wise to provide the extra workspace
(typically less than a megabyte per process).
If clustersize > n/sqrt(NPROW*NPCOL), then providing enough space
to compute all the eigenvectors orthogonally will cause serious degradation
in performance. At the limit (that is, clustersize = n-1) p?stein will
perform no better than ?stein on single processor.
Output Parameters
a On exit, the lower triangle (if uplo = 'L') or the upper triangle (if uplo =
'U')of A, including the diagonal, is overwritten.
1555
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
w (global).
Array of size n. The first m elements contain the selected eigenvalues in
ascending order.
z (local).
Array, global size n*n, local size lld_z*LOCc(jz+n-1).
If jobz = 'V', then on normal exit the first m columns of z contain the
orthonormal eigenvectors of the matrix corresponding to the selected
eigenvalues. If an eigenvector fails to converge, then that column of z
contains the latest approximation to the eigenvector, and the index of the
eigenvector is returned in ifail.
If jobz = 'N', then z is not referenced.
ifail (global).
Array of size n.
If jobz = 'V', then on normal exit, the first m elements of ifail are zero. If
(mod(info,2) ≠ 0) on exit, then ifail contains the indices of the
eigenvectors that failed to converge.
If jobz = 'N', then ifail is not referenced.
gap (global)
Array of size NPROW*NPCOL
1556
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
This array contains the gap between eigenvalues whose eigenvectors could
not be reorthogonalized. The output values in this array correspond to the
clusters indicated by the array iclustr. As a result, the dot product between
eigenvectors corresponding to the ith cluster may be as high as (C*n)/
gap[i - 1] where C is a small constant.
info (global)
If info = 0, the execution is successful.
If info < 0:
If the i-th argument is an array and the j-entry had an illegal value, then
info = -(i*100+j), if the i-th argument is a scalar and had an illegal
value, then info = -i.
NOTE
mod(x,y) is the integer remainder of x/y.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?heev
Computes all eigenvalues and, optionally,
eigenvectors of a complex Hermitian matrix.
Syntax
void pcheev (char *jobz , char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *w , MKL_Complex8 *z , MKL_INT *iz , MKL_INT *jz ,
MKL_INT *descz , MKL_Complex8 *work , MKL_INT *lwork , float *rwork , MKL_INT *lrwork ,
MKL_INT *info );
void pzheev (char *jobz , char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *w , MKL_Complex16 *z , MKL_INT *iz , MKL_INT
*jz , MKL_INT *descz , MKL_Complex16 *work , MKL_INT *lwork , double *rwork , MKL_INT
*lrwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
1557
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Description
The p?heev function computes all eigenvalues and, optionally, eigenvectors of a complex Hermitian matrix A
by calling the recommended sequence of ScaLAPACK functions. The function assumes a homogeneous
system and makes spot checks of the consistency of the eigenvalues across the different processes. A
heterogeneous system may return incorrect results without any error messages.
Input Parameters
np = the number of rows local to a given process.
nq = the number of columns local to a given process.
Specifies whether the upper or lower triangular part of the Hermitian matrix
A is stored:
If uplo = 'U', a stores the upper triangular part of A.
n (global) The number of rows and columns of the matrix A(n≥ 0).
a (local).
Block cyclic array of global size n*n and local size lld_a*LOCc(ja+n-1).
On entry, the Hermitian matrix A.
If uplo = 'U', only the upper triangular part of A is used to define the
elements of the Hermitian matrix.
If uplo = 'L', only the lower triangular part of A is used to define the
elements of the Hermitian matrix.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A. If desca[ctxt_ - 1] is incorrect, p?heev cannot
guarantee correct error reporting.
iz, jz (global) The row and column indices in the global matrix Z indicating the
first row and the first column of the submatrix Z, respectively.
descz (global and local) array of size dlen_. The array descriptor for the
distributed matrix Z. descz[ctxt_ - 1] must equal desca[ctxt_ - 1].
work (local).
Array of size lwork.
1558
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If eigenvectors are requested (jobz = 'V'), then the amount of workspace
required:
lwork≥ (np0+nq0+nb)*nb + 3*n + n2
with nb = desca[mb_ - 1] = desca[ nb_ - 1] = nb = descz[mb_ -
1] = descz[ nb_ - 1]
np0 = numroc(nn, nb, 0, 0, NPROW).
nq0 = numroc( max( n, nb, 2 ), nb, 0, 0, NPCOL).
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the size required for optimal
performance for all work arrays. The required workspace is returned as the
first element of the corresponding work arrays, and no error message is
issued by pxerbla.
rwork (local).
Workspace array of size lrwork.
Output Parameters
a On exit, the lower triangle (if uplo = 'L'), or the upper triangle (if uplo =
'U') of A, including the diagonal, is overwritten.
w (global).
Array of size n. The first m elements contain the selected eigenvalues in
ascending order.
z (local).
Array, global size n*n, local size lld_z*LOCc(jz+n-1).
If jobz ='V', then on normal exit the first columns of z contain the
orthonormal eigenvectors of the matrix corresponding to the selected
eigenvalues. If an eigenvector fails to converge, then that column of z
contains the latest approximation to the eigenvector, and the index of the
eigenvector is returned in ifail.
If jobz = 'N', then z is not referenced.
1559
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
rwork[0] (local)
On output, rwork[0] returns workspace required to guarantee completion.
info (global)
If info = 0, the execution is successful.
If info < 0:
If the i-th argument is an array and the j-entry had an illegal value, then
info = -(i*100+j). If the i-th argument is a scalar and had an illegal
value, then info = -i.
If info> 0:
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?heevd
Computes all eigenvalues and eigenvectors of a
complex Hermitian matrix by using a divide and
conquer algorithm.
Syntax
void pcheevd (char *jobz , char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *w , MKL_Complex8 *z , MKL_INT *iz , MKL_INT *jz ,
MKL_INT *descz , MKL_Complex8 *work , MKL_INT *lwork , float *rwork , MKL_INT *lrwork ,
MKL_INT *iwork , MKL_INT *liwork , MKL_INT *info );
void pzheevd (char *jobz , char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *w , MKL_Complex16 *z , MKL_INT *iz , MKL_INT
*jz , MKL_INT *descz , MKL_Complex16 *work , MKL_INT *lwork , double *rwork , MKL_INT
*lrwork , MKL_INT *iwork , MKL_INT *liwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?heevd function computes all eigenvalues and eigenvectors of a complex Hermitian matrix A by using a
divide and conquer algorithm.
Input Parameters
np = the number of rows local to a given process.
nq = the number of columns local to a given process.
1560
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If jobz = 'N', then only eigenvalues are computed (not yet
implemented).
If jobz = 'V', then eigenvalues and eigenvectors are computed.
Specifies whether the upper or lower triangular part of the Hermitian matrix
A is stored:
If uplo = 'U', a stores the upper triangular part of A.
n (global) The number of rows and columns of the matrix A(n≥ 0).
a (local).
Block cyclic array of global size n*n and local size lld_a*LOCc(ja+n-1).
On entry, the Hermitian matrix A.
If uplo = 'U', only the upper triangular part of A is used to define the
elements of the Hermitian matrix.
If uplo = 'L', only the lower triangular part of A is used to define the
elements of the Hermitian matrix.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A. If desca[ctxt_ - 1] is incorrect, p?heevd cannot
guarantee correct error reporting.
iz, jz (global) The row and column indices in the global matrix Z indicating the
first row and the first column of the submatrix Z, respectively.
descz (global and local) array of size dlen_. The array descriptor for the
distributed matrix Z. descz[ctxt_ - 1] must equal desca[ctxt_ - 1].
work (local).
Array of size lwork.
rwork (local).
Workspace array of size lrwork.
1561
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
a On exit, the lower triangle (if uplo = 'L'), or the upper triangle (if uplo =
'U') of A, including the diagonal, is overwritten.
w (global).
Array of size n. If info = 0, w contains the eigenvalues in the ascending
order.
z (local).
Array, global size n*n, local size lld_z*LOCc(jz+n-1).
rwork[0] (local)
On output, rwork[0] returns workspace required to guarantee completion.
iwork[0] (local).
On return, iwork[0] contains the amount of integer workspace required.
info (global)
If info = 0, the execution is successful.
If info < 0:
If the i-th argument is an array and the j-entry had an illegal value, then
info = -(i*100+j). If the i-th argument is a scalar and had an illegal
value, then info = -i.
If info> 0:
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?heevr
Computes selected eigenvalues and, optionally,
eigenvectors of a Hermitian matrix using Relatively
Robust Representation.
1562
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void pcheevr(char* jobz, char* range, char* uplo, MKL_INT* n, MKL_Complex8* a, MKL_INT*
ia, MKL_INT* ja, MKL_INT* desca, float* vl, float* vu, MKL_INT* il, MKL_INT* iu,
MKL_INT* m, MKL_INT* nz, float* w, MKL_Complex8* z, MKL_INT* iz, MKL_INT* jz, MKL_INT*
descz, MKL_Complex8* work, MKL_INT* lwork, float* rwork, MKL_INT* lrwork, MKL_INT*
iwork, MKL_INT* liwork, MKL_INT* info);
void pzheevr(char* jobz, char* range, char* uplo, MKL_INT* n, MKL_Complex16* a, MKL_INT*
ia, MKL_INT* ja, MKL_INT* desca, double* vl, double* vu, MKL_INT* il, MKL_INT* iu,
MKL_INT* m, MKL_INT* nz, double* w, MKL_Complex16* z, MKL_INT* iz, MKL_INT* jz, MKL_INT*
descz, MKL_Complex16* work, MKL_INT* lwork, double* rwork, MKL_INT* lrwork, MKL_INT*
iwork, MKL_INT* liwork, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
p?heevr computes selected eigenvalues and, optionally, eigenvectors of a complex Hermitian matrix A
distributed in 2D blockcyclic format by calling the recommended sequence of ScaLAPACK functions.
First, the matrix A is reduced to complex Hermitian tridiagonal form. Then, the eigenproblem is solved using
the parallel MRRR algorithm. Last, if eigenvectors have been computed, a backtransformation is done.
Upon successful completion, each processor stores a copy of all computed eigenvalues in w. The eigenvector
matrix Z is stored in 2D block-cyclic format distributed over all processors.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
jobz (global)
Specifies whether or not to compute the eigenvectors:
= 'N': Compute eigenvalues only.
= 'V': Compute eigenvalues and eigenvectors.
range (global)
= 'A': all eigenvalues will be found.
= 'V': all eigenvalues in the interval [vl,vu] will be found.
uplo (global)
Specifies whether the upper or lower triangular part of the Hermitian matrix
A is stored:
= 'U': Upper triangular
= 'L': Lower triangular
n (global )
1563
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ia (global )
Global row index in the global matrix A that points to the beginning of the
submatrix which is to be operated on. It should be set to 1 when operating
on a full matrix.
ja (global )
Global column index in the global matrix A that points to the beginning of
the submatrix which is to be operated on. It should be set to 1 when
operating on a full matrix.
desca (global and local) array of size dlen_. (The ScaLAPACK descriptor length is
dlen_ = 9.)
The array descriptor for the distributed matrix a. The descriptor stores
details about the 2D block-cyclic storage, see the notes below. If desca is
incorrect, p?heevr cannot work correctly.
vl (global)
If range='V', the lower bound of the interval to be searched for
eigenvalues. Not referenced if range = 'A' or 'I'.
vu (global)
If range='V', the upper bound of the interval to be searched for
eigenvalues. Not referenced if range = 'A' or 'I'.
il (global )
If range='I', the index (from smallest to largest) of the smallest eigenvalue
to be returned. il≥ 1.
iu (global )
If range='I', the index (from smallest to largest) of the largest eigenvalue
to be returned. min(il,n) ≤iu≤n.
iz (global )
Global row index in the global matrix Z that points to the beginning of the
submatrix which is to be operated on. It should be set to 1 when operating
on a full matrix.
jz (global )
1564
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Global column index in the global matrix Z that points to the beginning of
the submatrix which is to be operated on. It should be set to 1 when
operating on a full matrix.
lwork (local )
Size of work array, must be at least 3.
ictxt = desca[ctxt_ - 1]
lrwork (local )
Size of rwork, must be at least 3.
1565
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
iceil(x,y) is the ceiling of x/y.
Variable definitions:
neig = number of eigenvectors requested
nb = desca[ mb_ - 1] = desca[ nb_ - 1] = descz[ mb_ - 1] = descz[nb_
- 1]
nn = max( n, nb, 2 )
liwork (local )
size of iwork
OUTPUT Parameters
a The lower triangle (if uplo='L') or the upper triangle (if uplo='U') of a,
including the diagonal, is destroyed.
m (global )
Total number of eigenvalues found. 0 ≤m≤n.
nz (global )
Total number of eigenvectors computed. 0 ≤nz≤m.
1566
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If jobz = 'V', nz = m
If jobz = 'V', then on normal exit the first m columns of z contain the
orthonormal eigenvectors of the matrix corresponding to the selected
eigenvalues.
If jobz = 'N', then z is not referenced.
info (global )
= 0: successful exit
< 0: If the i-th argument is an array and the j-th entry had an illegal value,
then info = -(i*100+j), if the i-th argument is a scalar and had an illegal
value, then info = -i.
Application Notes
The distributed submatrices a(ia:*, ja:*) and z(iz:iz+m-1,jz:jz+n-1) must satisfy the following
alignment properties:
NOTE
mod(x,y) is the integer remainder of x/y.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?heevx
Computes selected eigenvalues and, optionally,
eigenvectors of a Hermitian matrix.
1567
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
void pcheevx (char *jobz , char *range , char *uplo , MKL_INT *n , MKL_Complex8 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *vl , float *vu , MKL_INT *il ,
MKL_INT *iu , float *abstol , MKL_INT *m , MKL_INT *nz , float *w , float *orfac ,
MKL_Complex8 *z , MKL_INT *iz , MKL_INT *jz , MKL_INT *descz , MKL_Complex8 *work ,
MKL_INT *lwork , float *rwork , MKL_INT *lrwork , MKL_INT *iwork , MKL_INT *liwork ,
MKL_INT *ifail , MKL_INT *iclustr , float *gap , MKL_INT *info );
void pzheevx (char *jobz , char *range , char *uplo , MKL_INT *n , MKL_Complex16 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *vl , double *vu , MKL_INT *il ,
MKL_INT *iu , double *abstol , MKL_INT *m , MKL_INT *nz , double *w , double *orfac ,
MKL_Complex16 *z , MKL_INT *iz , MKL_INT *jz , MKL_INT *descz , MKL_Complex16 *work ,
MKL_INT *lwork , double *rwork , MKL_INT *lrwork , MKL_INT *iwork , MKL_INT *liwork ,
MKL_INT *ifail , MKL_INT *iclustr , double *gap , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?heevx function computes selected eigenvalues and, optionally, eigenvectors of a complex Hermitian
matrix A by calling the recommended sequence of ScaLAPACK functions. Eigenvalues and eigenvectors can
be selected by specifying either a range of values or a range of indices for the desired eigenvalues.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
np = the number of rows local to a given process.
nq = the number of columns local to a given process.
If range = 'V', all eigenvalues in the half-open interval [vl, vu] will be
found.
If range = 'I', the eigenvalues with indices il through iu will be found.
Specifies whether the upper or lower triangular part of the Hermitian matrix
A is stored:
If uplo = 'U', a stores the upper triangular part of A.
1568
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If uplo = 'L', a stores the lower triangular part of A.
n (global) The number of rows and columns of the matrix A(n≥ 0).
a (local).
Block cyclic array of global size n*n and local size lld_a*LOCc(ja+n-1).
On entry, the Hermitian matrix A.
If uplo = 'U', only the upper triangular part of A is used to define the
elements of the Hermitian matrix.
If uplo = 'L', only the lower triangular part of A is used to define the
elements of the Hermitian matrix.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A. If desca[ctxt_ - 1] is incorrect, p?heevx cannot
guarantee correct error reporting.
vl, vu (global)
If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues; not referenced if range = 'A' or 'I'.
il, iu (global)
If range ='I', the indices of the smallest and largest eigenvalues to be
returned.
Constraints:
il ≥ 1; min(il,n) ≤ iu ≤ n.
Not referenced if range = 'A' or 'V'.
abstol (global).
If jobz='V', setting abstol to p?lamch(context, 'U') yields the most
orthogonal eigenvectors.
The absolute error tolerance for the eigenvalues. An approximate
eigenvalue is accepted as converged when it is determined to lie in an
interval [a, b] of width less than or equal to abstol+eps*max(|a|,|b|),
where eps is the machine precision. If abstol is less than or equal to zero,
then eps*norm(T) will be used in its place, where norm(T) is the 1-norm
of the tridiagonal matrix obtained by reducing A to tridiagonal form.
Eigenvalues are computed most accurately when abstol is set to twice the
underflow threshold 2*p?lamch('S'), not zero. If this function returns
with ((mod(info,2)≠0).or.(mod(info/8,2)≠0)), indicating that some
eigenvalues or eigenvectors did not converge, try setting abstol to
2*p?lamch('S').
NOTE
mod(x,y) is the integer remainder of x/y.
orfac (global).
1569
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
iz, jz (global) The row and column indices in the global matrix Z indicating the
first row and the first column of the submatrix Z, respectively.
descz (global and local) array of size dlen_. The array descriptor for the
distributed matrix Z. descz[ctxt_ - 1] must equal desca[ctxt_ - 1].
work (local).
Array of size lwork.
rwork (local)
Workspace array of size lrwork.
1570
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If eigenvectors are requested (jobz = 'V'), then the amount of workspace
required to guarantee that all eigenvectors are computed is:
lrwork≥ 4*n + max(5*nn, np0*mq0+2*nb*nb) + iceil(neig,
NPROW*NPCOL)*nn
The computed eigenvectors may not be orthogonal if the minimal
workspace is supplied and orfac is too small. If you want to guarantee
orthogonality (at the cost of potentially poor performance) you should add
the following values to lrwork:
(clustersize-1)*n,
where clustersize is the number of eigenvalues in the largest cluster, where
a cluster is defined as a set of close eigenvalues:
{w[k - 1],..., w[k+clustersize-2]|w[j] ≤
w[j-1]+orfac*2*norm(A)}.
Variable definitions:
neig = number of eigenvectors requested;
nb = desca[mb_ - 1] = desca[nb_ - 1] = descz[mb_ - 1] =
descz[nb_ - 1];
nn = max(n, NB, 2);
desca[rsrc_ - 1] = desca[nb_ - 1] = descz[rsrc_ - 1] =
descz[csrc_ - 1] = 0;
np0 = numroc(nn, nb, 0, 0, NPROW);
mq0 = numroc(max(neig, nb, 2), nb, 0, 0, NPCOL);
iceil(x, y) is a ScaLAPACK function returning ceiling(x/y)
When lrwork is too small:
If lwork is too small to guarantee orthogonality, p?heevx attempts to
maintain orthogonality in the clusters with the smallest spacing between the
eigenvalues. If lwork is too small to compute all the eigenvectors requested,
no computation is performed and info= -23 is returned. Note that when
range='V', p?heevx does not know how many eigenvectors are requested
until the eigenvalues are computed. Therefore, when range='V' and as
long as lwork is large enough to allow p?heevx to compute the eigenvalues,
p?heevx will compute the eigenvalues and as many eigenvectors as it can.
Relationship between workspace, orthogonality and performance:
If clustersize ≥ n/sqrt(NPROW*NPCOL), then providing enough space
to compute all the eigenvectors orthogonally will cause serious degradation
in performance. In the limit (that is, clustersize = n-1)p?stein will
perform no better than ?stein on 1 processor.
1571
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
liwork ≥ 6*nnp
Where: nnp = max(n, NPROW*NPCOL+1, 4)
Output Parameters
a On exit, the lower triangle (if uplo = 'L'), or the upper triangle (if uplo =
'U') of A, including the diagonal, is overwritten.
w (global).
Array of size n. The first m elements contain the selected eigenvalues in
ascending order.
z (local).
Array, global size n*n, local size lld_z*LOCc(jz+n-1).
If jobz ='V', then on normal exit the first m columns of z contain the
orthonormal eigenvectors of the matrix corresponding to the selected
eigenvalues. If an eigenvector fails to converge, then that column of z
contains the latest approximation to the eigenvector, and the index of the
eigenvector is returned in ifail.
If jobz = 'N', then z is not referenced.
rwork (local).
1572
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Array of size lrwork. On return, rwork[0] contains the optimal amount of
workspace required for efficient execution.
If jobz='N'rwork[0] = optimal amount of workspace required to compute
eigenvalues efficiently.
If jobz='V'rwork[0] = optimal amount of workspace required to compute
eigenvalues and eigenvectors efficiently with no guarantee on orthogonality.
If range='V', it is assumed that all eigenvectors may be required.
iwork[0] (local)
On return, iwork[0] contains the amount of integer workspace required.
ifail (global)
Array of size n.
If jobz ='V', then on normal exit, the first m elements of ifail are zero. If
(mod(info,2)≠0) on exit, then ifail contains the indices of the eigenvectors
that failed to converge.
If jobz = 'N', then ifail is not referenced.
iclustr (global)
Array of size 2*NPROW*NPCOL.
gap (global)
Array of size (NPROW*NPCOL)
This array contains the gap between eigenvalues whose eigenvectors could
not be reorthogonalized. The output values in this array correspond to the
clusters indicated by the array iclustr. As a result, the dot product between
eigenvectors corresponding to the i-th cluster may be as high as (C*n)/
gap(i) where C is a small constant.
info (global)
If info = 0, the execution is successful.
If info < 0:
If the i-th argument is an array and the j-entry had an illegal value, then
info = -(i*100+j). If the i-th argument is a scalar and had an illegal
value, then info = -i.
If info> 0:
1573
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?gesvd
Computes the singular value decomposition of a
general matrix, optionally computing the left and/or
right singular vectors.
Syntax
void psgesvd (char *jobu , char *jobvt , MKL_INT *m , MKL_INT *n , float *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , float *s , float *u , MKL_INT *iu , MKL_INT *ju ,
MKL_INT *descu , float *vt , MKL_INT *ivt , MKL_INT *jvt , MKL_INT *descvt , float
*work , MKL_INT *lwork , float *rwork , MKL_INT *info );
void pdgesvd (char *jobu , char *jobvt , MKL_INT *m , MKL_INT *n , double *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , double *s , double *u , MKL_INT *iu , MKL_INT *ju ,
MKL_INT *descu , double *vt , MKL_INT *ivt , MKL_INT *jvt , MKL_INT *descvt , double
*work , MKL_INT *lwork , double *rwork , MKL_INT *info );
void pcgesvd (char *jobu , char *jobvt , MKL_INT *m , MKL_INT *n , MKL_Complex8 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *s , MKL_Complex8 *u , MKL_INT *iu ,
MKL_INT *ju , MKL_INT *descu , MKL_Complex8 *vt , MKL_INT *ivt , MKL_INT *jvt , MKL_INT
*descvt , MKL_Complex8 *work , MKL_INT *lwork , float *rwork , MKL_INT *info );
void pzgesvd (char *jobu , char *jobvt , MKL_INT *m , MKL_INT *n , MKL_Complex16 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *s , MKL_Complex16 *u , MKL_INT
*iu , MKL_INT *ju , MKL_INT *descu , MKL_Complex16 *vt , MKL_INT *ivt , MKL_INT *jvt ,
MKL_INT *descvt , MKL_Complex16 *work , MKL_INT *lwork , double *rwork , MKL_INT
*info );
Include Files
• mkl_scalapack.h
Description
The p?gesvd function computes the singular value decomposition (SVD) of an m-by-n matrix A, optionally
computing the left and/or right singular vectors. The SVD is written
A = U*Σ*VT,
where Σ is an m-by-n matrix that is zero except for its min(m, n) diagonal elements, U is an m-by-m
orthogonal matrix, and V is an n-by-n orthogonal matrix. The diagonal elements of Σ are the singular values
of A and the columns of U and V are the corresponding right and left singular vectors, respectively. The
singular values are returned in array s in decreasing order and only the first min(m,n) columns of U and rows
of vt = VT are computed.
1574
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
NOTE
The distributed submatrix sub(A) must verify certain alignment properties. These
expressions must be true:
• mb_a = nb_a = nb
• iroffa = icoffa
where:
• iroffa = mod(ia-1, nb )
• icoffa = mod(ja-1, nb )
Input Parameters
mp = number of local rows in A and U
nq = number of local columns in A and VT
size = min(m, n)
sizeq = number of local columns in U
sizep = number of local rows in VT
jobu (global) Specifies options for computing all or part of the matrix U.
If jobu = 'V', the first size columns of U (the left singular vectors) are
returned in the array u;
If jobu ='N', no columns of U (no left singular vectors)are computed.
jobvt (global)
Specifies options for computing all or part of the matrix VT.
If jobvt = 'V', the first size rows of VT (the right singular vectors) are
returned in the array vt;
If jobvt = 'N', no rows of VT(no right singular vectors) are computed.
a (local).
Block cyclic array, global size (m, n), local size (mp, nq).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
1575
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
iu, ju (global) The row and column indices in the global matrix U indicating the
first row and the first column of the submatrix U, respectively.
descu (global and local) array of size dlen_. The array descriptor for the
distributed matrix U.
ivt, jvt (global) The row and column indices in the global matrix VT indicating the
first row and the first column of the submatrix VT, respectively.
descvt (global and local) array of size dlen_. The array descriptor for the
distributed matrix VT.
work (local).
Workspace array of size lwork
1576
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
wantu(wantvt) = 1, if left/right singular vectors are wanted, and
wantu(wantvt) = 0, otherwise. w?bdsqr, wp?ormbrqln, and wp?ormbrprt
refer respectively to the workspace required for the subprograms ?bdsqr,
p?ormbr(qln), and p?ormbr(prt), where qln and prt are the values of the
arguments vect, side, and trans in the call to p?ormbr. nru is equal to the
local number of rows of the matrix U when distributed 1-dimensional
"column" of processes. Analogously, ncvt is equal to the local number of
columns of the matrix VT when distributed across 1-dimensional "row" of
processes. Calling the LAPACK procedure ?bdsqr requires
rwork Workspace array of size 1 + 4*sizeb. Not used for psgesvd and pdgesvd.
Output Parameters
s (global).
Array of size size.
Contains the singular values of A sorted so that s(i) ≥s(i+1).
u (local).
local size mp*sizeq, global size m*size)
If jobu = 'V', u contains the first min(m, n) columns of U.
vt (local).
local size (sizep, nq), global size (size, n)
If jobvt = 'V', vt contains the first size rows of VTif jobu = 'N', vt is
not referenced.
work On exit, if info = 0, then work[0] returns the required minimal size of
lwork.
rwork On exit, if info = 0, then rwork[0] returns the required size of rwork.
info (global)
If info = 0, the execution is successful.
If info < 0, If info = -i, the ith parameter had an illegal value.
1577
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?sygvx
Computes selected eigenvalues and, optionally,
eigenvectors of a real generalized symmetric definite
eigenproblem.
Syntax
void pssygvx (MKL_INT *ibtype , char *jobz , char *range , char *uplo , MKL_INT *n ,
float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT
*jb , MKL_INT *descb , float *vl , float *vu , MKL_INT *il , MKL_INT *iu , float
*abstol , MKL_INT *m , MKL_INT *nz , float *w , float *orfac , float *z , MKL_INT *iz ,
MKL_INT *jz , MKL_INT *descz , float *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT
*liwork , MKL_INT *ifail , MKL_INT *iclustr , float *gap , MKL_INT *info );
void pdsygvx (MKL_INT *ibtype , char *jobz , char *range , char *uplo , MKL_INT *n ,
double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib ,
MKL_INT *jb , MKL_INT *descb , double *vl , double *vu , MKL_INT *il , MKL_INT *iu ,
double *abstol , MKL_INT *m , MKL_INT *nz , double *w , double *orfac , double *z ,
MKL_INT *iz , MKL_INT *jz , MKL_INT *descz , double *work , MKL_INT *lwork , MKL_INT
*iwork , MKL_INT *liwork , MKL_INT *ifail , MKL_INT *iclustr , double *gap , MKL_INT
*info );
Include Files
• mkl_scalapack.h
Description
The p?sygvxfunction computes all the eigenvalues, and optionally, the eigenvectors of a real generalized
symmetric-definite eigenproblem, of the form
sub(A)*x = λ*sub(B)*x, sub(A) sub(B)*x = λ*x, or sub(B)*sub(A)*x = λ*x.
Here x denotes eigen vectors, λ (lambda) denotes eigenvalues, sub(A) denoting A(ia:ia+n-1, ja:ja
+n-1) is assumed to symmetric, and sub(B) denoting B(ib:ib+n-1, jb:jb+n-1) is also positive definite.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
1578
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If ibtype = 3, the problem type is sub(B)*sub(A)*x = lambda*x.
If uplo = 'U', arrays a and b store the upper triangles of sub(A) and sub
(B);
If uplo = 'L', arrays a and b store the lower triangles of sub(A) and sub
(B).
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1). On
entry, this array contains the local pieces of the n-by-n symmetric
distributed matrix sub(A).
If uplo = 'U', the leading n-by-n upper triangular part of sub(A) contains
the upper triangular part of the matrix.
If uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
the lower triangular part of the matrix.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A. If desca[ctxt_ - 1] is incorrect, p?sygvx cannot
guarantee correct error reporting.
b (local).
Pointer into the local memory to an array of size lld_b*LOCc(jb+n-1). On
entry, this array contains the local pieces of the n-by-n symmetric
distributed matrix sub(B).
If uplo = 'U', the leading n-by-n upper triangular part of sub(B) contains
the upper triangular part of the matrix.
If uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
the lower triangular part of the matrix.
ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of the submatrix B, respectively.
1579
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B. descb[ctxt_ - 1] must be equal to desca[ctxt_ -
1].
vl, vu (global)
If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
If range = 'A' or 'I', vl and vu are not referenced.
il, iu (global)
If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned. Constraint: il ≥ 1, min(il, n)≤ iu ≤ n
abstol (global)
If jobz='V', setting abstol to p?lamch(context, 'U') yields the most
orthogonal eigenvectors.
The absolute error tolerance for the eigenvalues. An approximate
eigenvalue is accepted as converged when it is determined to lie in an
interval [a,b] of width less than or equal to
abstol + eps*max(|a|,|b|),
where eps is the machine precision. If abstol is less than or equal to zero,
then eps*norm(T) will be used in its place, where norm(T) is the 1-norm
of the tridiagonal matrix obtained by reducing A to tridiagonal form.
Eigenvalues will be computed most accurately when abstol is set to twice
the underflow threshold 2*p?lamch('S') not zero. If this function returns
with ((mod(info,2)≠0) or (mod(info/8,2)≠0)), indicating that some
eigenvalues or eigenvectors did not converge, try setting abstol to
2*p?lamch('S').
NOTE
mod(x,y) is the integer remainder of x/y.
orfac (global).
Specifies which eigenvectors should be reorthogonalized. Eigenvectors that
correspond to eigenvalues which are within tol=orfac*norm(A) of each
other are to be reorthogonalized. However, if the workspace is insufficient
(see lwork), tol may be decreased until all eigenvectors to be
reorthogonalized can be stored in one process. No reorthogonalization will
be done if orfac equals zero. A default value of 1.0e-3 is used if orfac is
negative. orfac should be identical on all processes.
iz, jz (global) The row and column indices in the global matrix Z indicating the
first row and the first column of the submatrix Z, respectively.
descz (global and local) array of size dlen_. The array descriptor for the
distributed matrix Z.descz[ctxt_ - 1] must equal desca[ctxt_ - 1].
work (local)
1580
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Workspace array of size lwork
lwork (local)
Size of the array work. See below for definitions of variables used to define
lwork.
If no eigenvectors are requested (jobz = 'N'), then lwork ≥ 5*n +
max(5*nn, NB*(np0 + 1)).
If eigenvectors are requested (jobz = 'V'), then the amount of workspace
required to guarantee that all eigenvectors are computed is:
lwork ≥ 5*n + max(5*nn, np0*mq0 + 2*nb*nb) + iceil(neig,
NPROW*NPCOL)*nn.
The computed eigenvectors may not be orthogonal if the minimal
workspace is supplied and orfac is too small. If you want to guarantee
orthogonality at the cost of potentially poor performance you should add
the following to lwork:
(clustersize-1)*n,
where clustersize is the number of eigenvalues in the largest cluster, where
a cluster is defined as a set of close eigenvalues:
{w[k - 1],..., w[k+clustersize - 2]|w[j] ≤ w[j - 1] +
orfac*2*norm(A)}
Variable definitions:
neig = number of eigenvectors requested,
nb = desca[mb_ - 1] = desca[nb_ - 1] = descz[mb_ - 1] =
descz[nb_ - 1],
nn = max(n, nb, 2),
desca[rsrc_ - 1] = desca[nb_ - 1] = descz[rsrc_ - 1] =
descz[csrc_ - 1] = 0,
np0 = numroc(nn, nb, 0, 0, NPROW),
1581
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
1582
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
a On exit,
If jobz = 'V', and if info = 0, sub(A) contains the distributed matrix Z
of eigenvectors. The eigenvectors are normalized as follows:
for ibtype = 1 or 2, ZT*sub(B)*Z = i;
If jobz = 'N', then on exit the upper triangle (if uplo='U') or the lower
triangle (if uplo='L') of sub(A), including the diagonal, is destroyed.
nz (global)
Total number of eigenvectors computed. 0 ≤ nz ≤ m. The number of
columns of z that are filled.
If jobz ≠ 'V', nz is not referenced.
w (global)
Array of size n. On normal exit, the first m entries contain the selected
eigenvalues in ascending order.
z (local).
If jobz = 'V', then on normal exit the first m columns of z contain the
orthonormal eigenvectors of the matrix corresponding to the selected
eigenvalues. If an eigenvector fails to converge, then that column of z
contains the latest approximation to the eigenvector, and the index of the
eigenvector is returned in ifail.
If jobz = 'N', then z is not referenced.
ifail (global)
Array of size n.
1583
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
iclustr (global)
Array of size (2*NPROW*NPCOL). This array contains indices of eigenvectors
corresponding to a cluster of eigenvalues that could not be reorthogonalized
due to insufficient workspace (see lwork, orfac and info). Eigenvectors
corresponding to clusters of eigenvalues indexed iclustr[2*i - 2] to
iclustr[2*i - 1], could not be reorthogonalized due to lack of
workspace. Hence the eigenvectors corresponding to these clusters may not
be orthogonal. iclustr is a zero terminated array.
gap (global)
Array of size NPROW*NPCOL. This array contains the gap between
eigenvalues whose eigenvectors could not be reorthogonalized. The output
values in this array correspond to the clusters indicated by the array iclustr.
As a result, the dot product between eigenvectors corresponding to the i-th
cluster may be as high as (C*n)/gap[i - 1], where C is a small constant.
info (global)
If info = 0, the execution is successful.
If info <0: the i-th argument is an array and the j-entry had an illegal
value, then info = -(i*100+j), if the i-th argument is a scalar and had
an illegal value, then info = -i.
If info> 0:
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
1584
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
p?hegvx
Computes selected eigenvalues and, optionally,
eigenvectors of a complex generalized Hermitian
positive-definite eigenproblem.
Syntax
void pchegvx (MKL_INT *ibtype , char *jobz , char *range , char *uplo , MKL_INT *n ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b ,
MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , float *vl , float *vu , MKL_INT *il ,
MKL_INT *iu , float *abstol , MKL_INT *m , MKL_INT *nz , float *w , float *orfac ,
MKL_Complex8 *z , MKL_INT *iz , MKL_INT *jz , MKL_INT *descz , MKL_Complex8 *work ,
MKL_INT *lwork , float *rwork , MKL_INT *lrwork , MKL_INT *iwork , MKL_INT *liwork ,
MKL_INT *ifail , MKL_INT *iclustr , float *gap , MKL_INT *info );
void pzhegvx (MKL_INT *ibtype , char *jobz , char *range , char *uplo , MKL_INT *n ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b ,
MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , double *vl , double *vu , MKL_INT *il ,
MKL_INT *iu , double *abstol , MKL_INT *m , MKL_INT *nz , double *w , double *orfac ,
MKL_Complex16 *z , MKL_INT *iz , MKL_INT *jz , MKL_INT *descz , MKL_Complex16 *work ,
MKL_INT *lwork , double *rwork , MKL_INT *lrwork , MKL_INT *iwork , MKL_INT *liwork ,
MKL_INT *ifail , MKL_INT *iclustr , double *gap , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?hegvx function computes all the eigenvalues, and optionally, the eigenvectors of a complex
generalized Hermitian positive-definite eigenproblem, of the form
sub(A)*x = λ*sub(B)*x, sub(A)*sub(B)*x = λ*x, or sub(B)*sub(A)*x = λ*x.
Here sub (A) denoting A(ia:ia+n-1, ja:ja+n-1) and sub(B) are assumed to be Hermitian and sub(B)
denoting B(ib:ib+n-1, jb:jb+n-1) is also positive definite.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
sub(A)*x = lambda*sub(B)*x;
If ibtype = 2, the problem type is
sub(A)*sub(B)*x = lambda*x;
If ibtype = 3, the problem type is
sub(B)*sub(A)*x = lambda*x.
1585
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If uplo = 'U', arrays a and b store the upper triangles of sub(A) and sub
(B);
If uplo = 'L', arrays a and b store the lower triangles of sub(A) and sub
(B).
n (global)
The order of the matrices sub(A) and sub (B) (n≥ 0).
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1). On
entry, this array contains the local pieces of the n-by-n Hermitian
distributed matrix sub(A). If uplo = 'U', the leading n-by-n upper
triangular part of sub(A) contains the upper triangular part of the matrix. If
uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
the lower triangular part of the matrix.
ia, ja (global)
The row and column indices in the global matrix A indicating the first row
and the first column of the submatrix A, respectively.
b (local).
Pointer into the local memory to an array of size lld_b*LOCc(jb+n-1). On
entry, this array contains the local pieces of the n-by-n Hermitian
distributed matrix sub(B).
If uplo = 'U', the leading n-by-n upper triangular part of sub(B) contains
the upper triangular part of the matrix.
If uplo = 'L', the leading n-by-n lower triangular part of sub(B) contains
the lower triangular part of the matrix.
ib, jb (global)
The row and column indices in the global matrix B indicating the first row
and the first column of the submatrix B, respectively.
1586
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
descb (global and local) array of size dlen_.
The array descriptor for the distributed matrix B.descb[ctxt_ - 1] must
be equal to desca[ctxt_ - 1].
vl, vu (global)
If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues.
If range = 'A' or 'I', vl and vu are not referenced.
il, iu (global)
If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned. Constraint: il≥ 1, min(il, n) ≤ iu ≤ n
abstol (global)
If jobz='V', setting abstol to p?lamch(context, 'U') yields the most
orthogonal eigenvectors.
The absolute error tolerance for the eigenvalues. An approximate
eigenvalue is accepted as converged when it is determined to lie in an
interval [a,b] of width less than or equal to
abstol + eps*max(|a|,|b|),
where eps is the machine precision. If abstol is less than or equal to zero,
then eps*norm(T) will be used in its place, where norm(T) is the 1-norm of
the tridiagonal matrix obtained by reducing A to tridiagonal form.
Eigenvalues will be computed most accurately when abstol is set to twice
the underflow threshold 2*p?lamch('S') not zero. If this function returns
with ((mod(info,2)≠0).or. * (mod(info/8,2)≠0)), indicating that
some eigenvalues or eigenvectors did not converge, try setting abstol to
2*p?lamch('S').
NOTE
mod(x,y) is the integer remainder of x/y.
orfac (global).
Specifies which eigenvectors should be reorthogonalized. Eigenvectors that
correspond to eigenvalues which are within tol=orfac*norm(A) of each
other are to be reorthogonalized. However, if the workspace is insufficient
(see lwork), tol may be decreased until all eigenvectors to be
reorthogonalized can be stored in one process. No reorthogonalization will
be done if orfac equals zero. A default value of 1.0E-3 is used if orfac is
negative. orfac should be identical on all processes.
iz, jz (global) The row and column indices in the global matrix Z indicating the
first row and the first column of the submatrix Z, respectively.
descz (global and local) array of size dlen_. The array descriptor for the
distributed matrix Z.descz[ctxt_ - 1] must equal desca[ctxt_ - 1].
1587
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
work (local)
Workspace array of size lwork
lwork (local).
The size of the array work.
If only eigenvalues are requested:
lwork ≥ n+ max(NB*(np0 + 1), 3)
If eigenvectors are requested:
lwork ≥ n + (np0+ mq0 + NB)*NB
with nq0 = numroc(nn, NB, 0, 0, NPCOL).
rwork (local)
Workspace array of size lrwork.
1588
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The computed eigenvectors may not be orthogonal if the minimal
workspace is supplied and orfac is too small. If you want to guarantee
orthogonality (at the cost of potentially poor performance) you should add
the following value to lrwork:
(clustersize-1)*n,
where clustersize is the number of eigenvalues in the largest cluster, where
a cluster is defined as a set of close eigenvalues:
{w]k - 1],..., w[k+clustersize - 2]|w[j] ≤ w[j -
1]+orfac*2*norm(A)}
Variable definitions:
neig = number of eigenvectors requested;
nb = desca[mb_ - 1] = desca[nb_ - 1] = descz[mb_ - 1] =
descz[nb_ - 1];
nn = max(n, nb, 2);
desca[rsrc_ - 1] = desca[nb_ - 1] = descz[rsrc_ - 1] =
descz[csrc_ - 1] = 0 ;
np0 = numroc(nn, nb, 0, 0, NPROW);
mq0 = numroc(max(neig, nb, 2), nb, 0, 0, NPCOL);
iceil(x, y) is a ScaLAPACK function returning ceiling(x/y).
When lrwork is too small:
If lwork is too small to guarantee orthogonality, p?hegvx attempts to
maintain orthogonality in the clusters with the smallest spacing between the
eigenvalues.
If lwork is too small to compute all the eigenvectors requested, no
computation is performed and info= -25 is returned. Note that when
range='V', p?hegvx does not know how many eigenvectors are requested
until the eigenvalues are computed. Therefore, when range='V' and as
long as lwork is large enough to allow p?hegvx to compute the eigenvalues,
p?hegvx will compute the eigenvalues and as many eigenvectors as it can.
Relationship between workspace, orthogonality & performance:
If clustersize > n/sqrt(NPROW*NPCOL), then providing enough space
to compute all the eigenvectors orthogonally will cause serious degradation
in performance. In the limit (that is, clustersize = n-1) p?stein will
perform no better than ?stein on 1 processor.
1589
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
If jobz = 'N', then on exit the upper triangle (if uplo='U') or the lower
triangle (if uplo='L') of sub(A), including the diagonal, is destroyed.
w (global)
Array of size n. On normal exit, the first m entries contain the selected
eigenvalues in ascending order.
z (local).
global size n*n, local size lld_z*LOCc(jz+n-1).
1590
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If jobz = 'V', then on normal exit the first m columns of z contain the
orthonormal eigenvectors of the matrix corresponding to the selected
eigenvalues. If an eigenvector fails to converge, then that column of z
contains the latest approximation to the eigenvector, and the index of the
eigenvector is returned in ifail.
If jobz = 'N', then z is not referenced.
rwork On exit, rwork[0] contains the amount of workspace required for optimal
efficiency
If jobz='N'rwork[0] = optimal amount of workspace required to compute
eigenvalues efficiently
If jobz='V'rwork[0] = optimal amount of workspace required to compute
eigenvalues and eigenvectors efficiently with no guarantee on orthogonality.
If range='V', it is assumed that all eigenvectors may be required when
computing optimal workspace.
ifail (global)
Array of size n.
ifail provides additional information when info≠0
iclustr (global)
Array of size (2*NPROW*NPCOL). This array contains indices of eigenvectors
corresponding to a cluster of eigenvalues that could not be reorthogonalized
due to insufficient workspace (see lwork, orfac and info). Eigenvectors
corresponding to clusters of eigenvalues indexed iclustr(2*i-1) to
iclustr(2*i), could not be reorthogonalized due to lack of workspace.
Hence the eigenvectors corresponding to these clusters may not be
orthogonal.
iclustr() is a zero terminated array. (iclustr(2*k)
≠0.and.clustr(2*k+1)=0) if and only if k is the number of clusters.
iclustr is not referenced if jobz = 'N'.
gap (global)
Array of size NPROW*NPCOL.
This array contains the gap between eigenvalues whose eigenvectors could
not be reorthogonalized. The output values in this array correspond to the
clusters indicated by the array iclustr. As a result, the dot product between
eigenvectors corresponding to the i-th cluster may be as high as (C*n)/
gap(i), where C is a small constant.
1591
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
info (global)
If info = 0, the execution is successful.
If info <0: the i-th argument is an array and the j-entry had an illegal
value, then info = -(i*100+j), if the i-th argument is a scalar and had
an illegal value, then info = -i.
If info> 0:
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?max1 c,z Finds the index of the element whose real part has maximum
absolute value (similar to the Level 1 PBLAS p?amax, but using the
absolute value to the real part).
pmpim2 s,d Computes the eigenpair range assignments for all processes.
?combamax1 c,z Finds the element with maximum real part absolute value and its
corresponding global index.
p?sum1 sc,dz Forms the 1-norm of a complex vector similar to Level 1 PBLAS
p?asum, but using the true absolute value.
1592
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Routine Name Data Description
Types
p?labrd s,d,c,z Reduces the first nb rows and columns of a general rectangular
matrix A to real bidiagonal form by an orthogonal/unitary
transformation, and returns auxiliary matrices that are needed to
apply the transformation to the unreduced part of A.
p?lacon s,d,c,z Estimates the 1-norm of a square matrix, using the reverse
communication for evaluating matrix-vector products.
p?lacp3 s,d Copies from a global parallel array into a local replicated array or
vice versa.
p?laevswp s,d,c,z Moves the eigenvectors from where they are computed to
ScaLAPACK standard block cyclic array.
p?lange s,d,c,z Returns the value of the 1-norm, Frobenius norm, infinity-norm, or
the largest absolute value of any element, of a general rectangular
matrix.
p?lanhs s,d,c,z Returns the value of the 1-norm, Frobenius norm, infinity-norm, or
the largest absolute value of any element, of an upper Hessenberg
matrix.
1593
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
p?lansy, p?lanhe s,d,c,z/c Returns the value of the 1-norm, Frobenius norm, infinity-norm, or
,z the largest absolute value of any element of a real symmetric or
complex Hermitian matrix.
p?lantr s,d,c,z Returns the value of the 1-norm, Frobenius norm, infinity-norm, or
the largest absolute value of any element, of a triangular matrix.
p?laqge s,d,c,z Scales a general rectangular matrix, using row and column scaling
factors computed by p?geequ.
p?laqr1 s,d Sets a scalar multiple of the first column of the product of a 2-by-2
or 3-by-3 matrix and specified shifts.
p?lared1d s,d Redistributes an array assuming that the input array bycol is
distributed across rows and that all process columns contain the
same copy of bycol.
p?lared2d s,d Redistributes an array assuming that the input array byrow is
distributed across columns and that all process rows contain the
same copy of byrow .
1594
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Routine Name Data Description
Types
p?lasmsub s,d Looks for a small subdiagonal element from the bottom of the
matrix that it can safely set to zero.
p?lauu2 s,d,c,z Computes the product UUH or LHL, where U and L are upper or
lower triangular matrices (local unblocked algorithm).
p?lauum s,d,c,z Computes the product UUH or LHL, where U and L are upper or
lower triangular matrices.
1595
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
p?pbtrsv s,d,c,z Solves a single triangular linear system via frontsolve or backsolve
where the triangular matrix is a factor of a banded matrix
computed by p?pbtrf.
p?pttrsv s,d,c,z Solves a single triangular linear system via frontsolve or backsolve
where the triangular matrix is a factor of a tridiagonal matrix
computed by p?pttrf.
?lamsh s,d Sends multiple shifts through a small (single node) matrix to
maximize the number of bulges that can be sent through.
?larrb2 s,d Provides limited bisection to locate eigenvalues for more accuracy.
?larre2 s,d Given a tridiagonal matrix, sets small off-diagonal elements to zero
and for each unreduced block, finds base representations and
eigenvalues.
?larre2a s,d Given a tridiagonal matrix, sets small off-diagonal elements to zero
and for each unreduced block, finds base representations and
eigenvalues.
?larrf2 s,d Finds a new relatively robust representation such that at least one
of the eigenvalues is relatively isolated.
1596
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Routine Name Data Description
Types
?stegr2b s,d From eigenvalues and initial representations computes the selected
eigenvalues and eigenvectors of the real symmetric tridiagonal
matrix in parallel on multiple processors.
?dttrsv s,d,c,z Solves a general tridiagonal system of linear equations using the LU
factorization computed by ?dttrf.
pilaenv NA Returns the positive integer value of the logical blocking size.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
p?lacgv
Conjugates a complex vector.
1597
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
void pclacgv (MKL_INT *n , MKL_Complex8 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT
*descx , MKL_INT *incx );
void pzlacgv (MKL_INT *n , MKL_Complex16 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT
*descx , MKL_INT *incx );
Include Files
• mkl_scalapack.h
Description
The p?lacgvfunction conjugates a complex vector sub(X) of length n, where sub(X) denotes X(ix, jx:jx
+n-1) if incx = m_x, and X(ix:ix+n-1, jx) if incx = 1.
Input Parameters
x (local).
Pointer into the local memory to an array of size lld_x * LOCc(n_x). On
entry the vector to be conjugated x[i] = X(ix+(jx-1)*m_x+i*incx), 0
≤ i < n.
ix (global) The row index in the global matrix X indicating the first row of
sub(X).
jx (global) The column index in the global matrix X indicating the first column
of sub(X).
descx (global and local) Array of size dlen_=9. The array descriptor for the
distributed matrix X.
incx (global) The global increment for the elements of X. Only two values of
incx are supported in this version, namely 1 and m_x. incx must not be
zero.
Output Parameters
x (local).
On exit, the local pieces of conjugated distributed vector sub(X).
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?max1
Finds the index of the element whose real part has
maximum absolute value (similar to the Level 1 PBLAS
p?amax, but using the absolute value to the real part).
Syntax
void pcmax1 (MKL_INT *n , MKL_Complex8 *amax , MKL_INT *indx , MKL_Complex8 *x ,
MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , MKL_INT *incx );
void pzmax1 (MKL_INT *n , MKL_Complex16 *amax , MKL_INT *indx , MKL_Complex16 *x ,
MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , MKL_INT *incx );
1598
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl_scalapack.h
Description
The p?max1function computes the global index of the maximum element in absolute value of a distributed
vector sub(X). The global index is returned in indx and the value is returned in amax, where sub(X) denotes
X(ix:ix+n-1, jx) if incx = 1, X(ix, jx:jx+n-1) if incx = m_x.
Input Parameters
x (local)
Pointer into the local memory to an array of size lld_x * LOCc(jx+n-1). On
entry this array contains the local pieces of the distributed vector sub(X).
ix (global) The row index in the global matrix X indicating the first row of
sub(X).
jx (global) The column index in the global matrix X indicating the first column
of sub(X).
descx (global and local) Array of size dlen_. The array descriptor for the
distributed matrix X.
incx (global).The global increment for the elements of X. Only two values of incx
are supported in this version, namely 1 and m_x. incx must not be zero.
Output Parameters
amax (global output).The absolute value of the largest entry of the distributed
vector sub(X) only in the scope of sub(X).
indx (global output).The global index of the element of the distributed vector
sub(X) whose real part has maximum absolute value.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
pilaver
Returns the ScaLAPACK version.
Syntax
void pilaver (MKL_INT* vers_major, MKL_INT* vers_minor, MKL_INT* vers_patch);
Include Files
• mkl_scalapack.h
Description
This function returns the ScaLAPACK version.
Output Parameters
1599
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
vers_minor Return the ScaLAPACK minor version from the major version.
vers_patch Return the ScaLAPACK patch version from the minor version.
pmpcol
Finds the collaborators of a process.
Syntax
void pmpcol(MKL_INT* myproc, MKL_INT* nprocs, MKL_INT* iil, MKL_INT* needil, MKL_INT*
neediu, MKL_INT* pmyils, MKL_INT* pmyius, MKL_INT* colbrt, MKL_INT* frstcl, MKL_INT*
lastcl);
Include Files
• mkl_scalapack.h
Description
Using the output from pmpim2 and given the information on eigenvalue clusters, pmpcol finds the
collaborators of myproc.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
pmyils array
For each processor p, 0 < p≤nprocs, pmyils[p-1] is the index of the first
eigenvalue in the eigenvalue cluster to be computed.
pmyils[p-1] equals zero if p stays idle.
pmyius array
For each processor p, pmyius[p-1] is the index of the last eigenvalue in the
eigenvalue cluster to be computed.
pmyius[p-1] equals zero if p stays idle.
OUTPUT Parameters
1600
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
myproc collaborates with:
frstcl, ..., myproc-1, myproc+1, ...,lastcl
If myproc = frstcl, there are no collaborators on the left. If myproc =
lastcl, there are no collaborators on the right.
If frstcl = 0 and lastcl = nprocs-1, then myproc collaborates with
everybody
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
pmpim2
Computes the eigenpair range assignments for all
processes.
Syntax
void pmpim2(MKL_INT* il, MKL_INT* iu, MKL_INT* nprocs, MKL_INT* pmyils, MKL_INT*
pmyius);
Include Files
• mkl_scalapack.h
Description
pmpim2 is the scheduling function. It computes for all processors the eigenpair range assignments.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
Output Parameters
pmyils array
For each processor p, pmyils[p-1] is the index of the first eigenvalue
in a cluster to be computed.
pmyils[p-1] equals zero if p stays idle.
pmyius array
For each processor p, pmyius[p-1] is the index of the last eigenvalue
in a cluster to be computed.
pmyius[p-1] equals zero if p stays idle.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
1601
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?combamax1
Finds the element with maximum real part absolute
value and its corresponding global index.
Syntax
void ccombamax1 (MKL_Complex8 *v1 , MKL_Complex8 *v2 );
void zcombamax1 (MKL_Complex16 *v1 , MKL_Complex16 *v2 );
Include Files
• mkl_scalapack.h
Description
The ?combamax1function finds the element having maximum real part absolute value as well as its
corresponding global index.
Input Parameters
v1 (local)
Array of size 2. The first maximum absolute value element and its global
index. v1[0]=amax, v1[1]=indx.
v2 (local)
Array of size 2. The second maximum absolute value element and its global
index. v2[0]=amax, v2[1]=indx.
Output Parameters
v1 (local).
The first maximum absolute value element and its global index.
v1[0]=amax, v1[1]=indx.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?sum1
Forms the 1-norm of a complex vector similar to Level
1 PBLAS p?asum, but using the true absolute value.
Syntax
void pscsum1 (MKL_INT *n , float *asum , MKL_Complex8 *x , MKL_INT *ix , MKL_INT *jx ,
MKL_INT *descx , MKL_INT *incx );
void pdzsum1 (MKL_INT *n , double *asum , MKL_Complex16 *x , MKL_INT *ix , MKL_INT
*jx , MKL_INT *descx , MKL_INT *incx );
Include Files
• mkl_scalapack.h
Description
The p?sum1function returns the sum of absolute values of a complex distributed vector sub(x) in asum,
where sub(x) denotes X(ix:ix+n-1, jx:jx), if incx = 1, X(ix:ix, jx:jx+n-1), if incx = m_x.
1602
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Based on p?asum from the Level 1 PBLAS. The change is to use the 'genuine' absolute value.
Input Parameters
x (local )
Pointer into the local memory to an array of size lld_x * LOCc(jx+n-1). This
array contains the local pieces of the distributed vector sub(X).
ix (global) The row index in the global matrix X indicating the first row of
sub(X).
jx (global) The column index in the global matrix X indicating the first column
of sub(X)
descx (local) Array of size dlen_=9. The array descriptor for the distributed matrix
X.
incx (global) The global increment for the elements of X. Only two values of
incx are supported in this version, namely 1 and m_x.
Output Parameters
asum (local)
The sum of absolute values of the distributed vector sub(X) only in its
scope.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?dbtrsv
Computes an LU factorization of a general triangular
matrix with no pivoting. The function is called by
p?dbtrs.
Syntax
void psdbtrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu ,
MKL_INT *nrhs , float *a , MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib ,
MKL_INT *descb , float *af , MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT
*info );
void pddbtrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu ,
MKL_INT *nrhs , double *a , MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib ,
MKL_INT *descb , double *af , MKL_INT *laf , double *work , MKL_INT *lwork , MKL_INT
*info );
void pcdbtrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu ,
MKL_INT *nrhs , MKL_Complex8 *a , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b ,
MKL_INT *ib , MKL_INT *descb , MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8 *work ,
MKL_INT *lwork , MKL_INT *info );
void pzdbtrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu ,
MKL_INT *nrhs , MKL_Complex16 *a , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b ,
MKL_INT *ib , MKL_INT *descb , MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16 *work ,
MKL_INT *lwork , MKL_INT *info );
1603
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl_scalapack.h
Description
The p?dbtrsvfunction solves a banded triangular system of linear equations
A(1 :n, ja:ja+n-1)T * X = B(ib:ib+n-1, 1 :nrhs) (for real flavors); A(1 :n, ja:ja+n-1)H* X = B(ib:ib+n-1,
1 :nrhs) (for complex flavors),
where A(1 :n, ja:ja+n-1) is a banded triangular matrix factor produced by the Gaussian elimination code of
p?dbtrf and is stored in A(1 :n, ja:ja+n-1) and af. The matrix stored in A(1 :n, ja:ja+n-1) is either
upper or lower triangular according to uplo, and the choice of solving A(1 :n, ja:ja+n-1) or A(1 :n, ja:ja
+n-1)T is dictated by the user by the parameter trans.
The function p?dbtrf must be called first.
Input Parameters
uplo (global)
If uplo='U', the upper triangle of A(1:n, ja:ja+n-1) is stored,
trans (global)
If trans = 'N', solve with A(1:n, ja:ja+n-1),
nrhs (global) The number of right-hand sides; the number of columns of the
distributed submatrix B (nrhs≥ 0).
a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1),
where lld_a≥(bwl+bwu+1). On entry, this array contains the local pieces of
the n-by-n unsymmetric banded distributed Cholesky factor L or LT,
represented in global A as A(1 :n, ja:ja+n-1). This local portion is stored
in the packed banded format used in LAPACK. See the Application Notes
below and the ScaLAPACK manual for more detail on the format of
distributed matrices.
ja (global) The index in the global matrix A that points to the start of the
matrix to be operated on (which may be either all of A or a submatrix of A).
1604
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
b (local)
Pointer into the local memory to an array of local lead dimension lld_b≥nb.
On entry, this array contains the local pieces of the right-hand sides
B(ib:ib+n-1, 1:nrhs).
ib (global) The row index in the global matrix B that points to the first row of
the matrix to be operated on (which may be either all of B or a submatrix of
B).
if 2d type (dtype_b =1), dlen≥9. The array descriptor for the distributed
matrix B. Contains information of mapping B to memory.
laf (local)
Size of user-input auxiliary fill-in space af.
work (local).
Temporary workspace. This space may be overwritten in between function
calls.
work must be the size given in lwork.
Output Parameters
a (local).
This local portion is stored in the packed banded format used in LAPACK.
Please see the ScaLAPACK manual for more detail on the format of
distributed matrices.
b On exit, this contains the local piece of the solutions distributed matrix X.
af (local).
auxiliary fill-in space. The fill-in space is created in a call to the factorization
function p?dbtrf and is stored in af. If a linear system is to be solved
using p?dbtrf after the factorization function, af must not be altered after
the factorization.
info (local).
If info = 0, the execution is successful.
1605
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
< 0: If the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value, then info= - (i*100+j), if the i-th argument is a scalar and
had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?dttrsv
Computes an LU factorization of a general band
matrix, using partial pivoting with row interchanges.
The function is called by p?dttrs.
Syntax
void psdttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , float *dl , float
*d , float *du , MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT
*descb , float *af , MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT *info );
void pddttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , double *dl ,
double *d , double *du , MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib ,
MKL_INT *descb , double *af , MKL_INT *laf , double *work , MKL_INT *lwork , MKL_INT
*info );
void pcdttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8
*dl , MKL_Complex8 *d , MKL_Complex8 *du , MKL_INT *ja , MKL_INT *desca , MKL_Complex8
*b , MKL_INT *ib , MKL_INT *descb , MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8
*work , MKL_INT *lwork , MKL_INT *info );
void pzdttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16
*dl , MKL_Complex16 *d , MKL_Complex16 *du , MKL_INT *ja , MKL_INT *desca ,
MKL_Complex16 *b , MKL_INT *ib , MKL_INT *descb , MKL_Complex16 *af , MKL_INT *laf ,
MKL_Complex16 *work , MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?dttrsvfunction solves a tridiagonal triangular system of linear equations
Input Parameters
uplo (global)
If uplo='U', the upper triangle of A(1:n, ja:ja+n-1) is stored,
1606
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
if uplo = 'L', the lower triangle of A(1:n, ja:ja+n-1) is stored.
trans (global)
If trans = 'N', solve with A(1:n, ja:ja+n-1),
nrhs (global) The number of right-hand sides; the number of columns of the
distributed submatrix B(ib:ib+n-1, 1:nrhs). (nrhs≥ 0).
dl (local).
Pointer to local part of global vector storing the lower diagonal of the
matrix.
Globally, dl[0] is not referenced, and dl must be aligned with d.
d (local).
Pointer to local part of global vector storing the main diagonal of the matrix.
du (local).
Pointer to local part of global vector storing the upper diagonal of the
matrix.
Globally, du[n-1] is not referenced, and du must be aligned with d.
ja (global) The index in the global matrix A that points to the start of the
matrix to be operated on (which may be either all of A or a submatrix of A).
b (local)
Pointer into the local memory to an array of local lead dimension lld_b≥nb.
On entry, this array contains the local pieces of the right-hand sides
B(ib:ib+n-1, 1 :nrhs).
ib (global) The row index in the global matrix B that points to the first row of
the matrix to be operated on (which may be either all of B or a submatrix of
B).
laf (local).
1607
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
laf≥ 2*(nb+2). If laf is not large enough, an error code is returned and
the minimum acceptable size will be returned in af[0].
work (local).
Temporary workspace. This space may be overwritten in between function
calls.
work must be the size given in lwork.
lwork≥ 10*npcol+4*nrhs.
Output Parameters
dl (local).
On exit, this array contains information containing the factors of the matrix.
d On exit, this array contains information containing the factors of the matrix.
Must be of size ≥nb_a.
b On exit, this contains the local piece of the solutions distributed matrix X.
af (local).
Auxiliary fill-in space. The fill-in space is created in a call to the factorization
function p?dttrf and is stored in af. If a linear system is to be solved
using p?dttrs after the factorization function, af must not be altered after
the factorization.
info (local).
If info=0, the execution is successful.
p?gebal
Balances a general real/complex matrix.
Syntax
void psgebal(char* job, MKL_INT* n, float* a, MKL_INT* desca, MKL_INT* ilo, MKL_INT*
ihi, float* scale, MKL_INT* info);
void pdgebal(char* job, MKL_INT* n, double* a, MKL_INT* desca, MKL_INT* ilo, MKL_INT*
ihi, double* scale, MKL_INT* info);
void pcgebal(char* job, MKL_INT* n, complex float* a, MKL_INT* desca, MKL_INT* ilo,
MKL_INT* ihi, float* scale, MKL_INT* info);
1608
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void pzgebal(char* job, MKL_INT* n, complex double* a, MKL_INT* desca, MKL_INT* ilo,
MKL_INT* ihi, double* scale, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
p?gebal balances a general real/complex matrix A. This involves, first, permuting A by a similarity
transformation to isolate eigenvalues in the first 1 to ilo-1 and last ihi+1 to n elements on the diagonal;
and second, applying a diagonal similarity transformation to rows and columns ilo to ihi to make the rows
and columns as close in norm as possible. Both steps are optional.
Balancing may reduce the 1-norm of the matrix, and improve the accuracy of the computed eigenvalues
and/or eigenvectors.
Input Parameters
job (global )
Specifies the operations to be performed on a:
= 'N': none: simply set ilo = 1, ihi = n, scale[i] = 1.0 for i = 0,...,n-1;
n (global )
The order of the matrix A (n≥ 0).
a (local ) Pointer into the local memory to an array of size lld_a * LOCc(n)
OUTPUT Parameters
1609
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
info (global )
= 0: successful exit.
< 0: if info = -i, the i-th argument had an illegal value.
Application Notes
The permutations consist of row and column interchanges which put the matrix in the form
where T1 and T2 are upper triangular matrices whose eigenvalues lie along the diagonal. The column indices
ilo and ihi mark the starting and ending columns of the submatrix B. Balancing consists of applying a
diagonal similarity transformation D-1BD to make the 1-norms of each row of B and its corresponding column
nearly equal. The output matrix is
1610
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Information about the permutations P and the diagonal matrix D is returned in the vector scale.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?gebd2
Reduces a general rectangular matrix to real
bidiagonal form by an orthogonal/unitary
transformation (unblocked algorithm).
Syntax
void psgebd2 (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *d , float *e , float *tauq , float *taup , float *work , MKL_INT
*lwork , MKL_INT *info );
void pdgebd2 (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *d , double *e , double *tauq , double *taup , double *work , MKL_INT
*lwork , MKL_INT *info );
void pcgebd2 (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , float *d , float *e , MKL_Complex8 *tauq , MKL_Complex8 *taup ,
MKL_Complex8 *work , MKL_INT *lwork , MKL_INT *info );
void pzgebd2 (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *d , double *e , MKL_Complex16 *tauq , MKL_Complex16 *taup ,
MKL_Complex16 *work , MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?gebd2function reduces a real/complex general m-by-n distributed matrix sub(A) = A(ia:ia+m-1,
ja:ja+n-1) to upper or lower bidiagonal form B by an orthogonal/unitary transformation:
Q'*sub(A)*P = B.
If m ≥ n, B is the upper bidiagonal; if m<n, B is the lower bidiagonal.
1611
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
m (global)
The number of rows of the distributed matrix sub(A). (m≥0).
n (global)
The number of columns in the distributed matrix sub(A). (n≥0).
a (local).
Pointer into the local memory to an array of sizelld_a * LOCc(ja+n-1).
On entry, this array contains the local pieces of the general distributed
matrix sub(A).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
work (local).
This is a workspace array of size lwork.
Output Parameters
a (local).
On exit, if m ≥ n, the diagonal and the first superdiagonal of sub(A) are
overwritten with the upper bidiagonal matrix B; the elements below the
diagonal, with the array tauq, represent the orthogonal/unitary matrix Q as
a product of elementary reflectors, and the elements above the first
superdiagonal, with the array taup, represent the orthogonal matrix P as a
product of elementary reflectors. If m < n, the diagonal and the first
subdiagonal are overwritten with the lower bidiagonal matrix B; the
elements below the first subdiagonal, with the array tauq, represent the
1612
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
orthogonal/unitary matrix Q as a product of elementary reflectors, and the
elements above the diagonal, with the array taup, represent the orthogonal
matrix P as a product of elementary reflectors. See Applications Notes
below.
d (local)
Array of size LOCc(ja+min(m,n)-1) if m ≥ n; LOCr(ia+min(m,n)-1)
otherwise. The distributed diagonal elements of the bidiagonal matrix B:
d[i] = A(i+1,i+1), i=0, 1,..., size (d) - 1 . d is tied to the distributed matrix
A.
e (local)
Array of size LOCc(ja+min(m,n)-1) if m≥ n; LOCr(ia+min(m,n)-2)
otherwise. The distributed diagonal elements of the bidiagonal matrix B:
if m ≥ n, e[i] = A(i+1,i+2) for i = 0, 1, ... , n-2;
tauq (local).
Array of size LOCc(ja+min(m,n)-1). The scalar factors of the elementary
reflectors which represent the orthogonal/unitary matrix Q. tauq is tied to
the distributed matrix A.
taup (local).
Array of size LOCr(ia+min(m,n)-1). The scalar factors of the elementary
reflectors which represent the orthogonal/unitary matrix P. taup is tied to
the distributed matrix A.
info (local)
if info < 0: If the i-th argument is an array and the j-th entry, indexed
j-1, had an illegal value, then info = - (i*100+j), if the i-th argument is a
scalar and had an illegal value, then info = -i.
Application Notes
The matrices Q and P are represented as products of elementary reflectors:
If m≥n,
1613
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If m < n,
where d and e denote diagonal and off-diagonal elements of B, vi denotes an element of the vector defining
H(i), and ui an element of the vector defining G(i).
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?gehd2
Reduces a general matrix to upper Hessenberg form
by an orthogonal/unitary similarity transformation
(unblocked algorithm).
Syntax
void psgehd2 (MKL_INT *n , MKL_INT *ilo , MKL_INT *ihi , float *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *tau , float *work , MKL_INT *lwork , MKL_INT
*info );
void pdgehd2 (MKL_INT *n , MKL_INT *ilo , MKL_INT *ihi , double *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *tau , double *work , MKL_INT *lwork , MKL_INT
*info );
void pcgehd2 (MKL_INT *n , MKL_INT *ilo , MKL_INT *ihi , MKL_Complex8 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
void pzgehd2 (MKL_INT *n , MKL_INT *ilo , MKL_INT *ihi , MKL_Complex16 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
1614
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
The p?gehd2function reduces a real/complex general distributed matrix sub(A) to upper Hessenberg form H
by an orthogonal/unitary similarity transformation: Q'*sub(A)*Q = H, where sub(A) = A(ia+n-1 :ia
+n-1, ja+n-1 :ja+n-1).
Input Parameters
ilo, ihi (global) It is assumed that the matrix sub(A) is already upper triangular in
rows ia:ia+ilo-2 and ia+ihi:ia+n-1 and columns ja:ja+jlo-2 and ja
+jhi:ja+n-1. See Application Notes for further information.
If n≥ 0, 1 ≤ ilo ≤ ihi ≤ n; otherwise set ilo = 1, ihi = n.
a (local).
Pointer into the local memory to an array of sizelld_a * LOCc(ja+n-1).
On entry, this array contains the local pieces of the n-by-n general
distributed matrix sub(A) to be reduced.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
work (local).
This is a workspace array of size lwork.
Output Parameters
a (local). On exit, the upper triangle and the first subdiagonal of sub(A) are
overwritten with the upper Hessenberg matrix H, and the elements below
the first subdiagonal, with the array tau, represent the orthogonal/unitary
matrix Q as a product of elementary reflectors. (see Application Notes
below).
tau (local).
1615
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
info (local)
If info = 0, the execution is successful.
if info < 0: If the i-th argument is an array and the j-th entry, indexed j-1,
had an illegal value, then info = - (i*100+j), if the i-th argument is a
scalar and had an illegal value, then info = -i.
Application Notes
The matrix Q is represented as a product of (ihi-ilo) elementary reflectors
Q = H(ilo)*H(ilo+1)*...*H(ihi-1).
Each H(i) has the form
H(i) = I - tau*v*v',
where tau is a real/complex scalar, and v is a real/complex vector with v(1: i)=0, v(i+1)=1 and v(ihi
+1:n)=0; v(i+2:ihi) is stored on exit in A(ia+ilo+i:ia+ihi-1, ia+ilo+i-2), and tau in tau[ja+ilo
+i-3].
The contents of A(ia:ia+n-1, ja:ja+n-1) are illustrated by the following example, with n = 7, ilo = 2
and ihi = 6:
where a denotes an element of the original matrix sub(A), h denotes a modified element of the upper
Hessenberg matrix H, and vi denotes an element of the vector defining H(ja+ilo+i-2).
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?gelq2
Computes an LQ factorization of a general rectangular
matrix (unblocked algorithm).
1616
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void psgelq2 (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdgelq2 (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
void pcgelq2 (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT
*info );
void pzgelq2 (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT
*info );
Include Files
• mkl_scalapack.h
Description
The p?gelq2function computes an LQ factorization of a real/complex distributed m-by-n matrix sub(A) =
A(ia:ia+m-1, ja:ja+n-1) = L*Q.
Input Parameters
m (global)
The number of rows of the distributed matrix sub(A). (m≥0).
n (global)
The number of columns of the distributed matrix sub(A). (n≥0).
a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).
On entry, this array contains the local pieces of the m-by-n distributed
matrix sub(A) which is to be factored.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
work (local).
This is a workspace array of size lwork.
1617
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
a (local).
On exit, the elements on and below the diagonal of sub(A) contain the m by
min(m,n) lower trapezoidal matrix L (L is lower triangular if m ≤ n); the
elements above the diagonal, with the array tau, represent the orthogonal/
unitary matrix Q as a product of elementary reflectors (see Application
Notes below).
tau (local).
Array of size LOCr(ia+min(m, n)-1). This array contains the scalar
factors of the elementary reflectors. tau is tied to the distributed matrix A.
info (local) If info = 0, the execution is successful. if info < 0: If the i-th
argument is an array and the j-th entry, indexed j-1, had an illegal value,
then info = - (i*100+j), if the i-th argument is a scalar and had an illegal
value, then info = -i.
Application Notes
The matrix Q is represented as a product of elementary reflectors
Q =H(ia+k-1)*H(ia+k-2)*. . . *H(ia) for real flavors, Q =(H(ia+k-1))H*(H(ia
+k-2))H...*(H(ia))H for complex flavors,
where k = min(m,n).
H(i) = I - tau*v*v'
where tau is a real/complex scalar, and v is a real/complex vector with v(1: i-1) = 0 and v(i) = 1; v(i
+1: n) (for real flavors) or conjg(v(i+1: n)) (for complex flavors) is stored on exit in A(ia+i-1,ja+i:ja
+n-1), and tau in tau[ia+i-2].
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?geql2
Computes a QL factorization of a general rectangular
matrix (unblocked algorithm).
Syntax
void psgeql2 (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
1618
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void pdgeql2 (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
void pcgeql2 (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT
*info );
void pzgeql2 (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT
*info );
Include Files
• mkl_scalapack.h
Description
The p?geql2function computes a QL factorization of a real/complex distributed m-by-n matrix sub(A) =
A(ia:ia+m-1, ja:ja+n-1)= Q *L.
Input Parameters
m (global)
The number of rows in the distributed matrix sub(A). (m≥ 0).
n (global)
The number of columns in the distributed matrix sub(A). (n≥ 0).
a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).
On entry, this array contains the local pieces of the m-by-n distributed
matrix sub(A) which is to be factored.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
work (local).
This is a workspace array of size lwork.
1619
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
a (local).
On exit,
if m ≥ n, the lower triangle of the distributed submatrix A(ia+m-n:ia+m-1,
ja:ja+n-1) contains the n-by-n lower triangular matrix L;
if m ≤ n, the elements on and below the (n-m)-th superdiagonal contain
the m-by-n lower trapezoidal matrix L; the remaining elements, with the
array tau, represent the orthogonal/ unitary matrix Q as a product of
elementary reflectors (see Application Notes below).
tau (local).
Array of size LOCc(ja+n-1). This array contains the scalar factors of the
elementary reflectors. tau is tied to the distributed matrix A.
info (local).
If info = 0, the execution is successful. if info < 0: If the i-th argument
is an array and the j-th entry, indexed j-1, had an illegal value, then info
= - (i*100+j), if the i-th argument is a scalar and had an illegal value,
then info = -i.
Application Notes
The matrix Q is represented as a product of elementary reflectors
Q = H(ja+k-1)*...*H(ja+1)*H(ja), where k = min(m,n).
Each H(i) has the form
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?geqr2
Computes a QR factorization of a general rectangular
matrix (unblocked algorithm).
Syntax
void psgeqr2 (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdgeqr2 (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
1620
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void pcgeqr2 (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT
*info );
void pzgeqr2 (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT
*info );
Include Files
• mkl_scalapack.h
Description
The p?geqr2function computes a QR factorization of a real/complex distributed m-by-n matrix sub(A) =
A(ia:ia+m-1, ja:ja+n-1)= Q*R.
Input Parameters
m (global)
The number of rows in the distributed matrix sub(A). (m≥0).
a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).
On entry, this array contains the local pieces of the m-by-n distributed
matrix sub(A) which is to be factored.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
work (local).
This is a workspace array of size lwork.
1621
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
a (local).
On exit, the elements on and above the diagonal of sub(A) contain the
min(m,n) by n upper trapezoidal matrix R (R is upper triangular if m≥n); the
elements below the diagonal, with the array tau, represent the orthogonal/
unitary matrix Q as a product of elementary reflectors (see Application
Notes below).
tau (local).
Array of size LOCc(ja+min(m,n)-1). This array contains the scalar factors of
the elementary reflectors. tau is tied to the distributed matrix A.
info (local)
If info = 0, the execution is successful. if info < 0:
If the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value, then info = - (i*100+j),
if the i-th argument is a scalar and had an illegal value, then info = -i.
Application Notes
The matrix Q is represented as a product of elementary reflectors
Q = H(ja)*H(ja+1)*. . .* H(ja+k-1), where k = min(m,n).
Each H(i) has the form
H(j)= I - tau*v*v',
where tau is a real/complex scalar, and v is a real/complex vector with v(1: i-1) = 0 and v(i) = 1; v(i+1: m)
is stored on exit in A(ia+i:ia+m-1, ja+i-1), and tau in tau[ja+i-2].
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?gerq2
Computes an RQ factorization of a general rectangular
matrix (unblocked algorithm).
Syntax
void psgerq2 (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdgerq2 (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
void pcgerq2 (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT
*info );
1622
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void pzgerq2 (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT
*info );
Include Files
• mkl_scalapack.h
Description
The p?gerq2function computes an RQ factorization of a real/complex distributed m-by-n matrix sub(A) =
A(ia:ia+m-1, ja:ja+n-1) = R*Q.
Input Parameters
a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).
On entry, this array contains the local pieces of the m-by-n distributed
matrix sub(A) which is to be factored.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
work (local).
This is a workspace array of size lwork.
Output Parameters
a (local).
1623
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
On exit,
if m ≤ n, the upper triangle of A(ia+m-n:ia+m-1, ja:ja+n-1) contains the
m-by-m upper triangular matrix R;
if m ≥ n, the elements on and above the (m-n)-th subdiagonal contain the
m-by-n upper trapezoidal matrix R; the remaining elements, with the array
tau, represent the orthogonal/ unitary matrix Q as a product of elementary
reflectors (see Application Notes below).
tau (local).
Array of size LOCr(ia+m -1). This array contains the scalar factors of the
elementary reflectors. tau is tied to the distributed matrix A.
info (local)
If info = 0, the execution is successful.
if info < 0: If the i-th argument is an array and the j-th entry, indexed j-1,
had an illegal value, then info = - (i*100+j), if the i-th argument is a
scalar and had an illegal value, then info = -i.
Application Notes
The matrix Q is represented as a product of elementary reflectors
Q = H(ia)*H(ia+1)*...*H(ia+k-1) for real flavors,
Q = (H(ia))H*(H(ia+1))H...*(H(ia+k-1))H for complex flavors,
where k = min(m, n).
H(i) = I - tau*v*v',
where tau is a real/complex scalar, and v is a real/complex vector with v(n-k+i+1:n) = 0 and v(n-k+i) =
1; v(1:n-k+i-1) for real flavors or conjg(v(1:n-k+i-1)) for complex flavors is stored on exit in A(ia+m-
k+i-1, ja:ja+n-k+i-2), and tau in tau[ia+m-k+i-2].
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?getf2
Computes an LU factorization of a general matrix,
using partial pivoting with row interchanges (local
blocked algorithm).
Syntax
void psgetf2 (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *ipiv , MKL_INT *info );
void pdgetf2 (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *ipiv , MKL_INT *info );
void pcgetf2 (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *ipiv , MKL_INT *info );
1624
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void pzgetf2 (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *ipiv , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?getf2function computes an LU factorization of a general m-by-n distributed matrix sub(A) = A(ia:ia
+m-1, ja:ja+n-1) using partial pivoting with row interchanges.
The factorization has the form sub(A) = P * L* U, where P is a permutation matrix, L is lower triangular
with unit diagonal elements (lower trapezoidal if m>n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Parallel Level 2 BLAS version of the algorithm.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
m (global)
The number of rows in the distributed matrix sub(A). (m≥0).
a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).
On entry, this array contains the local pieces of the m-by-n distributed
matrix sub(A).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
Output Parameters
ipiv (local)
Array of size(LOCr(m_a) + mb_a). This array contains the pivoting
information. ipiv[i] -> The global row that local row (i +1) was swapped
with, i = 0, 1, ... , LOCr(m_a) + mb_a - 1. This array is tied to the
distributed matrix A.
info (local).
If info = 0: successful exit.
If info < 0:
1625
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
• if the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value, then info = -(i*100+j),
• if the i-th argument is a scalar and had an illegal value, then info = -
i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?labrd
Reduces the first nb rows and columns of a general
rectangular matrix A to real bidiagonal form by an
orthogonal/unitary transformation, and returns
auxiliary matrices that are needed to apply the
transformation to the unreduced part of A.
Syntax
void pslabrd (MKL_INT *m , MKL_INT *n , MKL_INT *nb , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *d , float *e , float *tauq , float *taup , float *x ,
MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , float *y , MKL_INT *iy , MKL_INT *jy ,
MKL_INT *descy , float *work );
void pdlabrd (MKL_INT *m , MKL_INT *n , MKL_INT *nb , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *d , double *e , double *tauq , double *taup , double *x ,
MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , double *y , MKL_INT *iy , MKL_INT *jy ,
MKL_INT *descy , double *work );
void pclabrd (MKL_INT *m , MKL_INT *n , MKL_INT *nb , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *d , float *e , MKL_Complex8 *tauq , MKL_Complex8
*taup , MKL_Complex8 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , MKL_Complex8 *y ,
MKL_INT *iy , MKL_INT *jy , MKL_INT *descy , MKL_Complex8 *work );
void pzlabrd (MKL_INT *m , MKL_INT *n , MKL_INT *nb , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *d , double *e , MKL_Complex16 *tauq ,
MKL_Complex16 *taup , MKL_Complex16 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx ,
MKL_Complex16 *y , MKL_INT *iy , MKL_INT *jy , MKL_INT *descy , MKL_Complex16 *work );
Include Files
• mkl_scalapack.h
Description
The p?labrdfunction reduces the first nb rows and columns of a real/complex general m-by-n distributed
matrix sub(A) = A(ia:ia+m-1, ja:ja+n-1) to upper or lower bidiagonal form by an orthogonal/unitary
transformation Q'* A * P, and returns the matrices X and Y necessary to apply the transformation to the
unreduced part of sub(A).
If m ≥n, sub(A) is reduced to upper bidiagonal form; if m < n, sub(A) is reduced to lower bidiagonal form.
1626
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
m (global) The number of rows in the distributed matrix sub(A). (m≥ 0).
nb (global)
The number of leading rows and columns of sub(A) to be reduced.
a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).
On entry, this array contains the local pieces of the general distributed
matrix sub(A).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
ix, jx (global) The row and column indices in the global matrix X indicating the
first row and the first column of the matrix sub(X), respectively.
descx (global and local) array of size dlen_. The array descriptor for the
distributed matrix X.
iy, jy (global) The row and column indices in the global matrix Y indicating the
first row and the first column of the matrix sub(Y), respectively.
descy (global and local) array of size dlen_. The array descriptor for the
distributed matrix Y.
work (local).
Workspace array of sizelwork.
Output Parameters
a (local)
On exit, the first nb rows and columns of the matrix are overwritten; the
rest of the distributed matrix sub(A) is unchanged.
1627
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If m < n, elements below the diagonal in the first nb columns, with the
array tauq, represent the orthogonal/unitary matrix Q as a product of
elementary reflectors, and elements on and above the diagonal in the first
nb rows, with the array taup, represent the orthogonal/unitary matrix P as
a product of elementary reflectors. See Application Notes below.
d (local).
Array of size LOCr(ia+min(m,n)-1) if m ≥ n; LOCc(ja+min(m,n)-1)
otherwise. The distributed diagonal elements of the bidiagonal distributed
matrix B:
d[i] = A(ia+i, ja+i), i= 0, 1, ..., size (d)-1
d is tied to the distributed matrix A.
e (local).
Array of size LOCr(ia+min(m,n)-1) if m ≥ n; LOCc(ja+min(m,n)-2)
otherwise. The distributed off-diagonal elements of the bidiagonal
distributed matrix B:
if m ≥ n, e[i] = A(ia+i, ja+i+1) for i = 0, 1, ..., n-2;
x (local)
Pointer into the local memory to an array of size lld_x* nb. On exit, the
local pieces of the distributed m-by-nb matrix X(ix:ix+m-1, jx:jx+nb-1)
required to update the unreduced part of sub(A).
y (local).
Pointer into the local memory to an array of size lld_y* nb. On exit, the
local pieces of the distributed n-by-nb matrix Y(iy:iy+n-1, jy:jy+nb-1)
required to update the unreduced part of sub(A).
Application Notes
The matrices Q and P are represented as products of elementary reflectors:
Q = H(1)*H(2)*...*H(nb), and P = G(1)*G(2)*...*G(nb)
Each H(i) and G(i) has the form:
1628
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If m < n, v(1: i) = 0, v(i+1 ) = 1, and v(i+1:m) is stored on exit in
where a denotes an element of the original matrix which is unchanged, vi denotes an element of the vector
defining H(i), and ui an element of the vector defining G(i).
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?lacon
Estimates the 1-norm of a square matrix, using the
reverse communication for evaluating matrix-vector
products.
Syntax
void pslacon (MKL_INT *n , float *v , MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , float
*x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , MKL_INT *isgn , float *est , MKL_INT
*kase );
void pdlacon (MKL_INT *n , double *v , MKL_INT *iv , MKL_INT *jv , MKL_INT *descv ,
double *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , MKL_INT *isgn , double *est ,
MKL_INT *kase );
void pclacon (MKL_INT *n , MKL_Complex8 *v , MKL_INT *iv , MKL_INT *jv , MKL_INT
*descv , MKL_Complex8 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , float *est ,
MKL_INT *kase );
void pzlacon (MKL_INT *n , MKL_Complex16 *v , MKL_INT *iv , MKL_INT *jv , MKL_INT
*descv , MKL_Complex16 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , double *est ,
MKL_INT *kase );
Include Files
• mkl_scalapack.h
1629
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Description
The p?laconfunction estimates the 1-norm of a square, real/unitary distributed matrix A. Reverse
communication is used for evaluating matrix-vector products. x and v are aligned with the distributed matrix
A, this information is implicitly contained within iv, ix, descv, and descx.
Input Parameters
v (local).
Pointer into the local memory to an array of size LOCr(n+mod(iv-1, mb_v)).
On the final return, v = a*w, where est = norm(v)/norm(w) (w is not
returned).
iv, jv (global) The row and column indices in the global matrix V indicating the
first row and the first column of the submatrix V, respectively.
descv (global and local) array of size dlen_. The array descriptor for the
distributed matrix V.
x (local).
Pointer into the local memory to an array of size LOCr(n+mod(ix-1, mb_x)).
ix, jx (global) The row and column indices in the global matrix X indicating the
first row and the first column of the submatrix X, respectively.
descx (global and local) array of size dlen_. The array descriptor for the
distributed matrix X.
isgn (local).
Array of size LOCr(n+mod(ix-1, mb_x)). isgn is aligned with x and v.
kase (local).
On the initial call to p?lacon, kase should be 0.
Output Parameters
x (local).
On an intermediate return, X should be overwritten by A*X, if kase=1, A'
*X, if kase=2,
p?lacon must be re-called with all the other parameters unchanged.
est (global).
kase (local)
On an intermediate return, kase is 1 or 2, indicating whether X should be
overwritten by A*X, or A'*X. On the final return from p?lacon, kase is
again 0.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
1630
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
p?laconsb
Looks for two consecutive small subdiagonal elements.
Syntax
void pslaconsb (const float *a, const MKL_INT *desca, const MKL_INT *i, const MKL_INT
*l, MKL_INT *m, const float *h44, const float *h33, const float *h43h34, float *buf,
const MKL_INT *lwork );
void pdlaconsb (const double *a, const MKL_INT *desca, const MKL_INT *i, const MKL_INT
*l, MKL_INT *m, const double *h44, const double *h33, const double *h43h34, double *buf,
const MKL_INT *lwork );
void pclaconsb (const MKL_Complex8 *a , const MKL_INT *desca , const MKL_INT *i , const
MKL_INT *l , MKL_INT *m , const MKL_Complex8 *h44 , const MKL_Complex8 *h33 , const
MKL_Complex8 *h43h34 , MKL_Complex8 *buf , const MKL_INT *lwork );
void pzlaconsb (const MKL_Complex16 *a , const MKL_INT *desca , const MKL_INT *i ,
const MKL_INT *l , MKL_INT *m , const MKL_Complex16 *h44 , const MKL_Complex16 *h33 ,
const MKL_Complex16 *h43h34 , MKL_Complex16 *buf , const MKL_INT *lwork );
Include Files
• mkl_scalapack.h
Description
The p?laconsbfunction looks for two consecutive small subdiagonal elements by analyzing the effect of
starting a double shift QR iteration given by h44, h33, and h43h34 to see if this process makes a subdiagonal
negligible.
Input Parameters
a (local)
Array of size lld_a*LOCc(n_a). On entry, the Hessenberg matrix whose
tridiagonal part is being scanned. Unchanged on exit.
i (global)
The global location of the bottom of the unreduced submatrix of A.
Unchanged on exit.
l (global)
The global location of the top of the unreduced submatrix of A. Unchanged
on exit.
lwork (local)
This must be at least 7*ceil(ceil( (i-l)/mb_a )/lcm(nprow,
npcol)). Here lcm is the least common multiple and nprow*npcol is the
logical grid size.
1631
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
m (global). On exit, this yields the starting location of the QR double shift.
This will satisfy:
l ≤ m ≤ i-2.
buf (local).
Array of size lwork.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?lacp2
Copies all or part of a distributed matrix to another
distributed matrix.
Syntax
void pslacp2 (char *uplo , MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb );
void pdlacp2 (char *uplo , MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb );
void pclacp2 (char *uplo , MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb );
void pzlacp2 (char *uplo , MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb );
Include Files
• mkl_scalapack.h
Description
The p?lacp2function copies all or part of a distributed matrix A to another distributed matrix B. No
communication is performed, p?lacp2 performs a local copy sub(A):= sub(B), where sub(A) denotes
A(ia:ia+m-1, a:ja+n-1) and sub(B) denotes B(ib:ib+m-1, jb:jb+n-1).
p?lacp2 requires that only dimension of the matrix operands is distributed.
Input Parameters
uplo (global) Specifies the part of the distributed matrix sub(A) to be copied:
= 'U': Upper triangular part is copied; the strictly lower triangular part of
sub(A) is not referenced;
= 'L': Lower triangular part is copied; the strictly upper triangular part of
sub(A) is not referenced.
Otherwise: all of the matrix sub(A) is copied.
m (global)
The number of rows in the distributed matrix sub(A). (m ≥ 0).
1632
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
n (global)
The number of columns in the distributed matrix sub(A). (n ≥ 0).
a (local).
Pointer into the local memory to an array of sizelld_a * LOCc(ja+n-1).
On entry, this array contains the local pieces of the m-by-n distributed
matrix sub(A).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of sub(B), respectively.
descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
Output Parameters
b (local).
Pointer into the local memory to an array of size lld_b * LOCc(jb+n-1).
This array contains on exit the local pieces of the distributed matrix sub( B )
set as follows:
if uplo = 'U', B(ib+i-1, jb+j-1) = A(ia+i-1, ja+j-1), 1≤i≤j, 1≤j≤n;
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?lacp3
Copies from a global parallel array into a local
replicated array or vice versa.
Syntax
void pslacp3 (const MKL_INT *m, const MKL_INT *i, float *a, const MKL_INT *desca, float
*b, const MKL_INT *ldb, const MKL_INT *ii, const MKL_INT *jj, const MKL_INT *rev );
void pdlacp3 (const MKL_INT *m, const MKL_INT *i, double *a, const MKL_INT *desca,
double *b, const MKL_INT *ldb, const MKL_INT *ii, const MKL_INT *jj, const MKL_INT
*rev );
void pclacp3 (const MKL_INT *m, const MKL_INT *i, MKL_Complex8 *a, const MKL_INT
*desca, MKL_Complex8 *b, const MKL_INT *ldb, const MKL_INT *ii, const MKL_INT *jj,
const MKL_INT *rev);
void pzlacp3 (const MKL_INT *m, const MKL_INT *i, MKL_Complex16 *a, const MKL_INT
*desca, MKL_Complex16 *b, const MKL_INT *ldb, const MKL_INT *ii, const MKL_INT *jj,
const MKL_INT *rev);
1633
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl_scalapack.h
Description
This is an auxiliary function that copies from a global parallel array into a local replicated array or vise versa.
Note that the entire submatrix that is copied gets placed on one node or more. The receiving node can be
specified precisely, or all nodes can receive, or just one row or column of nodes.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
m (global)
m is the order of the square submatrix that is copied.
m≥ 0. Unchanged on exit.
i (global) The matrix element A(i, i) is the global location that the copying
starts from. Unchanged on exit.
a (local)
Array of size lld_a*LOCc(n_a). On entry, the parallel matrix to be copied
into or from.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
b (local)
Array of size ldb*LOCc(m). If rev = 0, this is the global portion of the
matrix A(i:i+m-1, i:i+m-1). If rev = 1, this is unchanged on exit.
ldb (local)
ii, jj (global) By using rev 0 and 1, data can be sent out and returned again. If
rev = 0, then ii is destination row index and jj is destination column index
for the node(s) receiving the replicated matrixB. If ii ≥ 0, jj ≥ 0, then node
(ii, jj) receives the data. If ii = -1, jj ≥ 0, then all rows in column jj receive
the data. If ii ≥ 0, jj = -1, then all cols in row ii receive the data. If ii = -1, jj
= -1, then all nodes receive the data. If rev !=0, then ii is the source row
index for the node(s) sending the replicated B.
rev (global) Use rev = 0 to send global matrixA into locally replicated matrixB
(on node (ii, jj)). Use rev != 0 to send locally replicated B from node (ii, jj)
to its owner (which changes depending on its location in A) into the global
A.
1634
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?lacpy
Copies all or part of one two-dimensional array to
another.
Syntax
void pslacpy (char *uplo , MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb );
void pdlacpy (char *uplo , MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb );
void pclacpy (char *uplo , MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb );
void pzlacpy (char *uplo , MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb );
Include Files
• mkl_scalapack.h
Description
The p?lacpyfunction copies all or part of a distributed matrix A to another distributed matrix B. No
communication is performed, p?lacpy performs a local copy sub(B):= sub(A), where sub(A) denotes
A(ia:ia+m-1,ja:ja+n-1) and sub(B) denotes B(ib:ib+m-1,jb:jb+n-1).
Input Parameters
uplo (global) Specifies the part of the distributed matrix sub(A) to be copied:
= 'U': Upper triangular part; the strictly lower triangular part of sub(A) is
not referenced;
= 'L': Lower triangular part; the strictly upper triangular part of sub(A) is
not referenced.
Otherwise: all of the matrix sub(A) is copied.
m (global)
The number of rows in the distributed matrix sub(A). (m≥0).
n (global)
The number of columns in the distributed matrix sub(A). (n≥0).
a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).
1635
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
On entry, this array contains the local pieces of the distributed matrix
sub(A).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of sub(B) respectively.
descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
Output Parameters
b (local).
Pointer into the local memory to an array of size lld_b * LOCc(jb+n-1).
This array contains on exit the local pieces of the distributed matrix sub(B)
set as follows:
if uplo = 'U', B(ib+i-1, jb+j-1) = A(ia+i-1, ja+j-1), 1≤i≤j, 1≤j≤n;
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?laevswp
Moves the eigenvectors from where they are
computed to ScaLAPACK standard block cyclic array.
Syntax
void pslaevswp (MKL_INT *n , float *zin , MKL_INT *ldzi , float *z , MKL_INT *iz ,
MKL_INT *jz , MKL_INT *descz , MKL_INT *nvs , MKL_INT *key , float *work , MKL_INT
*lwork );
void pdlaevswp (MKL_INT *n , double *zin , MKL_INT *ldzi , double *z , MKL_INT *iz ,
MKL_INT *jz , MKL_INT *descz , MKL_INT *nvs , MKL_INT *key , double *work , MKL_INT
*lwork );
void pclaevswp (MKL_INT *n , float *zin , MKL_INT *ldzi , MKL_Complex8 *z , MKL_INT
*iz , MKL_INT *jz , MKL_INT *descz , MKL_INT *nvs , MKL_INT *key , float *rwork ,
MKL_INT *lrwork );
void pzlaevswp (MKL_INT *n , double *zin , MKL_INT *ldzi , MKL_Complex16 *z , MKL_INT
*iz , MKL_INT *jz , MKL_INT *descz , MKL_INT *nvs , MKL_INT *key , double *rwork ,
MKL_INT *lrwork );
Include Files
• mkl_scalapack.h
1636
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
The p?laevswpfunction moves the eigenvectors (potentially unsorted) from where they are computed, to a
ScaLAPACK standard block cyclic array, sorted so that the corresponding eigenvalues are sorted.
Input Parameters
np = the number of rows local to a given process.
nq = the number of columns local to a given process.
n (global)
The order of the matrix A. n ≥ 0.
zin (local).
Array of size ldzi * nvs[iam+1]. The eigenvectors on input. iam is a process
rank from [0, nprocs) interval. Each eigenvector resides entirely in one
process. Each process holds a contiguous set of nvs[iam+1] eigenvectors.
The global number of the first eigenvector that the process holds is: ((sum
for i=[0, iam] of nvs[i])+1).
ldzi (local)
The leading dimension of the zin array.
iz, jz (global) The row and column indices in the global matrix Z indicating the
first row and the first column of the submatrix Z, respectively.
nvs (global)
Array of size nprocs+1
nvs[i] = number of eigenvectors held by processes [0, i)
nvs[0] = number of eigenvectors held by processes [0, 0) = 0
nvs[nprocs]= number of eigenvectors held by [0, nprocs)= total number of
eigenvectors.
key (global)
Array of size n. Indicates the actual index (after sorting) for each of the
eigenvectors.
rwork (local).
Array of size lrwork.
lrwork (local)
Size of work.
Output Parameters
z (local).
Array of global size n* n and of local size lld_z * nq. The eigenvectors on
output. The eigenvectors are distributed in a block cyclic manner in both
dimensions, with a block size of nb.
1637
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?lahrd
Reduces the first nb columns of a general rectangular
matrix A so that elements below the k-th subdiagonal
are zero, by an orthogonal/unitary transformation,
and returns auxiliary matrices that are needed to
apply the transformation to the unreduced part of A.
Syntax
void pslahrd (MKL_INT *n , MKL_INT *k , MKL_INT *nb , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *tau , float *t , float *y , MKL_INT *iy , MKL_INT *jy ,
MKL_INT *descy , float *work );
void pdlahrd (MKL_INT *n , MKL_INT *k , MKL_INT *nb , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *tau , double *t , double *y , MKL_INT *iy , MKL_INT *jy ,
MKL_INT *descy , double *work );
void pclahrd (MKL_INT *n , MKL_INT *k , MKL_INT *nb , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *t , MKL_Complex8 *y ,
MKL_INT *iy , MKL_INT *jy , MKL_INT *descy , MKL_Complex8 *work );
void pzlahrd (MKL_INT *n , MKL_INT *k , MKL_INT *nb , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *t , MKL_Complex16
*y , MKL_INT *iy , MKL_INT *jy , MKL_INT *descy , MKL_Complex16 *work );
Include Files
• mkl_scalapack.h
Description
The p?lahrdfunction reduces the first nb columns of a real general n-by-(n-k+1) distributed matrix A(ia:ia
+n-1 , ja:ja+n-k) so that elements below the k-th subdiagonal are zero. The reduction is performed by
an orthogonal/unitary similarity transformation Q'*A*Q. The function returns the matrices V and T which
determine Q as a block reflector I-V*T*V', and also the matrix Y = A*V*T.
This is an auxiliary function called by p?gehrd. In the following comments sub(A) denotes A(ia:ia+n-1,
ja:ja+n-1).
Input Parameters
n (global)
The order of the distributed matrix sub(A). n ≥ 0.
k (global)
The offset for the reduction. Elements below the k-th subdiagonal in the
first nb columns are reduced to zero.
nb (global)
The number of columns to be reduced.
a (local).
1638
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-k). On
entry, this array contains the local pieces of the n-by-(n-k+1) general
distributed matrix A(ia:ia+n-1, ja:ja+n-k).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
iy, jy (global) The row and column indices in the global matrix Y indicating the
first row and the first column of the matrix sub(Y), respectively.
descy (global and local) array of size dlen_. The array descriptor for the
distributed matrix Y.
work (local).
Array of size nb.
Output Parameters
a (local).
On exit, the elements on and above the k-th subdiagonal in the first nb
columns are overwritten with the corresponding elements of the reduced
distributed matrix; the elements below the k-th subdiagonal, with the array
tau, represent the matrix Q as a product of elementary reflectors. The other
columns of the matrix A(ia:ia+n-1, ja:ja+n-k) are unchanged. (See
Application Notes below.)
tau (local)
Array of size LOCc(ja+n-2). The scalar factors of the elementary reflectors
(see Application Notes below). tau is tied to the distributed matrix A.
t (local)
Array of size nb_a* nb_a. The upper triangular matrix T.
y (local).
Pointer into the local memory to an array of size lld_y* nb_a. On exit, this
array contains the local pieces of the n-by-nb distributed matrix Y. lld_y ≥
LOCr(ia+n-1).
Application Notes
The matrix Q is represented as a product of nb elementary reflectors
Q = H(1)*H(2)*...*H(nb).
Each H(i) has the form
H(i) = i-tau*v*v',
where tau is a real/complex scalar, and v is a real/complex vector with v(1: i+k-1)= 0, v(i+k)= 1; v(i+k
+1:n) is stored on exit in A(ia+i+k:ia+n-1, ja+i-1), and tau in tau[ja+i-2].
1639
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The elements of the vectors v together form the (n-k+1)-by-nb matrix V which is needed, with T and Y, to
apply the transformation to the unreduced part of the matrix, using an update of the form: A(ia:ia+n-1,
ja:ja+n-k) := (I-V*T*V')*(A(ia:ia+n-1, ja:ja+n-k)-Y*V'). The contents of A(ia:ia+n-1, ja:ja+n-k) on exit
are illustrated by the following example with n = 7, k = 3, and nb = 2:
where a denotes an element of the original matrix A(ia:ia+n-1, ja:ja+n-k), h denotes a modified element
of the upper Hessenberg matrix H, and vi denotes an element of the vector defining H(i).
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?laiect
Exploits IEEE arithmetic to accelerate the
computations of eigenvalues.
Syntax
void pslaiect (float *sigma , MKL_INT *n , float *d , MKL_INT *count );
void pdlaiectb (float *sigma , MKL_INT *n , float *d , MKL_INT *count );
void pdlaiectl (float *sigma , MKL_INT *n , float *d , MKL_INT *count );
Include Files
• mkl_scalapack.h
Description
The p?laiectfunction computes the number of negative eigenvalues of (A- σI). This implementation of the
Sturm Sequence loop exploits IEEE arithmetic and has no conditionals in the innermost loop. The signbit for
real function pslaiect is assumed to be bit 32. Double-precision functions pdlaiectb and pdlaiectl differ
in the order of the double precision word storage and, consequently, in the signbit location. For pdlaiectb,
the double precision word is stored in the big-endian word order and the signbit is assumed to be bit 32. For
pdlaiectl, the double precision word is stored in the little-endian word order and the signbit is assumed to
be bit 64.
This is a ScaLAPACK internal function and arguments are not checked for unreasonable values.
Input Parameters
sigma The shift. p?laiect finds the number of eigenvalues less than equal to
sigma.
1640
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
n The order of the tridiagonal matrix T. n≥ 1.
On entry, this array contains the diagonals and the squares of the off-
diagonal elements of the tridiagonal matrix T. These elements are assumed
to be interleaved in memory for better cache performance. The diagonal
entries of T are in the entries d[0], d[2],..., d[2n-2], while the
squares of the off-diagonal entries are d[1], d[3], ..., d[2n-3]. To
avoid overflow, the matrix must be scaled so that its largest entry is no
greater than overflow(1/2) * underflow(1/4) in absolute value, and for
greatest accuracy, it should not be much smaller than that.
Output Parameters
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?lamve
Copies all or part of one two-dimensional distributed
array to another.
Syntax
void pslamve(char* uplo, MKL_INT* m, MKL_INT* n, float* a, MKL_INT* ia, MKL_INT* ja,
MKL_INT* desca, float* b, MKL_INT* ib, MKL_INT* jb, MKL_INT* descb, float* dwork);
void pdlamve(char* uplo, MKL_INT* m, MKL_INT* n, double* a, MKL_INT* ia, MKL_INT* ja,
MKL_INT* desca, double* b, MKL_INT* ib, MKL_INT* jb, MKL_INT* descb, double* dwork);
Include Files
• mkl_scalapack.h
Description
p?lamve copies all or part of a distributed matrix A to another distributed matrix B. There is no alignment
assumptions at all except that A and B are of the same size.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
uplo (global )
Specifies the part of the distributed matrix sub( A ) to be copied:
= 'U': Upper triangular part is copied; the strictly lower triangular part of
sub( A ) is not referenced;
= 'L': Lower triangular part is copied; the strictly upper triangular part of
sub( A ) is not referenced;
1641
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
m (global )
The number of rows to be operated on, which is the number of rows of the
distributed matrix sub( A ). m≥ 0.
n (global )
The number of columns to be operated on, which is the number of columns
of the distributed matrix sub( A ). n≥ 0.
a (local ) pointer into the local memory to an array of size lld_a * LOCc(ja
+n-1) . This array contains the local pieces of the distributed matrix
sub( A ) to be copied from.
ia (global )
The row index in the global matrix A indicating the first row of sub( A ).
ja (global )
The column index in the global matrix A indicating the first column of
sub( A ).
ib (global )
The row index in the global matrix B indicating the first row of sub( B ).
jb (global )
The column index in the global matrix B indicating the first column of
sub( B ).
OUTPUT Parameters
b (local ) pointer into the local memory to an array of size lld_b * LOCc(jb
+n-1) . This array contains on exit the local pieces of the distributed matrix
sub( B ).
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?lange
Returns the value of the 1-norm, Frobenius norm,
infinity-norm, or the largest absolute value of any
element, of a general rectangular matrix.
1642
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
float pslange (char *norm , MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *work );
double pdlange (char *norm , MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *work );
float pclange (char *norm , MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *work );
double pzlange (char *norm , MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *work );
Include Files
• mkl_scalapack.h
Description
The p?langefunction returns the value of the 1-norm, or the Frobenius norm, or the infinity norm, or the
element of largest absolute value of a distributed matrix sub(A) = A(ia:ia+m-1, ja:ja+n-1).
Input Parameters
m (global)
The number of rows in the distributed matrix sub(A). When m = 0,
p?lange is set to zero. m ≥ 0.
n (global)
The number of columns in the distributed matrix sub(A). When n = 0,
p?lange is set to zero. n ≥ 0.
a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1)
containing the local pieces of the distributed matrix sub(A).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
work (local).
Array size lwork.
1643
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
where
iroffa = mod(ia-1, mb_a), icoffa = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, myrow, rsrc_a, nprow),
iacol = indxg2p(ja, nb_a, mycol, csrc_a, npcol),
mp0 = numroc(m+iroffa, mb_a, myrow, iarow, nprow),
nq0 = numroc(n+icoffa, nb_a, mycol, iacol, npcol),
indxg2p and numroc are ScaLAPACK tool functions; myrow, mycol, nprow,
and npcol can be determined by calling the function blacs_gridinfo.
Output Parameters
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?lanhs
Returns the value of the 1-norm, Frobenius norm,
infinity-norm, or the largest absolute value of any
element, of an upper Hessenberg matrix.
Syntax
float pslanhs (char *norm , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *work );
double pdlanhs (char *norm , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *work );
float pclanhs (char *norm , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , float *work );
double pzlanhs (char *norm , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *work );
Include Files
• mkl_scalapack.h
Description
The p?lanhsfunction returns the value of the 1-norm, or the Frobenius norm, or the infinity norm, or the
element of largest absolute value of an upper Hessenberg distributed matrix sub(A) = A(ia:ia+m-1,
ja:ja+n-1).
Input Parameters
1644
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
= 'M' or 'm': val = max(abs(Aij)), largest absolute value of the matrix
A.
= '1' or 'O' or 'o': val = norm1(A), 1-norm of the matrix A
(maximum column sum),
= 'I' or 'i': val = normI(A), infinity norm of the matrix A (maximum
row sum),
= 'F', 'f', 'E' or 'e': val = normF(A), Frobenius norm of the matrix
A (square root of sum of squares).
n (global)
The number of columns in the distributed matrix sub(A). When n =
0, p?lanhs is set to zero. n≥ 0.
a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1)
containing the local pieces of the distributed matrix sub(A).
ia, ja (global)
The row and column indices in the global matrix A indicating the first row
and the first column of the matrix sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
work (local).
Array of size lwork.
lwork ≥ 0 if norm = 'M' or 'm' (not referenced),
where
iroffa = mod( ia-1, mb_a ), icoffa = mod( ja-1, nb_a ),
indxg2p and numroc are ScaLAPACK tool functions; myrow, imycol, nprow,
and npcol can be determined by calling the function blacs_gridinfo.
Output Parameters
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
1645
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
p?lansy, p?lanhe
Returns the value of the 1-norm, Frobenius norm,
infinity-norm, or the largest absolute value of any
element, of a real symmetric or a complex Hermitian
matrix.
Syntax
float pslansy (char *norm , char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *work );
double pdlansy (char *norm , char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *work );
float pclansy (char *norm , char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *work );
double pzlansy (char *norm , char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *work );
float pclanhe (char *norm , char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *work );
double pzlanhe (char *norm , char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *work );
Include Files
• mkl_scalapack.h
Description
The p?lansy and p?lanhefunctions return the value of the 1-norm, or the Frobenius norm, or the infinity
norm, or the element of largest absolute value of a distributed matrix sub(A) = A(ia:ia+m-1, ja:ja
+n-1).
Input Parameters
uplo (global) Specifies whether the upper or lower triangular part of the
symmetric matrix sub(A) is to be referenced.
= 'U': Upper triangular part of sub(A) is referenced,
n (global)
1646
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The number of columns in the distributed matrix sub(A). When n = 0,
p?lansy is set to zero. n ≥ 0.
a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1)
containing the local pieces of the distributed matrix sub(A).
If uplo = 'U', the leading n-by-n upper triangular part of sub(A) contains
the upper triangular matrix whose norm is to be computed, and the strictly
lower triangular part of this matrix is not referenced. If uplo = 'L', the
leading n-by-n lower triangular part of sub(A) contains the lower triangular
matrix whose norm is to be computed, and the strictly upper triangular part
of sub(A) is not referenced.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
work (local).
Array of size lwork.
lwork ≥ 0 if norm = 'M' or 'm' (not referenced),
where lcm is the least common multiple of nprow and npcol, lcm =
ilcm( nprow, npcol ) and iceil(x,y) is a ScaLAPACK function that
returns ceiling (x/y).
ilcm, iceil, indxg2p, and numroc are ScaLAPACK tool functions; myrow,
mycol, nprow, and npcol can be determined by calling the function
blacs_gridinfo.
Output Parameters
1647
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?lantr
Returns the value of the 1-norm, Frobenius norm,
infinity-norm, or the largest absolute value of any
element, of a triangular matrix.
Syntax
float pslantr (char *norm , char *uplo , char *diag , MKL_INT *m , MKL_INT *n , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *work );
double pdlantr (char *norm , char *uplo , char *diag , MKL_INT *m , MKL_INT *n , double
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *work );
float pclantr (char *norm , char *uplo , char *diag , MKL_INT *m , MKL_INT *n ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *work );
double pzlantr (char *norm , char *uplo , char *diag , MKL_INT *m , MKL_INT *n ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *work );
Include Files
• mkl_scalapack.h
Description
The p?lantrfunction returns the value of the 1-norm, or the Frobenius norm, or the infinity norm, or the
element of largest absolute value of a trapezoidal or triangular distributed matrix sub(A) = A(ia:ia+m-1,
ja:ja+n-1).
Input Parameters
uplo (global)
Specifies whether the upper or lower triangular part of the symmetric
matrix sub(A) is to be referenced.
= 'U': Upper trapezoidal,
diag (global)
Specifies whether the distributed matrix sub(A) has unit diagonal.
1648
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
= 'N': Non-unit diagonal.
m (global)
The number of rows in the distributed matrix sub(A). When m = 0,
p?lantr is set to zero. m ≥ 0.
n (global)
The number of columns in the distributed matrix sub(A). When n = 0,
p?lantr is set to zero. n ≥ 0.
a (local).
Pointer into the local memory to an array of sizelld_a * LOCc(ja+n-1)
containing the local pieces of the distributed matrix sub(A).
ia, ja (global)
The row and column indices in the global matrix A indicating the first row
and the first column of the matrix sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
work (local).
Array size lwork.
lwork ≥ 0 if norm = 'M' or 'm' (not referenced),
nq0 if norm = '1', 'O' or 'o',
Output Parameters
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?lapiv
Applies a permutation matrix to a general distributed
matrix, resulting in row or column pivoting.
1649
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
void pslapiv (char *direc , char *rowcol , char *pivroc , MKL_INT *m , MKL_INT *n ,
float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , MKL_INT *ip ,
MKL_INT *jp , MKL_INT *descip , MKL_INT *iwork );
void pdlapiv (char *direc , char *rowcol , char *pivroc , MKL_INT *m , MKL_INT *n ,
double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , MKL_INT *ip ,
MKL_INT *jp , MKL_INT *descip , MKL_INT *iwork );
void pclapiv (char *direc , char *rowcol , char *pivroc , MKL_INT *m , MKL_INT *n ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , MKL_INT
*ip , MKL_INT *jp , MKL_INT *descip , MKL_INT *iwork );
void pzlapiv (char *direc , char *rowcol , char *pivroc , MKL_INT *m , MKL_INT *n ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , MKL_INT
*ip , MKL_INT *jp , MKL_INT *descip , MKL_INT *iwork );
Include Files
• mkl_scalapack.h
Description
The p?lapivfunction applies either P (permutation matrix indicated by ipiv) or inv(P) to a general m-by-n
distributed matrix sub(A) = A(ia:ia+m-1, ja:ja+n-1), resulting in row or column pivoting. The pivot
vector may be distributed across a process row or a column. The pivot vector should be aligned with the
distributed matrix A. This function will transpose the pivot vector, if necessary.
For example, if the row pivots should be applied to the columns of sub(A), pass rowcol='C' and
pivroc='C'.
Input Parameters
direc (global)
Specifies in which order the permutation is applied:
= 'F' (Forward): Applies pivots forward from top of matrix. Computes
P*sub(A).
= 'B' (Backward): Applies pivots backward from bottom of matrix.
Computes inv(P)*sub(A).
rowcol (global)
Specifies if the rows or columns are to be permuted:
= 'R': Rows will be permuted,
pivroc (global)
Specifies whether ipiv is distributed over a process row or column:
= 'R': ipiv is distributed over a process row,
m (global)
1650
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The number of rows in the distributed matrix sub(A). When m = 0,
p?lapiv is set to zero. m ≥ 0.
n (global)
The number of columns in the distributed matrix sub(A). When n = 0,
p?lapiv is set to zero. n ≥ 0.
a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1)
containing the local pieces of the distributed matrix sub(A).
ia, ja (global)
The row and column indices in the global matrix A indicating the first row
and the first column of the matrix sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
ipiv (local)
Array of size lipiv ;
when rowcol='R' or 'r':
ip, jp (global) The row and column indices in the global matrix P indicating the
first row and the first column of the matrix sub(P), respectively.
descip (global and local) array of size dlen_. The array descriptor for the
distributed vector ipiv.
iwork (local).
Array of size ldw, where ldw is equal to the workspace necessary for
transposition, and the storage of the transposed ipiv:
Let lcm be the least common multiple of nprow and npcol.
1651
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
a (local).
On exit, the local pieces of the permuted distributed submatrix.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?lapv2
Applies a permutation to an m-by-n distributed matrix.
Syntax
void pslapv2 (const char* direc, const char* rowcol, const MKL_INT* m, const MKL_INT*
n, float* a, const MKL_INT* ia, const MKL_INT* ja, const MKL_INT* desca, const MKL_INT*
ipiv, const MKL_INT* ip, const MKL_INT* jp, const MKL_INT* descip);
void pdlapv2 (const char* direc, const char* rowcol, const MKL_INT* m, const MKL_INT*
n, double* a, const MKL_INT* ia, const MKL_INT* ja, const MKL_INT* desca, const
MKL_INT* ipiv, const MKL_INT* ip, const MKL_INT* jp, const MKL_INT* descip);
void pclapv2 (const char* direc, const char* rowcol, const MKL_INT* m, const MKL_INT*
n, MKL_Complex8* a, const MKL_INT* ia, const MKL_INT* ja, const MKL_INT* desca, const
MKL_INT* ipiv, const MKL_INT* ip, const MKL_INT* jp, const MKL_INT* descip);
void pzlapv2 (const char* direc, const char* rowcol, const MKL_INT* m, const MKL_INT*
n, MKL_Complex16* a, const MKL_INT* ia, const MKL_INT* ja, const MKL_INT* desca, const
MKL_INT* ipiv, const MKL_INT* ip, const MKL_INT* jp, const MKL_INT* descip);
Include Files
• mkl_scalapack.h
Description
p?lapv2 applies either P (permutation matrix indicated by ipiv) or inv( P ) to an m-by-n distributed matrix
sub( A ) denoting A(ia:ia+m-1,ja:ja+n-1), resulting in row or column pivoting. The pivot vector should be
aligned with the distributed matrix A. For pivoting the rows of sub( A ), ipiv should be distributed along a
process column and replicated over all process rows. Similarly, ipiv should be distributed along a process
row and replicated over all process columns for column pivoting.
Input Parameters
direc (global)
Specifies in which order the permutation is applied:
1652
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
= 'F' (Forward) Applies pivots Forward from top of matrix. Computes P *
sub( A );
= 'B' (Backward) Applies pivots Backward from bottom of matrix. Computes
inv( P ) * sub( A ).
rowcol (global)
Specifies if the rows or columns are to be permuted:
= 'R' Rows will be permuted,
= 'C' Columns will be permuted.
m (global)
The number of rows to be operated on, i.e. the number of rows of the
distributed submatrix sub( A ). m >= 0.
n (global)
The number of columns to be operated on, i.e. the number of columns of
the distributed submatrix sub( A ). n >= 0.
On entry, this local array contains the local pieces of the distributed matrix
sub( A ) to which the row or columns interchanges will be applied.
ia (global)
The row index in the global array a indicating the first row of sub( A ).
ja (global)
The column index in the global array a indicating the first column of
sub( A ).
ip (global)
The global row index of ipiv, which points to the beginning of the
submatrix on which to operate.
jp (global)
The global column index of ipiv, which points to the beginning of the
submatrix on which to operate.
1653
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
p?laqge
Scales a general rectangular matrix, using row and
column scaling factors computed by p?geequ .
Syntax
void pslaqge (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *r , float *c , float *rowcnd , float *colcnd , float *amax , char
*equed );
void pdlaqge (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *r , double *c , double *rowcnd , double *colcnd , double *amax , char
*equed );
void pclaqge (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , float *r , float *c , float *rowcnd , float *colcnd , float *amax ,
char *equed );
void pzlaqge (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *r , double *c , double *rowcnd , double *colcnd , double
*amax , char *equed );
Include Files
• mkl_scalapack.h
Description
The p?laqgefunction equilibrates a general m-by-n distributed matrix sub(A) = A(ia:ia+m-1, ja:ja+n-1)
using the row and scaling factors in the vectors r and c computed by p?geequ.
Input Parameters
m (global)
The number of rows in the distributed matrix sub(A). (m ≥0).
n (global)
The number of columns in the distributed matrix sub(A). (n ≥0).
a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
1654
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
r (local).
Array of size LOCr(m_a). The row scale factors for sub(A). r is aligned with
the distributed matrix A, and replicated across every process column. r is
tied to the distributed matrix A.
c (local).
Array of size LOCc(n_a). The row scale factors for sub(A). c is aligned with
the distributed matrix A, and replicated across every process column. c is
tied to the distributed matrix A.
rowcnd (local).
The global ratio of the smallest r[i] to the largest r[i] , ia-1 ≤ i ≤ ia
+m-2.
colcnd (local).
The global ratio of the smallest c[i] to the largest c[i], ia-1 ≤ i ≤ ia+n-2.
amax (global).
Absolute value of largest distributed submatrix entry.
Output Parameters
a (local).
On exit, the equilibrated distributed matrix. See equed for the form of the
equilibrated distributed submatrix.
equed (global)
Specifies the form of equilibration that was done.
= 'N': No equilibration
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?laqr0
Computes the eigenvalues of a Hessenberg matrix and
optionally returns the matrices from the Schur
decomposition.
Syntax
void pslaqr0(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* n, MKL_INT* ilo, MKL_INT* ihi,
float* h, MKL_INT* desch, float* wr, float* wi, MKL_INT* iloz, MKL_INT* ihiz, float* z,
MKL_INT* descz, float* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork, MKL_INT*
info, MKL_INT* reclevel);
1655
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
void pdlaqr0(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* n, MKL_INT* ilo, MKL_INT* ihi,
double* h, MKL_INT* desch, double* wr, double* wi, MKL_INT* iloz, MKL_INT* ihiz, double*
z, MKL_INT* descz, double* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork,
MKL_INT* info, MKL_INT* reclevel);
Include Files
• mkl_scalapack.h
Description
p?laqr0 computes the eigenvalues of a Hessenberg matrix H and, optionally, the matrices T and Z from the
Schur decomposition H = Z*T*ZT, where T is an upper quasi-triangular matrix (the Schur form), and Z is the
orthogonal matrix of Schur vectors.
Optionally Z may be postmultiplied into an input orthogonal matrix Q so that this function can give the Schur
factorization of a matrix A which has been reduced to the Hessenberg form H by the orthogonal matrix Q: A
= Q * H * QT = (QZ) * T * (QZ)T.
Input Parameters
wantt (global )
Non-zero : the full Schur form T is required;
Zero : only eigenvalues are required.
wantz (global )
Non-zero : the matrix of Schur vectors Z is required;
Zero: Schur vectors are not required.
n (global )
The order of the Hessenberg matrix H (and Z if wantzis non-zero). n≥ 0.
iloz, ihiz Specify the rows of the matrix Z to which transformations must be applied if
wantz is non-zero, 1 ≤iloz≤ilo; ihi≤ihiz≤n.
1656
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If wantzequals zero, z is not referenced.
lwork (local )
The length of the workspace array work.
liwork (local )
The length of the workspace array iwork.
reclevel (local )
Level of recursion. reclevel = 0 must hold on entry.
OUTPUT Parameters
wr, wi The real and imaginary parts, respectively, of the computed eigenvalues
ilo to ihi are stored in the corresponding elements of wr and wi. If two
eigenvalues are computed as a complex conjugate pair, they are stored in
consecutive elements of wr and wi, say the i-th and (i+1)th, with wi[i-1] >
0 and wi[i] < 0. If wantt is non-zero, the eigenvalues are stored in the
same order as on the diagonal of the Schur form returned in h.
info > 0: if info = i, then the function failed to compute all the eigenvalues.
Elements 0:ilo-2 and i:n-1 of wr and wi contain those eigenvalues which
have been successfully computed.
1657
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
> 0: if wantt is non-zero, then (initial value of H)*U = U*(final value of H),
where U is an orthogonal/unitary matrix. The final value of H is upper
Hessenberg and quasi-triangular/triangular in rows and columns info+1
through ihi.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?laqr1
Sets a scalar multiple of the first column of the
product of a 2-by-2 or 3-by-3 matrix and specified
shifts.
Syntax
void pslaqr1(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* n, MKL_INT* ilo, MKL_INT* ihi,
float* a, MKL_INT* desca, float* wr, float* wi, MKL_INT* iloz, MKL_INT* ihiz, float* z,
MKL_INT* descz, float* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* ilwork, MKL_INT*
info);
void pdlaqr1(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* n, MKL_INT* ilo, MKL_INT* ihi,
double* a, MKL_INT* desca, double* wr, double* wi, MKL_INT* iloz, MKL_INT* ihiz, double*
z, MKL_INT* descz, double* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* ilwork,
MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
p?laqr1 is an auxiliary function used to find the Schur decomposition and/or eigenvalues of a matrix already
in Hessenberg form from columns ilo to ihi.
This is a modified version of p?lahqr from ScaLAPACK version 1.7.3. The following modifications were
made:
1658
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
wantt (global )
Non-zero : the full Schur form T is required;
Zero: only eigenvalues are required.
n (global )
The order of the Hessenberg matrix A (and Z if wantzis non-zero). n≥ 0.
lwork (local )
The size of the work array (lwork>=1).
1659
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ilwork (local )
The size of the iwork array (ilwork≥ 3 ).
OUTPUT Parameters
info (global )
< 0: parameter number -info incorrect or inconsistent
= 0: successful exit
> 0: p?laqr1 failed to compute all the eigenvalues ilo to ihi in a total of
30*(ihi-ilo+1) iterations; if info = i, elements i:ihi-1 of wr and wi
contain those eigenvalues which have been successfully computed.
Application Notes
This algorithm is very similar to p?ahqr. Unlike p?lahqr, instead of sending one double shift through the
largest unreduced submatrix, this algorithm sends multiple double shifts and spaces them apart so that there
can be parallelism across several processor row/columns. Another critical difference is that this algorithm
aggregrates multiple transforms together in order to apply them in a block fashion.
Current Notes and/or Restrictions:
• This code requires the distributed block size to be square and at least six (6); unlike simpler codes like
LU, this algorithm is extremely sensitive to block size. Unwise choices of too small a block size can lead to
bad performance.
• This code requires a and z to be distributed identically and have identical contxts.
• This release currently does not have a function for resolving the Schur blocks into regular 2x2 form after
this code is completed. Because of this, a significant performance impact is required while the deflation is
done by sometimes a single column of processors.
1660
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
• This code does not currently block the initial transforms so that none of the rows or columns for any bulge
are completed until all are started. To offset pipeline start-up it is recommended that at least
2*LCM(NPROW,NPCOL) bulges are used (if possible)
• The maximum number of bulges currently supported is fixed at 32. In future versions this will be limited
only by the incoming work array.
• The matrix A must be in upper Hessenberg form. If elements below the subdiagonal are nonzero, the
resulting transforms may be nonsimilar. This is also true with the LAPACK function.
• For this release, it is assumed rsrc_=csrc_=0
• Currently, all the eigenvalues are distributed to all the nodes. Future releases will probably distribute the
eigenvalues by the column partitioning.
• The internals of this function are subject to change.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?laqr2
Performs the orthogonal/unitary similarity
transformation of a Hessenberg matrix to detect and
deflate fully converged eigenvalues from a trailing
principal submatrix (aggressive early deflation).
Syntax
void pslaqr2(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* n, MKL_INT* ktop, MKL_INT* kbot,
MKL_INT* nw, float* a, MKL_INT* desca, MKL_INT* iloz, MKL_INT* ihiz, float* z, MKL_INT*
descz, MKL_INT* ns, MKL_INT* nd, float* sr, float* si, float* t, MKL_INT* ldt, float* v,
MKL_INT* ldv, float* wr, float* wi, float* work, MKL_INT* lwork);
void pdlaqr2(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* n, MKL_INT* ktop, MKL_INT* kbot,
MKL_INT* nw, double* a, MKL_INT* desca, MKL_INT* iloz, MKL_INT* ihiz, double* z,
MKL_INT* descz, MKL_INT* ns, MKL_INT* nd, double* sr, double* si, double* t, MKL_INT*
ldt, double* v, MKL_INT* ldv, double* wr, double* wi, double* work, MKL_INT* lwork);
Include Files
• mkl_scalapack.h
Description
p?laqr2 accepts as input an upper Hessenberg matrix A and performs an orthogonal similarity
transformation designed to detect and deflate fully converged eigenvalues from a trailing principal submatrix.
On output Ais overwritten by a new Hessenberg matrix that is a perturbation of an orthogonal similarity
transformation of A. It is to be hoped that the final version of A has many zero subdiagonal entries.
This function handles small deflation windows which is affordable by one processor. Normally, it is called by
p?laqr1. All the inputs are assumed to be valid without checking.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
wantt (global )
1661
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
wantz (global )
If wantz is non-zero, then the orthogonal matrix Z is updated so that the
orthogonal Schur factor may be computed (in cooperation with the calling
function).
If wantz equals zero, then z is not referenced.
n (global )
The order of the matrix A and (if wantz is non-zero) the order of the
orthogonal matrix Z.
nw (global )
Deflation window size. 1 ≤nw≤ (kbot-ktop+1). Normally nw≥ 3 if p?laqr2 is
called by p?laqr1.
ldt (local )
1662
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The leading dimension of the array t. ldt≥nw.
ldv (local )
The leading dimension of the array v. ldv≥nw.
lwork (local )
work(lwork) is a local array and lwork is assumed big enough so that
lwork≥nw*nw.
OUTPUT Parameters
z
ns (global )
The number of unconverged (that is, approximate) eigenvalues returned in
sr and si that may be used as shifts by the calling function.
nd (global )
The number of converged eigenvalues uncovered by this function.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?laqr3
Performs the orthogonal/unitary similarity
transformation of a Hessenberg matrix to detect and
deflate fully converged eigenvalues from a trailing
principal submatrix (aggressive early deflation).
Syntax
void pslaqr3(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* n, MKL_INT* ktop, MKL_INT* kbot,
MKL_INT* nw, float* h, MKL_INT* desch, MKL_INT* iloz, MKL_INT* ihiz, float* z, MKL_INT*
descz, MKL_INT* ns, MKL_INT* nd, float* sr, float* si, float* v, MKL_INT* descv,
MKL_INT* nh, float* t, MKL_INT* desct, MKL_INT* nv, float* wv, MKL_INT* descw, float*
work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork, MKL_INT* reclevel);
1663
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
void pdlaqr3(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* n, MKL_INT* ktop, MKL_INT* kbot,
MKL_INT* nw, double* h, MKL_INT* desch, MKL_INT* iloz, MKL_INT* ihiz, double* z,
MKL_INT* descz, MKL_INT* ns, MKL_INT* nd, double* sr, double* si, double* v, MKL_INT*
descv, MKL_INT* nh, double* t, MKL_INT* desct, MKL_INT* nv, double* wv, MKL_INT* descw,
double* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork, MKL_INT* reclevel);
Include Files
• mkl_scalapack.h
Description
This function accepts as input an upper Hessenberg matrix H and performs an orthogonal similarity
transformation designed to detect and deflate fully converged eigenvalues from a trailing principal submatrix.
On output H is overwritten by a new Hessenberg matrix that is a perturbation of an orthogonal similarity
transformation of H. It is to be hoped that the final version of H has many zero subdiagonal entries.
Input Parameters
wantt (global )
If wantt is non-zero, then the Hessenberg matrix H is fully updated so that
the quasi-triangular Schur factor may be computed (in cooperation with the
calling function).
If wantt equals zero, then only enough of H is updated to preserve the
eigenvalues.
wantz (global )
If wantz is non-zero, then the orthogonal matrix Z is updated so that the
orthogonal Schur factor may be computed (in cooperation with the calling
function).
If wantz equals zero, then z is not referenced.
n (global )
The order of the matrix H and (if wantz is non-zero), the order of the
orthogonal matrix Z.
ktop (global )
It is assumed that either ktop = 1 or H (ktop,ktop-1)=0. kbot and ktop
together determine an isolated block along the diagonal of the Hessenberg
matrix.
kbot (global )
It is assumed without a check that either kbot = n or H (kbot+1,kbot)=0.
kbot and ktop together determine an isolated block along the diagonal of
the Hessenberg matrix.
nw (global )
Deflation window size. 1 ≤nw≤ (kbot-ktop+1).
1664
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
desch (global and local) array of size dlen_.
The array descriptor for the distributed matrix H.
nv (global )
The number of rows of work array wv available for workspace. nv≥nw.
lwork (local )
The size of the work array work (lwork≥1). lwork = 2*nw suffices, but
greater efficiency may result from larger values of lwork.
liwork (local )
The length of the workspace array iwork (liwork≥1).
1665
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
OUTPUT Parameters
ns (global )
The number of unconverged (that is, approximate) eigenvalues returned in
sr and si that may be used as shifts by the calling function.
nd (global )
The number of converged eigenvalues uncovered by this function.
sr, si (global ) array of size kbot. The real and imaginary parts of approximate
eigenvalues that may be used for shifts are stored in sr[kbot-nd-ns]
through sr[kbot-nd-1] and si[kbot-nd-ns] through si[kbot-nd-1],
respectively. The real and imaginary parts of converged eigenvalues are
stored in sr[kbot-nd] through sr[kbot-1] and si[kbot-nd] through
si[kbot-1], respectively.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?laqr5
Performs a single small-bulge multi-shift QR sweep.
Syntax
void pslaqr5(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* kacc22, MKL_INT* n, MKL_INT*
ktop, MKL_INT* kbot, MKL_INT* nshfts, float* sr, float* si, float* h, MKL_INT* desch,
MKL_INT* iloz, MKL_INT* ihiz, float* z, MKL_INT* descz, float* work, MKL_INT* lwork,
MKL_INT* iwork, MKL_INT* liwork);
void pdlaqr5(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* kacc22, MKL_INT* n, MKL_INT*
ktop, MKL_INT* kbot, MKL_INT* nshfts, double* sr, double* si, double* h, MKL_INT* desch,
MKL_INT* iloz, MKL_INT* ihiz, double* z, MKL_INT* descz, double* work, MKL_INT* lwork,
MKL_INT* iwork, MKL_INT* liwork);
Include Files
• mkl_scalapack.h
1666
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
This auxiliary function called by p?laqr0 performs a single small-bulge multi-shift QR sweep by chasing
separated groups of bulges along the main block diagonal of a Hessenberg matrix H.
Input Parameters
kacc22 (global)
Value 0, 1, or 2. Specifies the computation mode of far-from-diagonal
orthogonal updates.
= 0: p?laqr5 does not accumulate reflections and does not use
matrix-matrix multiply to update far-from-diagonal matrix entries.
= 1: p?laqr5 accumulates reflections and uses matrix-matrix multiply
to update the far-from-diagonal matrix entries.
= 2: p?laqr5 accumulates reflections, uses matrix-matrix multiply to
update the far-from-diagonal matrix entries, and takes advantage of
2-by-2 block structure during matrix multiplies.
n (global) scalar
The order of the Hessenberg matrix H and, if wantzis non-zero, the
order of the orthogonal matrix Z.
sr contains the real parts and si contains the imaginary parts of the
nshfts shifts of origin that define the multi-shift QR sweep.
1667
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
lwork (local)
The size of the work array (lwork≥1).
liwork (local)
The size of the iwork array (liwork≥1).
Output Parameters
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?laqsy
Scales a symmetric/Hermitian matrix, using scaling
factors computed by p?poequ .
Syntax
void pslaqsy (char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *sr , float *sc , float *scond , float *amax , char *equed );
void pdlaqsy (char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *sr , double *sc , double *scond , double *amax , char *equed );
void pclaqsy (char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , float *sr , float *sc , float *scond , float *amax , char *equed );
1668
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void pzlaqsy (char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *sr , double *sc , double *scond , double *amax , char *equed );
Include Files
• mkl_scalapack.h
Description
The p?laqsyfunction equilibrates a symmetric distributed matrix sub(A) = A(ia:ia+n-1, ja:ja+n-1) using
the scaling factors in the vectors sr and sc. The scaling factors are computed by p?poequ.
Input Parameters
uplo (global) Specifies the upper or lower triangular part of the symmetric
distributed matrix sub(A) is to be referenced:
n (global)
The order of the distributed matrix sub(A). n ≥ 0.
a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).
On entry, this array contains the local pieces of the distributed matrix
sub(A). On entry, the local pieces of the distributed symmetric matrix
sub(A).
If uplo = 'U', the leading n-by-n upper triangular part of sub(A) contains
the upper triangular part of the matrix, and the strictly lower triangular part
of sub(A) is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
the lower triangular part of the matrix, and the strictly upper triangular part
of sub(A) is not referenced.
ia, ja (global)
The row and column indices in the global matrix A indicating the first row
and the first column of the matrix sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
sr (local)
Array of size LOCr(m_a). The scale factors for the matrix A(ia:ia+m-1, ja:ja
+n-1). sr is aligned with the distributed matrix A, and replicated across
every process column. sr is tied to the distributed matrix A.
sc (local)
Array of size LOCc(m_a). The scale factors for the matrix A (ia:ia+m-1,
ja:ja+n-1). sc is aligned with the distributed matrix A, and replicated
across every process column. sc is tied to the distributed matrix A.
scond (global).
1669
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
amax (global).
Absolute value of largest distributed submatrix entry.
Output Parameters
a On exit,
if equed = 'Y', the equilibrated matrix:
equed (global).
Specifies whether or not equilibration was done.
= 'N': No equilibration.
= 'Y': Equilibration was done, that is, sub(A) has been replaced by:
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?lared1d
Redistributes an array assuming that the input array,
bycol, is distributed across rows and that all process
columns contain the same copy of bycol.
Syntax
void pslared1d (MKL_INT *n , MKL_INT *ia , MKL_INT *ja , MKL_INT *desc , float *bycol ,
float *byall , float *work , MKL_INT *lwork );
void pdlared1d (MKL_INT *n , MKL_INT *ia , MKL_INT *ja , MKL_INT *desc , double
*bycol , double *byall , double *work , MKL_INT *lwork );
Include Files
• mkl_scalapack.h
Description
The p?lared1dfunction redistributes a 1D array. It assumes that the input array bycol is distributed across
rows and that all process column contain the same copy of bycol. The output array byall is identical on all
processes and contains the entire array.
Input Parameters
np = Number of local rows in bycol()
n (global)
The number of elements to be redistributed. n≥ 0.
1670
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
bycol (local).
Distributed block cyclic array of global size n and of local size np. bycol is
distributed across the process rows. All process columns are assumed to
contain the same value.
work (local).
size lwork. Used to hold the buffers sent from one process to another.
lwork (local)
The size of the work array. lwork ≥ numroc(n, desc[nb_], 0, 0,
npcol).
Output Parameters
byall (global).
Global size n, local size n. byall is exactly duplicated on all processes. It
contains the same values as bycol, but it is replicated across all processes
rather than being distributed.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?lared2d
Redistributes an array assuming that the input array
byrow is distributed across columns and that all
process rows contain the same copy of byrow.
Syntax
void pslared2d (MKL_INT *n , MKL_INT *ia , MKL_INT *ja , MKL_INT *desc , float *byrow ,
float *byall , float *work , MKL_INT *lwork );
void pdlared2d (MKL_INT *n , MKL_INT *ia , MKL_INT *ja , MKL_INT *desc , double
*byrow , double *byall , double *work , MKL_INT *lwork );
Include Files
• mkl_scalapack.h
Description
The p?lared2dfunction redistributes a 1D array. It assumes that the input array byrow is distributed across
columns and that all process rows contain the same copy of byrow. The output array byall will be identical
on all processes and will contain the entire array.
Input Parameters
np = Number of local rows in byrow()
n (global)
The number of elements to be redistributed. n≥ 0.
desc (local) array of size dlen_. A 2D array descriptor, which describes byrow.
1671
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
byrow (local).
Distributed block cyclic array of global size n and of local size np.
byrow is distributed across the process columns. All process rows
are assumed to contain the same value.
work (local).
size lwork. Used to hold the buffers sent from one process to another.
lwork (local) The size of the work array. lwork ≥ numroc(n, desc[nb_], 0, 0,
npcol).
Output Parameters
byall (global).
Global size n, local size n. byall is exactly duplicated on all processes. It
contains the same values as byrow, but it is replicated across all processes
rather than being distributed.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?larf
Applies an elementary reflector to a general
rectangular matrix.
Syntax
void pslarf (char *side , MKL_INT *m , MKL_INT *n , float *v , MKL_INT *iv , MKL_INT
*jv , MKL_INT *descv , MKL_INT *incv , float *tau , float *c , MKL_INT *ic , MKL_INT
*jc , MKL_INT *descc , float *work );
void pdlarf (char *side , MKL_INT *m , MKL_INT *n , double *v , MKL_INT *iv , MKL_INT
*jv , MKL_INT *descv , MKL_INT *incv , double *tau , double *c , MKL_INT *ic , MKL_INT
*jc , MKL_INT *descc , double *work );
void pclarf (char *side , MKL_INT *m , MKL_INT *n , MKL_Complex8 *v , MKL_INT *iv ,
MKL_INT *jv , MKL_INT *descv , MKL_INT *incv , MKL_Complex8 *tau , MKL_Complex8 *c ,
MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work );
void pzlarf (char *side , MKL_INT *m , MKL_INT *n , MKL_Complex16 *v , MKL_INT *iv ,
MKL_INT *jv , MKL_INT *descv , MKL_INT *incv , MKL_Complex16 *tau , MKL_Complex16 *c ,
MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex16 *work );
Include Files
• mkl_scalapack.h
Description
The p?larffunction applies a real/complex elementary reflector Q (or QT) to a real/complex m-by-n
distributed matrix sub(C) = C(ic:ic+m-1, jc:jc+n-1), from either the left or the right. Q is represented
in the form
Q = I-tau*v*v',
where tau is a real/complex scalar and v is a real/complex vector.
If tau = 0, then Q is taken to be the unit matrix.
1672
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
side (global).
= 'L': form Q*sub(C),
m (global)
The number of rows in the distributed submatrix sub(A). (m≥ 0).
n (global)
The number of columns in the distributed submatrix sub(A). (n ≥ 0).
v (local).
Pointer into the local memory to an array of size lld_v * LOCc(n_v),
containing the local pieces of the global distributed matrix V representing
the Householder transformation Q,
V(iv:iv+m-1, jv) if side = 'L' and incv = 1,
iv, jv (global) The row and column indices in the global matrix V indicating the
first row and the first column of the matrix sub(V), respectively.
descv (global and local) array of size dlen_. The array descriptor for the
distributed matrix V.
incv (global)
The global increment for the elements of V. Only two values of incv are
supported in this version, namely 1 and m_v.
incv must not be zero.
tau (local).
Array of size LOCc(jv) if incv = 1, and LOCr(iv) otherwise. This array
contains the Householder scalars related to the Householder vectors.
tau is tied to the distributed matrix V.
c (local).
Pointer into the local memory to an array of size lld_c * LOCc(jc+n-1),
containing the local pieces of sub(C).
ic, jc (global)
The row and column indices in the global matrix C indicating the first row
and the first column of the matrix sub(C), respectively.
descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.
work (local).
1673
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
if ivcol = iccol,
lwork≥nqc0
else
lwork≥mpc0 + max( 1, nqc0 )
end if
else if side = 'R' ,
icoffc,nb_v,0,0,npcol),nb_v,0,0,lcmq ) )
end if
else if incv = m_v,
if side = 'L',
if ivrow = icrow,
lwork≥mpc0
else
lwork≥nqc0 + max( 1, mpc0 )
end if
end if
end if,
where lcm is the least common multiple of nprow and npcol and lcm =
ilcm( nprow, npcol ), lcmp = lcm/nprow, lcmq = lcm/npcol,
iroffc = mod( ic-1, mb_c ), icoffc = mod( jc-1, nb_c ),
icrow = indxg2p( ic, mb_c, myrow, rsrc_c, nprow ),
iccol = indxg2p( jc, nb_c, mycol, csrc_c, npcol ),
mpc0 = numroc( m+iroffc, mb_c, myrow, icrow, nprow ),
nqc0 = numroc( n+icoffc, nb_c, mycol, iccol, npcol ),
ilcm, indxg2p, and numroc are ScaLAPACK tool functions; myrow, mycol,
nprow, and npcol can be determined by calling the function
blacs_gridinfo.
Output Parameters
c (local).
On exit, sub(C) is overwritten by the Q*sub(C) if side = 'L',
1674
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
or sub(C) * Q if side = 'R'.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?larfb
Applies a block reflector or its transpose/conjugate-
transpose to a general rectangular matrix.
Syntax
void pslarfb (char *side , char *trans , char *direct , char *storev , MKL_INT *m ,
MKL_INT *n , MKL_INT *k , float *v , MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , float
*t , float *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , float *work );
void pdlarfb (char *side , char *trans , char *direct , char *storev , MKL_INT *m ,
MKL_INT *n , MKL_INT *k , double *v , MKL_INT *iv , MKL_INT *jv , MKL_INT *descv ,
double *t , double *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , double *work );
void pclarfb (char *side , char *trans , char *direct , char *storev , MKL_INT *m ,
MKL_INT *n , MKL_INT *k , MKL_Complex8 *v , MKL_INT *iv , MKL_INT *jv , MKL_INT *descv ,
MKL_Complex8 *t , MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc ,
MKL_Complex8 *work );
void pzlarfb (char *side , char *trans , char *direct , char *storev , MKL_INT *m ,
MKL_INT *n , MKL_INT *k , MKL_Complex16 *v , MKL_INT *iv , MKL_INT *jv , MKL_INT
*descv , MKL_Complex16 *t , MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT
*descc , MKL_Complex16 *work );
Include Files
• mkl_scalapack.h
Description
The p?larfbfunction applies a real/complex block reflector Q or its transpose QT/conjugate transpose QH to a
real/complex distributed m-by-n matrix sub(C) = C(ic:ic+m-1, jc:jc+n-1) from the left or the right.
Input Parameters
side (global)
if side = 'L': apply Q or QT for real flavors (QH for complex flavors) from
the Left;
if side = 'R': apply Q or QTfor real flavors (QH for complex flavors) from
the Right.
trans (global)
if trans = 'N': no transpose, apply Q;
1675
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
storev (global)
Indicates how the vectors that define the elementary reflectors are stored:
if storev = 'C': Columnwise
m (global)
The number of rows in the distributed matrix sub(C). (m ≥ 0).
n (global)
The number of columns in the distributed matrix sub(C). (n ≥ 0).
k (global)
The order of the matrix T.
v (local).
Pointer into the local memory to an array of size
iv, jv (global)
The row and column indices in the global matrix V indicating the first row
and the first column of the matrix sub(V), respectively.
descv (global and local) array of size dlen_. The array descriptor for the
distributed matrix V.
c (local).
Pointer into the local memory to an array of size lld_c * LOCc(jc+n-1),
containing the local pieces of sub(C).
ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the matrix sub(C), respectively.
descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.
work (local).
Workspace array of size lwork.
If storev = 'C',
if side = 'L',
1676
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
else if side = 'R',
if side = 'L' ,
ilcm, indxg2p, and numroc are ScaLAPACK tool functions; myrow, mycol,
nprow, and npcol can be determined by calling the function
blacs_gridinfo.
Output Parameters
t (local).
Array of size mb_v * mb_vif storev = 'R', and nb_v * nb_vif storev =
'C'. The triangular matrix t is the representation of the block reflector.
c (local).
On exit, sub(C) is overwritten by the Q*sub(C), or Q'*sub(C), or
sub(C)*Q, or sub(C)*Q'. Q' is transpose (conjugate transpose) of Q.
1677
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?larfc
Applies the conjugate transpose of an elementary
reflector to a general matrix.
Syntax
void pclarfc (char *side , MKL_INT *m , MKL_INT *n , MKL_Complex8 *v , MKL_INT *iv ,
MKL_INT *jv , MKL_INT *descv , MKL_INT *incv , MKL_Complex8 *tau , MKL_Complex8 *c ,
MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work );
void pzlarfc (char *side , MKL_INT *m , MKL_INT *n , MKL_Complex16 *v , MKL_INT *iv ,
MKL_INT *jv , MKL_INT *descv , MKL_INT *incv , MKL_Complex16 *tau , MKL_Complex16 *c ,
MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex16 *work );
Include Files
• mkl_scalapack.h
Description
The p?larfcfunction applies a complex elementary reflector QH to a complex m-by-n distributed matrix
sub(C) = C(ic:ic+m-1, jc:jc+n-1), from either the left or the right. Q is represented in the form
Q = i-tau*v*v',
where tau is a complex scalar and v is a complex vector.
If tau = 0, then Q is taken to be the unit matrix.
Input Parameters
side (global)
if side = 'L': form QH*sub(C) ;
m (global)
The number of rows in the distributed matrix sub(C). (m ≥ 0).
n (global)
The number of columns in the distributed matrix sub(C). (n ≥ 0).
v (local).
Pointer into the local memory to an array of size lld_v * LOCc(n_v),
containing the local pieces of the global distributed matrix V representing
the Householder transformation Q,
V(iv:iv+m-1, jv) if side = 'L' and incv = 1,
1678
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
iv, jv (global)
The row and column indices in the global matrix V indicating the first row
and the first column of the matrix sub(V), respectively.
descv (global and local) array of size dlen_. The array descriptor for the
distributed matrix V.
incv (global)
The global increment for the elements of v. Only two values of incv are
supported in this version, namely 1 and m_v.
incv must not be zero.
tau (local)
Array of size LOCc(jv) if incv = 1, and LOCr(iv) otherwise. This array
contains the Householder scalars related to the Householder vectors.
tau is tied to the distributed matrix V.
c (local).
Pointer into the local memory to an array of size lld_c * LOCc(jc+n-1),
containing the local pieces of sub(C).
ic, jc (global)
The row and column indices in the global matrix C indicating the first row
and the first column of the matrix sub(C), respectively.
descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.
work (local).
Workspace array of size lwork.
If incv = 1,
if side = 'L' ,
if ivcol = iccol,
lwork ≥ nqc0
else
lwork ≥ mpc0 + max( 1, nqc0 )
end if
else if side = 'R',
n+icoffc,nb_v,0,0,npcol ), nb_v,0,0,lcmq ) )
end if
else if incv = m_v,
if side = 'L',
m+iroffc,mb_v,0,0,nprow ),mb_v,0,0,lcmp ) )
1679
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
if ivrow = icrow,
lwork ≥ mpc0
else
lwork ≥ nqc0 + max( 1, mpc0 )
end if
end if
end if,
where lcm is the least common multiple of nprow and npcol and lcm =
ilcm(nprow, npcol),
lcmp = lcm/nprow, lcmq = lcm/npcol,
iroffc = mod(ic-1, mb_c), icoffc = mod(jc-1, nb_c),
icrow = indxg2p(ic, mb_c, myrow, rsrc_c, nprow),
iccol = indxg2p(jc, nb_c, mycol, csrc_c, npcol),
mpc0 = numroc(m+iroffc, mb_c, myrow, icrow, nprow),
nqc0 = numroc(n+icoffc, nb_c, mycol, iccol, npcol),
ilcm, indxg2p, and numroc are ScaLAPACK tool functions;myrow, mycol,
nprow, and npcol can be determined by calling the function
blacs_gridinfo.
Output Parameters
c (local).
On exit, sub(C) is overwritten by the QH*sub(C) if side = 'L', or sub(C)
* QH if side = 'R'.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?larfg
Generates an elementary reflector (Householder
matrix).
Syntax
void pslarfg (MKL_INT *n , float *alpha , MKL_INT *iax , MKL_INT *jax , float *x ,
MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , MKL_INT *incx , float *tau );
void pdlarfg (MKL_INT *n , double *alpha , MKL_INT *iax , MKL_INT *jax , double *x ,
MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , MKL_INT *incx , double *tau );
void pclarfg (MKL_INT *n , MKL_Complex8 *alpha , MKL_INT *iax , MKL_INT *jax ,
MKL_Complex8 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , MKL_INT *incx ,
MKL_Complex8 *tau );
void pzlarfg (MKL_INT *n , MKL_Complex16 *alpha , MKL_INT *iax , MKL_INT *jax ,
MKL_Complex16 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , MKL_INT *incx ,
MKL_Complex16 *tau );
1680
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl_scalapack.h
Description
The p?larfgfunction generates a real/complex elementary reflector H of order n, such that
where alpha is a scalar (a real scalar - for complex flavors), and sub(X) is an (n-1)-element real/complex
distributed vector X(ix:ix+n-2, jx) if incx = 1 and X(ix, jx:jx+n-2) if incx = m_x. H is represented
in the form
where tau is a real/complex scalar and v is a real/complex (n-1)-element vector. Note that H is not
Hermitian.
If the elements of sub(X) are all zero (and X(iax, jax) is real for complex flavors), then tau = 0 and H is
taken to be the unit matrix.
Otherwise 1 ≤ real(tau) ≤ 2 and abs(tau-1) ≤ 1.
Input Parameters
n (global)
The global order of the elementary reflector. n ≥ 0.
x (local).
Pointer into the local memory to an array of size lld_x * LOCc(n_x). This
array contains the local pieces of the distributed vector sub(X). Before
entry, the incremented array sub(X) must contain vector x.
ix, jx (global)
The row and column indices in the global matrix X indicating the first row
and the first column of sub(X), respectively.
1681
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
incx (global)
The global increment for the elements of x. Only two values of incx are
supported in this version, namely 1 and m_x. incx must not be zero.
Output Parameters
alpha (local)
On exit, alpha is computed in the process scope having the vector sub(X).
x (local).
On exit, it is overwritten with the vector v.
tau (local).
Array of size LOCc(jx) if incx = 1, and LOCr(ix) otherwise. This array
contains the Householder scalars related to the Householder vectors.
tau is tied to the distributed matrix X.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?larft
Forms the triangular vector T of a block reflector H=I-
V*T*VH.
Syntax
void pslarft (char *direct , char *storev , MKL_INT *n , MKL_INT *k , float *v , MKL_INT
*iv , MKL_INT *jv , MKL_INT *descv , float *tau , float *t , float *work );
void pdlarft (char *direct , char *storev , MKL_INT *n , MKL_INT *k , double *v ,
MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , double *tau , double *t , double *work );
void pclarft (char *direct , char *storev , MKL_INT *n , MKL_INT *k , MKL_Complex8 *v ,
MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , MKL_Complex8 *tau , MKL_Complex8 *t ,
MKL_Complex8 *work );
void pzlarft (char *direct , char *storev , MKL_INT *n , MKL_INT *k , MKL_Complex16
*v , MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , MKL_Complex16 *tau , MKL_Complex16
*t , MKL_Complex16 *work );
Include Files
• mkl_scalapack.h
Description
The p?larftfunction forms the triangular factor T of a real/complex block reflector H of order n, which is
defined as a product of k elementary reflectors.
If direct = 'F', H = H(1)*H(2)...*H(k), and T is upper triangular;
If storev = 'C', the vector which defines the elementary reflector H(i) is stored in the i-th column of the
distributed matrix V, and
H = I-V*T*V'
1682
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If storev = 'R', the vector which defines the elementary reflector H(i) is stored in the i-th row of the
distributed matrix V, and
H = I-V'*T*V.
Input Parameters
direct (global)
Specifies the order in which the elementary reflectors are multiplied to form
the block reflector:
if direct = 'F': H = H(1)*H(2)*...*H(k) (forward)
storev (global)
Specifies how the vectors that define the elementary reflectors are stored
(See Application Notes below):
if storev = 'C': columnwise;
n (global)
The order of the block reflector H. n ≥ 0.
k (global)
The order of the triangular factor T, is equal to the number of elementary
reflectors.
1 ≤ k ≤ mb_v (= nb_v).
iv, jv (global)
The row and column indices in the global matrix V indicating the first row
and the first column of the matrix sub(V), respectively.
descv (local) array of size dlen_. The array descriptor for the distributed matrix V.
tau (local)
Array of size LOCr(iv+k-1) if incv = m_v, and LOCc(jv+k-1) otherwise.
This array contains the Householder scalars related to the Householder
vectors.
tau is tied to the distributed matrix V.
work (local).
Workspace array of size k*(k -1)/2.
1683
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
v
t (local)
Array of size nb_v * nb_v if storev = 'C', and mb_v * mb_v otherwise. It
contains the k-by-k triangular factor of the block reflector associated with v.
If direct = 'F', t is upper triangular;
Application Notes
The shape of the matrix V and the storage of the vectors that define the H(i) is best illustrated by the
following example with n = 5 and k = 3. The elements equal to 1 are not stored; the corresponding array
elements are modified but restored on exit. The rest of the array is not used.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?larz
Applies an elementary reflector as returned by
p?tzrzf to a general matrix.
Syntax
void pslarz (char *side , MKL_INT *m , MKL_INT *n , MKL_INT *l , float *v , MKL_INT
*iv , MKL_INT *jv , MKL_INT *descv , MKL_INT *incv , float *tau , float *c , MKL_INT
*ic , MKL_INT *jc , MKL_INT *descc , float *work );
void pdlarz (char *side , MKL_INT *m , MKL_INT *n , MKL_INT *l , double *v , MKL_INT
*iv , MKL_INT *jv , MKL_INT *descv , MKL_INT *incv , double *tau , double *c , MKL_INT
*ic , MKL_INT *jc , MKL_INT *descc , double *work );
1684
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void pclarz (char *side , MKL_INT *m , MKL_INT *n , MKL_INT *l , MKL_Complex8 *v ,
MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , MKL_INT *incv , MKL_Complex8 *tau ,
MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work );
void pzlarz (char *side , MKL_INT *m , MKL_INT *n , MKL_INT *l , MKL_Complex16 *v ,
MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , MKL_INT *incv , MKL_Complex16 *tau ,
MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex16 *work );
Include Files
• mkl_scalapack.h
Description
The p?larzfunction applies a real/complex elementary reflector Q (or QT) to a real/complex m-by-n
distributed matrix sub(C) = C(ic:ic+m-1, jc:jc+n-1), from either the left or the right. Q is represented
in the form
Q = I-tau*v*v',
where tau is a real/complex scalar and v is a real/complex vector.
If tau = 0, then Q is taken to be the unit matrix.
Input Parameters
side (global)
if side = 'L': form Q*sub(C),
m (global)
The number of rows in the distributed matrix sub(C). (m ≥ 0).
n (global)
The number of columns in the distributed matrix sub(C). (n ≥ 0).
l (global)
The columns of the distributed matrix sub(A) containing the meaningful
part of the Householder reflectors. If side = 'L', m ≥ l ≥ 0,
if side = 'R', n ≥ l ≥ 0.
v (local).
Pointer into the local memory to an array of size lld_v * LOCc(n_v)
containing the local pieces of the global distributed matrix V representing
the Householder transformation Q,
V(iv:iv+l-1, jv) if side = 'L' and incv = 1,
1685
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
iv, jv (global) The row and column indices in the global distributed matrix V
indicating the first row and the first column of the matrix sub(V),
respectively.
descv (global and local) array of size dlen_. The array descriptor for the
distributed matrix V.
incv (global)
The global increment for the elements of V. Only two values of incv are
supported in this version, namely 1 and m_v.
incv must not be zero.
tau (local)
Array of size LOCc(jv) if incv = 1, and LOCr(iv) otherwise. This array
contains the Householder scalars related to the Householder vectors.
tau is tied to the distributed matrix V.
c (local).
Pointer into the local memory to an array of size lld_c * LOCc(jc+n-1),
containing the local pieces of sub(C).
ic, jc (global)
The row and column indices in the global matrix C indicating the first row
and the first column of the matrix sub(C), respectively.
descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.
work (local).
Array of size lwork
If incv = 1,
if side = 'L' ,
if ivcol = iccol,
lwork ≥ NqC0
else
lwork ≥ MpC0 + max(1, NqC0)
end if
else if side = 'R' ,
1686
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
if ivrow = icrow,
lwork ≥ MpC0
else
lwork ≥ NqC0 + max(1, MpC0)
end if
end if
end if.
Here lcm is the least common multiple of nprow and npcol and
lcm = ilcm( nprow, npcol ), lcmp = lcm / nprow,
ilcm, indxg2p, and numroc are ScaLAPACK tool functions; myrow, mycol,
nprow, and npcol can be determined by calling the function
blacs_gridinfo.
Output Parameters
c (local).
On exit, sub(C) is overwritten by the Q*sub(C) if side = 'L', or
sub(C)*Q if side = 'R'.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?larzb
Applies a block reflector or its transpose/conjugate-
transpose as returned by p?tzrzf to a general
matrix.
Syntax
void pslarzb (char *side , char *trans , char *direct , char *storev , MKL_INT *m ,
MKL_INT *n , MKL_INT *k , MKL_INT *l , float *v , MKL_INT *iv , MKL_INT *jv , MKL_INT
*descv , float *t , float *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , float
*work );
void pdlarzb (char *side , char *trans , char *direct , char *storev , MKL_INT *m ,
MKL_INT *n , MKL_INT *k , MKL_INT *l , double *v , MKL_INT *iv , MKL_INT *jv , MKL_INT
*descv , double *t , double *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , double
*work );
1687
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
void pclarzb (char *side , char *trans , char *direct , char *storev , MKL_INT *m ,
MKL_INT *n , MKL_INT *k , MKL_INT *l , MKL_Complex8 *v , MKL_INT *iv , MKL_INT *jv ,
MKL_INT *descv , MKL_Complex8 *t , MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc ,
MKL_INT *descc , MKL_Complex8 *work );
void pzlarzb (char *side , char *trans , char *direct , char *storev , MKL_INT *m ,
MKL_INT *n , MKL_INT *k , MKL_INT *l , MKL_Complex16 *v , MKL_INT *iv , MKL_INT *jv ,
MKL_INT *descv , MKL_Complex16 *t , MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc ,
MKL_INT *descc , MKL_Complex16 *work );
Include Files
• mkl_scalapack.h
Description
The p?larzbfunction applies a real/complex block reflector Q or its transpose QT (conjugate transpose QH for
complex flavors) to a real/complex distributed m-by-n matrix sub(C) = C(ic:ic+m-1, jc:jc+n-1) from the
left or the right.
Q is a product of k elementary reflectors as returned by p?tzrzf.
Input Parameters
side (global)
if side = 'L': apply Q or QT (QH for complex flavors) from the Left;
if side = 'R': apply Q or QT (QH for complex flavors) from the Right.
trans (global)
if trans = 'N': No transpose, apply Q;
direct (global)
Indicates how H is formed from a product of elementary reflectors.
if direct = 'F': H = H(1)*H(2)*...*H(k) - forward (not supported) ;
storev (global)
Indicates how the vectors that define the elementary reflectors are stored:
if storev = 'C': columnwise (not supported ).
m (global)
The number of rows in the distributed submatrix sub(C). (m ≥ 0).
n (global)
The number of columns in the distributed submatrix sub(C). (n ≥ 0).
k (global)
1688
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The order of the matrix T. (= the number of elementary reflectors whose
product defines the block reflector).
l (global)
The columns of the distributed submatrix sub(A) containing the meaningful
part of the Householder reflectors.
If side = 'L', m ≥ l ≥ 0,
if side = 'R', n ≥ l ≥ 0.
v (local).
Pointer into the local memory to an array of size lld_v * LOCc(jv+m-1) if
side = 'L', lld_v * LOCc(jv+m-1) if side = 'R'.
It contains the local pieces of the distributed vectors V representing the
Householder transformation as returned by p?tzrzf.
lld_v ≥ LOCr(iv+k-1).
iv, jv (global)
The row and column indices in the global matrix V indicating the first row
and the first column of the submatrix sub(V), respectively.
descv (global and local) array of size dlen_. The array descriptor for the
distributed matrix V.
t (local)
Array of size mb_v* mb_v.
The lower triangular matrix T in the representation of the block reflector.
c (local).
Pointer into the local memory to an array of size lld_c * LOCc(jc+n-1).
ic, jc (global)
The row and column indices in the global matrix C indicating the first row
and the first column of the submatrix sub(C), respectively.
descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.
work (local).
1689
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If storev = 'C' ,
if side = 'L' ,
Output Parameters
c (local).
1690
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
On exit, sub(C) is overwritten by the Q*sub(C), or Q'*sub(C), or
sub(C)*Q, or sub(C)*Q', where Q' is the transpose (conjugate transpose)
of Q.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?larzc
Applies (multiplies by) the conjugate transpose of an
elementary reflector as returned by p?tzrzf to a
general matrix.
Syntax
void pclarzc (char *side , MKL_INT *m , MKL_INT *n , MKL_INT *l , MKL_Complex8 *v ,
MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , MKL_INT *incv , MKL_Complex8 *tau ,
MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work );
void pzlarzc (char *side , MKL_INT *m , MKL_INT *n , MKL_INT *l , MKL_Complex16 *v ,
MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , MKL_INT *incv , MKL_Complex16 *tau ,
MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex16 *work );
Include Files
• mkl_scalapack.h
Description
The p?larzcfunction applies a complex elementary reflector QH to a complex m-by-n distributed matrix
sub(C) = C(ic:ic+m-1, jc:jc+n-1), from either the left or the right. Q is represented in the form
Q = i-tau*v*v',
where tau is a complex scalar and v is a complex vector.
If tau = 0, then Q is taken to be the unit matrix.
Input Parameters
side (global)
if side = 'L': form QH*sub(C);
m (global)
The number of rows in the distributed matrix sub(C). (m ≥ 0).
n (global)
The number of columns in the distributed matrix sub(C). (n ≥ 0).
l (global)
The columns of the distributed matrix sub(A) containing the meaningful
part of the Householder reflectors.
If side = 'L', m ≥ l ≥ 0,
1691
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
if side = 'R', n ≥ l ≥ 0.
v (local).
iv, jv (global)
The row and column indices in the global matrix V indicating the first row
and the first column of the matrix sub(V), respectively.
descv (global and local) array of size dlen_. The array descriptor for the
distributed matrix V.
incv (global)
The global increment for the elements of V. Only two values of incv are
supported in this version, namely 1 and m_v.
incv must not be zero.
tau (local)
Array of size LOCc(jv) if incv = 1, and LOCr(iv) otherwise. This array
contains the Householder scalars related to the Householder vectors.
tau is tied to the distributed matrix V.
c (local).
Pointer into the local memory to an array of size lld_c * LOCc(jc+n-1),
containing the local pieces of sub(C).
ic, jc (global)
The row and column indices in the global matrix C indicating the first row
and the first column of the matrix sub(C), respectively.
descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.
work (local).
If incv = 1,
if side = 'L' ,
if ivcol = iccol,
lwork ≥ nqc0
else
lwork ≥ mpc0 + max(1, nqc0)
end if
else if side = 'R' ,
lwork ≥ nqc0 + max(max(1, mpc0), numroc(numroc(n+icoffc, nb_v,
1692
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
0, 0, npcol),
nb_v, 0, 0, lcmq)) end if
else if incv = m_v,
if side = 'L' ,
lwork ≥ mpc0 + max(max(1, nqc0), numroc(numroc(m+iroffc, mb_v,
0, 0, nprow),
mb_v, 0, 0, lcmp))
else if side = 'R',
if ivrow = icrow,
lwork ≥ mpc0
else
lwork ≥ nqc0 + max(1, mpc0)
end if
end if
end if
Here lcm is the least common multiple of nprow and npcol;
lcm = ilcm(nprow, npcol), lcmp = lcm/nprow, lcmq= lcm/npcol,
iroffc = mod(ic-1, mb_c), icoffc= mod(jc-1, nb_c),
icrow = indxg2p(ic, mb_c, myrow, rsrc_c, nprow),
iccol = indxg2p(jc, nb_c, mycol, csrc_c, npcol),
mpc0 = numroc(m+iroffc, mb_c, myrow, icrow, nprow),
nqc0 = numroc(n+icoffc, nb_c, mycol, iccol, npcol),
ilcm, indxg2p, and numroc are ScaLAPACK tool functions;
myrow, mycol, nprow, and npcol can be determined by calling the function
blacs_gridinfo.
Output Parameters
c (local).
On exit, sub(C) is overwritten by the QH*sub(C) if side = 'L', or
sub(C)*QH if side = 'R'.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?larzt
Forms the triangular factor T of a block reflector H=I-
V*T*VH as returned by p?tzrzf.
Syntax
void pslarzt (char *direct , char *storev , MKL_INT *n , MKL_INT *k , float *v , MKL_INT
*iv , MKL_INT *jv , MKL_INT *descv , float *tau , float *t , float *work );
void pdlarzt (char *direct , char *storev , MKL_INT *n , MKL_INT *k , double *v ,
MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , double *tau , double *t , double *work );
void pclarzt (char *direct , char *storev , MKL_INT *n , MKL_INT *k , MKL_Complex8 *v ,
MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , MKL_Complex8 *tau , MKL_Complex8 *t ,
MKL_Complex8 *work );
1693
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl_scalapack.h
Description
The p?larztfunction forms the triangular factor T of a real/complex block reflector H of order greater than
n, which is defined as a product of k elementary reflectors as returned by p?tzrzf.
If storev = 'C', the vector which defines the elementary reflector H(i), is stored in the i-th column of the
array v, and
H = i-v*t*v'.
If storev = 'R', the vector, which defines the elementary reflector H(i), is stored in the i-th row of the
array v, and
H = i-v'*t*v
Currently, only storev = 'R' and direct = 'B' are supported.
Input Parameters
direct (global)
Specifies the order in which the elementary reflectors are multiplied to form
the block reflector:
if direct = 'F': H = H(1)*H(2)*...*H(k) (Forward, not supported)
storev (global)
Specifies how the vectors which defines the elementary reflectors are
stored:
if storev = 'C': columnwise (not supported);
n (global)
The order of the block reflector H. n ≥ 0.
k (global)
The order of the triangular factor T (= the number of elementary
reflectors).
1≤k≤mb_v(= nb_v).
1694
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
iv, jv (global) The row and column indices in the global matrix V indicating the
first row and the first column of the matrix sub(V), respectively.
descv (local) array of size dlen_. The array descriptor for the distributed matrix V.
tau (local)
Array of size LOCr(iv+k-1) if incv = m_v, and LOCc(jv+k-1) otherwise.
This array contains the Householder scalars related to the Householder
vectors.
tau is tied to the distributed matrix V.
work (local).
Workspace array of size(k*(k-1)/2).
Output Parameters
v
t (local)
Array of size mb_v* mb_v. It contains the k-by-k triangular factor of the
block reflector associated with v. t is lower triangular.
Application Notes
The shape of the matrix V and the storage of the vectors which define the H(i) is best illustrated by the
following example with n = 5 and k = 3. The elements equal to 1 are not stored; the corresponding array
elements are modified but restored on exit. The rest of the array is not used.
1695
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?lascl
Multiplies a general rectangular matrix by a real scalar
defined as Cto/Cfrom.
1696
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void pslascl (char *type , float *cfrom , float *cto , MKL_INT *m , MKL_INT *n , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *info );
void pdlascl (char *type , double *cfrom , double *cto , MKL_INT *m , MKL_INT *n ,
double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *info );
void pclascl (char *type , float *cfrom , float *cto , MKL_INT *m , MKL_INT *n ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *info );
void pzlascl (char *type , double *cfrom , double *cto , MKL_INT *m , MKL_INT *n ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?lasclfunction multiplies the m-by-n real/complex distributed matrix sub(A) denoting A(ia:ia+m-1,
ja:ja+n-1) by the real/complex scalar cto/cfrom. This is done without over/underflow as long as the final
result cto*A(i,j)/cfrom does not over/underflow. type specifies that sub(A) may be full, upper triangular,
lower triangular or upper Hessenberg.
Input Parameters
type (global)
type indicates the storage type of the input distributed matrix.
if type = 'G': sub(A) is a full matrix,
m (global)
The number of rows in the distributed matrix sub(A). (m≥0).
n (global)
The number of columns in the distributed matrix sub(A). (n≥0).
This array contains the local pieces of the distributed matrix sub(A).
ia, ja (global)
The column and row indices in the global matrix A indicating the first row
and column of the matrix sub(A), respectively.
1697
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
a (local).
On exit, this array contains the local pieces of the distributed matrix
multiplied by cto/cfrom.
info (local)
if info = 0: the execution is successful.
if info < 0: If the i-th argument is an array and the j-th entry, indexed
j-1, had an illegal value, then info = -(i*100+j),
if the i-th argument is a scalar and had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?lase2
Initializes an m-by-n distributed matrix.
Syntax
void pslase2 (const char* uplo, const MKL_INT* m, const MKL_INT* n, const float* alpha,
const float* beta, float* a, const MKL_INT* ia, const MKL_INT* ja, const MKL_INT*
desca);
void pdlase2 (const char* uplo, const MKL_INT* m, const MKL_INT* n, const double*
alpha, const double* beta, double* a, const MKL_INT* ia, const MKL_INT* ja, const
MKL_INT* desca);
void pclase2 (const char* uplo, const MKL_INT* m, const MKL_INT* n, const MKL_Complex8*
alpha, const MKL_Complex8* beta, MKL_Complex8* a, const MKL_INT* ia, const MKL_INT* ja,
const MKL_INT* desca);
void pzlase2 (const char* uplo, const MKL_INT* m, const MKL_INT* n, const
MKL_Complex16* alpha, const MKL_Complex16* beta, MKL_Complex16* a, const MKL_INT* ia,
const MKL_INT* ja, const MKL_INT* desca);
Include Files
• mkl_scalapack.h
Description
p?lase2 initializes an m-by-n distributed matrix sub( A ) denoting A(ia:ia+m-1,ja:ja+n-1) to beta on the
diagonal and alpha on the off-diagonals. p?lase2 requires that only the dimension of the matrix operand is
distributed.
Input Parameters
uplo (global)
Specifies the part of the distributed matrix sub( A ) to be set:
= 'U': Upper triangular part is set; the strictly lower triangular part of
sub( A ) is not changed;
1698
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
= 'L': Lower triangular part is set; the strictly upper triangular part of
sub( A ) is not changed;
Otherwise: All of the matrix sub( A ) is set.
m (global)
The number of rows to be operated on i.e the number of rows of the
distributed submatrix sub( A ). m >= 0.
n (global)
The number of columns to be operated on i.e the number of columns of the
distributed submatrix sub( A ). n >= 0.
alpha (global)
The constant to which the off-diagonal elements are to be set.
beta (global)
The constant to which the diagonal elements are to be set.
ia (global)
The row index in the global array a indicating the first row of sub( A ).
ja (global)
The column index in the global array a indicating the first column of
sub( A ).
Output Parameters
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1).
This array contains the local pieces of the distributed matrix sub( A )
to be set.
On exit, the leading m-by-n submatrix sub( A ) is set as follows:
p?laset
Initializes the offdiagonal elements of a matrix to alpha
and the diagonal elements to beta.
1699
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
void pslaset (char *uplo , MKL_INT *m , MKL_INT *n , float *alpha , float *beta , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca );
void pdlaset (char *uplo , MKL_INT *m , MKL_INT *n , double *alpha , double *beta ,
double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca );
void pclaset (char *uplo , MKL_INT *m , MKL_INT *n , MKL_Complex8 *alpha , MKL_Complex8
*beta , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca );
void pzlaset (char *uplo , MKL_INT *m , MKL_INT *n , MKL_Complex16 *alpha ,
MKL_Complex16 *beta , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca );
Include Files
• mkl_scalapack.h
Description
The p?lasetfunction initializes an m-by-n distributed matrix sub(A) denoting A(ia:ia+m-1, ja:ja+n-1) to
beta on the diagonal and alpha on the offdiagonals.
Input Parameters
uplo (global)
Specifies the part of the distributed matrix sub(A) to be set:
if uplo = 'U': upper triangular part; the strictly lower triangular part of
sub(A) is not changed;
if uplo = 'L': lower triangular part; the strictly upper triangular part of
sub(A) is not changed.
Otherwise: all of the matrix sub(A) is set.
m (global)
The number of rows in the distributed matrix sub(A). (m≥0).
n (global)
The number of columns in the distributed matrix sub(A). (n≥0).
alpha (global).
The constant to which the offdiagonal elements are to be set.
beta (global).
The constant to which the diagonal elements are to be set.
Output Parameters
a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).
This array contains the local pieces of the distributed matrix sub(A) to be
set. On exit, the leading m-by-n matrix sub(A) is set as follows:
if uplo = 'U', A(ia+i-1, ja+j-1) = alpha, 1≤i≤j-1, 1≤j≤n,
1700
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
otherwise, A(ia+i-1, ja+j-1) = alpha, 1≤i≤m, 1≤j≤n, ia+i≠ja+j, and, for
all uplo, A(ia+i-1, ja+i-1) = beta, 1≤i≤min(m,n).
ia, ja (global)
The column and row indices in the distributed matrix A indicating the first
row and column of the matrix sub(A), respectively.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?lasmsub
Looks for a small subdiagonal element from the
bottom of the matrix that it can safely set to zero.
Syntax
void pslasmsub (const float *a, const MKL_INT *desca, const MKL_INT *i, const MKL_INT
*l, MKL_INT *k, const float *smlnum, float *buf, const MKL_INT *lwork );
void pdlasmsub (const double *a, const MKL_INT *desca, const MKL_INT *i, const MKL_INT
*l, MKL_INT *k, const double *smlnum, double *buf, const MKL_INT *lwork );
void pclasmsub (const MKL_Complex8 *a , const MKL_INT *desca , const MKL_INT *i , const
MKL_INT *l , MKL_INT *k , const float *smlnum , MKL_Complex8 *buf , const MKL_INT
*lwork );
void pzlasmsub (const MKL_Complex16 *a , const MKL_INT *desca , const MKL_INT *i ,
const MKL_INT *l , MKL_INT *k , const double *smlnum , MKL_Complex16 *buf , const
MKL_INT *lwork );
Include Files
• mkl_scalapack.h
Description
The p?lasmsubfunction looks for a small subdiagonal element from the bottom of the matrix that it can
safely set to zero. This function performs a global maximum and must be called by all processes.
Input Parameters
a (local)
Array of size lld_a*LOCc(n_a).
On entry, the Hessenberg matrix whose tridiagonal part is being scanned.
Unchanged on exit.
i (global)
The global location of the bottom of the unreduced submatrix of A.
Unchanged on exit.
1701
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
l (global)
The global location of the top of the unreduced submatrix of A.
Unchanged on exit.
smlnum (global)
On entry, a "small number" for the given matrix. Unchanged on exit. The
machine-dependent constants for the stopping criterion.
lwork (local)
This must be at least 2*ceil(ceil((i-l)/mb_a )/ lcm(nprow,npcol)).
Here lcm is least common multiple and nprowxnpcol is the logical grid size.
Output Parameters
k (global)
On exit, this yields the bottom portion of the unreduced submatrix. This will
satisfy: l ≤ k ≤ i-1.
buf (local).
Array of size lwork.
Application Notes
This routine parallelizes the code from ?lahqr that looks for a single small subdiagonal element.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?lasrt
Sorts the numbers in an array and the corresponding
vectors in increasing order.
Syntax
void pslasrt (const char* id, const MKL_INT* n, float* d, const float* q, const
MKL_INT* iq, const MKL_INT* jq, const MKL_INT* descq, float* work, const MKL_INT*
lwork, MKL_INT* iwork, const MKL_INT* liwork, MKL_INT* info);
void pdlasrt (const char* id, const MKL_INT* n, double* d, const double* q, const
MKL_INT* iq, const MKL_INT* jq, const MKL_INT* descq, double* work, const MKL_INT*
lwork, MKL_INT* iwork, const MKL_INT* liwork, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
p?lasrt sorts the numbers in d and the corresponding vectors in q in increasing order.
Input Parameters
id (global)
= 'I': sort d in increasing order;
1702
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
= 'D': sort d in decreasing order. (NOT IMPLEMENTED YET)
n (global)
The number of columns to be operated on i.e the number of columns of the
distributed submatrix sub( Q ). n >= 0.
d (global)
Array, size (n)
q (local)
Pointer into the local memory to an array of size lld_q*LOCc(jq+n-1) .
This array contains the local pieces of the distributed matrix sub( A ) to be
copied from.
iq (global)
The row index in the global array A indicating the first row of sub( Q ).
jq (global)
The column index in the global array A indicating the first column of
sub( Q ).
work (local)
Array, size (lwork)
lwork (local)
The size of the array work.
iwork (local)
Array, size (liwork)
liwork (local)
The size of the array iwork.
Output Parameters
info (global)
= 0: successful exit
1703
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
< 0: If the i-th argument is an array and the j-th entry had an illegal
value, then info = -(i*100+j), if the i-th argument is a scalar and
had an illegal value, then info = -i.
p?lassq
Updates a sum of squares represented in scaled form.
Syntax
void pslassq (MKL_INT *n , float *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx ,
MKL_INT *incx , float *scale , float *sumsq );
void pdlassq (MKL_INT *n , double *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx ,
MKL_INT *incx , double *scale , double *sumsq );
void pclassq (MKL_INT *n , MKL_Complex8 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT
*descx , MKL_INT *incx , float *scale , float *sumsq );
void pzlassq (MKL_INT *n , MKL_Complex16 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT
*descx , MKL_INT *incx , double *scale , double *sumsq );
Include Files
• mkl_scalapack.h
Description
The p?lassqfunction returns the values scl and smsq such that
For real functions pslassq/pdlassq the value of sumsq is assumed to be non-negative and scl returns the
value
scl = max(scale, abs(xi)).
For complex functions pclassq/pzlassq the value of sumsq is assumed to be at least unity and the value of
ssq will then satisfy
1.0 ≤ ssq ≤sumsq +2n
For all functions p?lassq values scale and sumsq must be supplied in scale and sumsq respectively, and
scale and sumsq are overwritten by scl and ssq respectively.
All functions p?lassq make only one pass through the vector sub(X).
Input Parameters
n (global)
The length of the distributed vector sub(x ).
1704
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
x The array that stores the vector for which a scaled sum of squares is
computed:
x[ix + (jx-1)*m_x + i*incx], 0 ≤ i < n.
ix (global)
The row index in the global matrix X indicating the first row of sub(X).
jx (global)
The column index in the global matrix X indicating the first column of
sub(X).
incx (global)
The global increment for the elements of X. Only two values of incx are
supported in this version, namely 1 and m_x. The argument incx must not
equal zero.
scale (local).
On entry, the value scale in the equation above.
sumsq (local)
On entry, the value sumsq in the equation above.
Output Parameters
scale (local).
On exit, scale is overwritten with scl , the scaling factor for the sum of
squares.
sumsq (local).
On exit, sumsq is overwritten with the value smsq, the basic sum of squares
from which scl has been factored out.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?laswp
Performs a series of row interchanges on a general
rectangular matrix.
Syntax
void pslaswp (char *direc , char *rowcol , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , MKL_INT *k1 , MKL_INT *k2 , MKL_INT *ipiv );
void pdlaswp (char *direc , char *rowcol , MKL_INT *n , double *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_INT *k1 , MKL_INT *k2 , MKL_INT *ipiv );
void pclaswp (char *direc , char *rowcol , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_INT *k1 , MKL_INT *k2 , MKL_INT *ipiv );
void pzlaswp (char *direc , char *rowcol , MKL_INT *n , MKL_Complex16 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *k1 , MKL_INT *k2 , MKL_INT *ipiv );
1705
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl_scalapack.h
Description
The p?laswpfunction performs a series of row or column interchanges on the distributed matrix
sub(A)=A(ia:ia+n-1, ja:ja+n-1). One interchange is initiated for each of rows or columns k1 through k2
of sub(A). This function assumes that the pivoting information has already been broadcast along the process
row or column. Also note that this function will only work for k1-k2 being in the same mb (or nb) block. If
you want to pivot a full matrix, use p?lapiv.
Input Parameters
direc (global)
Specifies in which order the permutation is applied:
= 'F' - forward,
= 'B' - backward.
rowcol (global)
Specifies if the rows or columns are permuted:
= 'R' - rows,
= 'C' - columns.
n (global)
If rowcol='R', the length of the rows of the distributed matrix A(*,
ja:ja+n-1) to be permuted;
If rowcol='C', the length of the columns of the distributed matrix A(ia:ia
+n-1 , *) to be permuted;
a (local)
Pointer into the local memory to an array of size lld_a * LOCc(n_a). On
entry, this array contains the local pieces of the distributed matrix to which
the row/columns interchanges will be applied.
ia (global)
The row index in the global matrix A indicating the first row of sub(A).
ja (global)
The column index in the global matrix A indicating the first column of
sub(A).
k1 (global)
The first element of ipiv for which a row or column interchange will be done.
k2 (global)
The last element of ipiv for which a row or column interchange will be done.
1706
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ipiv (local)
Array of size LOCr(m_a)+mb_a for row pivoting and LOCr(n_a)+nb_a for
column pivoting. This array is tied to the matrix A, ipiv[k]=l implies rows
(or columns) k+1 and l are to be interchanged, k = 0, 1, ..., size (ipiv) -1.
Output Parameters
A (local)
On exit, the permuted distributed matrix.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?latra
Computes the trace of a general square distributed
matrix.
Syntax
float pslatra (MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca );
double pdlatra (MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca );
void pclatra (MKL_Complex8 * , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca );
void pzlatra (MKL_Complex16 * , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca );
Include Files
• mkl_scalapack.h
Description
This function computes the trace of an n-by-n distributed matrix sub(A) denoting A(ia:ia+n-1, ja:ja
+n-1). The result is left on every process of the grid.
Input Parameters
n (global)
The number of rows and columns to be operated on, that is, the order of
the distributed matrix sub(A). n ≥0.
a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1)
containing the local pieces of the distributed matrix, the trace of which is to
be computed.
ia, ja (global) The row and column indices respectively in the global matrix A
indicating the first row and the first column of the matrix sub(A),
respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
1707
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?latrd
Reduces the first nb rows and columns of a
symmetric/Hermitian matrix A to real tridiagonal form
by an orthogonal/unitary similarity transformation.
Syntax
void pslatrd (char *uplo , MKL_INT *n , MKL_INT *nb , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *d , float *e , float *tau , float *w , MKL_INT *iw ,
MKL_INT *jw , MKL_INT *descw , float *work );
void pdlatrd (char *uplo , MKL_INT *n , MKL_INT *nb , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *d , double *e , double *tau , double *w , MKL_INT *iw ,
MKL_INT *jw , MKL_INT *descw , double *work );
void pclatrd (char *uplo , MKL_INT *n , MKL_INT *nb , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *d , float *e , MKL_Complex8 *tau , MKL_Complex8
*w , MKL_INT *iw , MKL_INT *jw , MKL_INT *descw , MKL_Complex8 *work );
void pzlatrd (char *uplo , MKL_INT *n , MKL_INT *nb , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *d , double *e , MKL_Complex16 *tau ,
MKL_Complex16 *w , MKL_INT *iw , MKL_INT *jw , MKL_INT *descw , MKL_Complex16 *work );
Include Files
• mkl_scalapack.h
Description
The p?latrdfunction reduces nb rows and columns of a real symmetric or complex Hermitian matrix
sub(A)= A(ia:ia+n-1, ja:ja+n-1) to symmetric/complex tridiagonal form by an orthogonal/unitary
similarity transformation Q'*sub(A)*Q, and returns the matrices V and W, which are needed to apply the
transformation to the unreduced part of sub(A).
If uplo = U, p?latrd reduces the last nb rows and columns of a matrix, of which the upper triangle is
supplied;
if uplo = L, p?latrd reduces the first nb rows and columns of a matrix, of which the lower triangle is
supplied.
This is an auxiliary function called by p?sytrd/p?hetrd.
Input Parameters
uplo (global)
Specifies whether the upper or lower triangular part of the symmetric/
Hermitian matrix sub(A) is stored:
= 'U': Upper triangular
= L: Lower triangular.
1708
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
n (global)
The number of rows and columns to be operated on, that is, the order of
the distributed matrix sub(A). n ≥ 0.
nb (global)
The number of rows and columns to be reduced.
ia (global)
The row index in the global matrix A indicating the first row of sub(A).
ja (global)
The column index in the global matrix A indicating the first column of
sub(A).
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
iw (global)
The row index in the global matrix W indicating the first row of sub(W).
jw (global)
The column index in the global matrix W indicating the first column of
sub(W).
descw (global and local) array of size dlen_. The array descriptor for the
distributed matrix W.
work (local)
Workspace array of size nb_a.
Output Parameters
a (local)
On exit, if uplo = 'U', the last nb columns have been reduced to
tridiagonal form, with the diagonal elements overwriting the diagonal
elements of sub(A); the elements above the diagonal with the array tau
represent the orthogonal/unitary matrix Q as a product of elementary
reflectors;
1709
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
if uplo = 'L', the first nb columns have been reduced to tridiagonal form,
with the diagonal elements overwriting the diagonal elements of sub(A);
the elements below the diagonal with the array tau represent the
orthogonal/unitary matrix Q as a product of elementary reflectors.
d (local)
Array of size LOCc(ja+n-1).
e (local)
Array of size LOCc(ja+n-1) if uplo = 'U', LOCc(ja+n-2) otherwise.
tau (local)
Array of size LOCc(ja+n-1). This array contains the scalar factors of the
elementary reflectors. tau is tied to the distributed matrix A.
w (local)
Pointer into the local memory to an array of size lld_w* nb_w. This array
contains the local pieces of the n-by-nb_w matrix w required to update the
unreduced part of sub(A).
Application Notes
If uplo = 'U', the matrix Q is represented as a product of elementary reflectors
Q = H(n)*H(n-1)*...*H(n-nb+1)
Each H(i) has the form
H(i) = I - tau*v*v' ,
where tau is a real/complex scalar, and v is a real/complex vector with v(i:n) = 0 and v(i-1) = 1;
v(1:i-1) is stored on exit in A(ia:ia+i-1, ja+i), and tau in tau[ja+i-2].
If uplo = L, the matrix Q is represented as a product of elementary reflectors
Q = H(1)*H(2)*...*H(nb)
Each H(i) has the form
H(i) = I - tau*v*v' ,
where tau is a real/complex scalar, and v is a real/complex vector with v(1:i) = 0 and v(i+1) = 1; v(i
+2: n) is stored on exit in A(ia+i+1: ia+n-1, ja+i-1), and tau in tau[ja+i-2].
The elements of the vectors v together form the n-by-nb matrix V which is needed, with W, to apply the
transformation to the unreduced part of the matrix, using a symmetric/Hermitian rank-2k update of the
form:
sub(A) := sub(A)-vw'-wv'.
1710
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The contents of a on exit are illustrated by the following examples with
n = 5 and nb = 2:
where d denotes a diagonal element of the reduced matrix, a denotes an element of the original matrix that
is unchanged, and vi denotes an element of the vector defining H(i).
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?latrs
Solves a triangular system of equations with the scale
factor set to prevent overflow.
Syntax
void pslatrs (char *uplo , char *trans , char *diag , char *normin , MKL_INT *n , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *x , MKL_INT *ix , MKL_INT *jx ,
MKL_INT *descx , float *scale , float *cnorm , float *work );
void pdlatrs (char *uplo , char *trans , char *diag , char *normin , MKL_INT *n , double
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *x , MKL_INT *ix , MKL_INT
*jx , MKL_INT *descx , double *scale , double *cnorm , double *work );
void pclatrs (char *uplo , char *trans , char *diag , char *normin , MKL_INT *n ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *x ,
MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , float *scale , float *cnorm , MKL_Complex8
*work );
void pzlatrs (char *uplo , char *trans , char *diag , char *normin , MKL_INT *n ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *x ,
MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , double *scale , double *cnorm ,
MKL_Complex16 *work );
Include Files
• mkl_scalapack.h
Description
The p?latrsfunction solves a triangular system of equations Ax = sb, ATx = sb or AHx = sb, where s is a
scale factor set to prevent overflow. The description of the function will be extended in the future releases.
1711
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
= 'N': cnorm is not set on entry. On exit, the norms will be computed and
stored in cnorm.
If uplo = 'L', the leading n-by-n lower triangular part of the array a
contains the lower triangular matrix, and the strictly upper triangular part
of a is not referenced.
If diag = 'U', the diagonal elements of a are also not referenced and are
assumed to be 1.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
x Array of size n. On entry, the right hand side b of the triangular system.
ix (global).The row index in the global matrix X indicating the first row of
sub(x).
jx (global)
The column index in the global matrix X indicating the first column of
sub(X).
1712
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
descx (global and local)
Array of size dlen_. The array descriptor for the distributed matrix X.
cnorm Array of size n. If normin = 'Y', cnorm is an input argument and cnorm[j]
contains the norm of the off-diagonal part of the (j+1)-th column of the
matrix A, j=0, 1, ..., n-1. If trans = 'N', cnorm[j] must be greater than or
equal to the infinity-norm, and if trans = 'T' or 'C', cnorm[j] must be
greater than or equal to the 1-norm.
work (local).
Temporary workspace.
Output Parameters
scale Array of size lda* n. The scaling factor s for the triangular system as
described above.
If scale = 0, the matrix A is singular or badly scaled, and the vector x is
an exact or approximate solution to Ax = 0.
cnorm If normin = 'N', cnorm is an output argument and cnorm[j] returns the 1-
norm of the off-diagonal part of the (j+1)-th column of A, j=0, 1, ..., n-1.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?latrz
Reduces an upper trapezoidal matrix to upper
triangular form by means of orthogonal/unitary
transformations.
Syntax
void pslatrz (MKL_INT *m , MKL_INT *n , MKL_INT *l , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *tau , float *work );
void pdlatrz (MKL_INT *m , MKL_INT *n , MKL_INT *l , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *tau , double *work );
void pclatrz (MKL_INT *m , MKL_INT *n , MKL_INT *l , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work );
void pzlatrz (MKL_INT *m , MKL_INT *n , MKL_INT *l , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work );
Include Files
• mkl_scalapack.h
Description
The p?latrzfunction reduces the m-by-n(m ≤ n) real/complex upper trapezoidal matrix sub(A) =
[A(ia:ia+m-1, ja:ja+m-1)A(ia:ia+m-1, ja+n-l:ja+n-1)] to upper triangular form by means of
orthogonal/unitary transformations.
The upper trapezoidal matrix sub(A) is factored as
1713
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
sub(A) = ( R 0 )*Z,
where Z is an n-by-n orthogonal/unitary matrix and R is an m-by-m upper triangular matrix.
Input Parameters
m (global)
The number of rows in the distributed matrix sub(A). m ≥ 0.
n (global)
The number of columns in the distributed matrix sub(A). n ≥ 0.
l (global)
The number of columns of the distributed matrix sub(A) containing the
meaningful part of the Householder reflectors. l > 0.
a (local)
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1). On
entry, the local pieces of the m-by-n distributed matrix sub(A), which is to
be factored.
ia (global)
The row index in the global matrix A indicating the first row of sub(A).
ja (global)
The column index in the global matrix A indicating the first column of
sub(A).
work (local)
Workspace array of size lwork.
lwork ≥ nq0 + max(1, mp0), where
iroff = mod(ia-1, mb_a),
icoff = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, myrow, rsrc_a, nprow),
iacol = indxg2p(ja, nb_a, mycol, csrc_a, npcol),
mp0 = numroc(m+iroff, mb_a, myrow, iarow, nprow),
nq0 = numroc(n+icoff, nb_a, mycol, iacol, npcol),
numroc, indxg2p, and numroc are ScaLAPACK tool functions; myrow,
mycol, nprow, and npcol can be determined by calling the function
blacs_gridinfo.
1714
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
a On exit, the leading m-by-m upper triangular part of sub(A) contains the
upper triangular matrix R, and elements n-l+1 to n of the first m rows of
sub(A), with the array tau, represent the orthogonal/unitary matrix Z as a
product of m elementary reflectors.
tau (local)
Array of sizeLOCr(ja+m-1). This array contains the scalar factors of the
elementary reflectors. tau is tied to the distributed matrix A.
Application Notes
The factorization is obtained by Householder's method. The k-th transformation matrix, Z(k), which is used
(or, in case of complex functions, whose conjugate transpose is used) to introduce zeros into the (m - k +
1)-th row of sub(A), is given in the form
where
tau is a scalar and z( k ) is an (n-m)-element vector. tau and z( k ) are chosen to annihilate the elements
of the k-th row of sub(A). The scalar tau is returned in the k-th element of tau, indexed k-1, and the vector
u( k ) in the k-th row of sub(A), such that the elements of z(k ) are in A( k, m + 1 ), ..., A( k,
n ). The elements of R are returned in the upper triangular part of sub(A).
Z is given by
Z = Z(1)Z(2)...Z(m).
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?lauu2
Computes the product U*U' or L'*L, where U and L
are upper or lower triangular matrices (local
unblocked algorithm).
Syntax
void pslauu2 (char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca );
void pdlauu2 (char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca );
void pclauu2 (char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca );
1715
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
void pzlauu2 (char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca );
Include Files
• mkl_scalapack.h
Description
The p?lauu2function computes the product U*U' or L'*L, where the triangular factor U or L is stored in the
upper or lower triangular part of the distributed matrix
sub(A)= A(ia:ia+n-1, ja:ja+n-1).
If uplo = 'U' or 'u', then the upper triangle of the result is stored, overwriting the factor U in sub(A).
If uplo = 'L' or 'l', then the lower triangle of the result is stored, overwriting the factor L in sub(A).
This is the unblocked form of the algorithm, calling BLAS Level 2 Routines. No communication is performed
by this function, the matrix to operate on should be strictly local to one process.
Input Parameters
uplo (global)
Specifies whether the triangular factor stored in the matrix sub(A) is upper
or lower triangular:
= U: upper triangular
= L: lower triangular.
n (global)
The number of rows and columns to be operated on, that is, the order of
the triangular factor U or L. n ≥ 0.
a (local)
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1). On
entry, the local pieces of the triangular factor U or L.
ia (global)
The row index in the global matrix A indicating the first row of sub(A).
ja (global)
The column index in the global matrix A indicating the first column of
sub(A).
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
Output Parameters
a (local)
On exit, if uplo = 'U', the upper triangle of the distributed matrix sub(A)
is overwritten with the upper triangle of the product U*U'; if uplo = 'L',
the lower triangle of sub(A) is overwritten with the lower triangle of the
product L'*L.
1716
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?lauum
Computes the product U*U' or L'*L, where U and L
are upper or lower triangular matrices.
Syntax
void pslauum (char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca );
void pdlauum (char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca );
void pclauum (char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca );
void pzlauum (char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca );
Include Files
• mkl_scalapack.h
Description
The p?lauumfunction computes the product U*U' or L'*L, where the triangular factor U or L is stored in the
upper or lower triangular part of the matrix sub(A)= A(ia:ia+n-1, ja:ja+n-1).
If uplo = 'U' or 'u', then the upper triangle of the result is stored, overwriting the factor U in sub(A). If
uplo = 'L' or 'l', then the lower triangle of the result is stored, overwriting the factor L in sub(A).
Input Parameters
uplo (global)
Specifies whether the triangular factor stored in the matrix sub(A) is upper
or lower triangular:
= 'U': upper triangular
n (global)
The number of rows and columns to be operated on, that is, the order of
the triangular factor U or L. n ≥ 0.
a (local)
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1). On
entry, the local pieces of the triangular factor U or L.
ia (global)
The row index in the global matrix A indicating the first row of sub(A).
ja (global)
1717
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The column index in the global matrix A indicating the first column of
sub(A).
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
Output Parameters
a (local)
On exit, if uplo = 'U', the upper triangle of the distributed matrix sub(A)
is overwritten with the upper triangle of the product U*U' ; if uplo = 'L',
the lower triangle of sub(A) is overwritten with the lower triangle of the
product L'*L.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?lawil
Forms the Wilkinson transform.
Syntax
void pslawil (const MKL_INT *ii, const MKL_INT *jj, const MKL_INT *m, const float *a,
const MKL_INT *desca, const float *h44, const float *h33, const float *h43h34, float
*v );
void pdlawil (const MKL_INT *ii, const MKL_INT *jj, const MKL_INT *m, const double *a,
const MKL_INT *desca, const double *h44, const double *h33, const double *h43h34,
double *v );
void pclawil (const MKL_INT *ii , const MKL_INT *jj , const MKL_INT *m , const
MKL_Complex8 *a , const MKL_INT *desca , const MKL_Complex8 *h44 , const MKL_Complex8
*h33 , const MKL_Complex8 *h43h34 , MKL_Complex8 *v );
void pzlawil (const MKL_INT *ii , const MKL_INT *jj , const MKL_INT *m , const
MKL_Complex16 *a , const MKL_INT *desca , const MKL_Complex16 *h44 , const
MKL_Complex16 *h33 , const MKL_Complex16 *h43h34 , MKL_Complex16 *v );
Include Files
• mkl_scalapack.h
Description
The p?lawilfunction gets the transform given by h44, h33, and h43h34 into v starting at row m.
Input Parameters
ii (global)
Number of the process row which owns the matrix element A(m+2, m+2).
jj (global)
Number of the process column which owns the matrix element A(m+2, m
+2).
m (global)
1718
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
On entry, the location from where the transform starts (row m). Unchanged
on exit.
a (local)
Array of size lld_a*LOCc(n_a).
On entry, the Hessenberg matrix. Unchanged on exit.
h43h34 (global)
These three values are for the double shift QR iteration. Unchanged on exit.
Output Parameters
v (global)
Array of size 3 that contains the transform on output.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?org2l/p?ung2l
Generates all or part of the orthogonal/unitary matrix
Q from a QL factorization determined by p?geqlf
(unblocked algorithm).
Syntax
void psorg2l (MKL_INT *m , MKL_INT *n , MKL_INT *k , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdorg2l (MKL_INT *m , MKL_INT *n , MKL_INT *k , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
void pcung2l (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
void pzung2l (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?org2l/p?ung2lfunction generates an m-by-n real/complex distributed matrix Q denoting A(ia:ia
+m-1, ja:ja+n-1) with orthonormal columns, which is defined as the last n columns of a product of k
elementary reflectors of order m:
Q = H(k)*...*H(2)*H(1) as returned by p?geqlf.
1719
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
m (global)
The number of rows in the distributed submatrix Q. m ≥ 0.
n (global)
The number of columns in the distributed submatrix Q. m ≥ n ≥ 0.
k (global)
The number of elementary reflectors whose product defines the matrix Q.
n≥ k ≥ 0.
On entry, the j-th column of the matrix stored in amust contain the vector
that defines the elementary reflector H(j), ja+n-k ≤ j ≤ ja+n-k, as
returned by p?geqlf in the k columns of its distributed matrix argument
A(ia:*,ja+n-k:ja+n-1).
ia (global)
The row index in the global matrix A indicating the first row of sub(A).
ja (global)
The column index in the global matrix A indicating the first column of
sub(A).
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
tau (local)
Array of size LOCc(ja+n-1).
work (local)
Workspace array of size lwork.
1720
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.
Output Parameters
a On exit, this array contains the local pieces of the m-by-n distributed matrix
Q.
info (local).
= 0: successful exit
< 0: if the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value,
then info = - (i*100 +j),
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?org2r/p?ung2r
Generates all or part of the orthogonal/unitary matrix
Q from a QR factorization determined by p?geqrf
(unblocked algorithm).
Syntax
void psorg2r (MKL_INT *m , MKL_INT *n , MKL_INT *k , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdorg2r (MKL_INT *m , MKL_INT *n , MKL_INT *k , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
void pcung2r (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
void pzung2r (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?org2r/p?ung2rfunction generates an m-by-n real/complex matrix Q denoting A(ia:ia+m-1, ja:ja
+n-1) with orthonormal columns, which is defined as the first n columns of a product of k elementary
reflectors of order m:
Q = H(1)*H(2)*...*H(k)
as returned by p?geqrf.
1721
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
m (global)
The number of rows in the distributed submatrix Q.m ≥ 0.
n (global)
The number of columns in the distributed submatrix Q. m ≥ n ≥ 0.
k (global)
The number of elementary reflectors whose product defines the matrix Q. n
≥ k ≥ 0.
On entry, the j-th column of the matrix stored in amust contain the vector
that defines the elementary reflector H(j), ja ≤ j ≤ ja+k-1, as returned
by p?geqrf in the k columns of its distributed matrix argument
A(ia:*,ja:ja+k-1).
ia (global)
The row index in the global matrix A indicating the first row of sub(A).
ja (global)
The column index in the global matrix A indicating the first column of
sub(A).
tau (local)
Array of size LOCc(ja+k-1).
work (local)
Workspace array of size lwork.
where
iroffa = mod(ia-1, mb_a , icoffa = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, myrow, rsrc_a, nprow),
iacol = indxg2p(ja, nb_a, mycol, csrc_a, npcol),
mpa0 = numroc(m+iroffa, mb_a, myrow, iarow, nprow),
nqa0 = numroc(n+icoffa, nb_a, mycol, iacol, npcol).
1722
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
indxg2p and numroc are ScaLAPACK tool functions; myrow, mycol, nprow,
and npcol can be determined by calling the function blacs_gridinfo.
Output Parameters
a On exit, this array contains the local pieces of the m-by-n distributed matrix
Q.
info (local).
= 0: successful exit
< 0: if the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value,
then info = - (i*100 +j),
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?orgl2/p?ungl2
Generates all or part of the orthogonal/unitary matrix
Q from an LQ factorization determined by p?gelqf
(unblocked algorithm).
Syntax
void psorgl2 (MKL_INT *m , MKL_INT *n , MKL_INT *k , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdorgl2 (MKL_INT *m , MKL_INT *n , MKL_INT *k , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
void pcungl2 (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
void pzungl2 (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?orgl2/p?ungl2function generates a m-by-n real/complex matrix Q denoting A(ia:ia+m-1, ja:ja
+n-1) with orthonormal rows, which is defined as the first m rows of a product of k elementary reflectors of
order n
1723
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
m (global)
The number of rows in the distributed submatrix Q. m ≥ 0.
n (global)
The number of columns in the distributed submatrix Q. n ≥ m ≥ 0.
k (global)
The number of elementary reflectors whose product defines the matrix Q. m
≥ k ≥ 0.
On entry, the i-th row of the matrix stored in amust contain the vector that
defines the elementary reflector H(i), ia ≤ i ≤ ia+k-1, as returned by
p?gelqf in the k rows of its distributed matrix argument A(ia:ia+k-1,
ja:*).
ia (global)
The row index in the global matrix A indicating the first row of sub(A).
ja (global)
The column index in the global matrix A indicating the first column of
sub(A).
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
tau (local)
Array of size LOCr(ja+k-1). tau[j] contains the scalar factor of the
elementary reflectors H(j+1), j = 0, 1, ..., LOCr(ja+k-1)-1, as returned by
p?gelqf. This array is tied to the distributed matrix A.
WORK (local)
Workspace array of size lwork.
1724
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
indxg2p and numroc are ScaLAPACK tool functions; myrow, mycol, nprow,
and npcol can be determined by calling the function blacs_gridinfo.
Output Parameters
a On exit, this array contains the local pieces of the m-by-n distributed matrix
Q.
info (local)
= 0: successful exit
< 0: if the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value,
then info = - (i*100 +j),
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?orgr2/p?ungr2
Generates all or part of the orthogonal/unitary matrix
Q from an RQ factorization determined by p?gerqf
(unblocked algorithm).
Syntax
void psorgr2 (MKL_INT *m , MKL_INT *n , MKL_INT *k , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdorgr2 (MKL_INT *m , MKL_INT *n , MKL_INT *k , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
void pcungr2 (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
void pzungr2 (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?orgr2/p?ungr2function generates an m-by-n real/complex matrix Q denoting A(ia:ia+m-1, ja:ja
+n-1) with orthonormal rows, which is defined as the last m rows of a product of k elementary reflectors of
order n
1725
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
m (global)
The number of rows in the distributed submatrix Q. m ≥ 0.
n (global)
The number of columns in the distributed submatrix Q. n ≥ m ≥ 0.
k (global)
The number of elementary reflectors whose product defines the matrix Q. m
≥ k ≥ 0.
On entry, the i-th row of the matrix stored in amust contain the vector that
defines the elementary reflector H(i), ia+m-k ≤ i ≤ ia+m-1, as returned by
p?gerqf in the k rows of its distributed matrix argument A(ia+m-k:ia
+m-1, ja:*).
ia (global)
The row index in the global matrix A indicating the first row of sub(A).
ja (global)
The column index in the global matrix A indicating the first column of
sub(A).
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
tau (local)
Array of size LOCr(ja+m-1). tau[j] contains the scalar factor of the
elementary reflectors H(j+1), j = 0, 1, ..., LOCr(ja+m-1)-1, as returned by
p?gerqf. This array is tied to the distributed matrix A.
work (local)
Workspace array of size lwork.
1726
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.
Output Parameters
a On exit, this array contains the local pieces of the m-by-n distributed matrix
Q.
info (local)
= 0: successful exit
< 0: if the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value,
then info = - (i*100 +j),
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?orm2l/p?unm2l
Multiplies a general matrix by the orthogonal/unitary
matrix from a QL factorization determined by p?geqlf
(unblocked algorithm).
Syntax
void psorm2l (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *tau , float *c , MKL_INT *ic ,
MKL_INT *jc , MKL_INT *descc , float *work , MKL_INT *lwork , MKL_INT *info );
void pdorm2l (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , double
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *tau , double *c , MKL_INT
*ic , MKL_INT *jc , MKL_INT *descc , double *work , MKL_INT *lwork , MKL_INT *info );
void pcunm2l (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau ,
MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work ,
MKL_INT *lwork , MKL_INT *info );
void pzunm2l (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau ,
MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex16 *work ,
MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?orm2l/p?unm2lfunction overwrites the general real/complex m-by-n distributed matrix sub
(C)=C(ic:ic+m-1,jc:jc+n-1) with
1727
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
QT*sub(C) / QH*sub(C) if side = 'L' and trans = 'T' (for real flavors) or trans = 'C' (for complex
flavors), or
sub(C)*Q if side = 'R' and trans = 'N', or
sub(C)*QT / sub(C)*QH if side = 'R' and trans = 'T' (for real flavors) or trans = 'C' (for complex
flavors).
where Q is a real orthogonal or complex unitary distributed matrix defined as the product of k elementary
reflectors
Q = H(k)*...*H(2)*H(1) as returned by p?geqlf . Q is of order m if side = 'L' and of order n if side =
'R'.
Input Parameters
side (global)
= 'L': apply Q or QT for real flavors (QH for complex flavors) from the left,
= 'R': apply Q or QT for real flavors (QH for complex flavors) from the
right.
trans (global)
= 'N': apply Q (no transpose)
m (global)
The number of rows in the distributed matrix sub(C). m ≥ 0.
n (global)
The number of columns in the distributed matrix sub(C). n ≥ 0.
k (global)
The number of elementary reflectors whose product defines the matrix Q.
If side = 'L', m ≥ k ≥ 0;
if side = 'R', n ≥ k ≥ 0.
a (local)
Pointer into the local memory to an array of size lld_a * LOCc(ja+k-1).
On entry, the j-th row of the matrix stored in amust contain the vector that
defines the elementary reflector H(j), ja ≤ j ≤ ja+k-1, as returned by
p?geqlf in the k columns of its distributed matrix argument A(ia:*,ja:ja
+k-1). The argument A(ia:*,ja:ja+k-1) is modified by the function but
restored on exit.
If side = 'L', lld_a ≥ max(1, LOCr(ia+m-1)),
ia (global)
The row index in the global matrix A indicating the first row of sub(A).
1728
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ja (global)
The column index in the global matrix A indicating the first column of
sub(A).
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
tau (local)
Array of size LOCc(ja+n-1). tau[j] contains the scalar factor of the
elementary reflector H(j+1), j = 0, 1, ..., LOCc(ja+n-1)-1, as returned by
p?geqlf. This array is tied to the distributed matrix A.
c (local)
Pointer into the local memory to an array of size lld_c * LOCc(jc+n-1).On
entry, the local pieces of the distributed matrix sub (C).
ic (global)
The row index in the global matrix C indicating the first row of sub(C).
jc (global)
The column index in the global matrix C indicating the first column of
sub(C).
descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.
work (local)
Workspace array of size lwork.
On exit, work(1) returns the minimal and optimal lwork.
lcmq = lcm/npcol,
lcm = iclm(nprow, npcol),
iroffc = mod(ic-1, mb_c),
icoffc = mod(jc-1, nb_c),
icrow = indxg2p(ic, mb_c, myrow, rsrc_c, nprow),
iccol = indxg2p(jc, nb_c, mycol, csrc_c, npcol),
Mqc0 = numroc(m+icoffc, nb_c, mycol, icrow, nprow),
Npc0 = numroc(n+iroffc, mb_c, myrow, iccol, npcol),
1729
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
ilcm, indxg2p, and numroc are ScaLAPACK tool functions; myrow, mycol,
nprow, and npcol can be determined by calling the function
blacs_gridinfo.
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.
Output Parameters
info (local)
= 0: successful exit
< 0: if the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value,
then info = - (i*100 +j),
NOTE
The distributed submatrices A(ia:*, ja:*) and C(ic:ic+m-1,jc:jc+n-1) must verify some
alignment properties, namely the following expressions should be true:
If side = 'L', ( mb_a == mb_c && iroffa == iroffc && iarow == icrow )
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?orm2r/p?unm2r
Multiplies a general matrix by the orthogonal/unitary
matrix from a QR factorization determined by
p?geqrf (unblocked algorithm).
Syntax
void psorm2r (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *tau , float *c , MKL_INT *ic ,
MKL_INT *jc , MKL_INT *descc , float *work , MKL_INT *lwork , MKL_INT *info );
void pdorm2r (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , double
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *tau , double *c , MKL_INT
*ic , MKL_INT *jc , MKL_INT *descc , double *work , MKL_INT *lwork , MKL_INT *info );
1730
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void pcunm2r (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau ,
MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work ,
MKL_INT *lwork , MKL_INT *info );
void pzunm2r (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau ,
MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex16 *work ,
MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?orm2r/p?unm2rfunction overwrites the general real/complex m-by-n distributed matrix sub
(C)=C(ic:ic+m-1, jc:jc+n-1) with
Q*sub(C) if side = 'L' and trans = 'N', or
QT*sub(C) / QH*sub(C) if side = 'L' and trans = 'T' (for real flavors) or trans = 'C' (for complex
flavors), or
sub(C)*Q if side = 'R' and trans = 'N', or
sub(C)*QT / sub(C)*QH if side = 'R' and trans = 'T' (for real flavors) or trans = 'C' (for complex
flavors).
where Q is a real orthogonal or complex unitary matrix defined as the product of k elementary reflectors
Input Parameters
side (global)
= 'L': apply Q or QT for real flavors (QH for complex flavors) from the left,
= 'R': apply Q or QT for real flavors (QH for complex flavors) from the
right.
trans (global)
= 'N': apply Q (no transpose)
m (global)
The number of rows in the distributed matrix sub(C). m ≥ 0.
n (global)
The number of columns in the distributed matrix sub(C). n ≥ 0.
k (global)
The number of elementary reflectors whose product defines the matrix Q.
If side = 'L', m ≥ k ≥ 0;
1731
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
if side = 'R', n ≥ k ≥ 0.
a (local)
Pointer into the local memory to an array of size lld_a * LOCc(ja+k-1).
On entry, the j-th column of the matrix stored in amust contain the vector
that defines the elementary reflector H(j), ja ≤ j ≤ja+k-1, as returned by
p?geqrf in the k columns of its distributed matrix argument A(ia:*,ja:ja
+k-1). The argument A(ia:*,ja:ja+k-1) is modified by the function but
restored on exit.
If side = 'L', lld_a ≥ max(1, LOCr(ia+m-1)),
ia (global)
The row index in the global matrix A indicating the first row of sub(A).
ja (global)
The column index in the global matrix A indicating the first column of
sub(A).
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
tau (local)
Array of size LOCc(ja+k-1). tau[j] contains the scalar factor of the
elementary reflector H(j+1), j = 0, 1, ..., LOCc(ja+k-1)-1, as returned by
p?geqrf. This array is tied to the distributed matrix A.
c (local)
Pointer into the local memory to an array of size lld_c * LOCc(jc+n-1).
ic (global)
The row index in the global matrix C indicating the first row of sub(C).
jc (global)
The column index in the global matrix C indicating the first column of
sub(C).
work (local)
Workspace array of size lwork.
1732
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
if side = 'R', lwork ≥ nqc0 + max(max(1, mpc0), numroc(numroc(n
+icoffc, nb_a, 0, 0, npcol), nb_a, 0, 0, lcmq)),
where
lcmq = lcm/npcol ,
lcm = iclm(nprow, npcol),
iroffc = mod(ic-1, mb_c),
icoffc = mod(jc-1, nb_c),
icrow = indxg2p(ic, mb_c, myrow, rsrc_c, nprow),
iccol = indxg2p(jc, nb_c, mycol, csrc_c, npcol),
Mqc0 = numroc(m+icoffc, nb_c, mycol, icrow, nprow),
Npc0 = numroc(n+iroffc, mb_c, myrow, iccol, npcol),
ilcm, indxg2p and numroc are ScaLAPACK tool functions; myrow, mycol,
nprow, and npcol can be determined by calling the function
blacs_gridinfo.
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.
Output Parameters
info (local)
= 0: successful exit
< 0: if the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value,
then info = - (i*100 +j),
NOTE
The distributed submatrices A(ia:*, ja:*) and C(ic:ic+m-1, jc:jc+n-1) must verify some
alignment properties, namely the following expressions should be true:
If side = 'L', (mb_a == mb_c) && (iroffa == iroffc) && (iarow == icrow).
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
1733
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
p?orml2/p?unml2
Multiplies a general matrix by the orthogonal/unitary
matrix from an LQ factorization determined by
p?gelqf (unblocked algorithm).
Syntax
void psorml2 (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *tau , float *c , MKL_INT *ic ,
MKL_INT *jc , MKL_INT *descc , float *work , MKL_INT *lwork , MKL_INT *info );
void pdorml2 (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , double
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *tau , double *c , MKL_INT
*ic , MKL_INT *jc , MKL_INT *descc , double *work , MKL_INT *lwork , MKL_INT *info );
void pcunml2 (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau ,
MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work ,
MKL_INT *lwork , MKL_INT *info );
void pzunml2 (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau ,
MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex16 *work ,
MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?orml2/p?unml2function overwrites the general real/complex m-by-n distributed matrix sub
(C)=C(ic:ic+m-1, jc:jc+n-1) with
QT*sub(C) / QH*sub(C) if side = 'L' and trans = 'T' (for real flavors) or trans = 'C' (for complex
flavors), or
sub(C)*Q if side = 'R' and trans = 'N', or
sub(C)*QT / sub(C)*QH if side = 'R' and trans = 'T' (for real flavors) or trans = 'C' (for complex
flavors).
where Q is a real orthogonal or complex unitary distributed matrix defined as the product of k elementary
reflectors
Q = H(k)*...*H(2)*H(1) (for real flavors)
Q = (H(k))H*...*(H(2))H*(H(1))H (for complex flavors)
as returned by p?gelqf . Q is of order m if side = 'L' and of order n if side = 'R'.
Input Parameters
side (global)
= 'L': apply Q or QT for real flavors (QH for complex flavors) from the left,
= 'R': apply Q or QT for real flavors (QH for complex flavors) from the
right.
1734
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
trans (global)
= 'N': apply Q (no transpose)
m (global)
The number of rows in the distributed matrix sub(C). m ≥ 0.
n (global)
The number of columns in the distributed matrix sub(C). n ≥ 0.
k (global)
The number of elementary reflectors whose product defines the matrix Q.
If side = 'L', m ≥ k ≥ 0;
if side = 'R', n ≥ k ≥ 0.
a (local)
Pointer into the local memory to an array of size
lld_a * LOCc(ja+m-1) if side='L',
On entry, the i-th row of the matrix stored in amust contain the vector that
defines the elementary reflector H(i), ia ≤ i ≤ ia+k-1, as returned by
p?gelqf in the k rows of its distributed matrix argument A(ia:ia+k-1,
ja:*). The argument A(ia:ia+k-1, ja:*) is modified by the function but
restored on exit.
ia (global)
The row index in the global matrix A indicating the first row of sub(A).
ja (global)
The column index in the global matrix A indicating the first column of
sub(A).
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
tau (local)
Array of size LOCc(ia+k-1). tau[i] contains the scalar factor of the
elementary reflector H(i+1), i = 0, 1, ..., LOCc(ja+k-1)-1, as returned by
p?gelqf. This array is tied to the distributed matrix A.
c (local)
Pointer into the local memory to an array of size lld_c * LOCc(jc+n-1). On
entry, the local pieces of the distributed matrix sub (C).
ic (global)
The row index in the global matrix C indicating the first row of sub(C).
1735
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
jc (global)
The column index in the global matrix C indicating the first column of
sub(C).
descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.
work (local)
Workspace array of size lwork.
where
lcmp = lcm / nprow,
lcm = iclm(nprow, npcol),
iroffc = mod(ic-1, mb_c),
icoffc = mod(jc-1, nb_c),
icrow = indxg2p(ic, mb_c, myrow, rsrc_c, nprow),
iccol = indxg2p(jc, nb_c, mycol, csrc_c, npcol),
Mpc0 = numroc(m+icoffc, mb_c, mycol, icrow, nprow),
Nqc0 = numroc(n+iroffc, nb_c, myrow, iccol, npcol),
ilcm, indxg2p and numroc are ScaLAPACK tool functions; myrow, mycol,
nprow, and npcol can be determined by calling the function
blacs_gridinfo.
If lwork = -1, then lwork is global input and a workspace query is
assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.
Output Parameters
info (local)
= 0: successful exit
< 0: if the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value,
then info = - (i*100 +j),
1736
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
if the i-th argument is a scalar and had an illegal value,
then info = -i.
NOTE
The distributed submatrices A(ia:*, ja:*) and C(ic:ic+m-1, jc:jc+n-1) must verify some
alignment properties, namely the following expressions should be true:
If side = 'L', (nb_a == mb_c && icoffa == iroffc)
If side = 'R', (nb_a == nb_c && icoffa == icoffc && iacol == iccol).
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?ormr2/p?unmr2
Multiplies a general matrix by the orthogonal/unitary
matrix from an RQ factorization determined by
p?gerqf (unblocked algorithm).
Syntax
void psormr2 (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *tau , float *c , MKL_INT *ic ,
MKL_INT *jc , MKL_INT *descc , float *work , MKL_INT *lwork , MKL_INT *info );
void pdormr2 (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , double
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *tau , double *c , MKL_INT
*ic , MKL_INT *jc , MKL_INT *descc , double *work , MKL_INT *lwork , MKL_INT *info );
void pcunmr2 (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau ,
MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work ,
MKL_INT *lwork , MKL_INT *info );
void pzunmr2 (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau ,
MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex16 *work ,
MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?ormr2/p?unmr2function overwrites the general real/complex m-by-n distributed matrix sub
(C)=C(ic:ic+m-1, jc:jc+n-1) with
sub(C)*QT / sub(C)*QH if side = 'R' and trans = 'T' (for real flavors) or trans = 'C' (for complex
flavors).
1737
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
where Q is a real orthogonal or complex unitary distributed matrix defined as the product of k elementary
reflectors
Q = H(1)*H(2)*...*H(k) (for real flavors)
Q = (H(1))H*(H(2))H*...*(H(k))H (for complex flavors)
as returned by p?gerqf . Q is of order m if side = 'L' and of order n if side = 'R'.
Input Parameters
side (global)
= 'L': apply Q or QT for real flavors (QH for complex flavors) from the left,
= 'R': apply Q or QT for real flavors (QH for complex flavors) from the
right.
trans (global)
= 'N': apply Q (no transpose)
m (global)
The number of rows in the distributed matrix sub(C). m ≥ 0.
n (global)
The number of columns in the distributed matrix sub(C). n ≥ 0.
k (global)
The number of elementary reflectors whose product defines the matrix Q.
If side = 'L', m ≥ k ≥ 0;
if side = 'R', n ≥ k ≥ 0.
a (local)
Pointer into the local memory to an array of size
lld_a * LOCc(ja+m-1) if side='L',
On entry, the i-th row of the matrix stored in amust contain the vector that
defines the elementary reflector H(i), ia ≤ i ≤ ia+k-1, as returned by
p?gerqf in the k rows of its distributed matrix argument A(ia:ia+k-1,
ja:*).
The argument A(ia:ia+k-1, ja:*) is modified by the function but
restored on exit.
ia (global)
The row index in the global matrix A indicating the first row of sub(A).
ja (global)
1738
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The column index in the global matrix A indicating the first column of
sub(A).
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
tau (local)
Array of size LOCc(ia+k-1). tau[j] contains the scalar factor of the
elementary reflector H(j+1), j = 0, 1, ..., LOCc(ja+k-1)-1, as returned by
p?gerqf. This array is tied to the distributed matrix A.
c (local)
Pointer into the local memory to an array of size lld_c * LOCc(jc+n-1). On
entry, the local pieces of the distributed matrix sub (C).
ic (global)
The row index in the global matrix C indicating the first row of sub(C).
jc (global)
The column index in the global matrix C indicating the first column of
sub(C).
descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.
work (local)
Workspace array of size lwork.
1739
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
info (local)
= 0: successful exit
< 0: if the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value,
then info = - (i*100 +j),
NOTE
The distributed submatrices A(ia:*, ja:*) and C(ic:ic+m-1,jc:jc+n-1) must verify some
alignment properties, namely the following expressions should be true:
If side = 'L', (nb_a == mb_c) && (icoffa == iroffc).
If side = 'R', (nb_a == nb_c) && (icoffa == icoffc) && (iacol == iccol).
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?pbtrsv
Solves a single triangular linear system via frontsolve
or backsolve where the triangular matrix is a factor of
a banded matrix computed by p?pbtrf.
Syntax
void pspbtrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs ,
float *a , MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *descb ,
float *af , MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT *info );
void pdpbtrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs ,
double *a , MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *descb ,
double *af , MKL_INT *laf , double *work , MKL_INT *lwork , MKL_INT *info );
void pcpbtrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs ,
MKL_Complex8 *a , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib ,
MKL_INT *descb , MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
1740
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void pzpbtrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs ,
MKL_Complex16 *a , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib ,
MKL_INT *descb , MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?pbtrsvfunction solves a banded triangular system of linear equations
or
A(1:n, ja:ja+n-1)T*X = B(jb:jb+n-1, 1:nrhs) for real flavors,
where A(1:n, ja:ja+n-1) is a banded triangular matrix factor produced by the Cholesky factorization code
p?pbtrf and is stored in A(1:n, ja:ja+n-1) and af. The matrix stored in A(1:n, ja:ja+n-1) is either
upper or lower triangular according to uplo.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
If trans = 'T' or 'C' for real flavors, solve with A(1:n, ja:ja+n-1)T.
n (global)
The number of rows and columns to be operated on, that is, the order of
the distributed submatrix A(1:n, ja:ja+n-1). n ≥ 0.
bw (global)
The number of subdiagonals in 'L' or 'U', 0 ≤bw≤n-1.
nrhs (global)
The number of right hand sides; the number of columns of the distributed
submatrix B(jb:jb+n-1, 1:nrhs); nrhs≥ 0.
1741
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
a (local)
Pointer into the local memory to an array with the first size lld_a ≥ (bw
+1), stored in desca.
On entry, this array contains the local pieces of the n-by-n symmetric
banded distributed Cholesky factor L or LT*A(1:n, ja:ja+n-1).
This local portion is stored in the packed banded format used in LAPACK.
See the Application Notes below and the ScaLAPACK manual for more detail
on the format of distributed matrices.
ja (global) The index in the global in the global matrix A that points to the
start of the matrix to be operated on (which may be either all of A or a
submatrix of A).
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
If 1D type (dtype_a = 501), then dlen≥ 7;
b (local)
Pointer into the local memory to an array of local lead size lld_b ≥nb.
On entry, this array contains the local pieces of the right hand sides
B(jb:jb+n-1, 1:nrhs).
ib (global) The row index in the global matrix B that points to the first row of
the matrix to be operated on (which may be either all of B or a submatrix of
B).
descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
If 1D type (dtype_b = 502), then dlen≥ 7;
laf (local)
The size of user-input auxiliary fill-in space af. Must be laf ≥ (nb
+2*bw)*bw . If laf is not large enough, an error code will be returned and
the minimum acceptable size will be returned in af[0].
work (local)
The array work is a temporary workspace array of size lwork. This space
may be overwritten in between function calls.
lwork (local or global) The size of the user-input workspace work, must be at
least lwork ≥bw*nrhs. If lwork is too small, the minimal acceptable size
will be returned in work[0] and an error code is returned.
1742
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
af (local)
The array af is of size laf. It contains auxiliary fill-in space. The fill-in
space is created in a call to the factorization function p?pbtrf and is stored
in af. If a linear system is to be solved using p?pbtrs after the
factorization function, af must not be altered after the factorization.
b On exit, this array contains the local piece of the solutions distributed
matrix X.
info (local)
= 0: successful exit
< 0: if the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value,
then info = - (i*100 +j),
Application Notes
If the factorization function and the solve function are to be called separately to solve various sets of right-
hand sides using the same coefficient matrix, the auxiliary space af must not be altered between calls to the
factorization function and the solve function.
The best algorithm for solving banded and tridiagonal linear systems depends on a variety of parameters,
especially the bandwidth. Currently, only algorithms designed for the case N/P>>bw are implemented. These
algorithms go by many names, including Divide and Conquer, Partitioning, domain decomposition-type, etc.
The Divide and Conquer algorithm assumes the matrix is narrowly banded compared with the number of
equations. In this situation, it is best to distribute the input matrix A one-dimensionally, with columns atomic
and rows divided amongst the processes. The basic algorithm divides the banded matrix up into P pieces with
one stored on each processor, and then proceeds in 2 phases for the factorization or 3 for the solution of a
linear system.
1. Local Phase: The individual pieces are factored independently and in parallel. These factors are
applied to the matrix creating fill-in, which is stored in a non-inspectable way in auxiliary space af.
Mathematically, this is equivalent to reordering the matrix A as PAPT and then factoring the principal
leading submatrix of size equal to the sum of the sizes of the matrices factored on each processor. The
factors of these submatrices overwrite the corresponding parts of A in memory.
2. Reduced System Phase: A small (bw*(P-1)) system is formed representing interaction of the larger
blocks and is stored (as are its factors) in the space af. A parallel Block Cyclic Reduction algorithm is
used. For a linear system, a parallel front solve followed by an analogous backsolve, both using the
structure of the factored matrix, are performed.
3. Back Subsitution Phase: For a linear system, a local backsubstitution is performed on each processor
in parallel.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
1743
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
p?pttrsv
Solves a single triangular linear system via frontsolve
or backsolve where the triangular matrix is a factor of
a tridiagonal matrix computed by p?pttrf .
Syntax
void pspttrsv (char *uplo , MKL_INT *n , MKL_INT *nrhs , float *d , float *e , MKL_INT
*ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *descb , float *af , MKL_INT
*laf , float *work , MKL_INT *lwork , MKL_INT *info );
void pdpttrsv (char *uplo , MKL_INT *n , MKL_INT *nrhs , double *d , double *e , MKL_INT
*ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *descb , double *af , MKL_INT
*laf , double *work , MKL_INT *lwork , MKL_INT *info );
void pcpttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , float *d ,
MKL_Complex8 *e , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib ,
MKL_INT *descb , MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8 *work , MKL_INT
*lwork , MKL_INT *info );
void pzpttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , double *d ,
MKL_Complex16 *e , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib ,
MKL_INT *descb , MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16 *work , MKL_INT
*lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?pttrsvfunction solves a tridiagonal triangular system of linear equations
or
A(1:n, ja:ja+n-1)T*X = B(jb:jb+n-1, 1:nrhs) for real flavors,
where A(1:n, ja:ja+n-1) is a tridiagonal triangular matrix factor produced by the Cholesky factorization
code p?pttrf and is stored in A(1:n, ja:ja+n-1) and af. The matrix stored in A(1:n, ja:ja+n-1) is
either upper or lower triangular according to uplo.
Input Parameters
1744
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
n (global)
The number of rows and columns to be operated on, that is, the order of
the distributed submatrix A(1:n, ja:ja+n-1). n ≥ 0.
nrhs (global)
The number of right hand sides; the number of columns of the distributed
submatrix B(jb:jb+n-1, 1:nrhs); nrhs ≥ 0.
d (local)
Pointer to the local part of the global vector storing the main diagonal of the
matrix; must be of size ≥nb_a.
e (local)
Pointer to the local part of the global vector du storing the upper diagonal of
the matrix; must be of size ≥nb_a. Globally, du(n) is not referenced, and du
must be aligned with d.
ja (global) The index in the global matrix A that points to the start of the
matrix to be operated on (which may be either all of A or a submatrix of A).
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
If 1D type (dtype_a = 501 or 502), then dlen ≥ 7;
b (local)
Pointer into the local memory to an array of local lead size lld_b ≥ nb.
On entry, this array contains the local pieces of the right hand sides
B(jb:jb+n-1, 1:nrhs).
ib (global) The row index in the global matrix B that points to the first row of
the matrix to be operated on (which may be either all of B or a submatrix of
B).
descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
If 1D type (dtype_b = 502), then dlen ≥ 7;
laf (local)
The size of user-input auxiliary fill-in space af. Must be laf ≥ (nb
+2*bw)*bw.
If laf is not large enough, an error code will be returned and the minimum
acceptable size will be returned in af[0].
work (local)
1745
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The array work is a temporary workspace array of size lwork. This space
may be overwritten in between function calls.
lwork (local or global) The size of the user-input workspace work, must be at
least lwork ≥(10+2*min(100, nrhs))*npcol+4*nrhs. If lwork is too
small, the minimal acceptable size will be returned in work[0] and an error
code is returned.
Output Parameters
d, e (local).
On exit, these arrays contain information on the factors of the matrix.
af (local)
The array af is of size laf. It contains auxiliary fill-in space. The fill-in
space is created in a call to the factorization function p?pbtrf and is stored
in af. If a linear system is to be solved using p?pttrs after the
factorization function, af must not be altered after the factorization.
b On exit, this array contains the local piece of the solutions distributed
matrix X.
info (local)
= 0: successful exit
< 0: if the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value,
then info = - (i*100 +j),
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?potf2
Computes the Cholesky factorization of a symmetric/
Hermitian positive definite matrix (local unblocked
algorithm).
Syntax
void pspotf2 (char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *info );
void pdpotf2 (char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *info );
void pcpotf2 (char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *info );
void pzpotf2 (char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *info );
1746
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl_scalapack.h
Description
The p?potf2function computes the Cholesky factorization of a real symmetric or complex Hermitian positive
definite distributed matrix sub (A)=A(ia:ia+n-1, ja:ja+n-1).
where U is an upper triangular matrix, L is lower triangular. X' denotes transpose (conjugate transpose) of X.
Input Parameters
uplo (global)
Specifies whether the upper or lower triangular part of the symmetric/
Hermitian matrix A is stored.
= 'U': upper triangle of sub (A) is stored;
n (global)
The number of rows and columns to be operated on, that is, the order of
the distributed matrix sub (A). n ≥ 0.
a (local)
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1)
containing the local pieces of the n-by-n symmetric distributed matrix
sub(A) to be factored.
If uplo = 'U', the leading n-by-n upper triangular part of sub(A) contains
the upper triangular matrix and the strictly lower triangular part of this
matrix is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
the lower triangular matrix and the strictly upper triangular part of sub(A) is
not referenced.
ia, ja (global)
The row and column indices in the global matrix A indicating the first row
and the first column of the sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
Output Parameters
a (local)
On exit,
if uplo = 'U', the upper triangular part of the distributed matrix contains
the Cholesky factor U;
if uplo = 'L', the lower triangular part of the distributed matrix contains
the Cholesky factor L.
1747
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
info (local)
= 0: successful exit
< 0: if the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value,
then info = - (i*100 +j),
> 0: if info = k, the leading minor of order k is not positive definite, and
the factorization could not be completed.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?rot
Applies a planar rotation to two distributed vectors.
Syntax
void psrot(MKL_INT* n, float* x, MKL_INT* ix, MKL_INT* jx, MKL_INT* descx, MKL_INT*
incx, float* y, MKL_INT* iy, MKL_INT* jy, MKL_INT* descy, MKL_INT* incy, float* cs,
float* sn, float* work, MKL_INT* lwork, MKL_INT* info);
void pdrot(MKL_INT* n, double* x, MKL_INT* ix, MKL_INT* jx, MKL_INT* descx, MKL_INT*
incx, double* y, MKL_INT* iy, MKL_INT* jy, MKL_INT* descy, MKL_INT* incy, double* cs,
double* sn, double* work, MKL_INT* lwork, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
p?rot applies a planar rotation defined by cs and sn to the two distributed vectors sub(x) and sub(y).
Input Parameters
n (global )
The number of elements to operate on when applying the planar rotation to
x and y (n≥0).
ix (global )
The global row index of the submatrix of the distributed matrix x to operate
on. If incx = 1, then it is required that ix = iy. 1 ≤ix≤m_x.
jx (global )
The global column index of the submatrix of the distributed matrix x to
operate on. If incx = m_x, then it is required that jx = jy. 1 ≤ix≤n_x.
1748
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The array descriptor of the distributed matrix x.
incx (global )
The global increment for the elements of x. Only two values of incx are
supported in this version, namely 1 and m_x. Moreover, it must hold that
incx = m_x if incy =m_y and that incx = 1 if incy = 1.
iy (global )
The global row index of the submatrix of the distributed matrix y to operate
on. If incy = 1, then it is required that iy = ix. 1 ≤iy≤m_y.
jy (global )
The global column index of the submatrix of the distributed matrix y to
operate on. If incy = m_x, then it is required that jy = jx. 1 ≤jy≤m_y.
incy (global )
The global increment for the elements of y. Only two values of incy are
supported in this version, namely 1 and m_y. Moreover, it must hold that
incy = m_y if incx = m_x and that incy = 1 if incx = 1.
cs, sn (global)
The parameters defining the properties of the planar rotation. It must hold
that 0 ≤cs,sn≤ 1 and that sn2 + cs2 = 1. The latter is hardly checked in
finite precision arithmetics.
lwork (local )
The length of the workspace array work.
OUTPUT Parameters
x
y
work[0] On exit, if info = 0, work[0] returns the optimal lwork
info (global )
= 0: successful exit
< 0: if info = -i, the i-th argument had an illegal value.
1749
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
If the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value, then info = -(i*100+j), if the i-th argument is a scalar and
had an illegal value, then info = -i.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?rscl
Multiplies a vector by the reciprocal of a real scalar.
Syntax
void psrscl (MKL_INT *n , float *sa , float *sx , MKL_INT *ix , MKL_INT *jx , MKL_INT
*descx , MKL_INT *incx );
void pdrscl (MKL_INT *n , double *sa , double *sx , MKL_INT *ix , MKL_INT *jx , MKL_INT
*descx , MKL_INT *incx );
void pcsrscl (MKL_INT *n , float *sa , MKL_Complex8 *sx , MKL_INT *ix , MKL_INT *jx ,
MKL_INT *descx , MKL_INT *incx );
void pzdrscl (MKL_INT *n , double *sa , MKL_Complex16 *sx , MKL_INT *ix , MKL_INT *jx ,
MKL_INT *descx , MKL_INT *incx );
Include Files
• mkl_scalapack.h
Description
The p?rsclfunction multiplies an n-element real/complex vector sub(X) by the real scalar 1/a. This is done
without overflow or underflow as long as the final result sub(X)/a does not overflow or underflow.
sub(X) denotes X(ix:ix+n-1, jx:jx), if incx = 1,
Input Parameters
n (global)
The number of components of the distributed vector sub(X). n ≥ 0.
sa The scalar a that is used to divide each component of the vector sub(X).
This parameter must be ≥ 0.
jx (global)
The column index of the submatrix of the distributed matrix X to operate
on.
1750
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
incx (global)
The increment for the elements of X. This version supports only two values
of incx, namely 1 and m_x.
Output Parameters
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?sygs2/p?hegs2
Reduces a symmetric/Hermitian positive-definite
generalized eigenproblem to standard form, using the
factorization results obtained from p?potrf (local
unblocked algorithm).
Syntax
void pssygs2 (MKL_INT *ibtype , char *uplo , MKL_INT *n , float *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb ,
MKL_INT *info );
void pdsygs2 (MKL_INT *ibtype , char *uplo , MKL_INT *n , double *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb ,
MKL_INT *info );
void pchegs2 (MKL_INT *ibtype , char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *jb ,
MKL_INT *descb , MKL_INT *info );
void pzhegs2 (MKL_INT *ibtype , char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *jb ,
MKL_INT *descb , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?sygs2/p?hegs2function reduces a real symmetric-definite or a complex Hermitian positive-definite
generalized eigenproblem to standard form.
Here sub(A) denotes A(ia:ia+n-1, ja:ja+n-1), and sub(B) denotes B(ib:ib+n-1, jb:jb+n-1).
sub(A)*x = λ*sub(B)*x
and sub(A) is overwritten by
1751
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
ibtype (global)
= 1:
compute inv(UT)*sub(A)*inv(U), or inv(L)*sub(A)*inv(LT) for real
functions,
and inv(UH)*sub(A)*inv(U), or inv(L)*sub(A)*inv(LH) for complex
functions;
= 2 or 3:
compute U*sub(A)*UT, or LT*sub(A)*L for real functions,
uplo (global)
Specifies whether the upper or lower triangular part of the symmetric/
Hermitian matrix sub(A) is stored, and how sub(B) is factorized.
= 'U': Upper triangular of sub(A) is stored and sub(B) is factorized as UT*U
(for real functions) or as UH*U (for complex functions).
= 'L': Lower triangular of sub(A) is stored and sub(B) is factorized as L*LT
(for real functions) or as L*LH (for complex functions)
n (global)
The order of the matrices sub(A) and sub(B). n ≥ 0.
a (local)
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).
On entry, this array contains the local pieces of the n-by-n symmetric/
Hermitian distributed matrix sub(A).
If uplo = 'U', the leading n-by-n upper triangular part of sub(A) contains
the upper triangular part of the matrix, and the strictly lower triangular part
of sub(A) is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
the lower triangular part of the matrix, and the strictly upper triangular part
of sub(A) is not referenced.
ia, ja (global)
The row and column indices in the global matrix A indicating the first row
and the first column of the sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
B (local)
Pointer into the local memory to an array of size lld_b * LOCc(jb+n-1).
1752
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
On entry, this array contains the local pieces of the triangular factor from
the Cholesky factorization of sub(B) as returned by p?potrf.
ib, jb (global)
The row and column indices in the global matrix B indicating the first row
and the first column of the sub(B), respectively.
descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
Output Parameters
a (local)
On exit, if info = 0, the transformed matrix is stored in the same format
as sub(A).
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?sytd2/p?hetd2
Reduces a symmetric/Hermitian matrix to real
symmetric tridiagonal form by an orthogonal/unitary
similarity transformation (local unblocked algorithm).
Syntax
void pssytd2 (char *uplo, MKL_INT *n, float *a, MKL_INT *ia, MKL_INT *ja, MKL_INT
*desca, float *d, float *e, float *tau, float *work, MKL_INT *lwork, MKL_INT *info);
void pdsytd2 (char *uplo, MKL_INT *n, double *a, MKL_INT *ia, MKL_INT *ja, MKL_INT
*desca, double *d, double *e, double *tau, double *work, MKL_INT *lwork, MKL_INT *info);
void pchetd2 (char *uplo, MKL_INT *n, MKL_Complex8 *a, MKL_INT *ia, MKL_INT *ja, MKL_INT
*desca, float *d, float *e, MKL_Complex8 *tau, MKL_Complex8 *work, MKL_INT *lwork,
MKL_INT *info);
void pzhetd2 (char *uplo, MKL_INT *n, MKL_Complex16 *a, MKL_INT *ia, MKL_INT *ja,
MKL_INT *desca, double *d, double *e, MKL_Complex16 *tau, MKL_Complex16 *work, MKL_INT
*lwork, MKL_INT *info);
Include Files
• mkl_scalapack.h
Description
The p?sytd2/p?hetd2function reduces a real symmetric/complex Hermitian matrix sub(A) to symmetric/
Hermitian tridiagonal form T by an orthogonal/unitary similarity transformation:
1753
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
uplo (global)
Specifies whether the upper or lower triangular part of the symmetric/
Hermitian matrix sub(A) is stored:
= 'U': upper triangular
n (global)
The number of rows and columns to be operated on, that is, the order of
the distributed matrix sub(A). n ≥ 0.
a (local)
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).
On entry, this array contains the local pieces of the n-by-n symmetric/
Hermitian distributed matrix sub(A).
If uplo = 'U', the leading n-by-n upper triangular part of sub(A) contains
the upper triangular part of the matrix, and the strictly lower triangular part
of sub(A) is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
the lower triangular part of the matrix, and the strictly upper triangular part
of sub(A) is not referenced.
ia, ja (global)
The row and column indices in the global matrix A indicating the first row
and the first column of the sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
work (local)
The array work is a temporary workspace array of size lwork.
Output Parameters
a On exit, if uplo = 'U', the diagonal and first superdiagonal of sub(A) are
overwritten by the corresponding elements of the tridiagonal matrix T, and
the elements above the first superdiagonal, with the array tau, represent
the orthogonal/unitary matrix Q as a product of elementary reflectors;
if uplo = 'L', the diagonal and first subdiagonal of A are overwritten by
the corresponding elements of the tridiagonal matrix T, and the elements
below the first subdiagonal, with the array tau, represent the orthogonal/
unitary matrix Q as a product of elementary reflectors. See the Application
Notes below.
d (local)
Array of sizeLOCc(ja+n-1). The diagonal elements of the tridiagonal matrix
T:
1754
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
d[i] = A(i+1,i+1), where i=0,1, ..., LOCc(ja+n-1) -1 ; d is tied to the
distributed matrix A.
e (local)
Array of size LOCc(ja+n-1),
tau (local)
Array of size LOCc(ja+n-1).
The scalar factors of the elementary reflectors. tau is tied to the distributed
matrix A.
work[0] On exit, work[0] returns the minimal and optimal value of lwork.
info (local)
= 0: successful exit
< 0: if the i-th argument, indexed i-1, is an array and the j-th entry had an
illegal value,
then info = -(i*100+j),
Application Notes
If uplo = 'U', the matrix Q is represented as a product of elementary reflectors
Q = H(n-1)*...*H(2)*H(1)
Each H(i) has the form
H(i) = I - tau*v*v',
where tau is a real/complex scalar, and v is a real/complex vector with v(i+1:n) = 0 and v(i) = 1;
v(1:i-1) is stored on exit in A(ia:ia+i-2, ja+i), and tau in tau[ja+i-2].
If uplo = 'L', the matrix Q is represented as a product of elementary reflectors
1755
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Q = H(1)*H(2)*...*H(n-1).
Each H(i) has the form
H(i) = I - tau*v*v' ,
where tau is a real/complex scalar, and v is a real/complex vector with v(1:i) = 0 and v(i+1) = 1; v(i
+2:n) is stored on exit in A(ia+i+1:ia+n-1, ja+i-1), and tau in tau[ja+i-2].
The contents of sub (A) on exit are illustrated by the following examples with n = 5:
where d and e denotes diagonal and off-diagonal elements of T, and vi denotes an element of the vector
defining H(i).
NOTE
The distributed matrix sub(A) must verify some alignment properties, namely the following
expression should be true:
( mb_a==nb_a && iroffa==icoffa )where iroffa = mod(ia - 1, mb_a) and icoffa =
mod(ja -1, nb_a).
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?trord
Reorders the Schur factorization of a general matrix.
Syntax
void pstrord( char* compq, MKL_INT* select, MKL_INT* para, MKL_INT* n, float* t,
MKL_INT* it, MKL_INT* jt, MKL_INT* desct, float* q, MKL_INT* iq, MKL_INT* jq, MKL_INT*
descq, float* wr, float* wi, MKL_INT* m, float* work, MKL_INT* lwork, MKL_INT* iwork,
MKL_INT* liwork, MKL_INT* info);
void pdtrord(char* compq, MKL_INT* select, MKL_INT* para, MKL_INT* n, double* t,
MKL_INT* it, MKL_INT* jt, MKL_INT* desct, double* q, MKL_INT* iq, MKL_INT* jq, MKL_INT*
descq, double* wr, double* wi, MKL_INT* m, double* work, MKL_INT* lwork, MKL_INT* iwork,
MKL_INT* liwork, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
p?trord reorders the real Schur factorization of a real matrix A = Q*T*QT, so that a selected cluster of
eigenvalues appears in the leading diagonal blocks of the upper quasi-triangular matrix T, and the leading
columns of Q form an orthonormal basis of the corresponding right invariant subspace.
1756
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
T must be in Schur form (as returned by p?lahqr), that is, block upper triangular with 1-by-1 and 2-by-2
diagonal blocks.
This function uses a delay and accumulate procedure for performing the off-diagonal updates.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
compq (global)
= 'V': update the matrix q of Schur vectors;
para (global)
Block parameters:
n (global)
1757
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
it, jt (global)
The row and column index in the global matrix T indicating the first column
of T. it = jt = 1 must hold (see Application Notes).
On entry, if compq = 'V', the local pieces of the global distributed matrix Q
of Schur vectors.
If compq = 'N', q is not referenced.
iq, jq (global)
The column index in the global matrix Q indicating the first column of Q. iq
= jq = 1 must hold (see Application Notes).
lwork (local)
The size of the array work.
liwork (local)
The size of the array iwork.
OUTPUT Parameters
1758
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
q On exit, if compq = 'V', q has been postmultiplied by the global orthogonal
transformation matrix which reorders t; the leading m columns of q form an
orthonormal basis for the specified invariant subspace.
If compq = 'N', q is not referenced.
m (global )
The size of the specified invariant subspace.
0 ≤m≤n.
info (global)
= 0: successful exit
< 0: if info = -i, the i-th argument had an illegal value. If the i-th
argument is an array and the j-th entry, indexed j-1, had an illegal value,
then info = -(i*1000+j), if the i-th argument is a scalar and had an illegal
value, then info = -i.
On exit, info = {the index of t where the swap failed (indexing starts
at 1)}.
• A 2-by-2 block to be reordered split into two 1-by-1 blocks and the
second block failed to swap with an adjacent block.
On exit, info = {the index of t where the swap failed}.
• If info = n+1, there is no valid BLACS context (see the BLACS
documentation for details).
Application Notes
The following alignment requirements must hold:
1759
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
This algorithm cannot work on submatrices of t and q, i.e., it = jt = iq = jq = 1 must hold. This is
however no limitation since p?lahqr does not compute Schur forms of submatrices anyway.
• Use a square grid, if possible, for maximum performance. The block parameters in para should be kept
well below the data distribution block size.
• In general, the parallel algorithm strives to perform as much work as possible without crossing the block
borders on the main block diagonal.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?trsen
Reorders the Schur factorization of a matrix and
(optionally) computes the reciprocal condition
numbers and invariant subspace for the selected
cluster of eigenvalues.
Syntax
void pstrsen(char* job, char* compq, MKL_INT* select, MKL_INT* para, MKL_INT* n, float*
t, MKL_INT* it, MKL_INT* jt, MKL_INT* desct, float* q, MKL_INT* iq, MKL_INT* jq,
MKL_INT* descq, float* wr, float* wi, MKL_INT* m, float* s, float* sep, float* work,
MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork, MKL_INT* info);
void pdtrsen(char* job, char* compq, MKL_INT* select, MKL_INT* para, MKL_INT* n, double*
t, MKL_INT* it, MKL_INT* jt, MKL_INT* desct, double* q, MKL_INT* iq, MKL_INT* jq,
MKL_INT* descq, double* wr, double* wi, MKL_INT* m, double* s, double* sep, double*
work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
p?trsen reorders the real Schur factorization of a real matrix A = Q*T*QT, so that a selected cluster of
eigenvalues appears in the leading diagonal blocks of the upper quasi-triangular matrix T, and the leading
columns of Q form an orthonormal basis of the corresponding right invariant subspace. The reordering is
performed by p?trord.
Optionally the function computes the reciprocal condition numbers of the cluster of eigenvalues and/or the
invariant subspace.
T must be in Schur form (as returned by p?lahqr), that is, block upper triangular with 1-by-1 and 2-by-2
diagonal blocks.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
job (global )
1760
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Specifies whether condition numbers are required for the cluster of
eigenvalues (s) or the invariant subspace (sep):
= 'V': only the condition number for the invariant subspace is computed
(sep);
= 'B': condition numbers for both the cluster and the invariant subspace are
computed (s and sep).
compq (global )
= 'V': update the matrix q of Schur vectors;
para (global )
Block parameters:
n (global )
The order of the globally distributed matrix t. n≥ 0.
1761
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
it, jt (global )
The row and column index in the global matrix T indicating the first column
of T. it = jt = 1 must hold (see Application Notes).
On entry, if compq = 'V', the local pieces of the global distributed matrix Q
of Schur vectors.
If compq = 'N', q is not referenced.
iq, jq (global )
The column index in the global matrix Q indicating the first column of Q. iq
= jq = 1 must hold (see Application Notes).
lwork (local )
The size of the array work.
liwork (local )
The size of the array iwork.
OUTPUT Parameters
1762
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If compq = 'N', q is not referenced.
m (global )
The size of the specified invariant subspace. 0 ≤m≤n.
s (global )
If job = 'E' or 'B', s is a lower bound on the reciprocal condition number for
the selected cluster of eigenvalues. s cannot underestimate the true
reciprocal condition number by more than a factor of sqrt(n). If m = 0 or n,
s = 1.
If job = 'N' or 'V', s is not referenced.
sep (global )
If job = 'V' or 'B', sep is the estimated reciprocal condition number of the
specified invariant subspace. If
m = 0 or n, sep = norm(t).
If job = 'N' or 'E', sep is not referenced.
info (global )
= 0: successful exit
< 0: if info = -i, the i-th argument had an illegal value.
If the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value, then info = -(i*1000+j), if the i-th argument is a scalar and
had an illegal value, then info = -i.
On exit, info = {the index of t where the swap failed (indexing starts
at 1)}.
• A 2-by-2 block to be reordered split into two 1-by-1 blocks and the
second block failed to swap with an adjacent block.
On exit, info = {the index of t where the swap failed}.
• If info = n+1, there is no valid BLACS context (see the BLACS
documentation for details).
1763
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Application Notes
The following alignment requirements must hold:
For parallel execution, use a square grid, if possible, for maximum performance. The block parameters in
para should be kept well below the data distribution block size.
In general, the parallel algorithm strives to perform as much work as possible without crossing the block
borders on the main block diagonal.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?trti2
Computes the inverse of a triangular matrix (local
unblocked algorithm).
Syntax
void pstrti2 (char *uplo , char *diag , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , MKL_INT *info );
void pdtrti2 (char *uplo , char *diag , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , MKL_INT *info );
void pctrti2 (char *uplo , char *diag , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_INT *info );
void pztrti2 (char *uplo , char *diag , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?trti2function computes the inverse of a real/complex upper or lower triangular block matrix sub (A)
= A(ia:ia+n-1, ja:ja+n-1).
This matrix should be contained in one and only one process memory space (local operation).
Input Parameters
uplo (global)
Specifies whether the matrix sub (A) is upper or lower triangular.
= 'U': sub (A) is upper triangular
diag (global)
1764
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Specifies whether or not the matrix A is unit triangular.
= 'N': sub (A) is non-unit triangular
n (global)
The number of rows and columns to be operated on, i.e., the order of the
distributed submatrix sub(A). n ≥ 0.
a (local)
Pointer into the local memory to an array, size lld_a * LOCc(ja+n-1).
On entry, this array contains the local pieces of the triangular matrix
sub(A).
If uplo = 'U', the leading n-by-n upper triangular part of the matrix
sub(A) contains the upper triangular part of the matrix, and the strictly
lower triangular part of sub(A) is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of the matrix
sub(A) contains the lower triangular part of the matrix, and the strictly
upper triangular part of sub(A) is not referenced. If diag = 'U', the
diagonal elements of sub(A) are not referenced either and are assumed to
be 1.
ia, ja (global)
The row and column indices in the global matrix A indicating the first row
and the first column of the sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
Output Parameters
a On exit, the (triangular) inverse of the original matrix, in the same storage
format.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
?lahqr2
Updates the eigenvalues and Schur decomposition.
1765
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
void clahqr2 (const MKL_INT* wantt, const MKL_INT* wantz, const MKL_INT* n, const
MKL_INT* ilo, const MKL_INT* ihi, MKL_Complex8* h, const MKL_INT* ldh, MKL_Complex8* w,
const MKL_INT* iloz, const MKL_INT* ihiz, MKL_Complex8* z, const MKL_INT* ldz, MKL_INT*
info);
void zlahqr2 (const MKL_INT* wantt, const MKL_INT* wantz, const MKL_INT* n, const
MKL_INT* ilo, const MKL_INT* ihi, MKL_Complex16* h, const MKL_INT* ldh, MKL_Complex16*
w, const MKL_INT* iloz, const MKL_INT* ihiz, MKL_Complex16* z, const MKL_INT* ldz,
MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
?lahqr2 is an auxiliary routine called by ?hseqr to update the eigenvalues and Schur decomposition already
computed by ?hseqr, by dealing with the Hessenberg submatrix in rows and columns ilo to ihi. This
version of ?lahqr (not the standard LAPACK version) uses a double-shift algorithm (like LAPACK's ?lahqr).
Unlike the standard LAPACK convention, this does not assume the subdiagonal is real, nor does it work to
preserve this quality if given.
Input Parameters
ilo, ihi It is assumed that the matrix H is upper triangular in rows and columns ihi
+1 :n, and that matrix element H(ilo,ilo-1) = 0 (unless ilo =
1). ?lahqr works primarily with the Hessenberg submatrix in rows and
columns ilo to ihi, but applies transformations to all of h if wantt is
nonzero.
1 <= ilo <= max(1,ihi); ihi <= n.
iloz, ihiz Specify the rows of Z to which transformations must be applied if wantz≠ 0.
1766
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ldz The leading dimension of the array z. ldz >= max(1,n).
Output Parameters
?lamsh
Sends multiple shifts through a small (single node)
matrix to maximize the number of bulges that can be
sent through.
Syntax
void slamsh (float *s, const MKL_INT *lds, MKL_INT *nbulge, const MKL_INT *jblk, float
*h, const MKL_INT *ldh, const MKL_INT *n, const float *ulp );
void dlamsh (double *s, const MKL_INT *lds, MKL_INT *nbulge, const MKL_INT *jblk,
double *h, const MKL_INT *ldh, const MKL_INT *n, const double *ulp );
void clamsh (MKL_Complex8 *s , const MKL_INT *lds , MKL_INT *nbulge , const MKL_INT
*jblk , MKL_Complex8 *h , const MKL_INT *ldh , const MKL_INT *n , const float *ulp );
void zlamsh (MKL_Complex16 *s , const MKL_INT *lds , MKL_INT *nbulge , const MKL_INT
*jblk , MKL_Complex16 *h , const MKL_INT *ldh , const MKL_INT *n , const double *ulp );
Include Files
• mkl_scalapack.h
Description
The ?lamshfunction sends multiple shifts through a small (single node) matrix to see how small consecutive
subdiagonal elements are modified by subsequent shifts in an effort to maximize the number of bulges that
can be sent through. The function should only be called when there are multiple shifts/bulges (nbulge > 1)
and the first shift is starting in the middle of an unreduced Hessenberg matrix because of two or more small
consecutive subdiagonal elements.
1767
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
s (local)
Array of size lds*2*jblk.
lds (local)
On entry, the leading dimension of S; unchanged on exit. 1<nbulge ≤ jblk
≤ lds/2.
nbulge (local)
On entry, the number of bulges to send through h (>1). nbulge should be
less than the maximum determined (jblk). 1<nbulge ≤ jblk ≤ lds/2.
jblk (local)
On entry, the number of double shifts determined for S; unchanged on exit.
h (local)
Array of size ldh*n.
ldh (local)
n (local)
On entry, the size of H. If all the bulges are expected to go through, n
should be at least 4nbulge+2. Otherwise, nbulge may be reduced by this
function.
ulp (local)
On entry, machine precision. Unchanged on exit.
Output Parameters
nbulge On exit, the maximum number of bulges that can be sent through.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
?lapst
Sorts the numbers in increasing or decreasing order.
Syntax
void slapst (const char* id, const MKL_INT* n, const float* d, MKL_INT* indx, MKL_INT*
info);
1768
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void dlapst (const char* id, const MKL_INT* n, const double* d, MKL_INT* indx, MKL_INT*
info);
Include Files
• mkl_scalapack.h
Description
?lapst is a modified version of the LAPACK routine ?lasrt.
Define a permutation indx that sorts the numbers in d in increasing order (if id = 'I') or in decreasing order
(if id = 'D' ).
Use Quick Sort, reverting to Insertion sort on arrays of size <= 20. Dimension of STACK limits n to about
232.
Input Parameters
Output Parameters
?laqr6
Performs a single small-bulge multi-shift QR sweep
collecting the transformations.
Syntax
void slaqr6(char* job, MKL_INT* wantt, MKL_INT* wantz, MKL_INT* kacc22, MKL_INT* n,
MKL_INT* ktop, MKL_INT* kbot, MKL_INT* nshfts, float* sr, float* si, float* h, MKL_INT*
ldh, MKL_INT* iloz, MKL_INT* ihiz, float* z, MKL_INT* ldz, float* v, MKL_INT* ldv,
float* u, MKL_INT* ldu, MKL_INT* nv, float* wv, MKL_INT* ldwv, MKL_INT* nh, float* wh,
MKL_INT* ldwh);
void dlaqr6(char* job, MKL_INT* wantt, MKL_INT* wantz, MKL_INT* kacc22, MKL_INT* n,
MKL_INT* ktop, MKL_INT* kbot, MKL_INT* nshfts, double* sr, double* si, double* h,
MKL_INT* ldh, MKL_INT* iloz, MKL_INT* ihiz, double* z, MKL_INT* ldz, double* v, MKL_INT*
ldv, double* u, MKL_INT* ldu, MKL_INT* nv, double* wv, MKL_INT* ldwv, MKL_INT* nh,
double* wh, MKL_INT* ldwh);
1769
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Include Files
• mkl_scalapack.h
Description
This auxiliary function performs a single small-bulge multi-shift QR sweep, moving the chain of bulges from
top to bottom in the submatrix H(ktop:kbot,ktop:kbot), collecting the transformations in the matrix V or
accumulating the transformations in the matrix Z (see below).
This is a modified version of ?laqr5 from LAPACK 3.1.
Input Parameters
wantz wantzis non-zero if the orthogonal Schur factor is being computed. wantz
is set to zero otherwise.
ktop, kbot These are the first and last rows and columns of an isolated diagonal block
upon which the QR sweep is to be applied. It is assumed without a check
that either ktop = 1 or H(ktop,ktop-1) = 0 and either kbot = n or H(kbot
+1,kbot) = 0.
nshfts nshfts gives the number of simultaneous shifts. nshfts must be positive
and even.
sr contains the real parts and si contains the imaginary parts of the
nshfts shifts of origin that define the multi-shift QR sweep.
ldh ldh is the leading dimension of H just as declared in the calling function.
ldh≥ max(1,n).
1770
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
iloz, ihiz Specify the rows of the matrix Zto which transformations must be applied if
wantzis non-zero. 1≤iloz≤ihiz≤n
ldz ldz is the leading dimension of z just as declared in the calling function.
ldz≥n.
ldv ldv is the leading dimension of v as declared in the calling function. ldv≥3.
ldu ldu is the leading dimension of u just as declared in the calling function.
ldu≥3*nshfts-3.
ldwv scalar
ldwv is the leading dimension of wv as declared in the in the calling
function. ldwv≥nv.
OUTPUT Parameters
Application Notes
Notes
Based on contributions by Karen Braman and Ralph Byers, Department of Mathematics, University of Kansas,
USA Robert Granat, Department of Computing Science and HPC2N, Umea University, Sweden
1771
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
?lar1va
Computes scaled eigenvector corresponding to given
eigenvalue.
Syntax
void slar1va(MKL_INT* n, MKL_INT* b1, MKL_INT* bn, float* lambda, float* d, float* l,
float* ld, float* lld, float* pivmin, float* gaptol, float* z, MKL_INT* wantnc, MKL_INT*
negcnt, float* ztz, float* mingma, MKL_INT* r, MKL_INT* isuppz, float* nrminv, float*
resid, float* rqcorr, float* work);
void dlar1va(MKL_INT* n, MKL_INT* b1, MKL_INT* bn, double* lambda, double* d, double* l,
double* ld, double* lld, double* pivmin, double* gaptol, double* z, MKL_INT* wantnc,
MKL_INT* negcnt, double* ztz, double* mingma, MKL_INT* r, MKL_INT* isuppz, double*
nrminv, double* resid, double* rqcorr, double* work);
Include Files
• mkl_scalapack.h
Description
?slar1va computes the (scaled) r-th column of the inverse of the submatrix in rows b1 through bn of the
tridiagonal matrix LDLT - λI. When λ is close to an eigenvalue, the computed vector is an accurate
eigenvector. Usually, r corresponds to the index where the eigenvector is largest in magnitude. The following
steps accomplish this computation :
Input Parameters
d Array of size n
1772
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
lld Array of size n-1
gaptol Tolerance that indicates when eigenvector entries are negligible with respect
to their contribution to the residual.
z Array of size n
OUTPUT Parameters
z On output, z contains the (scaled) r-th column of the inverse. The scaling is
such that z[r-1] equals 1.
negcnt If wantncis non-zero then negcnt = the number of pivots < pivmin in the
matrix factorization LDLT, and negcnt = -1 otherwise.
mingma The reciprocal of the largest (in magnitude) diagonal element of the inverse
of LDLT - σI.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
?laref
Applies Householder reflectors to matrices on their
rows or columns.
1773
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
void slaref (const char* type, float* a, const MKL_INT* lda, const MKL_INT* wantz,
float* z, const MKL_INT* ldz, const MKL_INT* block, MKL_INT* irow1, MKL_INT* icol1,
const MKL_INT* istart, const MKL_INT* istop, const MKL_INT* itmp1, const MKL_INT*
itmp2, const MKL_INT* liloz, const MKL_INT* lihiz, const float* vecs, float* v2, float*
v3, float* t1, float* t2, float* t3);
void dlaref (const char* type, double* a, const MKL_INT* lda, const MKL_INT* wantz,
double* z, const MKL_INT* ldz, const MKL_INT* block, MKL_INT* irow1, MKL_INT* icol1,
const MKL_INT* istart, const MKL_INT* istop, const MKL_INT* itmp1, const MKL_INT*
itmp2, const MKL_INT* liloz, const MKL_INT* lihiz, const double* vecs, double* v2,
double* v3, double* t1, double* t2, double* t3);
void claref (const char* type, MKL_Complex8* a, const MKL_INT* lda, const MKL_INT*
wantz, MKL_Complex8* z, const MKL_INT* ldz, const MKL_INT* block, MKL_INT* irow1,
MKL_INT* icol1, const MKL_INT* istart, const MKL_INT* istop, const MKL_INT* itmp1,
const MKL_INT* itmp2, const MKL_INT* liloz, const MKL_INT* lihiz, const MKL_Complex8*
vecs, MKL_Complex8* v2, MKL_Complex8* v3, MKL_Complex8* t1, MKL_Complex8* t2,
MKL_Complex8* t3);
void zlaref (const char* type, MKL_Complex16* a, const MKL_INT* lda, const MKL_INT*
wantz, MKL_Complex16* z, const MKL_INT* ldz, const MKL_INT* block, MKL_INT* irow1,
MKL_INT* icol1, const MKL_INT* istart, const MKL_INT* istop, const MKL_INT* itmp1,
const MKL_INT* itmp2, const MKL_INT* liloz, const MKL_INT* lihiz, const MKL_Complex16*
vecs, MKL_Complex16* v2, MKL_Complex16* v3, MKL_Complex16* t1, MKL_Complex16* t2,
MKL_Complex16* t3);
Include Files
• mkl_scalapack.h
Description
?laref applies one or several Householder reflectors of size 3 to one or two matrices (if column is specified)
on either their rows or columns.
Input Parameters
type (local)
If 'R': Apply reflectors to the rows of the matrix (apply from left)
Otherwise: Apply reflectors to the columns of the matrix
Unchanged on exit.
a (local)
Array, lld_a*LOCc(ja+n-1)
lda (local)
Unchanged on exit.
wantz (local)
1774
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If wantz≠ 0, then apply any column reflections to z as well.
z (local)
Array, ldz*ncols, where the value ncols depends on other arguments. If
wantzwantz≠ 0 and type≠ 'R' then ncols = icol1 + 3*(lihiz - liloz +
1). Otherwise, ncols is unused.
On entry, the second matrix to receive column reflections.
This is changed only if wantz is set.
ldz (local)
Unchanged on exit.
block (local)
If nonzero, then apply several reflectors at once and read their data from
the vecs array.
If zero, apply the single reflector given by v2, v3, t1, t2, and t3.
irow1 (local)
icol1 (local)
istart (local)
Specifies the "number" of the first reflector. This is used as an index into
vecs if block is set. istart is ignored if block is zero.
istop (local)
Specifies the "number" of the last reflector. This is used as an index into
vecs if block is set. istop is ignored if block is zero.
itmp1 (local)
Starting range into a. For rows, this is the local first column. For columns,
this is the local first row.
itmp2 (local)
Ending range into a. For rows, this is the local last column. For columns,
this is the local last row.
These serve the same purpose as itmp1, itmp2 but for z when wantz is
set.
vecs (local)
1775
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
?larrb2
Provides limited bisection to locate eigenvalues for
more accuracy.
Syntax
void slarrb2(MKL_INT* n, float* d, float* lld, MKL_INT* ifirst, MKL_INT* ilast, float*
rtol1, float* rtol2, MKL_INT* offset, float* w, float* wgap, float* werr, float* work,
MKL_INT* iwork, float* pivmin, float* lgpvmn, float* lgspdm, MKL_INT* twist, MKL_INT*
info);
void dlarrb2(MKL_INT* n, double* d, double* lld, MKL_INT* ifirst, MKL_INT* ilast,
double* rtol1, double* rtol2, MKL_INT* offset, double* w, double* wgap, double* werr,
double* work, MKL_INT* iwork, double* pivmin, double* lgpvmn, double* lgspdm, MKL_INT*
twist, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
Given the relatively robust representation (RRR) LDLT, ?larrb2 does "limited" bisection to refine the
eigenvalues of LDLT with indices in a given range to more accuracy. Initial guesses for these eigenvalues are
input in w, the corresponding estimate of the error in these guesses and their gaps are input in werr and
wgap, respectively. During bisection, intervals [left, right] are maintained by storing their mid-points and
semi-widths in the arrays w and werr respectively. The range of indices is specified by the ifirst, ilast,
and offset parameters, as explained in Input Parameters.
1776
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
NOTE
There are very few minor differences between larrb from LAPACK and this current
function ?larrb2. The most important reason for creating this nearly identical copy is
profiling: in the ScaLAPACK MRRR algorithm, eigenvalue computation using ?larrb2 is used
for refinement in the construction of the representation tree, as opposed to the initial
computation of the eigenvalues for the root RRR which uses ?larrb. When profiling, this
allows an easy quantification of refinement work vs. computing eigenvalues of the root.
Input Parameters
d Array of size n.
offset Offset for the arrays w, wgap and werr, i.e., the elements indexed ifirst -
offset - 1 through ilast - offset -1 of these arrays are to be used.
w Array of size n
Workspace.
Workspace.
1777
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
twist The twist index for the twisted factorization that is used for the negcount.
twist = n: Compute negcount from LDLT - λI = L+D+L+T
twist = 1: Compute negcount from LDLT - λI = U-D-U-T
twist = r, 1 < r < n: Compute negcount from LDLT - λI = Nr Δr NrT
OUTPUT Parameters
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
?larrd2
Computes the eigenvalues of a symmetric tridiagonal
matrix to suitable accuracy.
Syntax
void slarrd2(char* range, char* order, MKL_INT* n, float* vl, float* vu, MKL_INT* il,
MKL_INT* iu, float* gers, float* reltol, float* d, float* e, float* e2, float* pivmin,
MKL_INT* nsplit, MKL_INT* isplit, MKL_INT* m, float* w, float* werr, float* wl, float*
wu, MKL_INT* iblock, MKL_INT* indexw, float* work, MKL_INT* iwork, MKL_INT* dol,
MKL_INT* dou, MKL_INT* info);
void dlarrd2(char* range, char* order, MKL_INT* n, double* vl, double* vu, MKL_INT* il,
MKL_INT* iu, double* gers, double* reltol, double* d, double* e, double* e2, double*
pivmin, MKL_INT* nsplit, MKL_INT* isplit, MKL_INT* m, double* w, double* werr, double*
wl, double* wu, MKL_INT* iblock, MKL_INT* indexw, double* work, MKL_INT* iwork, MKL_INT*
dol, MKL_INT* dou, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
?larrd2 computes the eigenvalues of a symmetric tridiagonal matrix T to limited initial accuracy. This is an
auxiliary code to be called from larre2a.
?larrd2 has been created using the LAPACK code larrd which itself stems from stebz. The motivation for
creating ?larrd2 is efficiency: When computing eigenvalues in parallel and the input tridiagonal matrix splits
into blocks, ?larrd2 can skip over blocks which contain none of the eigenvalues from DOL to DOU for which
the processor responsible. In extreme cases (such as large matrices consisting of many blocks of small size
like 2x2), the gain can be substantial.
1778
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
order = 'B': ("By Block") the eigenvalues will be grouped by split-off block (see
iblock, isplit) and ordered from smallest to largest within the block.
= 'E': ("Entire matrix") the eigenvalues for the entire matrix will be ordered
from smallest to largest.
vl, vu If range='V', the lower and upper bounds of the interval to be searched for
eigenvalues. Eigenvalues less than or equal to vl, or greater than vu, will
not be returned. vl < vu.
il, iu If range='I', the indices (in ascending order) of the smallest eigenvalue, to
be returned in w[il-1], and largest eigenvalue, to be returned in w[iu-1].
d Array of size n
1779
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
dol, dou Specifying an index range dol:dou allows the user to work on only a
selected part of the representation tree.
Otherwise, the setting dol=1, dou=n should be applied.
Note that dol and dou refer to the order in which the eigenvalues are
stored in W.
OUTPUT Parameters
w Array of size n
wl, wu The interval (wl, wu] contains all the wanted eigenvalues.
on the spectrum.
If range='I', then wl and wu are computed by SLAEBZ from the
1780
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
At each row/column j where e[j-1] is zero or small, the matrix T is
considered to split into a block diagonal matrix. On exit, if info = 0,
iblock[i] specifies to which block (from 0 to the number of blocks minus
one) the eigenvalue w[i] belongs. (?larrd2 may use the remaining n-m
elements as workspace.)
The indices of the eigenvalues within each block (submatrix); for example,
indexw[i]= j and iblock[i]=k imply that the (i+1)-th eigenvalue w[i] is the
j-th eigenvalue in block k.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
?larre2
Given a tridiagonal matrix, sets small off-diagonal
elements to zero and for each unreduced block, finds
base representations and eigenvalues.
Syntax
void slarre2(char* range, MKL_INT* n, float* vl, float* vu, MKL_INT* il, MKL_INT* iu,
float* d, float* e, float* e2, float* rtol1, float* rtol2, float* spltol, MKL_INT*
nsplit, MKL_INT* isplit, MKL_INT* m, MKL_INT* dol, MKL_INT* dou, float* w, float* werr,
float* wgap, MKL_INT* iblock, MKL_INT* indexw, float* gers, float* pivmin, float* work,
MKL_INT* iwork, MKL_INT* info);
void dlarre2(char* range, MKL_INT* n, double* vl, double* vu, MKL_INT* il, MKL_INT* iu,
double* d, double* e, double* e2, double* rtol1, double* rtol2, double* spltol, MKL_INT*
nsplit, MKL_INT* isplit, MKL_INT* m, MKL_INT* dol, MKL_INT* dou, double* w, double*
werr, double* wgap, MKL_INT* iblock, MKL_INT* indexw, double* gers, double* pivmin,
double* work, MKL_INT* iwork, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
To find the desired eigenvalues of a given real symmetric tridiagonal matrix T, ?larre2 sets, via ?larra,
"small" off-diagonal elements to zero. For each block Ti, it finds
1781
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?larre2 is more suitable for parallel computation than the original LAPACK code for computing the root RRR
and its eigenvalues. When computing eigenvalues in parallel and the input tridiagonal matrix splits into
blocks, ?larre2 can skip over blocks which contain none of the eigenvalues from dol to dou for which the
processor is responsible. In extreme cases (such as large matrices consisting of many blocks of small size,
e.g. 2x2), the gain can be substantial.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
vl, vu If range='V', the lower and upper bounds for the eigenvalues.
Eigenvalues less than or equal to vl, or greater than vu, will not be
returned. vl < vu.
il, iu If range='I', the indices (in ascending order) of the smallest eigenvalue, to
be returned in w[il-1], and largest eigenvalue, to be returned in w[iu-1].
1 ≤il≤iu≤n.
d Array of size n
e Array of size n
The first (n-1) entries contain the subdiagonal elements of the tridiagonal
matrix T; e[n-1] need not be set.
e2 Array of size n
The first (n-1) entries contain the squares of the subdiagonal elements of
the tridiagonal matrix T; e2[n-1] need not be set.
1782
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
dol, dou Specifying an index range dol:dou allows the user to work on only a
selected part of the representation tree. Otherwise, the setting dol=1,
dou=n should be applied.
Note that dol and dou refer to the order in which the eigenvalues are
stored in w.
OUTPUT Parameters
vl, vu If range='I' or ='A', ?larre2 contains bounds on the desired part of the
spectrum.
e e contains the subdiagonal elements of the unit bidiagonal matrices Li. The
entries e[isplit[i]], 0 ≤i<nsplit, contain the base points σi+1 on output.
w Array of size n
The first m elements contain the eigenvalues. The eigenvalues of each of the
blocks, LiDiLiT, are sorted in ascending order (?larre2 may use the
remaining n-m elements as workspace).
Note that immediately after exiting this function, only the eigenvalues in
wwith indices in range dol-1:dou-1 might rely on this processor when the
eigenvalue computation is done in parallel.
Note that immediately after exiting this function, only the uncertainties in
werrwith indices in range dol-1:dou-1 might rely on this processor when
the eigenvalue computation is done in parallel.
The gap is only with respect to the eigenvalues of the same block as each
block has its own representation tree.
1783
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The indices of the eigenvalues within each block (submatrix); for example,
indexw[i]= 10 and iblock[i]=2 imply that the (i+1)-th eigenvalue w[i] is
the 10th eigenvalue in block 2.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
?larre2a
Given a tridiagonal matrix, sets small off-diagonal
elements to zero and for each unreduced block, finds
base representations and eigenvalues.
Syntax
void slarre2a(char* range, MKL_INT* n, float* vl, float* vu, MKL_INT* il, MKL_INT* iu,
float* d, float* e, float* e2, float* rtol1, float* rtol2, float* spltol, MKL_INT*
nsplit, MKL_INT* isplit, MKL_INT* m, MKL_INT* dol, MKL_INT* dou, MKL_INT* needil,
1784
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
MKL_INT* neediu, float* w, float* werr, float* wgap, MKL_INT* iblock, MKL_INT* indexw,
float* gers, float* sdiam, float* pivmin, float* work, MKL_INT* iwork, float* minrgp,
MKL_INT* info);
void dlarre2a(char* range, MKL_INT* n, double* vl, double* vu, MKL_INT* il, MKL_INT* iu,
double* d, double* e, double* e2, double* rtol1, double* rtol2, double* spltol, MKL_INT*
nsplit, MKL_INT* isplit, MKL_INT* m, MKL_INT* dol, MKL_INT* dou, MKL_INT* needil,
MKL_INT* neediu, double* w, double* werr, double* wgap, MKL_INT* iblock, MKL_INT*
indexw, double* gers, double* sdiam, double* pivmin, double* work, MKL_INT* iwork,
double* minrgp, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
To find the desired eigenvalues of a given real symmetric tridiagonal matrix T, ?larre2a sets any "small" off-
diagonal elements to zero, and for each unreduced block Ti, it finds
NOTE
The algorithm obtains a crude picture of all the wanted eigenvalues (as selected by range).
However, to reduce work and improve scalability, only the eigenvalues dol to dou are
refined. Furthermore, if the matrix splits into blocks, RRRs for blocks that do not contain
eigenvalues from dol to dou are skipped. The DQDS algorithm (function ?lasq2) is not used,
unlike in the sequential case. Instead, eigenvalues are computed in parallel to some figures
using bisection.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
vl, vu If range='V', the lower and upper bounds for the eigenvalues. Eigenvalues
less than or equal to vl, or greater than vu, will not be returned. vl < vu.
1785
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
il, iu If range='I', the indices (in ascending order) of the smallest eigenvalue, to
be returned in w[il-1], and largest eigenvalue, to be returned in w[iu-1].
1 ≤il≤iu≤n.
d Array of size n
e Array of size n
The first (n-1) entries contain the subdiagonal elements of the tridiagonal
matrix T; e[n-1] need not be set.
e2 Array of size n
The first (n-1) entries contain the squares of the subdiagonal elements of
the tridiagonal matrix T; e2[n-1] need not be set.
dol, dou If the user wants to work on only a selected part of the representation tree,
he can specify an index range dol:dou.
Note that dol and dou refer to the order in which the eigenvalues are
stored in w.
OUTPUT Parameters
vl, vu If range='V', the lower and upper bounds for the eigenvalues. Eigenvalues
less than or equal to vl, or greater than vu, are not returned. vl < vu.
e e contains the subdiagonal elements of the unit bidiagonal matrices Li. The
entries e[isplit[i]], 0 ≤i<nsplit, contain the base points σi+1 on output.
1786
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
The first block consists of rows/columns 1 to isplit[0], the second of
rows/columns isplit[0]+1 through isplit[1], etc., and the nsplit-th
block consists of rows/columns isplit[nsplit-2]+1 through
isplit[nsplit-1]=n.
needil, neediu The indices of the leftmost and rightmost eigenvalues of the root node RRR
which are needed to accurately compute the relevant part of the
representation tree.
w Array of size n
The first m elements contain the eigenvalues. The eigenvalues of each of the
blocks, LiDiLiT, are sorted in ascending order ( ?larre2a may use the
remaining n-m elements as workspace).
Note that immediately after exiting this function, only the eigenvalues in
wwith indices in range dol-1:dou-1 rely on this processor because the
eigenvalue computation is done in parallel.
Note that immediately after exiting this function, only the uncertainties in
werrwith indices in range dol-1:dou-1 are reliable on this processor
because the eigenvalue computation is done in parallel.
The separation from the right neighbor eigenvalue in w. The gap is only with
respect to the eigenvalues of the same block as each block has its own
representation tree.
Exception: at the right end of a block we store the left gap
Note that immediately after exiting this function, only the gaps in wgapwith
indices in range dol-1:dou-1 are reliable on this processor because the
eigenvalue computation is done in parallel.
The indices of the eigenvalues within each block (submatrix); for example,
indexw[i]= 10 and iblock[i]=2 imply that the (i+1)-th eigenvalue w[i] is
the 10th eigenvalue in block 2.
1787
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
?larrf2
Finds a new relatively robust representation such that
at least one of the eigenvalues is relatively isolated.
Syntax
void slarrf2(MKL_INT* n, float* d, float* l, float* ld, MKL_INT* clstrt, MKL_INT* clend,
MKL_INT* clmid1, MKL_INT* clmid2, float* w, float* wgap, float* werr, MKL_INT* trymid,
float* spdiam, float* clgapl, float* clgapr, float* pivmin, float* sigma, float* dplus,
float* lplus, float* work, MKL_INT* info);
void dlarrf2(MKL_INT* n, double* d, double* l, double* ld, MKL_INT* clstrt, MKL_INT*
clend, MKL_INT* clmid1, MKL_INT* clmid2, double* w, double* wgap, double* werr, MKL_INT*
trymid, double* spdiam, double* clgapl, double* clgapr, double* pivmin, double* sigma,
double* dplus, double* lplus, double* work, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
Given the initial representation LDLT and its cluster of close eigenvalues (in a relative measure), defined by
the indices of the first and last eigenvalues in the cluster, ?larrf2 finds a new relatively robust
representation LDLT - σ I = L+D+L+T such that at least one of the eigenvalues of L+D+L+T is relatively
isolated.
This is an enhanced version of ?larrf that also tries shifts in the middle of the cluster, should there be a
large gap, in order to break large clusters into at least two pieces.
Input Parameters
d Array of size n
1788
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ld Array of size n-1
clmid1, clmid2 The index of a middle eigenvalue pair with large gap.
spdiam Estimate of the spectral diameter obtained from the Gerschgorin intervals
OUTPUT Parameters
wgap Contains refined values of its input approximations. Very small gaps are
unchanged.
The first (n-1) elements of lplus contain the subdiagonal elements of the
unit bidiagonal matrix L+.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
?larrv2
Computes the eigenvectors of the tridiagonal matrix T
= L*D*LT given L, D and the eigenvalues of L*D*LT.
1789
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Syntax
void slarrv2(MKL_INT* n, float* vl, float* vu, float* d, float* l, float* pivmin,
MKL_INT* isplit, MKL_INT* m, MKL_INT* dol, MKL_INT* dou, MKL_INT* needil, MKL_INT*
neediu, float* minrgp, float* rtol1, float* rtol2, float* w, float* werr, float* wgap,
MKL_INT* iblock, MKL_INT* indexw, float* gers, float* sdiam, float* z, MKL_INT* ldz,
MKL_INT* isuppz, float* work, MKL_INT* iwork, MKL_INT* vstart, MKL_INT* finish,
MKL_INT* maxcls, MKL_INT* ndepth, MKL_INT* parity, MKL_INT* zoffset, MKL_INT* info);
void dlarrv2(MKL_INT* n, double* vl, double* vu, double* d, double* l, double* pivmin,
MKL_INT* isplit, MKL_INT* m, MKL_INT* dol, MKL_INT* dou, MKL_INT* needil, MKL_INT*
neediu, double* minrgp, double* rtol1, double* rtol2, double* w, double* werr, double*
wgap, MKL_INT* iblock, MKL_INT* indexw, double* gers, double* sdiam, double* z, MKL_INT*
ldz, MKL_INT* isuppz, double* work, MKL_INT* iwork, MKL_INT* vstart, MKL_INT* finish,
MKL_INT* maxcls, MKL_INT* ndepth, MKL_INT* parity, MKL_INT* zoffset, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
?larrv2 computes the eigenvectors of the tridiagonal matrix T = LDLT given L, D and approximations to the
eigenvalues of LDLT. The input eigenvalues should have been computed by larre2a or by previous calls
to ?larrv2.
The major difference between the parallel and the sequential construction of the representation tree is that in
the parallel case, not all eigenvalues of a given cluster might be computed locally. Other processors might
"own" and refine part of an eigenvalue cluster. This is crucial for scalability. Thus there might be
communication necessary before the current level of the representation tree can be parsed.
Please note:
• The calling sequence has two additional integer parameters, dol and dou, that should satisfy
m≥dou≥dol≥1. These parameters are only relevant when both eigenvalues and eigenvectors are computed
(stegr2b parameter jobz = 'V'). ?larrv2 only computes the eigenvectors corresponding to eigenvalues
dol through dou in w. (That is, instead of computing the eigenvectors belonging to w[0] through w[m-1],
only the eigenvectors belonging to eigenvalues w[dol - 1] through w[dou -1] are computed. In this case,
only the eigenvalues dol:dou are guaranteed to be accurately refined to all figures by Rayleigh-Quotient
iteration.
• The additional arguments vstart, finish, ndepth, parity, zoffset are included as a thread-safe
implementation equivalent to save variables. These variables store details about the local representation
tree which is computed layerwise. For scalability reasons, eigenvalues belonging to the locally relevant
representation tree might be computed on other processors. These need to be communicated before the
inspection of the RRRs can proceed on any given layer. Note that only when the variable finish is non-
zero, the computation has ended. All eigenpairs between dol and dou have been computed. m is set to
dou - dol + 1.
• ?larrv2 needs more workspace in z than the sequential slarrv. It is used to store the conformal
embedding of the local representation tree.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
1790
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
vl, vu Lower and upper bounds of the interval that contains the desired
eigenvalues. vl < vu. Needed to compute gaps on the left or right end of
the extremal eigenvalues in the desired range. vu is currently not used but
kept as parameter in case needed.
d Array of size n
l Array of size n
The splitting points, at which the matrix T breaks up into blocks. The first
block consists of rows/columns 1 to isplit[ 0 ], the second of rows/
columns isplit[ 0 ] + 1 through isplit[ 1 ], etc.
dol, dou If you want to compute only selected eigenvectors from all the eigenvalues
supplied, you can specify an index range dol:dou. Or else the setting
dol=1, dou=m should be applied. Note that dol and dou refer to the order
in which the eigenvalues are stored in w. If you want to compute only
selected eigenpairs, the columns dol-1 to dou+1 of the eigenvector space
Z contain the computed eigenvectors. All other columns of Z are set to
zero.
If dol > 1, then Z(:,dol-1-zoffset) is used.
needil, neediu Describe which are the left and right outermost eigenvalues that still need
to be included in the computation. These indices indicate whether
eigenvalues from other processors are needed to correctly compute the
conformally embedded representation tree.
When dol≤needil≤neediu≤dou, all required eigenvalues are local to the
processor and no communication is required to compute its part of the
representation tree.
rtol1, rtol2 Parameters for bisection. An interval [left,right] has converged if right-left <
max( rtol1*gap, rtol2*max(|left|,|right|) )
w Array of size n
1791
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
The first m elements contain the semiwidth of the uncertainty interval of the
corresponding eigenvalue in w.
The indices of the eigenvalues within each block (submatrix). For example:
indexw[i]= 10 and iblock[i]=2 imply that the (i+1)-th eigenvalue w[i] is
the 10th eigenvalue in block 2.
ldz The leading dimension of the array z. ldz≥ 1, and if stegr2b parameter
jobz = 'V', ldz≥ max(1,n).
finish A flag that indicates whether all eigenpairs have been computed.
maxcls The largest cluster worked on by this processor in the representation tree.
ndepth The current depth of the representation tree. Set to zero on initial pass,
changed when the deeper levels of the representation tree are generated.
parity An internal parameter needed for the storage of the clusters on the current
level of the representation tree.
zoffset Offset for storing the eigenpairs when z is distributed in 1D-cyclic fashion.
1792
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
OUTPUT Parameters
needil, neediu
w Unshifted eigenvalues for which eigenvectors have already been computed.
wgap Contains refined values of its input approximations. Very small gaps are
changed.
The support of the eigenvectors in z, i.e., the indices indicating the non-
zero elements in z. The i-th eigenvector is non-zero only in elements
isuppz[ 2*i-2 ] through isuppz[ 2*i-1 ].
finish A flag that indicates whether all eigenpairs have been computed.
maxcls The largest cluster worked on by this processor in the representation tree.
ndepth The current depth of the representation tree. Set to zero on initial pass,
changed when the deeper levels of the representation tree are generated.
parity An internal parameter needed for the storage of the clusters on the current
level of the representation tree.
=-2: Problem in ?larrf2 when computing the RRR of a child. When a child
is inside a tight cluster, it can be difficult to find an RRR. A partial remedy
from the user's point of view is to make the parameter minrgp smaller and
recompile. However, as the orthogonality of the computed vectors is
proportional to 1/minrgp, be aware that decreasing minrgp might be
reduce precision.
=-3: Problem in ?larrb2 when refining a single eigenvalue after the
Rayleigh correction was rejected.
= 5: The Rayleigh Quotient Iteration failed to converge to full accuracy.
1793
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
?lasorte
Sorts eigenpairs by real and complex data types.
Syntax
void slasorte (float *s , MKL_INT *lds , MKL_INT *j , float *out , MKL_INT *info );
void dlasorte (double *s , MKL_INT *lds , MKL_INT *j , double *out , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The ?lasortefunction sorts eigenpairs so that real eigenpairs are together and complex eigenpairs are
together. This helps to employ 2x2 shifts easily since every second subdiagonal is guaranteed to be zero. This
function does no parallel work and makes no calls.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
s (local)
Array of size lds.
On entry, a matrix already in Schur form.
lds (local)
On entry, the leading dimension of the array s; unchanged on exit.
j (local)
On entry, the order of the matrix S; unchanged on exit.
out (local)
Array of size 2*j. The work buffer required by the function.
info (local)
Set, if the input matrix had an odd number of real eigenvalues and things
could not be paired or if the input matrix S was not originally in Schur form.
0 indicates successful completion.
Output Parameters
1794
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
?lasrt2
Sorts numbers in increasing or decreasing order.
Syntax
void slasrt2 (char *id , MKL_INT *n , float *d , MKL_INT *key , MKL_INT *info );
void dlasrt2 (char *id , MKL_INT *n , double *d , MKL_INT *key , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The ?lasrt2function is modified LAPACK function ?lasrt, which sorts the numbers in d in increasing order
(if id = 'I') or in decreasing order (if id = 'D' ). It uses Quick Sort, reverting to Insertion Sort on arrays
of size ≤ 20. The size of STACK limits n to about 232.
Input Parameters
d Array of size n.
On entry, the array to be sorted.
Output Parameters
key On exit, key is permuted in exactly the same manner as d was permuted
from input to output. Therefore, if key[i] = i+1 for all i =0, ..., n-1 on input,
d[i] on output equals d[key[i]-1] on input.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
1795
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
?stegr2
Computes selected eigenvalues and eigenvectors of a
real symmetric tridiagonal matrix.
Syntax
void sstegr2(char* jobz, char* range, MKL_INT* n, float* d, float* e, float* vl, float*
vu, MKL_INT* il, MKL_INT* iu, MKL_INT* m, float* w, float* z, MKL_INT* ldz, MKL_INT*
nzc, MKL_INT* isuppz, float* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork,
MKL_INT* dol, MKL_INT* dou, MKL_INT* zoffset, MKL_INT* info);
void dstegr2(char* jobz, char* range, MKL_INT* n, double* d, double* e, double* vl,
double* vu, MKL_INT* il, MKL_INT* iu, MKL_INT* m, double* w, double* z, MKL_INT* ldz,
MKL_INT* nzc, MKL_INT* isuppz, double* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT*
liwork, MKL_INT* dol, MKL_INT* dou, MKL_INT* zoffset, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
?stegr2 computes selected eigenvalues and, optionally, eigenvectors of a real symmetric tridiagonal matrix
T. It is invoked in the ScaLAPACK MRRR driver p?syevr and the corresponding Hermitian version either when
only eigenvalues are to be computed, or when only a single processor is used (the sequential-like case).
?stegr2 has been adapted from LAPACK's ?stegr. Please note the following crucial changes.
1. The calling sequence has two additional integer parameters, dol and dou, that should satisfy
m≥dou≥dol≥1. ?stegr2only computes the eigenpairs corresponding to eigenvalues dol through dou in
w, indexed dol-1 through dou-1. (That is, instead of computing the eigenpairs belonging to w[0]
through w[m-1], only the eigenvectors belonging to eigenvalues w[dol-1] through w[dou-1] are
computed. In this case, only the eigenvalues dol through dou are guaranteed to be fully accurate.
2. m is not the number of eigenvalues specified by range, but is m = dou - dol + 1. This concerns the
case where only eigenvalues are computed, but on more than one processor. Thus, in this case m refers
to the number of eigenvalues computed on this processor.
3. The arrays w and z might not contain all the wanted eigenpairs locally, instead this information is
distributed over other processors.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
= 'I': eigenvalues of the entire matrix with the indices in a given range will
be found.
1796
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
n The order of the matrix. n≥ 0.
d Array of size n
e Array of size n
vl
vu If range='V', the lower and upper bounds of the interval to be searched for
eigenvalues. vl < vu.
il, iu If range='I', the indices (in ascending order) of the smallest eigenvalue, to
be returned in w[il-1], and largest eigenvalue, to be returned in w[iu-1].
1 ≤il≤iu≤n, if n > 0.
ldz The leading dimension of the array z. ldz≥ 1, and if jobz = 'V', then ldz≥
max(1,n).
nzc The number of eigenvectors to be held in the array z, storing the matrix Z.
If nzc = -1, then a workspace query is assumed; the function calculates the
number of columns of the matrix Z that are needed to hold the
eigenvectors. This value is returned as the first entry of the z array, and no
error message related to nzc is issued.
if jobz = 'V', and lwork≥ max(1,12*n) if jobz = 'N'. If lwork = -1, then a
workspace query is assumed; the function only calculates the optimal size
of the work array, returns this value as the first entry of the work array,
and no error message related to lwork is issued.
liwork The size of the array iwork. liwork≥ max(1,10*n) if the eigenvectors are
desired, and liwork≥ max(1,8*n) if only the eigenvalues are to be
computed.
If liwork = -1, then a workspace query is assumed; the function only
calculates the optimal size of the iwork array, returns this value as the first
entry of the iwork array, and no error message related to liwork is issued.
dol, dou From the eigenvalues w[0] through w[m-1], only eigenvectors Z(:,dol) to
Z(:,dou) are computed.
1797
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
zoffset Offset for storing the eigenpairs when z is distributed in 1D-cyclic fashion
OUTPUT Parameters
w Array of size n
If jobz = 'V', and if info = 0, then the first m columns of the matrix Z
stored in z contain some of the orthonormal eigenvectors of the matrix T
corresponding to the selected eigenvalues, with the i-th column of Z holding
the eigenvector associated with w[i-1].
Note: the user must ensure that at least max(1,m) columns of the matrix
are supplied in the array z; if range = 'V', the exact value of m is not known
in advance and can be computed with a workspace query by setting nzc =
-1, see below.
The support of the eigenvectors in z, i.e., the indices indicating the nonzero
elements in z. The i-th computed eigenvector is nonzero only in elements
isuppz[ 2*i-2 ] through isuppz[ 2*i -1]. This is relevant in the case when
the matrix is split. isuppz is only set if n>2.
work On exit, if info = 0, work[0] returns the optimal (and minimal) lwork.
= 0: successful exit
other:if info = -i, the i-th argument had an illegal value
Here, the digit X = ABS( iinfo ) < 10, where iinfo is the nonzero error
code returned by ?larre2 or ?larrv, respectively.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
1798
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
?stegr2a
Computes selected eigenvalues and initial
representations needed for eigenvector computations.
Syntax
void sstegr2a(char* jobz, char* range, MKL_INT* n, float* d, float* e, float* vl, float*
vu, MKL_INT* il, MKL_INT* iu, MKL_INT* m, float* w, float* z, MKL_INT* ldz, MKL_INT*
nzc, float* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork, MKL_INT* dol,
MKL_INT* dou, MKL_INT* needil, MKL_INT* neediu, MKL_INT* inderr, MKL_INT* nsplit,
float* pivmin, float* scale, float* wl, float* wu, MKL_INT* info);
void dstegr2a(char* jobz, char* range, MKL_INT* n, double* d, double* e, double* vl,
double* vu, MKL_INT* il, MKL_INT* iu, MKL_INT* m, double* w, double* z, MKL_INT* ldz,
MKL_INT* nzc, double* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork, MKL_INT*
dol, MKL_INT* dou, MKL_INT* needil, MKL_INT* neediu, MKL_INT* inderr, MKL_INT* nsplit,
double* pivmin, double* scale, double* wl, double* wu, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
?stegr2a computes selected eigenvalues and initial representations needed for eigenvector computations
in ?stegr2b. It is invoked in the ScaLAPACK MRRR driver p?syevr and the corresponding Hermitian version
when both eigenvalues and eigenvectors are computed in parallel on multiple processors. For this
case, ?stegr2a implements the first part of the MRRR algorithm, parallel eigenvalue computation and finding
the root RRR. At the end of ?stegr2a, other processors might have a part of the spectrum that is needed to
continue the computation locally. Once this eigenvalue information has been received by the processor, the
computation can then proceed by calling the second part of the parallel MRRR algorithm, ?stegr2b.
Please note:
• The calling sequence has two additional integer parameters, (compared to LAPACK's stegr), these are
dol and dou and should satisfy m≥dou≥dol≥1. These parameters are only relevant for the case jobz = 'V'.
Globally invoked over all processors, ?stegr2a computes all the eigenvalues specified by range.
?stegr2a locally only computes the eigenvalues corresponding to eigenvalues dol through dou in w,
indexed dol-1 through dou-1. (That is, instead of computing the eigenvectors belonging to w([0] through
w[m-1], only the eigenvectors belonging to eigenvalues w[dol-1] through w[dou-1] are computed. In this
case, only the eigenvalues dol through dou are guaranteed to be fully accurate.
• m is not the number of eigenvalues specified by range, but it is m = dou - dol + 1. Instead, m refers to
the number of eigenvalues computed on this processor.
• While no eigenvectors are computed in ?stegr2a itself (this is done later in ?stegr2b), the interface
If jobz = 'V' then, depending on range and dol, dou, ?stegr2a might need more workspace in z then
the original ?stegr. In particular, the arrays w and z might not contain all the wanted eigenpairs locally,
instead this information is distributed over other processors.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
1799
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
= 'I': eigenvalues of the entire matrix with the indices in a given range will
be found.
d Array of size n
e Array of size n
vl, vu If range='V', the lower and upper bounds of the interval to be searched for
eigenvalues. vl < vu.
il, iu If range='I', the indices (in ascending order) of the smallest eigenvalue, to
be returned in w[il-1], and largest eigenvalue, to be returned in w[iu-1]. 1
≤il≤iu≤n, if n > 0.
ldz The leading dimension of the array z. ldz≥ 1, and if jobz = 'V', then ldz≥
max(1,n).
If nzc = -1, then a workspace query is assumed; the function calculates the
number of columns of the matrix stored in array z that are needed to hold
the eigenvectors. This value is returned as the first entry of the z array, and
no error message related to nzc is issued.
lwork The size of the array work. lwork≥ max(1,18*n) if jobz = 'V', and lwork≥
max(1,12*n) if jobz = 'N'.
liwork The size of the array iwork. liwork≥ max(1,10*n) if the eigenvectors are
desired, and liwork≥ max(1,8*n) if only the eigenvalues are to be
computed.
1800
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If liwork = -1, then a workspace query is assumed; the function only
calculates the optimal size of the iwork array, returns this value as the first
entry of the iwork array, and no error message related to liwork is issued.
dol, dou From all the eigenvalues w[0] through w[m-1], only eigenvalues w[dol-1]
through w[dou-1] are computed.
OUTPUT Parameters
w Array of size n
work On exit, if info = 0, work[0] returns the optimal (and minimal) lwork.
needil, neediu The indices of the leftmost and rightmost eigenvalues needed to accurately
compute the relevant part of the representation tree. This information can
be used to find out which processors have the relevant eigenvalue
information needed so that it can be communicated.
inderr inderr points to the place in the work space where the eigenvalue
uncertainties (errors) are stored.
wl, wu The interval (wl, wu] contains all the wanted eigenvalues.
1801
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Here, the digit x = abs( iinfo ) < 10, where iinfo is the nonzero error
code returned by ?larre2a.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
?stegr2b
From eigenvalues and initial representations computes
the selected eigenvalues and eigenvectors of the real
symmetric tridiagonal matrix in parallel on multiple
processors.
Syntax
void sstegr2b(char* jobz, MKL_INT* n, float* d, float* e, MKL_INT* m, float* w, float*
z, MKL_INT* ldz, MKL_INT* nzc, MKL_INT* isuppz, float* work, MKL_INT* lwork, MKL_INT*
iwork, MKL_INT* liwork, MKL_INT* dol, MKL_INT* dou, MKL_INT* needil, MKL_INT* neediu,
MKL_INT* indwlc, float* pivmin, float* scale, float* wl, float* wu, MKL_INT* vstart,
MKL_INT* finish, MKL_INT* maxcls, MKL_INT* ndepth, MKL_INT* parity, MKL_INT* zoffset,
MKL_INT* info);
void dstegr2b(char* jobz, MKL_INT* n, double* d, double* e, MKL_INT* m, double* w,
double* z, MKL_INT* ldz, MKL_INT* nzc, MKL_INT* isuppz, double* work, MKL_INT* lwork,
MKL_INT* iwork, MKL_INT* liwork, MKL_INT* dol, MKL_INT* dou, MKL_INT* needil, MKL_INT*
neediu, MKL_INT* indwlc, double* pivmin, double* scale, double* wl, double* wu, MKL_INT*
vstart, MKL_INT* finish, MKL_INT* maxcls, MKL_INT* ndepth, MKL_INT* parity, MKL_INT*
zoffset, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
?stegr2b should only be called after a call to ?stegr2a. From eigenvalues and initial representations
computed by ?stegr2a, ?stegr2b computes the selected eigenvalues and eigenvectors of the real
symmetric tridiagonal matrix in parallel on multiple processors. It is potentially invoked multiple times on a
given processor because the locally relevant representation tree might depend on spectral information that is
"owned" by other processors and might need to be communicated.
Please note:
• The calling sequence has two additional integer parameters, dol and dou, that should satisfy
m≥dou≥dol≥1. These parameters are only relevant for the case jobz = 'V'. ?stegr2b only computes the
eigenvectors corresponding to eigenvalues dol through dou in w, indexed dol-1 through dou-1. (That is,
instead of computing the eigenvectors belonging to w([0] through w[m-1], only the eigenvectors belonging
to eigenvalues w[dol-1] through w[dou-1] are computed. In this case, only the eigenvalues dol through
dou are guaranteed to be accurately refined to all figures by Rayleigh-Quotient iteration.
• The additional arguments vstart, finish, ndepth, parity, zoffset are included as a thread-safe
implementation equivalent to save variables. These variables store details about the local representation
tree which is computed layerwise. For scalability reasons, eigenvalues belonging to the locally relevant
representation tree might be computed on other processors. These need to be communicated before the
inspection of the RRRs can proceed on any given layer. Note that only when the variable finishis non-
zero, the computation has ended. All eigenpairs between dol and dou have been computed. m is set to
dou - dol + 1.
• ?stegr2b needs more workspace in z than the sequential ?stegr. It is used to store the conformal
embedding of the local representation tree.
1802
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
d Array of size n
e Array of size n
w Array of size n
ldz The leading dimension of the array z. ldz≥ 1, and if jobz = 'V', then ldz≥
max(1,n).
nzc The number of eigenvectors to be held in the array z, storing the matrix Z.
liwork The size of the array iwork. liwork≥ max(1,10*n) if the eigenvectors are
desired, and liwork≥ max(1,8*n) if only the eigenvalues are to be
computed.
If liwork = -1, then a workspace query is assumed; the function only
calculates the optimal size of the iwork array, returns this value as the first
entry of the iwork array, and no error message related to liwork is issued.
dol, dou From the eigenvalues w[0] through w[m-1], only eigenvectors Z(:,dol) to
Z(:,dou) are computed.
1803
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
needil, neediu Describes which are the left and right outermost eigenvalues still to be
computed. Initially computed by ?larre2a, modified in the course of the
algorithm.
scale The scaling factor for T. Used for unscaling the eigenvalues at the very end
of the algorithm.
wl, wu The interval (wl, wu] contains all the wanted eigenvalues.
maxcls The largest cluster worked on by this processor in the representation tree.
ndepth The current depth of the representation tree. Set to zero on initial pass,
changed when the deeper levels of the representation tree are generated.
parity An internal parameter needed for the storage of the clusters on the current
level of the representation tree.
zoffset Offset for storing the eigenpairs when z is distributed in 1D-cyclic fashion.
OUTPUT Parameters
If jobz = 'V', and if info = 0, then a subset of the first m columns of the
matrix Z, stored in z, contain the orthonormal eigenvectors of the matrix T
corresponding to the selected eigenvalues, with the i-th column of Z holding
the eigenvector associated with w[i-1].
The support of the eigenvectors in z, i.e., the indices indicating the nonzero
elements in z. The i-th computed eigenvector is nonzero only in elements
isuppz[ 2*i-2 ] through isuppz[ 2*i -1]. This is relevant in the case when
the matrix is split. isuppz is only set if n>2.
work On exit, if info = 0, work[0] returns the optimal (and minimal) lwork.
indwlc Pointer into the workspace location where the local eigenvalue
representations are stored. ("Local eigenvalues" are those relative to the
individual shifts of the RRRs.)
maxcls The largest cluster worked on by this processor in the representation tree.
1804
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
ndepth The current depth of the representation tree. Set to zero on initial pass,
changed when the deeper levels of the representation tree are generated.
parity An internal parameter needed for the storage of the clusters on the current
level of the representation tree.
= 0: successful exit
other:if info = -i, the i-th argument had an illegal value
Here, the digit x = abs( iinfo ) < 10, where iinfo is the nonzero error
code returned by ?larrv2
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
?stein2
Computes the eigenvectors corresponding to specified
eigenvalues of a real symmetric tridiagonal matrix,
using inverse iteration.
Syntax
void sstein2 (MKL_INT *n , float *d , float *e , MKL_INT *m , float *w , MKL_INT
*iblock , MKL_INT *isplit , float *orfac , float *z , MKL_INT *ldz , float *work ,
MKL_INT *iwork , MKL_INT *ifail , MKL_INT *info );
void dstein2 (MKL_INT *n , double *d , double *e , MKL_INT *m , double *w , MKL_INT
*iblock , MKL_INT *isplit , double *orfac , double *z , MKL_INT *ldz , double *work ,
MKL_INT *iwork , MKL_INT *ifail , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The ?stein2function is a modified LAPACK function ?stein. It computes the eigenvectors of a real
symmetric tridiagonal matrix T corresponding to specified eigenvalues, using inverse iteration.
The maximum number of iterations allowed for each eigenvector is specified by an internal parameter maxits
(currently set to 5).
Input Parameters
d, e , w Arrays:
d, of size n. The n diagonal elements of the tridiagonal matrix T.
e, of size n.
The (n-1) subdiagonal elements of the tridiagonal matrix T, in elements 1
to n-1. e[n-1] need not be set.
1805
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
w, of size n.
The first m elements of w contain the eigenvalues for which eigenvectors are
to be computed. The eigenvalues should be grouped by split-off block and
ordered from smallest to largest within the block. (The output array w
from ?stebz with ORDER = 'B' is expected here).
ldz The leading dimension of the output array z; ldz ≥ max(1, n).
Output Parameters
On normal exit, all elements of ifail are zero. If one or more eigenvectors
fail to converge after maxits iterations, then their indices are stored in the
array ifail.
1806
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
?dbtf2
Computes an LU factorization of a general band matrix
with no pivoting (local unblocked algorithm).
Syntax
void sdbtf2 (MKL_INT *m , MKL_INT *n , MKL_INT *kl , MKL_INT *ku , float *ab , MKL_INT
*ldab , MKL_INT *info );
void ddbtf2 (MKL_INT *m , MKL_INT *n , MKL_INT *kl , MKL_INT *ku , double *ab , MKL_INT
*ldab , MKL_INT *info );
void cdbtf2 (MKL_INT *m , MKL_INT *n , MKL_INT *kl , MKL_INT *ku , MKL_Complex8 *ab ,
MKL_INT *ldab , MKL_INT *info );
void zdbtf2 (MKL_INT *m , MKL_INT *n , MKL_INT *kl , MKL_INT *ku , MKL_Complex16 *ab ,
MKL_INT *ldab , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The ?dbtf2function computes an LU factorization of a general real/complex m-by-n band matrix A without
using partial pivoting with row interchanges.
This is the unblocked version of the algorithm, calling BLAS Routines and Functions.
Input Parameters
Output Parameters
1807
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Application Notes
The band storage scheme is illustrated by the following example, when m = n = 6, kl = 2, ku = 1:
The function does not use array elements marked *; elements marked + need not be set on entry, but the
function requires them to store elements of U, because of fill-in resulting from the row interchanges.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
?dbtrf
Computes an LU factorization of a general band matrix
with no pivoting (local blocked algorithm).
Syntax
void sdbtrf (MKL_INT *m , MKL_INT *n , MKL_INT *kl , MKL_INT *ku , float *ab , MKL_INT
*ldab , MKL_INT *info );
void ddbtrf (MKL_INT *m , MKL_INT *n , MKL_INT *kl , MKL_INT *ku , double *ab , MKL_INT
*ldab , MKL_INT *info );
void cdbtrf (MKL_INT *m , MKL_INT *n , MKL_INT *kl , MKL_INT *ku , MKL_Complex8 *ab ,
MKL_INT *ldab , MKL_INT *info );
void zdbtrf (MKL_INT *m , MKL_INT *n , MKL_INT *kl , MKL_INT *ku , MKL_Complex16 *ab ,
MKL_INT *ldab , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
This function computes an LU factorization of a real m-by-n band matrix A without using partial pivoting or
row interchanges.
This is the blocked version of the algorithm, calling BLAS Routines and Functions.
Input Parameters
1808
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
n The number of columns in A(n ≥ 0).
Output Parameters
Application Notes
The band storage scheme is illustrated by the following example, when m = n = 6, kl = 2, ku = 1:
?dttrf
Computes an LU factorization of a general tridiagonal
matrix with no pivoting (local blocked algorithm).
Syntax
void sdttrf (MKL_INT *n , float *dl , float *d , float *du , MKL_INT *info );
1809
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
void ddttrf (MKL_INT *n , double *dl , double *d , double *du , MKL_INT *info );
void cdttrf (MKL_INT *n , MKL_Complex8 *dl , MKL_Complex8 *d , MKL_Complex8 *du ,
MKL_INT *info );
void zdttrf (MKL_INT *n , MKL_Complex16 *dl , MKL_Complex16 *d , MKL_Complex16 *du ,
MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The ?dttrffunction computes an LU factorization of a real or complex tridiagonal matrix A using elimination
without partial pivoting.
The factorization has the form A = L*U, where L is a product of unit lower bidiagonal matrices and U is upper
triangular with nonzeros only in the main diagonal and first superdiagonal.
Input Parameters
Output Parameters
dl Overwritten by the (n-1) multipliers that define the matrix L from the LU
factorization of A.
> 0: if info = i, the matrix element U(i,i) is exactly 0. The factorization has
been completed, but the factor U is exactly singular. Division by 0 will occur
if you use the factor U for solving a system of linear equations.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
?dttrsv
Solves a general tridiagonal system of linear equations
using the LU factorization computed by ?dttrf.
Syntax
void sdttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , float *dl , float
*d , float *du , float *b , MKL_INT *ldb , MKL_INT *info );
1810
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void ddttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , double *dl ,
double *d , double *du , double *b , MKL_INT *ldb , MKL_INT *info );
void cdttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8
*dl , MKL_Complex8 *d , MKL_Complex8 *du , MKL_Complex8 *b , MKL_INT *ldb , MKL_INT
*info );
void zdttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16
*dl , MKL_Complex16 *d , MKL_Complex16 *du , MKL_Complex16 *b , MKL_INT *ldb , MKL_INT
*info );
Include Files
• mkl_scalapack.h
Description
The ?dttrsvfunction solves one of the following systems of linear equations:
Input Parameters
nrhs The number of right-hand sides, that is, the number of columns in the
matrix B(nrhs ≥ 0).
dl,d,du,b The array dl of size (n - 1) contains the (n - 1) multipliers that define the
matrix L from the LU factorization of A.
The array d of size n contains n diagonal elements of the upper triangular
matrix U from the LU factorization of A.
The array du of size (n - 1) contains the (n - 1) elements of the first super-
diagonal of U.
On entry, the array b of size ldb * nrhs contains the right-hand side of
matrix B.
Output Parameters
1811
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
?pttrsv
Solves a symmetric (Hermitian) positive-definite
tridiagonal system of linear equations, using the
L*D*LH factorization computed by ?pttrf.
Syntax
void spttrsv (char *trans , MKL_INT *n , MKL_INT *nrhs , float *d , float *e , float
*b , MKL_INT *ldb , MKL_INT *info );
void dpttrsv (char *trans , MKL_INT *n , MKL_INT *nrhs , double *d , double *e , double
*b , MKL_INT *ldb , MKL_INT *info );
void cpttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , float *d ,
MKL_Complex8 *e , MKL_Complex8 *b , MKL_INT *ldb , MKL_INT *info );
void zpttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , double *d ,
MKL_Complex16 *e , MKL_Complex16 *b , MKL_INT *ldb , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The ?pttrsvfunction solves one of the triangular systems:
Input Parameters
1812
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
trans Specifies the form of the system of equations:
for real flavors:
if trans = 'N': L*X = B (no transpose)
nrhs The number of right hand sides, that is, the number of columns of the
matrix B. nrhs ≥ 0.
d array of size n. The n diagonal elements of the diagonal matrix D from the
factorization computed by ?pttrf.
e array of size (n-1). The (n-1) off-diagonal elements of the unit bidiagonal
factor U or L from the factorization computed by ?pttrf. See uplo.
Output Parameters
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
?steqr2
Computes all eigenvalues and, optionally,
eigenvectors of a symmetric tridiagonal matrix using
the implicit QL or QR method.
Syntax
void ssteqr2 (char *compz , MKL_INT *n , float *d , float *e , float *z , MKL_INT *ldz ,
MKL_INT *nr , float *work , MKL_INT *info );
void dsteqr2 (char *compz , MKL_INT *n , double *d , double *e , double *z , MKL_INT
*ldz , MKL_INT *nr , double *work , MKL_INT *info );
Include Files
• mkl_scalapack.h
1813
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Description
The ?steqr2function is a modified version of LAPACK function ?steqr. The ?steqr2function computes all
eigenvalues and, optionally, eigenvectors of a symmetric tridiagonal matrix using the implicit QL or QR
method. ?steqr2 is modified from ?steqr to allow each ScaLAPACK process running ?steqr2 to perform
updates on a distributed matrix Q. Proper usage of ?steqr2 can be gleaned from examination of ScaLAPACK
function p?syev.
Input Parameters
d, e, work Arrays:
d contains the diagonal elements of T. The size of d must be at least
max(1, n).
e contains the (n-1) subdiagonal elements of T. The size of e must be at
least max(1, n-1).
z (local)
Array of global size n* n and of local size ldz* nr.
ldz ≥ 1,
ldz ≥ max(1, n), if eigenvectors are desired.
Output Parameters
1814
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
if compz = 'V', z contains the orthonormal eigenvectors of the original
symmetric matrix, and if compz = 'I', z contains the orthonormal
eigenvectors of the symmetric tridiagonal matrix. If compz = 'N', then z is
not referenced.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
?trmvt
Performs matrix-vector operations.
Syntax
void strmvt (const char* uplo, const MKL_INT* n, const float* t, const MKL_INT* ldt,
float* x, const MKL_INT* incx, const float* y, const MKL_INT* incy, float* w, const
MKL_INT* incw, const float* z, const MKL_INT* incz);
void dtrmvt (const char* uplo, const MKL_INT* n, const double* t, const MKL_INT* ldt,
double* x, const MKL_INT* incx, const double* y, const MKL_INT* incy, double* w, const
MKL_INT* incw, const double* z, const MKL_INT* incz);
void ctrmvt (const char* uplo, const MKL_INT* n, const MKL_Complex8* t, const MKL_INT*
ldt, MKL_Complex8* x, const MKL_INT* incx, const MKL_Complex8* y, const MKL_INT* incy,
MKL_Complex8* w, const MKL_INT* incw, const MKL_Complex8* z, const MKL_INT* incz);
void ztrmvt (const char* uplo, const MKL_INT* n, const MKL_Complex16* t, const MKL_INT*
ldt, MKL_Complex16* x, const MKL_INT* incx, const MKL_Complex16* y, const MKL_INT*
incy, MKL_Complex16* w, const MKL_INT* incw, const MKL_Complex16* z, const MKL_INT*
incz);
Include Files
• mkl_scalapack.h
Description
?trmvt performs the matrix-vector operations as follows:
strmvt and dtrmvt: x := T' *y, and w := T *z
Input Parameters
uplo On entry, uplo specifies whether the matrix is an upper or lower triangular
matrix as follows:
1815
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Unchanged on exit.
Before entry with uplo = 'U' or 'u', the leading n-by-n upper triangular part
of the array t must contain the upper triangular matrix and the strictly
lower triangular part of t is not referenced.
Before entry with uplo = 'L' or 'l', the leading n-by-n lower triangular part
of the array t must contain the lower triangular matrix and the strictly
upper triangular part of t is not referenced.
ldt On entry, lda specifies the first dimension of A as declared in the calling
(sub) program. lda must be at least max( 1, n ).
Unchanged on exit.
incx On entry, incx specifies the increment for the elements of x. incx must
not be zero.
Unchanged on exit.
Before entry, the incremented array y must contain the n element vector y.
Unchanged on exit.
incy On entry, incy specifies the increment for the elements of y. incy must
not be zero.
Unchanged on exit.
incw On entry, incw specifies the increment for the elements of w. incw must
not be zero.
Unchanged on exit.
Before entry, the incremented array z must contain the n element vector z.
Unchanged on exit.
incz On entry, incz specifies the increment for the elements of z. incz must
not be zero.
Unchanged on exit.
1816
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Output Parameters
t Before entry with uplo = 'U' or 'u', the leading n-by-n upper
triangular part of the array t must contain the upper triangular matrix
and the strictly lower triangular part of t is not referenced.
Before entry with uplo = 'L' or 'l', the leading n-by-n lower triangular
part of the array t must contain the lower triangular matrix and the
strictly upper triangular part of t is not referenced.
On exit, x = T' * y.
On exit, w = T * z.
pilaenv
Returns the positive integer value of the logical
blocking size.
Syntax
MKL_INT pilaenv (const MKL_INT *ictxt , const char *prec);
Include Files
• mkl_pblas.h
Description
pilaenv returns the positive integer value of the logical blocking size. This value is machine and precision
specific. This version provides a logical blocking size which should give good though not optimal performance
on many of the currently available distributed-memory concurrent computers. You are encouraged to modify
this subroutine to set this tuning parameter for your particular machine.
Input Parameters
ictxt On entry, ictxt specifies the BLACS context handle, indicating the global
context of the operation. The context itself is global, but the value of ictxt
is local.
prec On input, prec specifies the precision for which the logical block size should
be returned as follows:
prec = 'S' or 's' single precision real,
prec = 'D' or 'd' double precision real,
prec = 'C' or 'c' single precision complex,
prec = 'Z' or 'z' double precision complex,
prec = 'I' or 'i' integer.
Application Notes
Before modifying this routine to tune the library performance on your system, be aware of the following:
1. The value this function returns must be strictly larger than zero,
1817
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
2. If you are planning to link your program with different instances of the library (for example, on a
heterogeneous machine), you must compile each instance of the library with exactly the same version
of this routine for obvious interoperability reasons.
pilaenvx
Called from the ScaLAPACK routines to choose
problem-dependent parameters for the local
environment.
Syntax
MKL_INT pilaenvx (const MKL_INT* ictxt, const MKL_INT* ispec, const char* name, const
char* opts, const MKL_INT* n1, const MKL_INT* n2, const MKL_INT* n3, const MKL_INT*
Include Files
• mkl.h
Description
pilaenvx is called from the ScaLAPACK routines to choose problem-dependent parameters for the local
environment. See ispec for a description of the parameters. This version provides a set of parameters which
should give good, though not optimal, performance on many of the currently available computers. You are
encouraged to modify this subroutine to set the tuning parameters for your particular machine using the
option and problem size information in the arguments.
Input Parameters
ictxt (local input)On entry, ictxt specifies the BLACS context handle, indicating
the global context of the operation. The context itself is global, but the
value of ictxt is local.
1818
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
= 9: maximum size of the subproblems at the bottom of the computation
tree in the divide-and-conquer algorithm (used by ?gelsd and ?gesdd).
opts (global input) The character options to the subroutine name, concatenated
into a single character string. For example, uplo = 'U', trans = 'T',
and diag = 'N' for a triangular routine would be specified as opts =
'UTN'.
n1, n2, n3, and n4 (global input) Problem dimensions for the subroutine name; these may not
all be required.
Output Parameters
Application Notes
The following conventions have been used when calling ilaenv from the LAPACK routines:
1. opts is a concatenation of all of the character options to subroutine name, in the same order that they
appear in the argument list for name, even if they are not used in determining the value of the
parameter specified by ispec.
1819
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
2. The problem dimensions n1, n2, n3, and n4 are specified in the order that they appear in the argument
list for name. n1 is used first, n2 second, and so on, and unused problem dimensions are passed a value
of -1.
3. The parameter value returned by ilaenv is checked for validity in the calling subroutine. For example,
ilaenv is used to retrieve the optimal block size for strtri as follows:
pjlaenv
Called from the ScaLAPACK symmetric and Hermitian
tailored eigen-routines to choose problem-dependent
parameters for the local environment.
Syntax
MKL_INT pjlaenv (const MKL_INT* ictxt, const MKL_INT* ispec, const char* name, const
char* opts, const MKL_INT* n1, const MKL_INT* n2, const MKL_INT* n3, const MKL_INT*
n4);
Include Files
• mkl.h
Description
pjlaenv is called from the ScaLAPACK symmetric and Hermitian tailored eigen-routines to choose problem-
dependent parameters for the local environment. See ispec for a description of the parameters. This version
provides a set of parameters which should give good, though not optimal, performance on many of the
currently available computers. You are encouraged to modify this subroutine to set the tuning parameters for
your particular machine using the option and problem size information in the arguments.
Input Parameters
name (global input) The name of the calling subroutine, in either upper case or
lower case.
opts (global input) The character options to the subroutine name, concatenated
into a single character string. For example, uplo = 'U', trans = 'T',
and diag = 'N' for a triangular routine would be specified as opts =
'UTN'.
1820
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
n1, n2, n3, and n4 (global input) Problem dimensions for the subroutine name; these may not
all be required. At present, only n1 is used, and it (n1) is used only for
'TTRD'.
Output Parameters
< 0: if pjlaenv = -k, the k-th argument had an illegal value. Most
parameters set via a call to pjlaenv must be identical on all
processors and hence pjlaenv will return the same value to all
procesors (i.e. global output). However some, in particular, the panel
blocking factor can be different on each processor and hence pjlaenv
can return different values on different processors (i.e. local output).
Application Notes
The following conventions have been used when calling pjlaenv from the ScaLAPACK routines:
1. opts is a concatenation of all of the character options to subroutine name, in the same order that they
appear in the argument list for name, even if they are not used in determining the value of the
parameter specified by ispec.
2. The problem dimensions n1, n2, n3, and n4 are specified in the order that they appear in the argument
list for name. n1 is used first, n2 second, and so on, and unused problem dimensions are passed a
value of -1.
a. The parameter value returned by pjlaenv is checked for validity in the calling subroutine. For
example, pjlaenv is used to retrieve the optimal blocksize for STRTRI as follows:
1821
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
1822
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
void pslamr1d (const MKL_INT *n , float *a , const MKL_INT *ia , const MKL_INT *ja ,
const MKL_INT *desca , float *b , const MKL_INT *ib , const MKL_INT *jb , const MKL_INT
*descb );
void pdlamr1d (const MKL_INT *n , double *a , const MKL_INT *ia , const MKL_INT *ja ,
const MKL_INT *desca , double *b , const MKL_INT *ib , const MKL_INT *jb , const
MKL_INT *descb );
void pclamr1d (const MKL_INT *n , MKL_Complex8 *a , const MKL_INT *ia , const MKL_INT
*ja , const MKL_INT *desca , MKL_Complex8 *b , const MKL_INT *ib , const MKL_INT *jb ,
const MKL_INT *descb );
void pzlamr1d (const MKL_INT *n , MKL_Complex16 *a , const MKL_INT *ia , const MKL_INT
*ja , const MKL_INT *desca , MKL_Complex16 *b , const MKL_INT *ib , const MKL_INT *jb ,
const MKL_INT *descb );
void clanv2 (MKL_Complex8 *a , MKL_Complex8 *b , MKL_Complex8 *c , MKL_Complex8 *d ,
MKL_Complex8 *rt1 , MKL_Complex8 *rt2 , float *cs , MKL_Complex8 *sn );
void zlanv2 (MKL_Complex16 *a , MKL_Complex16 *b , MKL_Complex16 *c , MKL_Complex16
*d , MKL_Complex16 *rt1 , MKL_Complex16 *rt2 , double *cs , MKL_Complex16 *sn );
void pclattrs (const char *uplo , const char *trans , const char *diag , const char
*normin , const MKL_INT *n , const MKL_Complex8 *a , const MKL_INT *ia , const MKL_INT
*ja , const MKL_INT *desca , MKL_Complex8 *x , const MKL_INT *ix , const MKL_INT *jx ,
const MKL_INT *descx , float *scale , float *cnorm , MKL_INT *info );
void pzlattrs (const char *uplo , const char *trans , const char *diag , const char
*normin , const MKL_INT *n , const MKL_Complex16 *a , const MKL_INT *ia , const MKL_INT
*ja , const MKL_INT *desca , MKL_Complex16 *x , const MKL_INT *ix , const MKL_INT *jx ,
const MKL_INT *descx , double *scale , double *cnorm , MKL_INT *info );
void pssyttrd (const char *uplo , const MKL_INT *n , float *a , const MKL_INT *ia ,
const MKL_INT *ja , const MKL_INT *desca , float *d , float *e , float *tau , float
*work , const MKL_INT *lwork , MKL_INT *info );
void pdsyttrd (const char *uplo , const MKL_INT *n , double *a , const MKL_INT *ia ,
const MKL_INT *ja , const MKL_INT *desca , double *d , double *e , double *tau , double
*work , const MKL_INT *lwork , MKL_INT *info );
MKL_INT piparmq (const MKL_INT *ictxt , const MKL_INT *ispec , const char *name , const
char *opts , const MKL_INT *n , const MKL_INT *ilo , const MKL_INT *ihi , const MKL_INT
*lworknb );
For descriptions of these functions, please see http://www.netlib.org/scalapack/explore-html/files.html.
p?labad s,d Returns the square root of the underflow and overflow thresholds if the
exponent-range is very large.
p?lachkieee s,d Performs a simple check for the features of the IEEE standard.
p?lasnbt s,d Computes the position of the sign bit of a floating-point number.
1823
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
See Also
pxerbla Error handling routine called by ScaLAPACK routines.
p?labad
Returns the square root of the underflow and overflow
thresholds if the exponent-range is very large.
Syntax
void pslabad (MKL_INT *ictxt , float *small , float *large );
void pdlabad (MKL_INT *ictxt , double *small , double *large );
Include Files
• mkl_scalapack.h
Description
The p?labadfunction takes as input the values computed by p?lamch for underflow and overflow, and
returns the square root of each of these values if the log of large is sufficiently large. This function is
intended to identify machines with a large exponent range, such as the Crays, and redefine the underflow
and overflow limits to be the square roots of the values computed by p?lamch. This function is needed
because p?lamch does not compensate for poor arithmetic in the upper half of the exponent range, as is
found on a Cray.
In addition, this function performs a global minimization and maximization on these values, to support
heterogeneous computing networks.
Input Parameters
ictxt (global)
The BLACS context handle in which the computation takes place.
small (local).
On entry, the underflow threshold as computed by p?lamch.
large (local).
On entry, the overflow threshold as computed by p?lamch.
Output Parameters
small (local).
On exit, if log10(large) is sufficiently large, the square root of small,
otherwise unchanged.
large (local).
On exit, if log10(large) is sufficiently large, the square root of large,
otherwise unchanged.
1824
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?lachkieee
Performs a simple check for the features of the IEEE
standard.
Syntax
void pslachkieee (MKL_INT *isieee , float *rmax , float *rmin );
void pdlachkieee (MKL_INT *isieee , float *rmax , float *rmin );
Include Files
• mkl_scalapack.h
Description
The p?lachkieeefunction performs a simple check to make sure that the features of the IEEE standard are
implemented. In some implementations, p?lachkieee may not return.
This is a ScaLAPACK internal function and arguments are not checked for unreasonable values.
Input Parameters
rmax (local).
The overflow threshold(= ?lamch ('O')).
rmin (local).
The underflow threshold(= ?lamch ('U')).
Output Parameters
isieee (local).
On exit, isieee = 1 implies that all the features of the IEEE standard that
we rely on are implemented. On exit, isieee = 0 implies that some the
features of the IEEE standard that we rely on are missing.
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?lamch
Determines machine parameters for floating-point
arithmetic.
Syntax
float pslamch (MKL_INT *ictxt , char *cmach );
double pdlamch (MKL_INT *ictxt , char *cmach );
Include Files
• mkl_scalapack.h
Description
The p?lamchfunction determines single precision machine parameters.
1825
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input Parameters
ictxt (global). The BLACS context handle in which the computation takes place.
cmach (global)
Specifies the value to be returned by p?lamch:
where
eps = relative machine precision
sfmin = safe minimum, such that 1/sfmin does not overflow
base = base of the machine
prec = eps*base
t = number of (base) digits in the mantissa
rnd = 1.0 when rounding occurs in addition, 0.0 otherwise
emin = minimum exponent before (gradual) underflow
rmin = underflow threshold - base(emin-1)
emax = largest exponent before overflow
rmax = overflow threshold - (baseemax)*(1-eps)
Output Parameters
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?lasnbt
Computes the position of the sign bit of a floating-
point number.
Syntax
void pslasnbt (MKL_INT *ieflag );
void pdlasnbt (MKL_INT *ieflag );
1826
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Include Files
• mkl_scalapack.h
Description
The p?lasnbtfunction finds the position of the signbit of a single/double precision floating point number. This
function assumes IEEE arithmetic, and hence, tests only the 32-nd bit (for single precision) or 32-nd and 64-
th bits (for double precision) as a possibility for the signbit. sizeof(int) is assumed equal to 4 bytes.
If a compile time flag (NO_IEEE) indicates that the machine does not have IEEE arithmetic, ieflag = 0 is
returned.
Output Parameters
ieflag This flag indicates the position of the signbit of any single/double precision
floating point number.
ieflag = 0, if the compile time flag NO_IEEE indicates that the machine
does not have IEEE arithmetic, or if sizeof(int) is different from 4 bytes.
ieflag = 1 indicates that the signbit is the 32-nd bit for a single precision
function.
In the case of a double precision function:
ieflag = 1 indicates that the signbit is the 32-nd bit (Big Endian).
ieflag = 2 indicates that the signbit is the 64-th bit (Little Endian).
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
descinit
Initializes the array descriptor for distributed matrix.
Syntax
void descinit (MKL_INT *desc, const MKL_INT *m, const MKL_INT *n, const MKL_INT *mb,
const MKL_INT *nb, const MKL_INT *irsrc, const MKL_INT *icsrc, const MKL_INT *ictxt,
const MKL_INT *lld, MKL_INT *info);
Description
The descintfunction initializes the array descriptor for distributed matrix.
Input Parameters
mb (global input) The blocking factor used to distribute the rows of the matrix.
MB >= 1.
nb (global input) The blocking factor used to distribute the columns of the
matrix. NB >= 1.
1827
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
lrsrc (global input) The process row over which the first row of the matrix is
distributed. 0 <= IRSRC < NPROW.
lcsrc (global input) The process column over which the first column of the matrix
is distributed. 0 <= ICSRC < NPCOL.
ictxt (global input) The BLACS context handle, indicating the global context of
the operation on the matrix. The context itself is global.
lld (local input) The leading dimension of the local array storing the local
blocks of the distributed matrix. LLD >= MAX(1,LOCr(M)). LOCr() denotes
the number of rows of a global dense matrix that the process in a grid
receives after data distributing.
Output Parameters
info (output)
= 0: successful exit
< 0: if INFO = -i, the i-th argument had an illegal value
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
numroc
Computes the number of rows or columns of a
distributed matrix owned by the process.
Syntax
MKL_INT numroc (const MKL_INT *n, const MKL_INT *nb, const MKL_INT *iproc, const
MKL_INT *srcproc, const MKL_INT *nprocs);
Description
The numrocfunction computes the number of rows or columns of a distributed matrix owned by the process.
Input Parameters
nb (global input) Block size, size of the blocks the distributed matrix is split
into.
iproc (local input) The coordinate of the process whose local array row or column
is to be determined.
srcproc (global input) The coordinate of the process that possesses the first row or
column of the distributed matrix.
nprocs (global input) The total number processes over which the matrix is
distributed.
Output Parameters
1828
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?gemr2d s,d,c,z,i Copies a submatrix from one general rectangular matrix to another.
See Also
pxerbla Error handling routine called by ScaLAPACK routines.
p?gemr2d
Copies a submatrix from one general rectangular
matrix to another.
Syntax
void psgemr2d (MKL_INT *m, MKL_INT *n, float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT *ictxt );
void pdgemr2d (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT *ictxt );
void pcgemr2d (MKL_INT *m , MKL_INT *n MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT
*ictxt );
void pzgemr2d (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb ,
MKL_INT *ictxt );
void pigemr2d (MKL_INT *m , MKL_INT *n , MKL_INT *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT
*ictxt );
Include Files
• mkl_scalapack.h
Description
The p?gemr2dfunction copies the indicated matrix or submatrix of A to the indicated matrix or submatrix of
B. It provides a truly general copy from any block cyclicly-distributed matrix or submatrix to any other block
cyclicly-distributed matrix or submatrix. With p?trmr2d, these functions are the only ones in the ScaLAPACK
library which provide inter-context operations: they can take a matrix or submatrix A in context A
(distributed over process grid A) and copy it to a matrix or submatrix B in context B (distributed over process
grid B).
There does not need to be a relationship between the two operand matrices or submatrices other than their
global size and the fact that they are both legal block cyclicly-distributed matrices or submatrices. This
means that they can, for example, be distributed across different process grids, have varying block sizes and
differing matrix starting points, or be contained in different sized distributed matrices.
1829
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Take care when context A is disjoint from context B. The general rules for which parameters need to be set
are:
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
a (local)
Pointer into the local memory to array of size lld_a* LOCc(ja+n-1)
containing the source matrix A.
ia, ja (global) The row and column indices in the array A indicating the first row
and the first column, respectively, of the submatrix of A) to copy. 1
≤ia≤total_rows_in_a - m +1, 1 ≤ja≤total_columns_in_a - n +1.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
Only dtype_a = 1 is supported, so dlen_ = 9.
If the calling process is not part of the context of A, ctxt_a must be equal to
-1.
ib, jb (global) The row and column indices in the array B indicating the first row
and the first column, respectively, of the submatrix B to which to copy the
matrix. 1 ≤ib≤total_rows_in_b - m +1, 1 ≤jb≤total_columns_in_b - n +1.
descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
Only dtype_b = 1 is supported, so dlen_ = 9.
1830
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
If the calling process is not part of the context of B, ctxt_b must be equal to
-1.
ictxt (global).
The context encompassing at least the union of all processes in context A
and context B. All processes in the context ictxt must call this function,
even if they do not own a piece of either matrix.
Output Parameters
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
p?trmr2d
Copies a submatrix from one trapezoidal matrix to
another.
Syntax
void pstrmr2d (char *uplo , char *diag , MKL_INT *m , MKL_INT *n , float *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , MKL_INT *ictxt );
void pdtrmr2d (char *uplo , char *diag , MKL_INT *m , MKL_INT *n , MKL_INT *nrhs ,
double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib ,
MKL_INT *jb , MKL_INT *descb , MKL_INT *ictxt );
void pctrmr2d (char *uplo , char *diag , MKL_INT *m , MKL_INT *n , MKL_INT *nrhs ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b ,
MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT *ictxt );
void pztrmr2d (char *uplo , char *diag , MKL_INT *m , MKL_INT *n , MKL_INT *nrhs ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b ,
MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT *ictxt );
void pitrmr2d (char *uplo , char *diag , MKL_INT *m , MKL_INT *n , MKL_INT *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , MKL_INT *ictxt );
Include Files
• mkl_scalapack.h
Description
The p?trmr2dfunction copies the indicated matrix or submatrix of A to the indicated matrix or submatrix of
B. It provides a truly general copy from any block cyclicly-distributed matrix or submatrix to any other block
cyclicly-distributed matrix or submatrix. With p?gemr2d, these functions are the only ones in the ScaLAPACK
library which provide inter-context operations: they can take a matrix or submatrix A in context A
(distributed over process grid A) and copy it to a matrix or submatrix B in context B (distributed over process
grid B).
The p?trmr2dfunction assumes the matrix or submatrix to be trapezoidal. Only the upper or lower part is
copied, and the other part is unchanged.
1831
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
There does not need to be a relationship between the two operand matrices or submatrices other than their
global size and the fact that they are both legal block cyclicly-distributed matrices or submatrices. This
means that they can, for example, be distributed across different process grids, have varying block sizes and
differing matrix starting points, or be contained in different sized distributed matrices.
Take care when context A is disjoint from context B. The general rules for which parameters need to be set
are:
Because of its generality, p?trmr2d can be used for many operations not usually associated with copy
functions. For instance, it can be used to a take a matrix on one process and distribute it across a process
grid, or the reverse. If a supercomputer is grouped into a virtual parallel machine with a workstation, for
instance, this function can be used to move the matrix from the workstation to the supercomputer and back.
In ScaLAPACK, it is called to copy matrices from a two-dimensional process grid to a one-dimensional
process grid. It can be used to redistribute matrices so that distributions providing maximal performance can
be used by various component libraries, as well.
Note that this function requires an array descriptor with dtype_ = 1.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
uplo (global) Specifies whether to copy the upper or lower part of the matrix or
submatrix.
diag (global) Specifies whether to copy the diagonal of the matrix or submatrix.
a (local)
Pointer into the local memory to array of size lld_a* LOCc(ja+n-1)
containing the source matrix A.
ia, ja (global) The row and column indices in the array A indicating the first row
and the first column, respectively, of the submatrix of A) to copy. 1
≤ia≤total_rows_in_a - m +1, 1 ≤ja≤total_columns_in_a - n +1.
1832
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
Only dtype_a = 1 is supported, so dlen_ = 9.
If the calling process is not part of the context of A, ctxt_a must be equal to
-1.
ib, jb (global) The row and column indices in the array B indicating the first row
and the first column, respectively, of the submatrix B to which to copy the
matrix. 1 ≤ib≤total_rows_in_b - m +1, 1 ≤jb≤total_columns_in_b - n +1.
descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
Only dtype_b = 1 is supported, so dlen_ = 9.
If the calling process is not part of the context of B, ctxt_b must be equal to
-1.
ictxt (global).
The context encompassing at least the union of all processes in context A
and context B. All processes in the context ictxt must call this function,
even if they do not own a piece of either matrix.
Output Parameters
See Also
Overview for details of ScaLAPACK array descriptor structures and related notations.
1833
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
techniques [Schenk00-2]. To improve sequential and parallel sparse numerical factorization performance, the
algorithms are based on a Level-3 BLAS update and pipelining parallelism is used with a combination of left-
and right-looking supernode techniques [Schenk00, Schenk01, Schenk02, Schenk03]. The parallel pivoting
methods allow complete supernode pivoting to compromise numerical stability and scalability during the
factorization process. For sufficiently large problem sizes, numerical experiments demonstrate that the
scalability of the parallel algorithm is nearly independent of the shared-memory multiprocessing architecture.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
The following table lists the names of the Intel® oneAPI Math Kernel Library (oneMKL) PARDISO routines and
describes their general use.
oneMKL PARDISO Routines
Routine Description
pardisoinit
Initializes Intel® oneAPI Math Kernel Library (oneMKL)
PARDISO with default parameters depending on the
matrix type.
pardiso
Calculates the solution of a set of sparse linear equations
with single or multiple right-hand sides.
pardiso_64
Calculates the solution of a set of sparse linear equations
with single or multiple right-hand sides, 64-bit integer
version.
mkl_pardiso_pivot
Replaces routine which handles Intel® oneAPI Math Kernel
Library (oneMKL) PARDISO pivots with user-defined
routine.
pardiso_getdiag
Returns diagonal elements of initial and factorized matrix.
pardiso_export
Places pointers dedicated for sparse representation of
requested matrix into MKL PARDISO.
pardiso_handle_store
Store internal structures from pardiso to a file.
pardiso_handle_restore
Restore pardiso internal structures from a file.
pardiso_handle_delete
Delete files with pardiso internal structure data.
pardiso_handle_store_64
Store internal structures from pardiso_64 to a file.
pardiso_handle_restore_64
Restore pardiso_64 internal structures from a file.
pardiso_handle_delete_64
Delete files with pardiso_64 internal structure data.
The Intel® oneAPI Math Kernel Library (oneMKL) PARDISO solver supports a wide range of real and complex
sparse matrix types (seethe figure below).
1834
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
__border__top
Sparse Matrices That Can Be Solved with the oneMKL PARDISO Solver
The Intel® oneAPI Math Kernel Library (oneMKL) PARDISO solver performs four tasks:
• analysis and symbolic factorization
• numerical factorization
• forward and backward substitution including iterative refinement
• termination to release all internal solver memory.
To find code examples that use Intel® oneAPI Math Kernel Library (oneMKL) PARDISO routines to solve
systems of linear equations, unzip theC archive file in the examplesfolder of the Intel® oneAPI Math Kernel
Library (oneMKL) installation directory. Code examples will be in theexamples/solverc/source folder.
Symmetric Matrices The solver first computes a symmetric fill-in reducing permutation P based on
either the minimum degree algorithm [Liu85] or the nested dissection algorithm
from the METIS package [Karypis98] (both included with Intel® oneAPI Math
Kernel Library (oneMKL)), followed by the parallel left-right looking numerical
Cholesky factorization [Schenk00-2] of PAPT = LLT for symmetric positive-
definite matrices, or PAPT = LDLT for symmetric indefinite matrices. The solver
uses diagonal pivoting, or 1x1 and 2x2 Bunch-Kaufman pivoting for symmetric
indefinite matrices. An approximation of X is found by forward and backward
substitution and optional iterative refinement.
Whenever numerically acceptable 1x1 and 2x2 pivots cannot be found within the
diagonal supernode block, the coefficient matrix is perturbed. One or two passes
of iterative refinement may be required to correct the effect of the perturbations.
This restricting notion of pivoting with iterative refinement is effective for highly
indefinite symmetric systems. Furthermore, for a large set of matrices from
different applications areas, this method is as accurate as a direct factorization
method that uses complete sparse pivoting techniques [Schenk04].
Another method of improving the pivoting accuracy is to use symmetric weighted
matching algorithms. These algorithms identify large entries in the coefficient
matrix A that, if permuted close to the diagonal, permit the factorization process
1835
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
to identify more acceptable pivots and proceed with fewer pivot perturbations.
These algorithms are based on maximum weighted matchings and improve the
quality of the factor in a complementary way to the alternative of using more
complete pivoting techniques.
The inertia is also computed for real symmetric indefinite matrices.
Structurally Symmetric The solver first computes a symmetric fill-in reducing permutation P followed by
Matrices the parallel numerical factorization of PAPT = QLUT. The solver uses partial
pivoting in the supernodes and an approximation of X is found by forward and
backward substitution and optional iterative refinement.
Nonsymmetric Matrices The solver first computes a nonsymmetric permutation PMPS and scaling matrices
Dr and Dc with the aim of placing large entries on the diagonal to enhance
reliability of the numerical factorization process [Duff99]. In the next step the
solver computes a fill-in reducing permutation P based on the matrix PMPSA +
(PMPSA)T followed by the parallel numerical factorization
QLUR = PPMPSDrADcP
with supernode pivoting matrices Q and R. When the factorization algorithm
reaches a point where it cannot factor the supernodes with this pivoting strategy,
it uses a pivoting perturbation strategy similar to [Li99]. The magnitude of the
potential pivot is tested against a constant threshold of
alpha = eps*||A2||inf,
where eps is the machine precision, A2 = P*PMPS*Dr*A*Dc*P, and ||A2||inf is
the infinity norm of A. Any tiny pivots encountered during elimination are set to
the sign (lII)*eps*||A2||inf, which trades off some numerical stability for the
ability to keep pivots from getting too small. Although many failures could render
the factorization well-defined but essentially useless, in practice the diagonal
elements are rarely modified for a large class of matrices. The result of this
pivoting approach is that the factorization is, in general, not exact and iterative
refinement may be needed.
1836
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
NOTE
Intel® oneAPI Math Kernel Library (oneMKL) supports only the VBSR format for real and symmetric
positive definite or indefinite matrices (mtype = 2 or mtype = -2).
Intel® oneAPI Math Kernel Library (oneMKL) supports these features for all matrix types as long
asiparm[23]=1:
For all storage formats, the Intel® oneAPI Math Kernel Library (oneMKL) PARDISO parameterja is used for
the columns array, ia is used for rowIndex, and a is used for values. The algorithms in Intel® oneAPI Math
Kernel Library (oneMKL) PARDISO require column indicesja to be in increasing order per row and that the
diagonal element in each row be present for any structurally symmetric matrix. For symmetric or
nonsymmetric matrices the diagonal elements which are equal to zero are not necessary.
Caution
Intel® oneAPI Math Kernel Library (oneMKL) PARDISO column indicesja must be in increasing order
per row. You can validate the sparse matrix structure with the matrix checker (iparm[26])
NOTE
While the presence of zero diagonal elements for symmetric matrices is not required, you should
explicitly set zero diagonal elements for symmetric matrices. Otherwise, Intel® oneAPI Math Kernel
Library (oneMKL) PARDISO creates internal copies of arraysia, ja, and a full of diagonal elements,
which require additional memory and computational time. However, the memory and time required the
diagonal elements in internal arrays is usually not significant compared to the memory and the time
required to factor and solve the matrix.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Storage of Matrices
By default, Intel® oneAPI Math Kernel Library (oneMKL) PARDISO stores data in RAM. This is referred to as
In-Core (IC) mode. However, you can specify that Intel® oneAPI Math Kernel Library (oneMKL) PARDISO
store matrices on disk by settingiparm[59]. This mode is called the Out-of-Core (OOC) mode.
You can set the following parameters for the OOC mode.
1837
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
By default, the current working directory is used in the OOC mode as a directory path for storing data. All
work arrays will be stored in files named ooc_temp with different extensions. When
MKL_PARDISO_OOC_FILE_NAME is not set and MKL_PARDISO_OOC_PATH is set, the names for the created files
will contain <path>/mkl_pardiso or <path>\mkl_pardiso depending on the OS. Setting
MKL_PARDISO_OOC_FILE_NAME=<filename> will override the path which could have been set in
MKL_PARDISO_OOC_PATH. In this case <filename> will be used for naming the OOC files.
By default, MKL_PARDISO_OOC_MAX_CORE_SIZE is 2000 (MB) and MKL_PARDISO_OOC_MAX_SWAP_SIZE is 0.
NOTE
Do not set the sum of MKL_PARDISO_OOC_MAX_CORE_SIZE and MKL_PARDISO_OOC_MAX_SWAP_SIZE
greater than the size of the RAM plus the size of the swap memory. Be sure to allow enough free
memory for the operating system and any other processes which need to be running.
By default, all temporary data files will be deleted. For keeping them it is required to set
MKL_PARDISO_OOC_KEEP_FILE to 0.
OOC parameters can be set in a configuration file. You can set the path to this file and its name using
environmental variables MKL_PARDISO_OOC_CFG_PATH and MKL_PARDISO_OOC_CFG_FILE_NAME.
For setting parameters of OOC mode either environment variables or a configuration file can be used. When
the last option is chosen, by default the name of the file is pardiso_ooc.cfg and it should be placed in the
working directory. If needed, the user can set the path to the configuration file using environmental variables
MKL_PARDISO_OOC_CFG_PATH and MKL_PARDISO_OOC_CFG_FILE_NAME. These variables specify the path and
filename as follows:
• Linux* OS and OS X*: <MKL_PARDISO_OOC_CFG_PATH>/ <MKL_PARDISO_OOC_CFG_FILE_NAME>
• Windows* OS: <MKL_PARDISO_OOC_CFG_PATH>\<MKL_PARDISO_OOC_CFG_FILE_NAME>
MKL_PARDISO_OOC_PATH = <path>
MKL_PARDISO_OOC_MAX_CORE_SIZE = N
MKL_PARDISO_OOC_MAX_SWAP_SIZE = K
MKL_PARDISO_OOC_KEEP_FILE = 0 (or 1)
Caution
The maximum length of the path lines in the configuration files is 1000 characters.
Alternatively, the OOC parameters can be set as environment variables via command line.
1838
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
For Linux* OS and OS X*:
A real nonsymmetric matrix A (mtype = 11) is factored by Intel® oneAPI Math Kernel Library (oneMKL)
PARDISO asA = L*U . In this case the solution of the system A*x=b can be found by the following sequence:
L*y=b (forward substitution, phase =331) andU*x=y (backward substitution, phase =333).
Solving a system with a real symmetric indefinite matrix A (mtype = -2) is slightly different from the cases
above. Intel® oneAPI Math Kernel Library (oneMKL) PARDISO factors this matrix asA=LDLT, and the solution
of the system A*x=b can be calculated as the following sequence of substitutions: L*y=b (forward
1839
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
substitution, phase =331), D*v=y (diagonal substitution, phase =332), and finally LT*x=v (backward
substitution, phase =333). Diagonal substitution makes sense only for symmetric indefinite matrices (mtype
= -2, -4, 6). For matrices of other types a solution can be found as described in the first two examples.
Caution
The number of refinement steps (iparm[7]) must be set to zero if a solution is calculated with
separate substitutions (phase = 331, 332, 333), otherwise Intel® oneAPI Math Kernel Library
(oneMKL) PARDISO produces the wrong result.
NOTE
Different pivoting (iparm[20]) produces different LDLT factorization. Therefore results of forward,
diagonal and backward substitutions with diagonal pivoting can differ from results of the same steps
with Bunch-Kaufman pivoting. Of course, the final results of sequential execution of forward, diagonal
and backward substitution are equal to the results of the full solving step (phase=33) regardless of the
pivoting used.
To use the low rank update feature, set iparm[38] = 1 while also setting iparm[23] = 10. Additionally,
supply an array that lists the values in A2 that are different from A1 using the perm parameter as outlined in
the pardiso perm parameter description.
Important
Low rank update can only be called for matrices with the exact same pattern of nonzero values. As
such, the value of the mtype, ia, ja, and iparm[23] parameters should also be identical. In general,
the low rank factorization should be called with the same parameters as the preceding factorization
step for the same internal data structure handle (except for array a, iparm[38], and perm).
Low rank update does not currently support Intel TBB threading. In this case, Intel® oneAPI Math
Kernel Library (oneMKL) PARDISO defaults to full factorization instead.
Low rank update cannot be used in combination with a user-supplied permutation vector - in other
words, you must use the default values of iparm[4] = 0, iparm[30] = 0, and iparm[35] = 0).
Additionally, iparm[3], iparm[5], iparm[27], iparm[36], iparm[55], and iparm[59] must all be
set to the default value of 0.
pardiso
Calculates the solution of a set of sparse linear
equations with single or multiple right-hand sides.
1840
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Syntax
void pardiso (_MKL_DSS_HANDLE_t pt, const MKL_INT *maxfct, const MKL_INT *mnum, const
MKL_INT *mtype, const MKL_INT *phase, const MKL_INT *n, const void *a, const MKL_INT
*ia, const MKL_INT *ja, MKL_INT *perm, const MKL_INT *nrhs, MKL_INT *iparm, const
MKL_INT *msglvl, void *b, void *x, MKL_INT *error);
Include Files
• mkl.h
Description
The pardiso routine calculates the solution of a set of sparse linear equations
A*X = B
with single or multiple right-hand sides, using a parallel LU, LDL, or LLT factorization, where A is an n-by-n
matrix, and X and B are n-by-nrhs vectors or matrices.
Notes
• This routine supports usage of the mkl_progress with OpenMP, TBB, and sequential threading. See
mkl_progress for details. The case of iparm[23]=10 does not support this feature.
• If iparm[26] is set to 1 (Matrix checker), Intel® oneAPI Math Kernel Library PARDISO uses the
auxiliary routine sparse_matrix_checker to check integer arrays ia and ja.
sparse_matrix_checker has its own set of error values (from 21 to 24) that are returned in the
event of an unsuccessful matrix check. For more details, refer to the sparse_matrix_checker
documentation.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/
PerformanceIndex.
Notice revision #20201201
Input Parameters
Caution
After the first call to pardiso do not directly modify pt, as that
could cause a serious memory leak.
1841
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
maxfct Maximum number of factors with identical sparsity structure that must be
kept in memory at the same time. In most applications this value is equal
to 1. It is possible to store several different factorizations with the same
nonzero structure at the same time in the internal data structure
management of the solver.
pardiso can process several matrices with an identical matrix sparsity
pattern and it can store the factors of these matrices at the same time.
Matrices with a different sparsity structure can be kept in memory with
different memory address pointers pt.
mnum Indicates the actual matrix for the solution phase. With this scalar you can
define which matrix to factorize. The value must be: 1 ≤mnum≤maxfct.
mtype Defines the matrix type, which influences the pivoting method. The Intel®
oneAPI Math Kernel Library (oneMKL) PARDISO solver supports the
following matrices:
1842
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
phase Solver Execution Steps
12 Analysis, numerical factorization
22 Numerical factorization
L11 0 D11 0 U 11 U 21
A=
L12 L22 0 D22 0 U 22
You can supply a custom implementation for phase 332 instead of calling
pardiso. For example, it can be implemented with dense LAPACK
functionality. Custom implementation also allows you to substitute the
matrix S with your own.
NOTE
For very large Schur complement matrices use LAPACK
functionality to compute the Schur complement vector instead
of the Intel® oneAPI Math Kernel Library (oneMKL) PARDISO
phase 332 implementation.
1843
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
For CSR3 format, the size of a is the same as that of ja. Refer to the
values array description in Three Array Variation of CSR Format for more
details.
For BSR3 format the size of a is the size of ja multiplied by the square of
the block size. Refer to the values array description in Three Array
Variation of BSR Format for more details.
NOTE
If you set iparm[36]to a negative value, Intel® oneAPI Math
Kernel Library (oneMKL) PARDISO converts the data from CSR3
format to an internal variable BSR (VBSR) format. SeeSparse
Data Storage.
For CSR3 format, ia[i] (i<n) points to the first column index of row i in
the array ja. That is, ia[i] gives the index of the element in array a that
contains the first non-zero element from row i of A. The last element ia[n]
is taken to be equal to the number of non-zero elements in A, plus one.
Refer to rowIndex array description in Three Array Variation of CSR Format
for more details.
For BSR3 format, ia[i] (i<n) points to the first column index of row i in
the array ja. That is, ia[i] gives the index of the element in array a that
contains the first non-zero block from row i of A. The last element ia[n] is
taken to be equal to the number of non-zero blcoks in A, plus one. Refer to
rowIndex array description in Three Array Variation of BSR Format for more
details.
The array ia is accessed in all phases of the solution process.
ja For CSR3 format, array ja contains column indices of the sparse matrix A.
It is important that the indices are in increasing order per row. For
structurally symmetric matrices it is assumed that all diagonal elements are
stored (even if they are zeros) in the list of non-zero elements in a and ja.
For symmetric matrices, the solver needs only the upper triangular part of
the system as is shown for columns array in Three Array Variation of CSR
Format.
For BSR3 format, array ja contains column indices of the sparse matrix A.
It is important that the indices are in increasing order per row. For
structurally symmetric matrices it is assumed that all diagonal blocks are
stored (even if they are zeros) in the list of non-zero blocks in a and ja. For
symmetric matrices, the solver needs only the upper triangular part of the
system as is shown for columns array in Three Array Variation of BSR
Format.
The array ja is accessed in all phases of the solution process.
1844
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
perm Array, size (n). Depending on the value of iparm[4] and iparm[30], holds
the permutation vector of size n, specifies elements used for computing a
partial solution, or specifies differing values of the input matrices for low
rank update.
NOTE
Be aware that setting iparm[4] = 1 prevents use of a parallel
algorithm for the solve step.
iparm Array, size (64). This array is used to pass various parameters to Intel®
oneAPI Math Kernel Library (oneMKL) PARDISO and to return some useful
information after execution of the solver.
See pardiso iparm Parameter for more details about the iparm parameters.
1845
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Output Parameters
(See also Intel MKL PARDISO Parameters in Tabular Form.)
iparm On output, some iparm values report information such as the numbers of
non-zero elements in the factors.
See pardiso iparm Parameter for more details about the iparm parameters.
error Information
0 no error
-1 input inconsistent
-3 reordering problem
1846
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
error Information
-15 internal error which can appear for iparm[23]=10 and
iparm[12]=1. Try switch matching off (set
iparm[12]=0 and rerun.)
pardisoinit
Initialize Intel® oneAPI Math Kernel Library (oneMKL)
PARDISO with default parameters in accordance with
the matrix type.
Syntax
void pardisoinit (_MKL_DSS_HANDLE_t pt, const MKL_INT *mtype, MKL_INT *iparm );
Include Files
• mkl.h
Description
This function initializes the solver handle pt for Intel® oneAPI Math Kernel Library (oneMKL) PARDISO with
zero values (as needed for the very first call of pardiso) and sets default iparm values in accordance with
the matrix type mtype.
The recommended way is to avoid using pardisoinit and to initialize pt and set the values of the iparm
array manually as the default parameters might not be the best for a particular use case.
An alternative method to set default iparm values is to call pardiso in the analysis phase with iparm(1)=0.
In this case, the solver handle pt must be initialized with zero values.
The pardisoinit routine initializes only the in-core version of Intel® oneAPI Math Kernel Library (oneMKL)
PARDISO. Switching to the out-of-core version of Intel® oneAPI Math Kernel Library (oneMKL) PARDISO as
well as changing default iparm values can be done after the call to pardisoinit but before the first call to
pardiso.
The pardisoinit routine cannot be used together with the pardiso_64 routine.
Input Parameters
mtype Matrix type. Based on this value pardisoinit chooses default values for
the iparm array. Refer to the section oneMKL PARDISO Parameters in
Tabular Formfor more details about the default values of Intel® oneAPI Math
Kernel Library (oneMKL) PARDISO.
Output Parameters
1847
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
It is very important that pt is initialized with zero before the
first call of Intel® oneAPI Math Kernel Library (oneMKL)
PARDISO. After that first call you must never modify the array,
because it could cause a serious memory leak or a crash.
iparm Array of size 64. This array is used to set various options for Intel® oneAPI
Math Kernel Library (oneMKL) PARDISO and to return some useful
information after execution of the solver. Thepardisoinit routine fills in
the iparm array with the default values. Refer to the section oneMKL
PARDISO Parameters in Tabular Form for more details about the default
values of Intel® oneAPI Math Kernel Library (oneMKL) PARDISO.
pardiso_64
Calculates the solution of a set of sparse linear
equations with single or multiple right-hand sides, 64-
bit integer version.
Syntax
void pardiso_64 (_MKL_DSS_HANDLE_t pt, const long long int *maxfct, const long long int
*mnum, const long long int *mtype, const long long int *phase, const long long int *n,
const void *a, const long long int *ia, const long long int *ja, long long int *perm,
const long long int *nrhs, long long int *iparm, const long long int *msglvl, void *b,
void *x, long long int *error);
Include Files
• mkl.h
Description
pardiso_64 is an alternative ILP64 (64-bit integer) version of the pardiso routine (see Description section
for more details). The interface of pardiso_64 is the same as the interface of pardiso, but it accepts and
returns all integer data as long long int.
Use pardiso_64 when pardisofor solving large matrices (with the number of non-zero elements on the
order of 500 million or more). You can use it together with the usual LP64 interfaces for the rest of Intel®
oneAPI Math Kernel Library (oneMKL) functionality. In other words, if you use 64-bit integer version
(pardiso_64), you do not need to re-link your applications with ILP64 libraries. Take into account that
pardiso_64 may perform slower than regular pardiso on the reordering and symbolic factorization phase.
NOTE
pardiso_64 is supported only in the 64-bit libraries. If pardiso_64 is called from the 32-bit libraries,
it returns error =-12.
NOTE
This routine supports the Progress Routine feature. See Progress Function for details.
1848
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Input Parameters
The input parameters of pardiso_64 are the same as the input parameters of pardiso, but pardiso_64
accepts all integer data as long long int.
Output Parameters
The output parameters of pardiso_64 are the same as the output parameters of pardiso, but pardiso_64
returns all integer data as long long int.
mkl_pardiso_pivot
Replaces routine which handles Intel® oneAPI Math
Kernel Library (oneMKL) PARDISO pivots with user-
defined routine.
Syntax
void mkl_pardiso_pivot (const void *ai, void *bi, const void *eps);
Include Files
• mkl.h
Description
The mkl_pardiso_pivotroutine allows you to handle diagonal elements which arise during numerical
factorization that are zero or near zero. By default, Intel® oneAPI Math Kernel Library (oneMKL) PARDISO
determines that a diagonal elementbi is a pivot if bi < eps, and if so, replaces it with eps. But you can
provide your own routine to modify the resulting factorized matrix in case there are small elements on the
diagonal during the factorization step.
NOTE
To use this routine, you must set iparm[55] to 1 before the main pardiso loop.
NOTE
The matrix types mtype=2 (symmetric positive-definite matrix) and mtype=4 (complex and
Hermitian positive definite) are not supported, because the Cholesky factorization without
pivoting is used for these matrix types.
Input Parameters
Output Parameters
bi In case element is chosen as a pivot, value with which to replace the pivot.
1849
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
pardiso_getdiag
Returns diagonal elements of initial and factorized
matrix.
Syntax
void pardiso_getdiag (const _MKL_DSS_HANDLE_t pt, void *df, void *da, const MKL_INT
*mnum, MKL_INT *error);
Include Files
• mkl.h
Description
This routine returns the diagonal elements of the initial and factorized matrix for a real or Hermitian matrix.
NOTE
In order to use this routine, you must set iparm[55] to 1 before the main pardiso loop.
If iparm[23] is set to 10 (an improved two-level factorization algorithm for nonsymmetric matrices),
Intel® oneAPI Math Kernel Library PARDISO will automatically use the classic algorithm for
factorization.
Input Parameters
pt Array with a size of 64. Handle to internal data structure for the Intel®
oneAPI Math Kernel Library (oneMKL) PARDISO solver. The entries must be
set to zero prior to the first call topardiso. Unique for factorization.
mnum Indicates the actual matrix for the solution phase of the Intel® oneAPI Math
Kernel Library (oneMKL) PARDISO solver. With this scalar you can define the
diagonal elements of the factorized matrix that you want to obtain. The
value must be: 1 ≤mnum ≤ maxfct. In most applications this value is 1.
Output Parameters
NOTE
Elements of df correspond to diagonal elements of matrix
Lcomputed during phase 22. Because during phase 22 Intel®
oneAPI Math Kernel Library (oneMKL) PARDISO makes
additional permutations to improve stability, it is possible that
arraydf is not in line with the perm array computed during phase
11.
1850
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
error Information
0 no error
pardiso_export
Places pointers dedicated for sparse representation of
a requested matrix (values, rows, and columns) into
MKL PARDISO
Syntax
void pardiso_export (const _MKL_DSS_HANDLE_t pt, void* values, MKL_INT* rows, MKL_INT*
columns, MKL_INT* step, MKL_INT* iparm, MKL_INT* error);
Include Files
• mkl.h
Description
This auxiliary routine places pointers dedicated for sparse representation of a requested matrix (values,
rows, and columns) into MKL PARDISO. The matrix will be stored in the three-array variant of the
compressed sparse row (CSR3 format) with 0-based indexing.
NOTE
Currently, this routine can be used only for a sparse Schur complement matrix. All
parameters related to the Schur complement matrix (perm, iparm) must be set before the
reordering stage of MKL PARDISO (phase = 11) is called.
Input Parameters
®
pt Array with a size of 64. Handle to internal data structure for the Intel
MKL PARDISO solver. The entries must be set to zero prior to the first
call to pardiso. Unique for factorization.
®
iparm This array is used to pass various parameters to Intel MKL PARDISO
and to return some useful information after execution of the solver.
Step Notes
value
1
Used to place pointers related to a Schur complement
matrix in MKL PARDISO. The routine with step equal to
1 must be called between the reordering and
factorization phases of MKL PARDISO.
−1
Used to clean the internal handle.
1851
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Input/Output Parameters
• 0 indicates no error.
• 1 indicates inconsistent input data.
Usage Example
The following C-style example demonstrates how to use the pardiso_export routine to get the sparse
representation (that is, three-array CSR format) of a Schur complement matrix.
#include "mkl.h"
/*
* Call the reordering phase of MKL PARDISO with iparm[35] set to -1 in
* order to compute the Schur complement matrix only, or -2 to compute all
* factorization arrays. perm array indices related to the Schur complement
* matrix must be set to 1.
*/
phase = 11;
for ( i = 0; i < schur_size; i++ ) { perm[i] = 1.; }
iparm[35] = -1;
pardiso(pt, &maxfct, &mnum, &mtype, &phase, &n, a, ia, ja, perm, &nrhs,
iparm, &msglvl, b, x, &error);
/*
* After the reordering phase, iparm[35] will contain the number of non-zero
* elements for the Schur complement matrix. Arrays dedicated to the sparse
* representation of the Schur complement matrix must be allocated before
* the factorization stage of MKL PARDISO is called.
*/
schur_nnz = iparm[35];
schur_rows = (MKL_INT *) mkl_malloc(schur_size+1, ALIGNMENT);
schur_columns = (MKL_INT *) mkl_malloc(schur_nnz , ALIGNMENT);
schur_values = (DATA_TYPE *) mkl_malloc(schur_nnz , ALIGNMENT);
/*
* Call to the pardiso_export routine with step equal to 1 in order to put
1852
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
* pointers related to the three-array CSR format into MKL PARDISO:
*/
pardiso_export(pt, schur_values, schur_ia, schur_ja, &step, iparm, &error);
/*
* Call the factorization phase of PARDISO with iparm[35] equal to -1 or -2
* to compute the Schur complement matrix:
*/
phase = 22;
iparm[35] = -1;
pardiso(pt, &maxfct, &mnum, &mtype, &phase, &n, a, ia, ja, perm, &nrhs,
iparm, &msglvl, b, x, &error);
/*
* After the factorization stage, schur_values, schur_rows, and
* schur_columns will contain the Schur complement matrix in CSR3 format.
*/
pardiso_handle_store
Store internal structures from pardiso to a file.
Syntax
void pardiso_handle_store (_MKL_DSS_HANDLE_t pt, const char *dirname, MKL_INT *error);
Include Files
• mkl.h
Description
This function stores Intel® oneAPI Math Kernel Library (oneMKL) PARDISO structures to a file, allowing you to
store Intel® oneAPI Math Kernel Library (oneMKL) PARDISO internal structures between the stages of
thepardiso routine. The pardiso_handle_restoreroutine can restore the Intel® oneAPI Math Kernel Library
(oneMKL) PARDISO internal structures from the file.
Input Parameters
dirname String containing the name of the directory to which to write the files with
the content of the internal structures. Use an empty string ("") to specify
the current directory. The routine creates a file named handle.pds in the
directory.
Output Parameters
error Information
0 No error.
1853
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
error Information
-11 Error while writing to file.
pardiso_handle_restore
Restore pardiso internal structures from a file.
Syntax
void pardiso_handle_restore (_MKL_DSS_HANDLE_t pt, const char *dirname, MKL_INT
*error);
Include Files
• mkl.h
Description
This function restores Intel® oneAPI Math Kernel Library (oneMKL) PARDISO structures from a file. This
allows you to restore Intel® oneAPI Math Kernel Library (oneMKL) PARDISO internal structures stored
bypardiso_handle_store after a phase of the pardiso routine and continue execution of the next phase.
Input Parameters
dirname String containing the name of the directory in which the file with the
content of the internal structures are located. Use an empty string ("") to
specify the current directory.
Output Parameters
error Information
0 No error.
pardiso_handle_delete
Delete files with pardiso internal structure data.
Syntax
void pardiso_handle_delete (const char *dirname, MKL_INT *error);
Include Files
• mkl.h
1854
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Description
This function deletes files generated with pardiso_handle_storethat contain Intel® oneAPI Math Kernel
Library (oneMKL) PARDISO internal structures.
Input Parameters
dirname String containing the name of the directory in which the file with the
content of the internal structures are located. Use an empty string ("") to
specify the current directory.
Output Parameters
error Information
0 No error.
pardiso_handle_store_64
Store internal structures from pardiso_64 to a file.
Syntax
void pardiso_handle_store_64 (_MKL_DSS_HANDLE_t pt, const char *dirname, MKL_INT
*error);
Include Files
• mkl.h
Description
This function stores Intel® oneAPI Math Kernel Library (oneMKL) PARDISO structures to a file, allowing you to
store Intel® oneAPI Math Kernel Library (oneMKL) PARDISO internal structures between the stages of
thepardiso_64 routine. The pardiso_handle_restore_64routine can restore the Intel® oneAPI Math Kernel
Library (oneMKL) PARDISO internal structures from the file.
Input Parameters
dirname String containing the name of the directory to which to write the files with
the content of the internal structures. Use an empty string ("") to specify
the current directory. The routine creates a file named handle.pds in the
directory.
Output Parameters
error Information
0 No error.
1855
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
error Information
-2 Not enough memory.
pardiso_handle_restore_64
Restore pardiso_64 internal structures from a file.
Syntax
void pardiso_handle_restore_64 (_MKL_DSS_HANDLE_t pt, const char *dirname, MKL_INT
*error);
Include Files
• mkl.h
Description
This function restores Intel® oneAPI Math Kernel Library (oneMKL) PARDISO structures from a file. This
allows you to restore Intel® oneAPI Math Kernel Library (oneMKL) PARDISO internal structures stored
bypardiso_handle_store_64 after a phase of the pardiso_64 routine and continue execution of the next
phase.
Input Parameters
dirname String containing the name of the directory in which the file with the
content of the internal structures are located. Use an empty string ("") to
specify the current directory.
Input Parameters
error Information
0 No error.
1856
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
pardiso_handle_delete_64
Syntax
Delete files with pardiso_64 internal structure data.
void pardiso_handle_delete_64 (const char *dirname, MKL_INT *error);
Include Files
• mkl.h
Description
This function deletes files generated with pardiso_handle_store_64that contain Intel® oneAPI Math Kernel
Library (oneMKL) PARDISO internal structures.
Input Parameters
dirname String containing the name of the directory in which the file with the
content of the internal structures are located. Use an empty string ("") to
specify the current directory.
Output Parameters
error Information
0 No error.
1857
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
1858
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Parameter Type Description Values Comments In/
Out
1859
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE
Unless you have specified
low rank update,
iparm[34] indicates
whether row/column
indexing starts from 1 or 0.
nrhs >=0 in
MKL_INT* Number of right- Generally used value is 1
hand sides that
To obtain better Intel®
need to be solved
oneAPI Math Kernel
for
Library (oneMKL) PARDISO
performance, during the
numerical factorization
phase you can provide the
maximum number of
1860
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Parameter Type Description Values Comments In/
Out
1861
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
NOTE If a two-level factorization algorithm is chosen (that is, iparm[23]=1), then only
nested dissection algorithms are available (iparm[1]=2 or iparm[1]=3).
1862
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Component Description
NOTE
Setting iparm[1] = 3 prevents the use of CNR mode (iparm[33] > 0)
because Intel® oneAPI Math Kernel Library (oneMKL) PARDISO uses dynamic
parallelism.
iparm[2]
Reserved. Set to zero.
iparm[3]
Preconditioned CGS/CG.
input This parameter controls preconditioned CGS [Sonn89] for nonsymmetric or structurally
symmetric matrices and Conjugate-Gradients for symmetric matrices. iparm[3] has
the form iparm[3]= 10*L+K.
K=0
The factorization is always computed as required by phase.
K=1
CGS iteration replaces the computation of LU. The preconditioner is LU that
was computed at a previous step (the first step or last step with a failure) in a
sequence of solutions needed for identical sparsity patterns.
K=2
CGS iteration for symmetric positive definite matrices replaces the computation
of LLT. The preconditioner is LLT that was computed at a previous step (the
first step or last step with a failure) in a sequence of solutions needed for
identical sparsity patterns.
The value L controls the stopping criterion of the Krylov Subspace iteration:
epsCGS = 10-L is used in the stopping criterion
||dxi|| / ||dx0|| < epsCGS
where ||dxi|| = ||inv(L*U)*ri|| for K = 1 or ||dxi|| = ||inv(L*LT)*ri|| for
K = 2 and ri is the residue at iteration i of the preconditioned Krylov Subspace
iteration.
A maximum number of 150 iterations is fixed with the assumption that the iteration
will converge before consuming half the factorization time. Intermediate convergence
rates and residue excursions are checked and can terminate the iteration process. If
phase =23, then the factorization for a given A is automatically recomputed in cases
where the Krylov Subspace iteration failed, and the corresponding direct solution is
returned. Otherwise the solution from the preconditioned Krylov Subspace iteration is
returned. Using phase =33 results in an error message (error=-4) if the stopping
criteria for the Krylov Subspace iteration can not be reached. More information on the
failure can be obtained from iparm[19].
The default is iparm[3]=0, and other values are only recommended for an advanced
user. iparm[3] must be greater than or equal to zero.
Examples:
iparm[3] Description
iparm[4]
User permutation.
input
1863
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Component Description
This parameter controls whether user supplied fill-in reducing permutation is used
instead of the integrated multiple-minimum degree or nested dissection algorithms.
Another use of this parameter is to control obtaining the fill-in reducing permutation
vector calculated during the reordering stage of Intel® oneAPI Math Kernel Library
(oneMKL) PARDISO.
This option is useful for testing reordering algorithms, adapting the code to special
applications problems (for instance, to move zero diagonal elements to the end of
P*A*PT), or for using the permutation vector more than once for matrices with
identical sparsity structures. For definition of the permutation, see the description of
the perm parameter.
Caution
You can only set one of iparm[4], iparm[30], and iparm[35], so be sure that the
iparm[30] (partial solution) and the iparm[35] (Schur complement) parameters are 0
if you set iparm[4].
0*
User permutation in the perm array is ignored.
1
Intel® oneAPI Math Kernel Library (oneMKL) PARDISO uses the user supplied
fill-in reducing permutation from theperm array. iparm[1] is ignored.
NOTE
Setting iparm[4] = 1 prevents use of a parallel algorithm for the solve step.
2
Intel® oneAPI Math Kernel Library (oneMKL) PARDISO returns the permutation
vector computed at phase 1 in theperm array.
iparm[5]
Write solution on x.
input
NOTE
The array x is always used.
0*
The array x contains the solution; right-hand side vector b is kept unchanged.
1
The solver stores the solution on the right-hand side b.
iparm[6]
Number of iterative refinement steps performed.
output Reports the number of iterative refinement steps that were actually performed during
the solve step.
iparm[7]
Iterative refinement step.
input On entry to the solve and iterative refinement step, iparm[7] must be set to the
maximum number of iterative refinement steps that the solver performs.
2*
The solver automatically performs two steps of iterative refinement when
iparm[0] is set to 0.
>0
Maximum number of iterative refinement steps that the solver performs. The
solver will stop the iterative refinement process if:
• a satisfactory level of accuracy of the solution in terms of backward error
(see the iparm[8] description) is achieved,
1864
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Component Description
NOTE Currently, this feature is supported only for sequential and OpenMP
threading.
iparm[8] input
Tolerance level for the backward error in the iterative refinement process. If set to a
non-zero value, the following criterion is used for stopping the iterative refinement:
where x is the computed solution and b is the right-hand side. A modulus sign on a
vector or a matrix is used to indicate the vector or matrix obtained by replacing all
entries by their moduli (absolute values). The i-th component of the scaled residual
1865
1 Developer Reference for Intel® oneAPI Math Kernel Library for C
Component Description
8*
The default value for symmetric indefinite matrices (mtype =-2, mtype=-4,
mtype=6), eps = 10-8.
iparm[10]
Scaling vectors.
input Intel® oneAPI Math Kernel Library (oneMKL) PARDISO uses a maximum weight
matching algorithm to permute large elements on the diagonal and to scale so that the
diagonal elements are equal to 1 and the absolute values of the off-diagonal entries
are less than or equal to 1. This scaling method is applied only to nonsymmetric
matrices (mtype = 11 or mtype = 13). The scaling can also be used for symmetric
indefinite matrices (mtype = -2, mtype =-4, or mtype = 6) when the symmetric
weighted matchings are applied (iparm[12] = 1).
Use iparm[10] = 1 (scaling) and iparm[12] = 1 (matching) for highly indefinite
symmetric matrices, for example, from interior point optimizations or saddle point
problems. Note that in the analysis phase (phase=11) you must provide the numerical
values of the matrix A in array a in case of scaling and symmetric weighted matching.
0*
Disable scaling. Default for symmetric indefinite matrices.
1*
Enable scaling. Default for nonsymmetric matrices.
Scale the matrix so that the diagonal elements are equal to 1 and the absolute
values of the off-diagonal entries are less or equal to 1. This scaling method is
applied to nonsymmetric matrices (mtype = 11, mtype = 13). The scaling can
also be used for symmetric indefinite matrices (mtype = -2, mtype = -4,
mtype = 6) when the symmetric weighted matchings are applied (iparm[12]
= 1).
Note that in the analysis phase (phase=11) you must provide the numerical
values of the matrix A in case of scaling.
iparm[11]
Solve with transposed or conjugate transposed matrix A.
input
NOTE
For real matrices, the terms transposed and conjugate transposed are equivalent.
0*
Solve a linear system AX = B.
1
Solve a conjugate transposed system AHX = B based on the factorization of the
matrix A.
2
Solve a transposed system ATX = B based on the factorization of the matrix A.
iparm[12]
Improved accuracy using (non-) symmetric weighted matching.
input Intel® oneAPI Math Kernel Library (oneMKL) PARDISO can use a maximum weighted
matching algorithm to permute large elements close the diagonal. This strategy adds
an additional level of reliability to the factorization methods and complements the
alternative of using more complete pivoting techniques during the numerical
factorization.
0*
Disable matching. Default for symmetric indefinite matrices.
1*
Enable matching. Default for nonsymmetric matrices.
Maximum weighted matching algorithm to permute large elements close to the
diagonal.
1866
Developer Reference for Intel® oneAPI Math Kernel Library - C 1
Component Description