A = U * SIGMA * transpose(V)
where SIGMA is an M-by-N matrix which is zero except for its
min(M,N) diagonal elements, U is an M-by-M orthogonal matrix, and
V is an N-by-N orthogonal matrix. The diagonal elements of SIGMA
are the singular values of A and the columns of U and V are the
corresponding right and left singular vectors, respectively. The
singular values are returned in array S in decreasing order and
only the first min(M,N) columns of U and rows of VT = V**T are
computed.
Notes
=====
Each global data object is described by an associated description
vector. This vector stores the information required to establish
the mapping between an object element and its corresponding process
and memory location.
Let A be a generic term for any 2D block cyclicly distributed array.
Such a global array has an associated description vector DESCA.
In the following comments, the character _ should be read as
"of the global array".
NOTATION STORED IN EXPLANATION
--------------- -------------- --------------------------------------
DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case,
DTYPE_A = 1.
CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
the BLACS process grid A is distribu-
ted over. The context itself is glo-
bal, but the handle (the integer
value) may vary.
M_A (global) DESCA( M_ ) The number of rows in the global
array A.
N_A (global) DESCA( N_ ) The number of columns in the global
array A.
MB_A (global) DESCA( MB_ ) The blocking factor used to distribute
the rows of the array.
NB_A (global) DESCA( NB_ ) The blocking factor used to distribute
the columns of the array.
RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
row of the array A is distributed.
CSRC_A (global) DESCA( CSRC_ ) The process column over which the
first column of the array A is
distributed.
LLD_A (local) DESCA( LLD_ ) The leading dimension of the local
array. LLD_A >= MAX(1,LOCr(M_A)).
Let K be the number of rows or columns of a distributed matrix, and
assume that its process grid has dimension p x q. LOCr( K ) denotes
the number of elements of K that a process would receive if K were
distributed over the p processes of its process column. Similarly,
LOCc( K ) denotes the number of elements of K that a process would
receive if K were distributed over the q processes of its process
row. The values of LOCr() and LOCc() may be determined via a call
to the ScaLAPACK tool function, NUMROC:
LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ).
An upper bound for these quantities may be computed by:
LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A
MP = number of local rows in A and U NQ = number of local columns in A and VT SIZE = min( M, N ) SIZEQ = number of local columns in U SIZEP = number of local rows in VT
LWORK > 2 + 6*SIZEB + MAX(WATOBD, WBDTOSVD),
where SIZEB = MAX(M,N), and WATOBD and WBDTOSVD refer, respectively, to the workspace required to bidiagonalize the matrix A and to go from the bidiagonal matrix to the singular value decomposition U*S*VT.
For WATOBD, the following holds:
WATOBD = MAX(MAX(WPDLANGE,WPDGEBRD), MAX(WPDLARED2D,WPDLARED1D)),
where WPDLANGE, WPDLARED1D, WPDLARED2D, WPDGEBRD are the workspaces required respectively for the subprograms PDLANGE, PDLARED1D, PDLARED2D, PDGEBRD. Using the standard notation
MP = NUMROC( M, MB, MYROW, DESCA( CTXT_ ), NPROW), NQ = NUMROC( N, NB, MYCOL, DESCA( LLD_ ), NPCOL),
the workspaces required for the above subprograms are
WPDLANGE = MP, WPDLARED1D = NQ0, WPDLARED2D = MP0, WPDGEBRD = NB*(MP + NQ + 1) + NQ,
where NQ0 and MP0 refer, respectively, to the values obtained at MYCOL = 0 and MYROW = 0. In general, the upper limit for the workspace is given by a workspace required on processor (0,0):
WATOBD <= NB*(MP0 + NQ0 + 1) + NQ0.
In case of a homogeneous process grid this upper limit can be used as an estimate of the minimum workspace for every processor.
For WBDTOSVD, the following holds:
WBDTOSVD = SIZE*(WANTU*NRU + WANTVT*NCVT) + MAX(WDBDSQR, MAX(WANTU*WPDORMBRQLN, WANTVT*WPDORMBRPRT)),
1, if left(right) singular vectors are wanted WANTU(WANTVT) = 0, otherwise
and WDBDSQR, WPDORMBRQLN and WPDORMBRPRT refer respectively to the workspace required for the subprograms DBDSQR, PDORMBR(QLN), and PDORMBR(PRT), where QLN and PRT are the values of the arguments VECT, SIDE, and TRANS in the call to PDORMBR. NRU is equal to the local number of rows of the matrix U when distributed 1-dimensional "column" of processes. Analogously, NCVT is equal to the local number of columns of the matrix VT when distributed across 1-dimensional "row" of processes. Calling the LAPACK procedure DBDSQR requires
WDBDSQR = MAX(1, 2*SIZE + (2*SIZE - 4)*MAX(WANTU, WANTVT))
on every processor. Finally,
WPDORMBRQLN = MAX( (NB*(NB-1))/2, (SIZEQ+MP)*NB)+NB*NB, WPDORMBRPRT = MAX( (MB*(MB-1))/2, (SIZEP+NQ)*MB )+MB*MB,
If LIWORK = -1, then LIWORK is global input and a workspace query is assumed; the routine only calculates the minimum size for the work array. The required workspace is returned as the first element of WORK and no error message is issued by PXERBLA.