This section describe the calling sequence arguments for vectors and matrices, and shows how to distribute vectors, matrices and sequences in your program for the following areas:
An example of block-cyclic distribution of a global matrix in a Fortran 90 program in a message passing environment is shown in Appendix B, Sample Programs. See the following:
For the Level 2 and 3 PBLAS, Dense Linear Algebraic Equations, and Eigensystem Analysis and Singular Value Analysis subroutines, certain calling sequence arguments are used to specify block-cyclically-distributed vectors or matrices.
Table 14 describes the arguments associated with a vector
X. Table 15 describes the arguments associated with a matrix
A.
Table 14. Calling Sequence Arguments for a Block-Cyclically-Distributed Vector
| Argument | Meaning |
|---|---|
| x | is the local part of the global matrix X. To determine the size of the local array for X, see Determining the Number of Rows and Columns in Your Local Arrays. |
| ix | is the row index of global matrix X. |
| jx | is the column index of global matrix X. |
| desc_x | is the array descriptor for global matrix X. (See Table 16.) |
| incx | Stride for global vector X. |
Table 15. Calling Sequence Arguments for a Block-Cyclically-Distributed Matrix
| Argument | Meaning |
|---|---|
| a | is the local part of the global matrix A. To determine the size of the local array for A, see Determining the Number of Rows and Columns in Your Local Arrays. |
| ia | is the row index of the global matrix A. |
| ja | is the column index of the global matrix A. |
| desc_a | is the array descriptor for global matrix A. (See Table 16.) |
An array descriptor, which is an integer array, is needed for each block-cyclically-distributed vector or matrix. The process grid definition and array descriptor are used to establish the mapping between the global vector or matrix and its corresponding process and distributed memory location.
Throughout this book, the _ (underscore) symbol in the array descriptor is followed by an X to indicate a vector or an A to indicate a matrix.
An example of setting up descriptor arrays in a Fortran 90 program is shown in Appendix B, Sample Programs. See the subroutines initialize_rarray and initialize_carray in Module Scale.
Table 16 shows the type-1 array descriptor, as it is used in the
Level 2 and 3 PBLAS, Dense Linear Algebraic Equations, and Eigensystem
Analysis and Singular Value Analysis subroutines.
Table 16. Type-1 Array Descriptor for Block-Cyclically Distributed Vector or Matrix
| DESC_( ) | Symbolic name | Meaning |
|---|---|---|
| 1 | DTYPE_ | Descriptor type, where DTYPE_=1 |
| 2 | CTXT_ | BLACS context in which the global matrix is defined. (See Initializing the BLACS.) |
| 3 | M_ | Number of rows in the global matrix |
| 4 | N_ | Number of columns in the global matrix |
| 5 | MB_ | Row block size |
| 6 | NB_ | Column block size |
| 7 | RSRC_ | The process row of the p × q process grid over which the first row of the global matrix is distributed |
| 8 | CSRC_ | The process column of the p × q process grid over which the first column of the global matrix is distributed |
| 9 | LLD_ | Leading dimension of the local array. (See Determining the Number of Rows and Columns in Your Local Arrays.) This value may be different on each process. |
After a global vector or matrix is block-cyclically distributed over a process grid, you may decide to use only a portion of the global data structure. This is called a submatrix. For examples of how to specify the calling sequence arguments, listed in Table 14 and Table 15, for a submatrix, see:
Suppose you decide to distribute your global vector or matrix over the process grid, starting at a process other than 0,0. For examples of how to set the array descriptor values, listed in Table 16, see:
In a Parallel ESSL calling sequence, you specify an array that contains the local part of the global vector or matrix. To determine LOCp(M_) or LOCq(N_), which are used in the subroutines descriptions in Part 2 of this book, you must make a call to NUMROC:
LOCp(M_) = NUMROC (M_, MB_, myrow, RSRC_, p)
where:
LOCq(N_) = NUMROC (N_, NB_, mycol, CSRC_, q)
where:
For the Banded Linear Algebraic Equations, certain calling sequence arguments are used to specify block-cyclically distributed matrices on one-dimensional process grids.
Although the global array is block-cyclically distributed, the actual submatrix used in computation is either block-row or block-column distributed. See the appropriate subroutine for restrictions.
A symmetric band matrix must be distributed over a one-dimensional process grid:
Table 17 describes the calling sequence arguments associated with a
symmetric band matrix.
Table 17. Calling Sequence Arguments for a Distributed Symmetric Band Matrix
| Argument | Meaning |
|---|---|
| n | is the order of the global symmetric band submatrix A. |
| a | is the local part of the global symmetric band matrix A. |
| ja | is the column index of the global symmetric band matrix A. |
| desc_a | is the array descriptor for the global symmetric band matrix A. For more details, see Table 21 and Table 16. |
A general tridiagonal matrix, represented as three vectors, must be
distributed over a one-dimensional process grid using a block-cyclic data
distribution. Because vectors are one-dimensional data structures, you
can use type-501, type-502, or type-1 array descriptor regardless of whether
the process grid is p × 1 or
1 × p. Table 18 describes the calling sequence
arguments associated with a general tridiagonal matrix.
Table 18. Calling Sequence Arguments for General Tridiagonal Matrix
| Argument | Meaning |
|---|---|
| n | is the order of the global general tridiagonal submatrix A. |
| dl, d, du | is the local part of the global vectors. (The general tridiagonal matrix A is stored in tridiagonal storage mode in dl, d, and du.) |
| ia | is the row index of the global general tridiagonal matrix A. |
| desc_a | is the array descriptor for the global general tridiagonal matrix A. For more details, see Table 21, Table 16, or Table 22. |
A symmetric tridiagonal matrix, represented as two vectors, must be distributed over a one-dimensional process grid using block-cyclic data distribution.
Because vectors are one-dimensional data structures, you can use a
type-501, type-502, or type-1 array descriptor regardless of whether the
process grid is p × 1 or
1 × p. Table 19 describes the calling sequence
arguments associated with a symmetric tridiagonal matrix.
Table 19. Calling Sequence Arguments for a Symmetric Tridiagonal Matrix
| Argument | Meaning |
|---|---|
| n | is the order of the global symmetric tridiagonal submatrix A. |
| d, e | is the local part of the global vectors. (The symmetric tridiagonal matrix A is stored in parallel-symmetric-tridiagonal storage mode in d and e.) |
| ia | is the row index of the global symmetric tridiagonal matrix A. |
| desc_a | is the array descriptor for the global symmetric tridiagonal matrix A. For more details, see Table 21, Table 16, or Table 22. |
For the Banded Linear Algebraic Equations subroutines, a general matrix consisting of multiple right-hand sides must be distributed over a one-dimensional process grid:
Table 20 describes the calling sequence arguments associated with the general matrix.
Table 20. Calling Sequence Arguments for a Matrix Containing the Multiple Right-Hand Sides
| Argument | Meaning |
|---|---|
| n | is the number of rows in the global general submatrix B. |
| b | is the local part of the global general matrix B. |
| ib | is the row index of the global general matrix B. |
| desc_b | is the array descriptor for the global general matrix B. For more details, see Table 22 and Table 16. |
An array descriptor, which is an integer array, is needed for each block-distributed matrix. The process grid definition and the array descriptor are used to establish the mapping between the global matrix and its corresponding process and distributed memory location.
In the Banded Linear Algebraic Equations sections throughout this book, the _ (underscore) symbol in the array descriptor is followed by an A or a B. A indicates a banded, tridiagonal, or symmetric tridiagonal matrix. B indicates a matrix containing the multiple right-hand sides matrix.
When you place a call to the banded or tridiagonal subroutines, you must be careful to choose consistent combinations of array descriptor types for matrix A and matrix B, and process grids. For consistent combinations, see the "Notes and Coding Rules" in the subroutine descriptions in Part 2 of this book.
Therefore, depending on which subroutine you are using in the Banded Linear Algebraic Equations, you may choose different array descriptors in the same subroutine calling sequence. Keep in mind you must only create one process grid; that is, CTXT_A = CTXT_B.
For example, when calling PDPBSV suppose you choose DTYPE_A = 501 for the
band matrix A and DTYPE_B = 502 for matrix B. If
you specify CTXT_A as 1 × p, you must also specify
CTXT_B as 1 × p. Or if you specify CTXT_A as
p × 1, you must also specify CTXT_B as
p × 1. For an example of how to set the array
descriptor values, see Example.
Table 21. Type-501 Array Descriptor
| DESC_( ) | Symbolic name | Value |
|---|---|---|
| 1 | DTYPE_ | DTYPE_ = 501 for 1 × p or p × 1, where p is the number of processes in a process grid. |
| 2 | CTXT_ | BLACS context in which the global matrix is defined. The BLACS
process grid can be defined as 1 × p or
p × 1.
(See Initializing the BLACS.) |
| 3 | N_ | Number of columns in the global matrix |
| 4 | NB_ | Column block size. |
| 5 | CSRC_ | The process column over which the first column of the global matrix is distributed |
| 6 | LLD_ | Leading dimension of the local array. (See Determining the Number of Rows or Columns in Your Local Arrays.) This value may be different on each process. For the tridiagonal subroutines, this argument is ignored. |
| 7 | -- | Reserved. |
Table 22. Type-502 Array Descriptor
| DESC_( ) | Symbolic name | Value |
|---|---|---|
| 1 | DTYPE_ | DTYPE_ = 502 for p × 1 or 1 × p, where p is the number of processes in a process grid. |
| 2 | CTXT_ | BLACS context in which the global matrix is defined. The BLACS
process grid can be defined as 1 × p or
p × 1.
(See Initializing the BLACS.) |
| 3 | M_ | Number of rows in the global matrix |
| 4 | MB_ | Row block size. |
| 5 | RSRC_ | The process row over which the first row of the global matrix is distributed |
| 6 | LLD_ | Leading dimension of the local array. (See Determining the Number of Rows or Columns in Your Local Arrays.) This value may be different on each process. For the tridiagonal subroutines, this argument is ignored for matrix A. |
| 7 | -- | Reserved. |
For local arrays described by type-501 array descriptor, the number of rows in the local matrix is always equal to the number of rows in the global matrix. The number of columns in the local array is determined as follows:
LOCq(N_) = NUMROC(N_,NB_,mycol,CSRC_,q)
LOCq(N_) = NUMROC(N_,NB_,myrow,CSRC_,q)
where:
For local arrays described by type-502 array descriptor, the number of columns in the local matrix is always equal to the number of columns in the global matrix. The number of rows in the local array is determined as follows:
LOCp(M_) = NUMROC(M_,MB_,myrow,RSRC_,p)
LOCp(M_) = NUMROC(M_,MB_,mycol,RSRC_,p)
where:
You must distribute your data before calling Parallel ESSL from your message passing program. This section shows how you how to distribute your data.
All the Parallel ESSL message passing subroutines, except the Banded Linear Algebraic Equations and Fourier transform subroutines, support block-cyclic distribution. The Banded Linear Algebraic Equations and the Fourier transform subroutines only support block distribution.
The following sections provide examples for distributing data over one- or two-dimensional process grids:
Parallel ESSL supports block-cyclic distribution for vectors over one- or two-dimensional process grids. A vector is distributed over a single row or column of the process grid, except for PDURNG. For PDURNG, vectors are distributed block-cyclically over the entire one- or two-dimensional process grid using row-major order, where the length n of the vector x must be evenly divisible by the available processes np multiplied by the block size nb. In other words, n/(np)(nb) must be an integer.
This example shows how a global vector of length 24 with blocks of size 3 is distributed block-cyclically over one-dimensional process grids. Assume the following:
Global vector x:
B,D 0
* *
| 8 |
0 | 2 |
| 3 |
| -- |
| 6 |
1 | 5 |
| 1 |
| -- |
| 9 |
2 | 5 |
| 3 |
| -- |
| 6 |
3 | 2 |
| 4 |
| -- |
| 10 |
4 | 7 |
| 4 |
| -- |
| 2 |
5 | 8 |
| 2 |
| -- |
| 8 |
6 | 9 |
| 2 |
| -- |
| 3 |
7 | 11 |
| 10 |
* *
Column-oriented, 4 × 1 process grid:
B,D | 0 -----| ------- 0 | P00 4 | -----| ------- 1 | P10 5 | -----| ------- 2 | P20 6 | -----| ------- 3 | P30 7 |
Local arrays:
p,q | 0
-----|----
| 8
| 2
| 3
0 | 10
| 7
| 4
-----|----
| 6
| 5
| 1
1 | 2
| 8
| 2
-----|----
| 9
| 5
| 3
2 | 8
| 9
| 2
-----|----
| 6
| 2
| 4
3 | 3
| 11
| 10
For the column-oriented example, the array descriptor DESC_X
contains the following:
| DESC_X( ) | Symbolic name | Value |
|---|---|---|
| 1 | DTYPE_X | 1 |
| 2 | CTXT_X | BLACS context |
| 3 | M_X | 24 |
| 4 | N_X | 1 |
| 5 | MB_X | 3 |
| 6 | NB_X | 1 |
| 7 | RSRC_X | 0 |
| 8 | CSRC_X | 0 |
| 9 | LLD_X | 6 |
Row-oriented, 1 × 4 process grid:
B,D | 0 4 | 1 5 | 2 6 | 3 7 -----| ------- | ------- | ------- |------- 0 | P00 | P01 | P02 | P03
Local array:
p,q | 0 | 1 | 2 | 3 -----|---------------|---------------|---------------|---------------- 0 | 8 2 3 10 7 4 | 6 5 1 2 8 2 | 9 5 3 8 9 2 | 6 2 4 3 11 10
For the row-oriented example, the array descriptor DESC_X
contains the following:
| DESC_X( ) | Symbolic name | Value |
|---|---|---|
| 1 | DTYPE_X | 1 |
| 2 | CTXT_X | BLACS context |
| 3 | M_X | 1 |
| 4 | N_X | 24 |
| 5 | MB_X | 1 |
| 6 | NB_X | 3 |
| 7 | RSRC_X | 0 |
| 8 | CSRC_X | 0 |
| 9 | LLD_X | 1 |
This example shows how a global vector of length 18 with block size of 3 is distributed over two-dimensional grids. When a two-dimensional process grid is used, the global vector can be distributed over any single row or any single column of the grid. Assume the following:
Global vector x:
B,D 0
* *
| 4 |
0 | 11 |
| 17 |
| -- |
| 21 |
1 | 3 |
| 7 |
| -- |
| 12 |
2 | 5 |
| 3 |
| -- |
| 15 |
3 | 3 |
| 4 |
| -- |
| 9 |
4 | 17 |
| 1 |
| -- |
| 10 |
5 | 9 |
| 25 |
* *
Two-dimensional, 2 × 3 process grid:
B,D | -- | -- | 0 -----| ------- | ------- |------- 0 | P00 | P01 | P02 2 | | | 4 | | | -----| ------- | ------- |------- 1 | P10 | P11 | P12 3 | | | 5 | | |
If the global vector is distributed over the third column of a 2 × 3 process grid, then P02 and P12 contain the following local arrays:
p,q | 2
-----|----
| 4
| 11
| 17
| 12
0 | 5
| 3
| 9
| 17
| 1
-----|----
| 21
| 3
| 7
| 15
1 | 3
| 4
| 10
| 9
| 25
For the single column example, the array descriptor DESC_X
contains the following:
| DESC_X( ) | Symbolic name | Value |
|---|---|---|
| 1 | DTYPE_X | 1 |
| 2 | CTXT_X | BLACS context |
| 3 | M_X | 18 |
| 4 | N_X | 1 |
| 5 | MB_X | 3 |
| 6 | NB_X | 1 |
| 7 | RSRC_X | 0 |
| 8 | CSRC_X | 2 |
| 9 | LLD_X | 9 |
If the global vector is distributed over the second row of a 2 × 3 process grid, then P10, P11, and P12 contain the following local arrays:
p,q | 0 | 1 | 2 -----|------------------|-----------------|----------------- 1 | 4 11 17 15 3 4 | 21 3 7 9 17 1 | 12 5 3 10 9 25
For the single row example, the array descriptor DESC_X contains
the following:
| DESC_X( ) | Symbolic name | Value |
|---|---|---|
| 1 | DTYPE_X | 1 |
| 2 | CTXT_X | BLACS context |
| 3 | M_X | 1 |
| 4 | N_X | 18 |
| 5 | MB_X | 1 |
| 6 | NB_X | 3 |
| 7 | RSRC_X | 1 |
| 8 | CSRC_X | 0 |
| 9 | LLD_X | 1 |
For PDURNG, the global vector is distributed block-cyclically over the entire 2 × 3 process grid using row-major order, as follows:
p,q | 0 | 1 | 2 -----|-----------|------------|----------- 0 | 4 11 17 | 21 3 7 | 12 5 3 -----|-----------|------------|----------- 1 | 15 3 4 | 9 17 1 | 10 9 25
Notes:
Following is an example of uneven block-cyclic distribution for a global vector of length 20 with block size of 3, where the two local arrays are different sizes. In this case, a fragment of a block with two elements occurs at the end of the vector. Assume the following:
X = (0, 5, 6, 3, 21, 5, 6, 1, 8, 9, 13, 11, 12, 15, 14, 15, 11, 17, 18, 19)
Following is a global vector x with block size 3:
B,D 0
* *
| 0 |
0 | 5 |
| 6 |
| -- |
| 3 |
1 | 21 |
| 5 |
| -- |
| 6 |
2 | 1 |
| 8 |
| -- |
| 9 |
3 | 13 |
| 11 |
| -- |
| 12 |
4 | 15 |
| 14 |
| -- |
| 15 |
5 | 11 |
| 17 |
| -- |
6 | 18 |
| 19 |
* *
Two-dimensional, 2 × 3 process grid:
B,D | 0 | -- | -- -----| ------- | ------- |------- 0 | P00 | P01 | P02 2 | | | 4 | | | 6 | | | -----| ------- | ------- |------- 1 | P10 | P11 | P12 3 | | | 5 | | |
If the vector is distributed over the first column of a 2 × 3 process grid, then P00 and P10 contain the following local arrays:
p,q | 0
-----|----
| 0
| 5
| 6
| 6
| 1
0 | 8
| 12
| 15
| 14
| 18
| 19
-----|----
| 3
| 21
| 5
| 9
1 | 13
| 11
| 15
| 11
| 17
Array descriptor DESC_X contains the following:
| DESC_X( ) | Symbolic name | Value |
|---|---|---|
| 1 | DTYPE_X | 1 |
| 2 | CTXT_X | BLACS context |
| 3 | M_X | 20 |
| 4 | N_X | 1 |
| 5 | MB_X | 3 |
| 6 | NB_X | 1 |
| 7 | RSRC_X | 0 |
| 8 | CSRC_X | 0 |
| 9 | LLD_X | 11 (For P00)
9 (For P10) |
If the vector is distributed over the first row of the 2 × 3 process grid, then P00, P01, and P02 contain the following local arrays:
p,q | 0 | 1 | 2 -----|-------------------------|--------------------|------------------- 0 | 0 5 6 9 13 11 18 19 | 3 21 5 12 15 14 | 6 1 8 15 11 17
Array descriptor DESC_X contains the following:
| DESC_X( ) | Symbolic name | Value |
|---|---|---|
| 1 | DTYPE_X | 1 |
| 2 | CTXT_X | BLACS context |
| 3 | M_X | 1 |
| 4 | N_X | 20 |
| 5 | MB_X | 1 |
| 6 | NB_X | 3 |
| 7 | RSRC_X | 0 |
| 8 | CSRC_X | 0 |
| 9 | LLD_X | 1 |
The Parallel ESSL subroutines, except the Banded Linear Algebraic Equations, support block-cyclic data distribution for matrices using one- or two-dimensional process grids. The Banded Linear Algebraic Equations support only block data distribution using one-dimensional process grids.
The following terminology is used when it is necessary to distinguish special types of matrices:
This section describes how to distribute a matrix block-cyclically over a one-dimensional process grid. It also shows how matrices for the Banded Linear Algebraic Equations are distributed over a one-dimensional process grid using block distribution.
The examples that follow show how a 6 × 8 global matrix A with blocks of size 2 × 2 is distributed block-cyclically over one-dimensional process grids. Assume the following global matrix A:
B,D 0 1 2 3
* *
0 | 0 1 | 2 3 | 4 5 | 6 7 |
| 10 11 | 12 13 | 14 15 | 16 17 |
| ---------|-----------|-----------|--------- |
1 | 20 21 | 22 23 | 24 25 | 26 27 |
| 30 31 | 32 33 | 34 35 | 36 37 |
| ---------|-----------|-----------|--------- |
2 | 40 41 | 42 43 | 44 45 | 46 47 |
| 50 51 | 52 53 | 54 55 | 56 57 |
* *
Column-oriented, 3 × 1 process grid:
B,D | 0 1 2 3 -----| ------- 0 | P00 -----| ------- 1 | P10 -----| ------- 2 | P20
Local arrays:
p,q | 0
-----|---------------------------------
0 | 0 1 2 3 4 5 6 7
| 10 11 12 13 14 15 16 17
-----|---------------------------------
1 | 20 21 22 23 24 25 26 27
| 30 31 32 33 34 35 36 37
-----|---------------------------------
2 | 40 41 42 43 44 45 46 47
| 50 51 52 53 54 55 56 57
For the column-oriented example, the array descriptor DESC_A
contains:
| DESC_A( ) | Symbolic name | Value |
|---|---|---|
| 1 | DTYPE_A | 1 |
| 2 | CTXT_A | BLACS context |
| 3 | M_A | 6 |
| 4 | N_A | 8 |
| 5 | MB_A | 2 |
| 6 | NB_A | 2 |
| 7 | RSRC_A | 0 |
| 8 | CSRC_A | 0 |
| 9 | LLD_A | 2 |
Row-oriented, 1 × 2 process grid:
B,D | 0 2 | 1 3 -----| ------- |----- 0 | P00 | P01 1 | | 2 | |
Local arrays:
p,q | 0 | 1
-----|------------------|------------------
| 0 1 4 5 | 2 3 6 7
| 10 11 14 15 | 12 13 16 17
| 20 21 24 25 | 22 23 26 27
0 | 30 31 34 35 | 32 33 36 37
| 40 41 44 45 | 42 43 46 47
| 50 51 54 55 | 52 53 56 57
For the row-oriented example, the array descriptor DESC_A:
| DESC_A( ) | Symbolic name | Value |
|---|---|---|
| 1 | DTYPE_A | 1 |
| 2 | CTXT_A | BLACS context |
| 3 | M_A | 6 |
| 4 | N_A | 8 |
| 5 | MB_A | 2 |
| 6 | NB_A | 2 |
| 7 | RSRC_A | 0 |
| 8 | CSRC_A | 0 |
| 9 | LLD_A | 6 |
For an example of distributing a matrix over a one-dimensional process grid in a Fortran 90 program, see matrix F in Appendix B, Sample Programs, which is:
This section shows how to distribute a symmetric band matrix A over a one-dimensional process grid using block-cyclic distribution.
Assume the following symmetric band matrix A of size 9 × 9 with a half bandwidth of 2:
* *
| 11 21 31 0 0 0 0 0 0 |
| 21 22 32 42 0 0 0 0 0 |
| 31 32 33 34 53 0 0 0 0 |
A = | 0 42 34 44 54 64 0 0 0 |
| 0 0 53 54 55 65 75 0 0 |
| 0 0 0 64 65 66 76 86 0 |
| 0 0 0 0 75 76 77 87 97 |
| 0 0 0 0 0 86 87 88 98 |
| 0 0 0 0 0 0 97 98 99 |
* *
Matrix A must be stored in upper- or lower-band-packed storage mode. The sections that follow contain examples describing these two storage modes. In these examples, matrix A is stored in an array with dimensions 3 × 9.
The global matrix A with block size of 2 is stored in upper-band-packed storage mode, as follows:
B,D 0 1 2 3 4
* *
| * * | 31 42 | 53 64 | 75 86 | 97 |
0 | * 21 | 32 34 | 54 65 | 76 87 | 98 |
| 11 22 | 33 44 | 55 66 | 77 88 | 99 |
* *
Following is a row-oriented, 1 × 3 process grid:
B,D | 0 3 | 1 4 | 2 -----| ------- | ------- |------- 0 | P00 | P01 | P02
The following local arrays A are distributed block-cyclically over the 1 × 3 process grid:
p,q | 0 | 1 | 2
-----|--------------|-----------|--------
| * * 75 86 | 31 42 97 | 53 64
0 | * 21 76 87 | 32 34 98 | 54 65
| 11 22 77 88 | 33 44 99 | 55 66
where * means you do not have to store a value in that position in the local array. However, these storage positions are required and overwritten during the computation.
The type-501 array descriptor DESC_A contains the
following:
| DESC_A( ) | Symbolic name | Value |
|---|---|---|
| 1 | DTYPE_A | DTYPE_A = 501 for 1 × p |
| 2 | CTXT_A | BLACS context |
| 3 | N_A | 9 |
| 4 | NB_A | 2 |
| 5 | CSRC_A | 0 |
| 6 | LLD_A | 3 |
| 7 | -- | Reserved |
Alternately, the type-1 array descriptor DESC_A contains the
following:
| DESC_A( ) | Symbolic name | Value |
|---|---|---|
| 1 | DTYPE_A | DTYPE_A = 1 for 1 × p |
| 2 | CTXT_A | BLACS context |
| 3 | M_A | 3 |
| 4 | N_A | 9 |
| 5 | MB_A | 1 |
| 6 | NB_A | 2 |
| 7 | RSRC_A | 0 |
| 8 | CSRC_A | 0 |
| 9 | LLD_A | 3 |
The global matrix A with block size of 2 is stored in lower-band-packed storage mode, as follows:
B,D 0 1 2 3 4
* *
| 11 22 | 33 44 | 55 66 | 77 88 | 99 |
0 | 21 32 | 34 54 | 65 76 | 87 98 | * |
| 31 42 | 53 64 | 75 86 | 97 * | * |
* *
Following is a row-oriented, 1 × 3 process grid:
B,D | 0 3 | 1 4 | 2 -----| ------- | ------- |------- 0 | P00 | P01 | P02
The following local arrays A are distributed block-cyclically over the 1 × 3 process grid:
p,q | 0 | 1 | 2
-----|-------------|----------|--------
| 11 22 77 88 | 33 44 99 | 55 66
0 | 21 32 87 98 | 34 54 * | 65 76
| 31 42 97 * | 53 64 * | 75 86
where * means you do not have to store a value in that position in the local array. However, these storage positions are required and overwritten during the computation.
The type-501 array descriptor DESC_A contains the
following:
| DESC_A( ) | Symbolic name | Value |
|---|---|---|
| 1 | DTYPE_A | DTYPE_A = 501 for 1 × p |
| 2 | CTXT_A | BLACS context |
| 3 | N_A | 9 |
| 4 | NB_A | 2 |
| 5 | CSRC_A | 0 |
| 6 | LLD_A | 3 |
| 7 | -- | Reserved |
Alternately, the type-1 array descriptor DESC_A contains the
following:
| DESC_A( ) | Symbolic name | Value |
|---|---|---|
| 1 | DTYPE_A | DTYPE_A = 1 for 1 × p |
| 2 | CTXT_A | BLACS context |
| 3 | M_A | 3 |
| 4 | N_A | 9 |
| 5 | MB_A | 1 |
| 6 | NB_A | 2 |
| 7 | RSRC_A | 0 |
| 8 | CSRC_A | 0 |
| 9 | LLD_A | 3 |
For more information on how to store symmetric band matrices, see the ESSL Version 3 Guide and Reference manual.
A general tridiagonal matrix, represented as three vectors, must be distributed over a one-dimensional process grid using a block-cyclic data distribution. Because vectors are one-dimensional data structures, you can use a type-501, type-502, or type-1 array descriptor regardless of whether the process grid is 1 × p or p × 1.
The first part of this section shows how to distribute a general tridiagonal matrix A over a p × 1 process grid. The second part shows how to distribute the same matrix over a 1 × p process grid. In both cases, the values contained in the corresponding local arrays are identical.
Assume the following general tridiagonal matrix A of size 7 × 7:
* *
| 11 12 0 0 0 0 0 |
| 21 22 23 0 0 0 0 |
| 0 32 33 34 0 0 0 |
| 0 0 43 44 45 0 0 |
| 0 0 0 54 55 56 0 |
| 0 0 0 0 65 66 67 |
| 0 0 0 0 0 76 77 |
* *
Matrix A is stored in tridiagonal storage mode in the following three vectors:
dl = (*, 21, 32, 43, 54, 65, 76)
d = (11, 22, 33, 44, 55, 66, 77)
du = (12, 23, 34, 45, 56, 67, *)
The general tridiagonal matrix A is stored in tridiagonal storage mode in vectors dl, d, and du.
Following is global vector dl:
B,D 0
* *
0 | * |
| 21 |
| -- |
1 | 32 |
| 43 |
| -- |
2 | 54 |
| 65 |
| -- |
3 | 76 |
* *
Following is global vector d:
B,D 0
* *
0 | 11 |
| 22 |
| -- |
1 | 33 |
| 44 |
| -- |
2 | 55 |
| 66 |
| -- |
3 | 77 |
* *
Following is global vector du:
B,D 0
* *
0 | 12 |
| 23 |
| -- |
1 | 34 |
| 45 |
| -- |
2 | 56 |
| 67 |
| -- |
3 | * |
* *
Following is a column-oriented, 3 × 1 process grid:
B,D | 0 -----| ------- 0 | P00 3 | -----| ------- 1 | P10 -----| ------- 2 | P20
The arrays are block-cyclically distributed over the 3 × 1 process grid.
Following are the local arrays for DL:
p,q | 0
-----|----
0 | *
| 21
| 76
-----|----
1 | 32
| 43
-----|----
2 | 54
| 65
Following are the local arrays for D:
p,q | 0
-----|----
0 | 11
| 22
| 77
-----|----
1 | 33
| 44
-----|----
2 | 55
| 66
Following are the local arrays for DU:
p,q | 0
-----|----
0 | 12
| 23
| *
-----|----
1 | 34
| 45
-----|----
2 | 56
| 67
where "*" means you do not have to store a value in that position in the local array. However, these storage positions are required.
The type-502 array descriptor DESC_A contains the
following:
| DESC_A( ) | Symbolic name | Value |
|---|---|---|
| 1 | DTYPE_A | DTYPE_A = 502 for p × 1 |
| 2 | CTXT_A | BLACS context |
| 3 | M_A | 7 |
| 4 | MB_A | 2 |
| 5 | RSRC_A | 0 |
| 6 | LLD_A | Not used |
| 7 | - | Reserved |
Alternately, the type-1 array descriptor DESC_A contains the
following:
| DESC_A( ) | Symbolic name | Value |
|---|---|---|
| 1 | DTYPE_A | DTYPE_A = 1 for p × 1 |
| 2 | CTXT_A | BLACS context |
| 3 | M_A | 7 |
| 4 | N_A | 1 |
| 5 | MB_A | 2 |
| 6 | NB_A | 1 |
| 7 | RSRC_A | 0 |
| 8 | CSRC_A | 0 |
| 9 | LLD_A | Not used |
The general tridiagonal matrix A is stored in tridiagonal storage mode in vectors dl, d, and du. Because vectors are one-dimensional data structures, the block-cyclically distributed arrays on a 1 × p process grid are identical to the block-cyclically distributed arrays on a p × 1 process grid.
Following is global vector dl:
B,D 0 1 2 3
* *
0 | * 21 | 32 43 | 54 65 | 76 |
* *
Following is global vector d:
B,D 0 1 2 3
* *
0 | 11 22 | 33 44 | 55 66 | 77 |
* *
Following is global vectors du:
B,D 0 1 2 3
* *
0 | 12 23 | 34 45 | 55 67 | * |
* *
Following is a row-oriented, 1 × 3 process grid:
B,D | 0 3 | 1 | 2 -----| ------- | ------- |------- 0 | P00 | P01 | P02
The arrays are block-cyclically distributed over the 1 × 3 process grid.
Following are the local arrays for DL:
p,q | 0 | 1 | 2 -----|---------|-------|------ 0 | * 21 76 | 32 43 | 54 65
Following are the local arrays for D:
p,q | 0 | 1 | 2 -----|----------|-------|------ 0 | 11 22 77 | 33 44 | 55 66
Following are the local arrays for DU:
p,q | 0 | 1 | 2 -----|----------|--------|------- 0 | 12 23 * | 34 45 | 55 67
where "*" means you do not have to store a value in that position in the local array. However, these storage positions are required.
The type-501 array descriptor DESC_A contains the
following:
| DESC_A( ) | Symbolic name | Value |
|---|---|---|
| 1 | DTYPE_A | DTYPE_A = 501 for 1 × p |
| 2 | CTXT_A | BLACS context |
| 3 | N_A | 7 |
| 4 | NB_A | 2 |
| 5 | CSRC_A | 0 |
| 6 | LLD_A | Not used |
| 7 | - | Reserved |
Alternately, the type-1 array descriptor DESC_A contains the
following:
| DESC_A( ) | Symbolic name | Value |
|---|---|---|
| 1 | DTYPE_A | DTYPE_A = 1 for 1 × p |
| 2 | CTXT_A | BLACS context |
| 3 | M_A | 1 |
| 4 | N_A | 7 |
| 5 | MB_A | 1 |
| 6 | NB_A | 2 |
| 7 | RSRC_A | 0 |
| 8 | CSRC_A | 0 |
| 9 | LLD_A | Not used |
For more information on how to store general tridiagonal matrices, see the ESSL Version 3 Guide and Reference manual.
A symmetric tridiagonal matrix, represented as two vectors, must be distributed over a one-dimensional process grid using a block-cyclic data distribution. Because vectors are one-dimensional data structures, you can use a type-501, type-502, or type-1 array descriptor regardless of whether the process grid is p × 1 or 1 × p.
The first part of this section shows a how to distribute a symmetric tridiagonal matrix A over a p × 1 process grid. The second part shows how to distribute the same matrix over a 1 × p process grid. In both cases, the values contained in the corresponding local arrays are identical.
Assume the following symmetric tridiagonal matrix A of size 7 × 7:
* *
| 10 1 0 0 0 0 0 |
| 1 20 2 0 0 0 0 |
| 0 2 30 3 0 0 0 |
| 0 0 3 40 4 0 0 |
| 0 0 0 4 50 5 0 |
| 0 0 0 0 5 60 6 |
| 0 0 0 0 0 6 70 |
* *
Matrix A is stored in parallel-symmetric-tridiagonal storage mode in the following two vectors:
d = (10, 20, 30, 40, 50, 60, 70)
e = (1, 2, 3, 4, 5, 6, *)
The symmetric tridiagonal matrix A is stored in parallel-symmetric-tridiagonal storage mode in vectors d and e.
Following is global vector d:
B,D 0
* *
| 10 |
0 | 20 |
| 30 |
| -- |
1 | 40 |
| 50 |
| 60 |
| -- |
2 | 70 |
* *
Following is global vector e:
B,D 0
* *
| 1 |
0 | 2 |
| 3 |
| - |
1 | 4 |
| 5 |
| 6 |
| - |
2 | * |
* *
Following is a column-oriented, 2 × 1 process grid:
B,D | 0 -----| ------- 0 | P00 2 | -----| ------- 1 | P10
The arrays are block-cyclically distributed over the 2 × 1 process grid.
Following are the local arrays for D:
p,q | 0
-----|----
| 10
0 | 20
| 30
| 70
-----|----
1 | 40
| 50
| 60
Following are the local arrays for E:
p,q | 0
-----|---
| 1
0 | 2
| 3
| *
-----|---
1 | 4
| 5
| 6
where * means you do not have to store a value in that position in the local array. However, these storage positions are required.
The type-502 array descriptor DESC_A contains the
following:
| DESC_A( ) | Symbolic name | Value |
|---|---|---|
| 1 | DTYPE_A | DTYPE_A = 502 for p × 1 |
| 2 | CTXT_A | BLACS context |
| 3 | M_A | 7 |
| 4 | MB_A | 3 |
| 5 | RSRC_A | 0 |
| 6 | LLD_A | Not used |
| 7 | - | Reserved |
Alternately, the type-1 array descriptor DESC_A contains the
following:
| DESC_A( ) | Symbolic name | Value |
|---|---|---|
| 1 | DTYPE_A | DTYPE_A = 1 for p × 1 |
| 2 | CTXT_A | BLACS context |
| 3 | M_A | 7 |
| 4 | N_A | 1 |
| 5 | MB_A | 3 |
| 6 | NB_A | 1 |
| 7 | RSRC_A | 0 |
| 8 | CSRC_A | 0 |
| 9 | LLD_A | Not used |
The symmetric tridiagonal matrix A is stored in parallel-symmetric-tridiagonal storage mode in vectors d and e. Because vectors are one-dimensional data structures, the block-cyclically distributed arrays on a 1 × p process grid are identical to the block-cyclically distributed arrays on a p × 1 process grid.
Following is global vector d:
B,D 0 1 2
* *
0 | 10 20 30 | 40 50 60 | 70 |
* *
Following is global vector e:
B,D 0 1 2
* *
0 | 1 2 3 | 4 5 6 | * |
* *
Following is a row-oriented, 1 × 2 process grid:
B,D | 0 2 | 1 -----| ------- |----- 0 | P00 | P01
The arrays are block-cyclically distributed over the 1 × 2 process grid.
Following are the local arrays for D:
p,q | 0 | 1 -----|--------------|--------- 0 | 10 20 30 70 | 40 50 60
Following are the local arrays for E:
p,q | 0 | 1 -----|---------|------ 0 | 1 2 3 * | 4 5 6
where "*" means you do not have to store a value in that position in the local array. However, these storage positions are required.
The type-501 array descriptor DESC_A contains the
following:
| DESC_A( ) | Symbolic name | Value |
|---|---|---|
| 1 | DTYPE_A | DTYPE_A = 501 for 1 × p |
| 2 | CTXT_A | BLACS context |
| 3 | N_A | 7 |
| 4 | NB_A | 3 |
| 5 | CSRC_A | 0 |
| 6 | LLD_A | Not used |
| 7 | - | Reserved |
Alternately, the type-1 array descriptor DESC_A contains the
following:
| DESC_A( ) | Symbolic name | Value |
|---|---|---|
| 1 | DTYPE_A | DTYPE_A = 1 for 1 × p |
| 2 | CTXT_A | BLACS context |
| 3 | M_A | 1 |
| 4 | N_A | 7 |
| 5 | MB_A | 1 |
| 6 | NB_A | 3 |
| 7 | RSRC_A | 0 |
| 8 | CSRC_A | 0 |
| 9 | LLD_A | Not used |
This section shows how to block-cyclically distribute a general matrix B containing the multiple right-hand sides for the Banded Linear Algebraic Equations subroutines.
Following is the global matrix B:
B,D 0
* *
0 | 11 12 13 |
| 21 22 23 |
| -------- |
1 | 31 32 33 |
| 41 42 43 |
| -------- |
2 | 51 52 53 |
| 61 62 63 |
|----------|
3 | 71 72 73 |
* *
Following is a 3 × 1 process grid:
B,D | 0 -----| ------- 0 | P00 3 | -----| ------- 1 | P10 -----| ------- 2 | P20
Following are the local arrays:
p,q | 0
-----|----------
0 | 11 12 13
| 21 22 23
| 71 72 73
-----|----------
1 | 31 32 33
| 41 42 43
-----|----------
2 | 51 52 53
| 61 62 63
The type-502 array descriptor DESC_B contains the
following:
| DESC_B( ) | Symbolic name | Value |
|---|---|---|
| 1 | DTYPE_B | DTYPE_B = 502 for p × 1 |
| 2 | CTXT_B | BLACS context |
| 3 | M_B | 7 |
| 4 | MB_B | 2 |
| 5 | RSRC_B | 0 |
| 6 | LLD_B | 3 (For P00)
2 (For P10 and P20) |
| 7 | -- | Reserved |
Alternately, the type-1 array descriptor DESC_B contains the
following:
| DESC_B( ) | Symbolic name | Value |
|---|---|---|
| 1 | DTYPE_B | DTYPE_B = 1 for p × 1 |
| 2 | CTXT_B | BLACS context |
| 3 | M_B | 7 |
| 4 | N_B | 3 |
| 5 | MB_B | 2 |
| 6 | NB_B | 1 |
| 7 | RSRC_B | 0 |
| 8 | CSRC_B | 0 |
| 9 | LLD_B | 3 (For P00)
2 (For P10 and P20) |
This section shows how to distribute general, symmetric, and upper triangular matrices over a two-dimensional process grid using block-cyclic distribution.
This example shows how the data for a global matrix A with block size of 2 × 3 is distributed block-cyclically over the entire 2 × 3 process grid. Assume the following 9 × 26 global matrix A with 45 blocks:
B,D 0 1 2 3 4 5 6 7 8
* *
0 | 112 5 7 | 8 9 3 | 7 5 1 | 3 2 1 | 8 98 4 | 8 9 4 | 1 3 10 | 3 3 10 | 5 3 |
| 116 9 6 | 7 2 3 | 6 5 6 | 4 3 2 | 7 2 111 | 7 2 1 | 7 6 15 | 7 6 15 | 7 6 |
| ---------|---------|---------|---------|------------|---------|----------|----------|----- |
1 | 1 5 7 | 1 9 3 | 1 5 1 | 1 2 1 | 1 9 4 | 1 9 4 | 5 8 10 | 3 3 11 | 5 3 |
| 6 9 6 | 7 2 3 | 6 5 6 | 4 3 2 | 7 2 1 | 7 2 1 | 7 6 19 | 7 1 15 | 7 2 |
| ---------|---------|---------|---------|------------|---------|----------|----------|----- |
2 | 2 5 7 | 2 9 3 | 2 5 1 | 2 2 1 | 2 9 4 | 2 9 4 | 1 8 10 | 2 3 11 | 3 3 |
| 6 9 6 | 7 2 3 | 6 5 6 | 4 3 2 | 7 2 1 | 7 2 1 | 7 3 19 | 7 4 15 | 7 8 |
| ---------|---------|---------|---------|------------|---------|----------|----------|----- |
3 | 3 5 7 | 3 9 3 | 3 5 1 | 3 2 1 | 3 9 4 | 3 9 4 | 9 8 10 | 2 3 11 | 3 3 |
| 6 9 6 | 7 2 3 | 6 5 6 | 4 3 2 | 7 2 1 | 7 2 1 | 1 3 49 | 7 4 55 | 7 3 |
| ---------|---------|---------|---------|------------|---------|----------|----------|----- |
4 | 20 1 9 | 4 5 6 | 9 8 7 | 1 4 3 | 1 15 21 | 4 7 6 | 9 8 12 | 3 9 18 | 2 4 |
* *
Two-dimensional, 2 × 3 process grid:
B,D | 0 3 6 | 1 4 7 | 2 5 8 -----| ------- | ------- |------- 0 | P00 | P01 | P02 2 | | | 4 | | | -----| ------- | ------- |------- 1 | P10 | P11 | P12 3 | | |
Local arrays:
p,q | 0 | 1 | 2
-----|-----------------------|-------------------------|------------------
| 112 5 7 3 2 1 1 3 10 | 8 9 3 8 98 4 3 3 10 | 7 5 1 8 9 4 5 3
| 116 9 6 4 3 2 7 6 15 | 7 2 3 7 2 111 7 6 15 | 6 5 6 7 2 1 7 6
0 | 2 5 7 2 2 1 1 8 10 | 2 9 3 2 9 4 2 3 11 | 2 5 1 2 9 4 3 3
| 6 9 6 4 3 2 7 3 19 | 7 2 3 7 2 1 7 4 15 | 6 5 6 7 2 1 7 8
| 20 1 9 1 4 3 9 8 12 | 4 5 6 1 15 21 3 9 18 | 9 8 7 4 7 6 2 4
-----|-----------------------|-------------------------|------------------
| 1 5 7 1 2 1 5 8 10 | 1 9 3 1 9 4 3 3 11 | 1 5 1 1 9 4 5 3
| 6 9 6 4 3 2 7 6 19 | 7 2 3 7 2 1 7 1 15 | 6 5 6 7 2 1 7 2
1 | 3 5 7 3 2 1 9 8 10 | 3 9 3 3 9 4 2 3 11 | 3 5 1 3 9 4 3 3
| 6 9 6 4 3 2 1 3 49 | 7 2 3 7 2 1 7 4 55 | 6 5 6 7 2 1 7 3
Array descriptor DESC_A contains the following:
| DESC_A( ) | Symbolic name | Value |
|---|---|---|
| 1 | DTYPE_A | 1 |
| 2 | CTXT_A | BLACS context |
| 3 | M_A | 9 |
| 4 | N_A | 26 |
| 5 | MB_A | 2 |
| 6 | NB_A | 3 |
| 7 | RSRC_A | 0 |
| 8 | CSRC_A | 0 |
| 9 | LLD_A | 5 (For P00, P01, and P02)
4 (For P10, P11, and P12) |
This example shows how the data for a global symmetric matrix A with block size of 3 × 3 is distributed block-cyclically over a 2 × 3 process grid. Assume the following 18 × 18 global symmetric matrix A with 36 blocks:
B,D 0 1 2 3 4 5
* *
| 1 2 3 | 4 5 6 | 7 8 9 | 10 11 12 | 13 14 15 | 16 17 18 |
0 | 2 10 11 | 12 13 14 | 15 16 17 | 18 19 20 | 21 22 23 | 24 25 26 |
| 3 11 20 | 21 22 23 | 24 25 26 | 27 28 29 | 30 31 32 | 33 34 35 |
| ----------|------------|------------|------------|------------|---------- |
| 4 12 21 | 2 3 5 | 7 11 13 | 17 19 23 | 29 31 37 | 41 43 47 |
1 | 5 13 22 | 3 1 4 | 9 16 25 | 36 49 64 | 81 10 12 | 14 16 19 |
| 6 14 23 | 5 4 5 | 6 10 11 | 15 16 20 | 21 25 26 | 30 31 35 |
| ----------|------------|------------|------------|------------|---------- |
| 7 15 24 | 7 9 6 | 1 2 3 | 4 5 6 | 7 8 9 | 10 11 12 |
2 | 8 16 25 | 11 16 10 | 2 11 13 | 15 17 19 | 21 23 25 | 27 29 31 |
| 9 17 26 | 13 25 11 | 3 13 2 | 4 6 8 | 10 12 14 | 16 18 20 |
| ----------|------------|------------|------------|------------|---------- |
| 10 18 27 | 17 36 15 | 4 15 4 | 3 6 9 | 2 4 6 | 3 6 9 |
3 | 11 19 28 | 19 49 16 | 5 17 6 | 6 1 2 | 3 4 5 | 6 7 8 |
| 12 20 29 | 23 64 20 | 6 19 8 | 9 2 1 | 3 5 7 | 9 11 13 |
| ----------|------------|------------|------------|------------|---------- |
| 13 21 30 | 29 81 21 | 7 21 10 | 2 3 3 | 20 22 21 | 24 23 25 |
4 | 14 22 31 | 31 10 25 | 8 23 12 | 4 4 5 | 22 4 5 | 6 9 10 |
| 15 23 32 | 37 12 26 | 9 25 14 | 6 5 7 | 21 5 3 | 2 7 8 |
| ----------|------------|------------|------------|------------|---------- |
| 16 24 33 | 41 14 30 | 10 27 16 | 3 6 9 | 24 6 2 | 4 11 15 |
5 | 17 25 34 | 43 16 31 | 11 29 18 | 6 7 11 | 23 9 7 | 11 17 13 |
| 18 26 35 | 47 19 35 | 12 31 20 | 9 8 13 | 25 10 8 | 15 13 21 |
* *
Two-dimensional, 3 × 2 process grid:
B,D | 0 2 4 | 1 3 5 -----| ------- |----- 0 | P00 | P01 3 | | -----| ------- |----- 1 | P10 | P11 4 | | -----| ------- |----- 2 | P20 | P21 5 | |
The symmetric matrix is distributed block-cyclically in lower storage mode over a 3 × 2 process grid:
p,q | 0 | 1
-----|-----------------------------|-----------------------------
| 1 * * * * * * * * | * * * * * * * * *
| 2 10 * * * * * * * | * * * * * * * * *
| 3 11 20 * * * * * * | * * * * * * * * *
0 | 10 18 27 4 15 4 * * * | 17 36 15 3 * * * * *
| 11 19 28 5 17 6 * * * | 19 49 16 6 1 * * * *
| 12 20 29 6 19 8 * * * | 23 64 20 9 2 1 * * *
-----|-----------------------------|-----------------------------
| 4 12 21 * * * * * * | 2 * * * * * * * *
| 5 13 22 * * * * * * | 3 1 * * * * * * *
| 6 14 23 * * * * * * | 5 4 5 * * * * * *
1 | 13 21 30 7 21 10 20 * * | 29 81 21 2 3 3 * * *
| 14 22 31 8 23 12 22 4 * | 31 10 25 4 4 5 * * *
| 15 23 32 9 25 14 21 5 3 | 37 12 26 6 5 7 * * *
-----|-----------------------------|-----------------------------
| 7 15 24 1 * * * * * | 7 9 6 * * * * * *
| 8 16 25 2 11 * * * * | 11 16 10 * * * * * *
| 9 17 26 3 13 2 * * * | 13 25 11 * * * * * *
2 | 16 24 33 10 27 16 24 6 2 | 41 14 30 3 6 9 4 * *
| 17 25 34 11 29 18 23 9 7 | 43 16 31 6 7 11 11 17 *
| 18 26 35 12 31 20 25 10 8 | 47 19 35 9 8 13 15 13 21
where * means you do not have to store a value in that position in the local array. However, these storage positions are required.
Notice that the local arrays are not symmetric.
Array descriptor DESC_A contains the following:
| DESC_A( ) | Symbolic name | Value |
|---|---|---|
| 1 | DTYPE_A | 1 |
| 2 | CTXT_A | BLACS context |
| 3 | M_A | 18 |
| 4 | N_A | 18 |
| 5 | MB_A | 3 |
| 6 | NB_A | 3 |
| 7 | RSRC_A | 0 |
| 8 | CSRC_A | 0 |
| 9 | LLD_A | 6 |
For more information on how to store symmetric matrices, see the ESSL Version 3 Guide and Reference manual.
This example shows how the data for a global upper triangular matrix A with block size of 2 × 2 is distributed block-cyclically over a 2 × 3 process grid. Assume the following 12 × 12 global upper triangular matrix A with 36 blocks:
B,D 0 1 2 3 4 5
* *
0 | 2 1 | 2 13 | 13 10 | 15 21 | 26 31 | 7 5 |
| 0 3 | 4 4 | 11 23 | 41 45 | 59 67 | 1 8 |
| -------|---------|---------|---------|---------|------- |
1 | 0 0 | 5 9 | 6 9 | 33 65 | 21 14 | 9 4 |
| 0 0 | 0 7 | 16 8 | 7 33 | 3 7 | 5 3 |
| -------|---------|---------|---------|---------|------- |
2 | 0 0 | 0 0 | 11 25 | 10 5 | 23 7 | 10 6 |
| 0 0 | 0 0 | 0 13 | 36 12 | 3 13 | 5 6 |
| -------|---------|---------|---------|---------|------- |
3 | 0 0 | 0 0 | 0 0 | 17 49 | 14 1 | 7 2 |
| 0 0 | 0 0 | 0 0 | 0 19 | 64 16 | 1 7 |
| -------|---------|---------|---------|---------|------- |
4 | 0 0 | 0 0 | 0 0 | 0 0 | 23 81 | 6 15 |
| 0 0 | 0 0 | 0 0 | 0 0 | 0 29 | 9 4 |
| -------|---------|---------|---------|---------|------- |
5 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 5 3 |
| 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 4 |
* *
Two-dimensional, 2 × 3 process grid:
B,D | 0 3 | 1 4 | 2 5 -----| ------- | ------- |------- 0 | P00 | P01 | P02 2 | | | 4 | | | -----| ------- | ------- |------- 1 | P10 | P11 | P12 3 | | | 5 | | |
The following local arrays are distributed block-cyclically in upper-triangular storage mode over a 2 × 3 process grid:
p,q | 0 | 1 | 2
-----|--------------|---------------|--------------
| 2 1 15 21 | 2 13 26 31 | 13 10 7 5
| * 3 41 45 | 4 4 59 67 | 11 23 1 8
| * * 10 5 | * * 23 7 | 11 25 10 6
0 | * * 36 12 | * * 3 13 | * 13 5 6
| * * * * | * * 23 81 | * * 6 15
| * * * * | * * * 29 | * * 9 4
-----|--------------|---------------|--------------
| * * 33 65 | 5 9 21 14 | 6 9 9 4
| * * 7 33 | * 7 3 7 | 16 8 5 3
| * * 17 49 | * * 14 1 | * * 7 2
1 | * * * 19 | * * 64 16 | * * 1 7
| * * * * | * * * * | * * 5 3
| * * * * | * * * * | * * * 4
where "*" means you do not have to store a value in that position in the local array. However, these storage positions are required.
Notice the local arrays are not upper triangular.
Array descriptor DESC_A contains the following:
| DESC_A( ) | Symbolic name | Value |
|---|---|---|
| 1 | DTYPE_A | 1 |
| 2 | CTXT_A | BLACS context |
| 3 | M_A | 12 |
| 4 | N_A | 12 |
| 5 | MB_A | 2 |
| 6 | NB_A | 2 |
| 7 | RSRC_A | 0 |
| 8 | CSRC_A | 0 |
| 9 | LLD_A | 6 |
For more information on how to store triangular matrices, see the ESSL Version 3 Guide and Reference manual.
For the Fortran 90 and Fortran 77 sparse linear algebraic equation subroutines, you must use the sparse utility subroutines provided with Parallel ESSL to build the sparse matrices on each process in the process grid. This sections shows the calling sequence arguments associated with the sparse matrix A.
This section contains the following sections:
This section describes the calling sequence arguments associated with a
sparse matrix A.