pylops_mpi.basicoperators.MPIMatrixMult#
- pylops_mpi.basicoperators.MPIMatrixMult(A, M, saveAt=False, base_comm=<mpi4py.MPI.Intracomm object>, kind='summa', dtype='float64')[source]#
MPI Distributed Matrix Multiplication Operator
This operator performs distributed matrix-matrix multiplication using either the SUMMA (Scalable Universal Matrix Multiplication Algorithm [1]) or a 1D block-row decomposition algorithm (based on the specified
kind
parameter).- Parameters:
- A
numpy.ndarray
Local block of the matrix operator.
- M
int
Global number of columns in the operand and result matrices.
- saveAt
bool
, optional If
True
, store both \(\mathbf{A}\) and its conjugate transpose \(\mathbf{A}^H\) to accelerate adjoint operations (uses twice the memory). Default isFalse
.- base_comm
mpi4py.MPI.Comm
, optional MPI communicator to use. Defaults to
MPI.COMM_WORLD
.- kind
str
, optional Algorithm used to perform matrix multiplication:
'block'
for # block-row-column decomposition, and'summa'
for SUMMA algorithm, or . Default is'summa'
.- dtype
str
, optional Type of elements in input array. Defaults to
numpy.float64
.
- A
- Attributes:
- Raises:
- NotImplementedError
If
kind
is not one of'summa'
or'block'
.- Exception
If the MPI communicator does not form a compatible grid for the selected algorithm.
Notes
The forward operator computes:
\[\mathbf{Y} = \mathbf{A} \cdot \mathbf{X}\]where:
\(\mathbf{A}\) is the distributed operator matrix of shape \([N \times K]\)
\(\mathbf{X}\) is the distributed operand matrix of shape \([K \times M]\)
\(\mathbf{Y}\) is the resulting distributed matrix of shape \([N \times M]\)
The adjoint (conjugate-transpose) operation computes:
\[\mathbf{X}_{adj} = \mathbf{A}^H \cdot \mathbf{Y}\]where \(\mathbf{A}^H\) is the complex-conjugate transpose of \(\mathbf{A}\).
Based on the choice of
kind
, the distribution layouts of the operator and model and data vectors differ as follows:- Summa:
2D block-grid distribution over a square process grid \([\sqrt{P} \times \sqrt{P}]\):
\(\mathbf{A}\) and \(\mathbf{X}\) (and \(\mathbf{Y}\)) are partitioned into \([N_{loc} \times K_{loc}]\) and \([K_{loc} \times M_{loc}]\) tiles on each rank, respectively.
Each SUMMA iteration broadcasts row- and column-blocks of \(\mathbf{A}\) and \(\mathbf{X}\) (forward) or \(\mathbf{Y}\) (adjoint) and accumulates local partial products.
- Block:
1D block-row distribution over a \([1 \times P]\) grid:
\(\mathbf{A}\) is partitioned into \([N_{loc} \times K]\) blocks across ranks.
\(\mathbf{X}\) (and \(\mathbf{Y}\)) are partitioned into \([K \times M_{loc}]\) blocks.
Local multiplication is followed by row-wise gather (forward) or allreduce (adjoint) across ranks.
[1]Robert A. van de Geijn, R., and Watts, J. “SUMMA: Scalable Universal Matrix Multiplication Algorithm”, 1995.