pylops_mpi.basicoperators.MPIMatrixMult#

pylops_mpi.basicoperators.MPIMatrixMult(A, M, saveAt=False, base_comm=<mpi4py.MPI.Intracomm object>, kind='summa', dtype='float64')[source]#

MPI Distributed Matrix Multiplication Operator

This operator performs distributed matrix-matrix multiplication using either the SUMMA (Scalable Universal Matrix Multiplication Algorithm [1]) or a 1D block-row decomposition algorithm (based on the specified kind parameter).

Parameters:
Anumpy.ndarray

Local block of the matrix operator.

Mint

Global number of columns in the operand and result matrices.

saveAtbool, optional

If True, store both \(\mathbf{A}\) and its conjugate transpose \(\mathbf{A}^H\) to accelerate adjoint operations (uses twice the memory). Default is False.

base_commmpi4py.MPI.Comm, optional

MPI communicator to use. Defaults to MPI.COMM_WORLD.

kindstr, optional

Algorithm used to perform matrix multiplication: 'block' for # block-row-column decomposition, and 'summa' for SUMMA algorithm, or . Default is 'summa'.

dtypestr, optional

Type of elements in input array. Defaults to numpy.float64.

Attributes:
shapetuple

Operator shape

kindstr, optional

Selected distributed matrix multiply algorithm ('block' or 'summa').

Raises:
NotImplementedError

If kind is not one of 'summa' or 'block'.

Exception

If the MPI communicator does not form a compatible grid for the selected algorithm.

Notes

The forward operator computes:

\[\mathbf{Y} = \mathbf{A} \cdot \mathbf{X}\]

where:

  • \(\mathbf{A}\) is the distributed operator matrix of shape \([N \times K]\)

  • \(\mathbf{X}\) is the distributed operand matrix of shape \([K \times M]\)

  • \(\mathbf{Y}\) is the resulting distributed matrix of shape \([N \times M]\)

The adjoint (conjugate-transpose) operation computes:

\[\mathbf{X}_{adj} = \mathbf{A}^H \cdot \mathbf{Y}\]

where \(\mathbf{A}^H\) is the complex-conjugate transpose of \(\mathbf{A}\).

Based on the choice of kind, the distribution layouts of the operator and model and data vectors differ as follows:

Summa:

2D block-grid distribution over a square process grid \([\sqrt{P} \times \sqrt{P}]\):

  • \(\mathbf{A}\) and \(\mathbf{X}\) (and \(\mathbf{Y}\)) are partitioned into \([N_{loc} \times K_{loc}]\) and \([K_{loc} \times M_{loc}]\) tiles on each rank, respectively.

  • Each SUMMA iteration broadcasts row- and column-blocks of \(\mathbf{A}\) and \(\mathbf{X}\) (forward) or \(\mathbf{Y}\) (adjoint) and accumulates local partial products.

Block:

1D block-row distribution over a \([1 \times P]\) grid:

  • \(\mathbf{A}\) is partitioned into \([N_{loc} \times K]\) blocks across ranks.

  • \(\mathbf{X}\) (and \(\mathbf{Y}\)) are partitioned into \([K \times M_{loc}]\) blocks.

  • Local multiplication is followed by row-wise gather (forward) or allreduce (adjoint) across ranks.

[1]

Robert A. van de Geijn, R., and Watts, J. “SUMMA: Scalable Universal Matrix Multiplication Algorithm”, 1995.