pyspark.mllib.linalg.distributed.
BlockMatrix
Represents a distributed matrix in blocks of local matrices.
pyspark.RDD
An RDD of sub-matrix blocks ((blockRowIndex, blockColIndex), sub-matrix) that form this distributed matrix. If multiple blocks with the same index exist, the results for operations like add and multiply will be unpredictable.
Number of rows that make up each block. The blocks forming the final rows are not required to have the given number of rows.
Number of columns that make up each block. The blocks forming the final columns are not required to have the given number of columns.
Number of rows of this matrix. If the supplied value is less than or equal to zero, the number of rows will be calculated when numRows is invoked.
Number of columns of this matrix. If the supplied value is less than or equal to zero, the number of columns will be calculated when numCols is invoked.
Methods
add(other)
add
Adds two block matrices together.
cache()
cache
Caches the underlying RDD.
multiply(other)
multiply
Left multiplies this BlockMatrix by other, another BlockMatrix.
numCols()
numCols
Get or compute the number of cols.
numRows()
numRows
Get or compute the number of rows.
persist(storageLevel)
persist
Persists the underlying RDD with the specified storage level.
subtract(other)
subtract
Subtracts the given block matrix other from this block matrix: this - other.
toCoordinateMatrix()
toCoordinateMatrix
Convert this matrix to a CoordinateMatrix.
toIndexedRowMatrix()
toIndexedRowMatrix
Convert this matrix to an IndexedRowMatrix.
toLocalMatrix()
toLocalMatrix
Collect the distributed matrix on the driver as a DenseMatrix.
transpose()
transpose
Transpose this BlockMatrix.
validate()
validate
Validates the block matrix info against the matrix data (blocks) and throws an exception if any error is found.
Attributes
blocks
The RDD of sub-matrix blocks ((blockRowIndex, blockColIndex), sub-matrix) that form this distributed matrix.
colsPerBlock
Number of columns that make up each block.
numColBlocks
Number of columns of blocks in the BlockMatrix.
numRowBlocks
Number of rows of blocks in the BlockMatrix.
rowsPerBlock
Number of rows that make up each block.
Methods Documentation
Adds two block matrices together. The matrices must have the same size and matching rowsPerBlock and colsPerBlock values. If one of the sub matrix blocks that are being added is a SparseMatrix, the resulting sub matrix block will also be a SparseMatrix, even if it is being added to a DenseMatrix. If two dense sub matrix blocks are added, the output block will also be a DenseMatrix.
Examples
>>> dm1 = Matrices.dense(3, 2, [1, 2, 3, 4, 5, 6]) >>> dm2 = Matrices.dense(3, 2, [7, 8, 9, 10, 11, 12]) >>> sm = Matrices.sparse(3, 2, [0, 1, 3], [0, 1, 2], [7, 11, 12]) >>> blocks1 = sc.parallelize([((0, 0), dm1), ((1, 0), dm2)]) >>> blocks2 = sc.parallelize([((0, 0), dm1), ((1, 0), dm2)]) >>> blocks3 = sc.parallelize([((0, 0), sm), ((1, 0), dm2)]) >>> mat1 = BlockMatrix(blocks1, 3, 2) >>> mat2 = BlockMatrix(blocks2, 3, 2) >>> mat3 = BlockMatrix(blocks3, 3, 2)
>>> mat1.add(mat2).toLocalMatrix() DenseMatrix(6, 2, [2.0, 4.0, 6.0, 14.0, 16.0, 18.0, 8.0, 10.0, 12.0, 20.0, 22.0, 24.0], 0)
>>> mat1.add(mat3).toLocalMatrix() DenseMatrix(6, 2, [8.0, 2.0, 3.0, 14.0, 16.0, 18.0, 4.0, 16.0, 18.0, 20.0, 22.0, 24.0], 0)
New in version 2.0.0.
Left multiplies this BlockMatrix by other, another BlockMatrix. The colsPerBlock of this matrix must equal the rowsPerBlock of other. If other contains any SparseMatrix blocks, they will have to be converted to DenseMatrix blocks. The output BlockMatrix will only consist of DenseMatrix blocks. This may cause some performance issues until support for multiplying two sparse matrices is added.
>>> dm1 = Matrices.dense(2, 3, [1, 2, 3, 4, 5, 6]) >>> dm2 = Matrices.dense(2, 3, [7, 8, 9, 10, 11, 12]) >>> dm3 = Matrices.dense(3, 2, [1, 2, 3, 4, 5, 6]) >>> dm4 = Matrices.dense(3, 2, [7, 8, 9, 10, 11, 12]) >>> sm = Matrices.sparse(3, 2, [0, 1, 3], [0, 1, 2], [7, 11, 12]) >>> blocks1 = sc.parallelize([((0, 0), dm1), ((0, 1), dm2)]) >>> blocks2 = sc.parallelize([((0, 0), dm3), ((1, 0), dm4)]) >>> blocks3 = sc.parallelize([((0, 0), sm), ((1, 0), dm4)]) >>> mat1 = BlockMatrix(blocks1, 2, 3) >>> mat2 = BlockMatrix(blocks2, 3, 2) >>> mat3 = BlockMatrix(blocks3, 3, 2)
>>> mat1.multiply(mat2).toLocalMatrix() DenseMatrix(2, 2, [242.0, 272.0, 350.0, 398.0], 0)
>>> mat1.multiply(mat3).toLocalMatrix() DenseMatrix(2, 2, [227.0, 258.0, 394.0, 450.0], 0)
>>> blocks = sc.parallelize([((0, 0), Matrices.dense(3, 2, [1, 2, 3, 4, 5, 6])), ... ((1, 0), Matrices.dense(3, 2, [7, 8, 9, 10, 11, 12]))])
>>> mat = BlockMatrix(blocks, 3, 2) >>> print(mat.numCols()) 2
>>> mat = BlockMatrix(blocks, 3, 2, 7, 6) >>> print(mat.numCols()) 6
>>> mat = BlockMatrix(blocks, 3, 2) >>> print(mat.numRows()) 6
>>> mat = BlockMatrix(blocks, 3, 2, 7, 6) >>> print(mat.numRows()) 7
Subtracts the given block matrix other from this block matrix: this - other. The matrices must have the same size and matching rowsPerBlock and colsPerBlock values. If one of the sub matrix blocks that are being subtracted is a SparseMatrix, the resulting sub matrix block will also be a SparseMatrix, even if it is being subtracted from a DenseMatrix. If two dense sub matrix blocks are subtracted, the output block will also be a DenseMatrix.
>>> dm1 = Matrices.dense(3, 2, [3, 1, 5, 4, 6, 2]) >>> dm2 = Matrices.dense(3, 2, [7, 8, 9, 10, 11, 12]) >>> sm = Matrices.sparse(3, 2, [0, 1, 3], [0, 1, 2], [1, 2, 3]) >>> blocks1 = sc.parallelize([((0, 0), dm1), ((1, 0), dm2)]) >>> blocks2 = sc.parallelize([((0, 0), dm2), ((1, 0), dm1)]) >>> blocks3 = sc.parallelize([((0, 0), sm), ((1, 0), dm2)]) >>> mat1 = BlockMatrix(blocks1, 3, 2) >>> mat2 = BlockMatrix(blocks2, 3, 2) >>> mat3 = BlockMatrix(blocks3, 3, 2)
>>> mat1.subtract(mat2).toLocalMatrix() DenseMatrix(6, 2, [-4.0, -7.0, -4.0, 4.0, 7.0, 4.0, -6.0, -5.0, -10.0, 6.0, 5.0, 10.0], 0)
>>> mat2.subtract(mat3).toLocalMatrix() DenseMatrix(6, 2, [6.0, 8.0, 9.0, -4.0, -7.0, -4.0, 10.0, 9.0, 9.0, -6.0, -5.0, -10.0], 0)
>>> blocks = sc.parallelize([((0, 0), Matrices.dense(1, 2, [1, 2])), ... ((1, 0), Matrices.dense(1, 2, [7, 8]))]) >>> mat = BlockMatrix(blocks, 1, 2).toCoordinateMatrix() >>> mat.entries.take(3) [MatrixEntry(0, 0, 1.0), MatrixEntry(0, 1, 2.0), MatrixEntry(1, 0, 7.0)]
>>> blocks = sc.parallelize([((0, 0), Matrices.dense(3, 2, [1, 2, 3, 4, 5, 6])), ... ((1, 0), Matrices.dense(3, 2, [7, 8, 9, 10, 11, 12]))]) >>> mat = BlockMatrix(blocks, 3, 2).toIndexedRowMatrix()
>>> # This BlockMatrix will have 6 effective rows, due to >>> # having two sub-matrix blocks stacked, each with 3 rows. >>> # The ensuing IndexedRowMatrix will also have 6 rows. >>> print(mat.numRows()) 6
>>> # This BlockMatrix will have 2 effective columns, due to >>> # having two sub-matrix blocks stacked, each with 2 columns. >>> # The ensuing IndexedRowMatrix will also have 2 columns. >>> print(mat.numCols()) 2
>>> blocks = sc.parallelize([((0, 0), Matrices.dense(3, 2, [1, 2, 3, 4, 5, 6])), ... ((1, 0), Matrices.dense(3, 2, [7, 8, 9, 10, 11, 12]))]) >>> mat = BlockMatrix(blocks, 3, 2).toLocalMatrix()
>>> # This BlockMatrix will have 6 effective rows, due to >>> # having two sub-matrix blocks stacked, each with 3 rows. >>> # The ensuing DenseMatrix will also have 6 rows. >>> print(mat.numRows) 6
>>> # This BlockMatrix will have 2 effective columns, due to >>> # having two sub-matrix blocks stacked, each with 2 >>> # columns. The ensuing DenseMatrix will also have 2 columns. >>> print(mat.numCols) 2
Transpose this BlockMatrix. Returns a new BlockMatrix instance sharing the same underlying data. Is a lazy operation.
>>> blocks = sc.parallelize([((0, 0), Matrices.dense(3, 2, [1, 2, 3, 4, 5, 6])), ... ((1, 0), Matrices.dense(3, 2, [7, 8, 9, 10, 11, 12]))]) >>> mat = BlockMatrix(blocks, 3, 2)
>>> mat_transposed = mat.transpose() >>> mat_transposed.toLocalMatrix() DenseMatrix(2, 6, [1.0, 4.0, 2.0, 5.0, 3.0, 6.0, 7.0, 10.0, 8.0, 11.0, 9.0, 12.0], 0)
Attributes Documentation
>>> mat = BlockMatrix( ... sc.parallelize([((0, 0), Matrices.dense(3, 2, [1, 2, 3, 4, 5, 6])), ... ((1, 0), Matrices.dense(3, 2, [7, 8, 9, 10, 11, 12]))]), 3, 2) >>> blocks = mat.blocks >>> blocks.first() ((0, 0), DenseMatrix(3, 2, [1.0, 2.0, 3.0, 4.0, 5.0, 6.0], 0))
>>> blocks = sc.parallelize([((0, 0), Matrices.dense(3, 2, [1, 2, 3, 4, 5, 6])), ... ((1, 0), Matrices.dense(3, 2, [7, 8, 9, 10, 11, 12]))]) >>> mat = BlockMatrix(blocks, 3, 2) >>> mat.colsPerBlock 2
>>> blocks = sc.parallelize([((0, 0), Matrices.dense(3, 2, [1, 2, 3, 4, 5, 6])), ... ((1, 0), Matrices.dense(3, 2, [7, 8, 9, 10, 11, 12]))]) >>> mat = BlockMatrix(blocks, 3, 2) >>> mat.numColBlocks 1
>>> blocks = sc.parallelize([((0, 0), Matrices.dense(3, 2, [1, 2, 3, 4, 5, 6])), ... ((1, 0), Matrices.dense(3, 2, [7, 8, 9, 10, 11, 12]))]) >>> mat = BlockMatrix(blocks, 3, 2) >>> mat.numRowBlocks 2
>>> blocks = sc.parallelize([((0, 0), Matrices.dense(3, 2, [1, 2, 3, 4, 5, 6])), ... ((1, 0), Matrices.dense(3, 2, [7, 8, 9, 10, 11, 12]))]) >>> mat = BlockMatrix(blocks, 3, 2) >>> mat.rowsPerBlock 3