pyspark.mllib.linalg.distributed.
CoordinateMatrix
Represents a matrix in coordinate format.
pyspark.RDD
An RDD of MatrixEntry inputs or (int, int, float) tuples.
Number of rows in the matrix. A non-positive value means unknown, at which point the number of rows will be determined by the max row index plus one.
Number of columns in the matrix. A non-positive value means unknown, at which point the number of columns will be determined by the max row index plus one.
Methods
numCols()
numCols
Get or compute the number of cols.
numRows()
numRows
Get or compute the number of rows.
toBlockMatrix([rowsPerBlock, colsPerBlock])
toBlockMatrix
Convert this matrix to a BlockMatrix.
toIndexedRowMatrix()
toIndexedRowMatrix
Convert this matrix to an IndexedRowMatrix.
toRowMatrix()
toRowMatrix
Convert this matrix to a RowMatrix.
transpose()
transpose
Transpose this CoordinateMatrix.
Attributes
entries
Entries of the CoordinateMatrix stored as an RDD of MatrixEntries.
Methods Documentation
Examples
>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(1, 0, 2), ... MatrixEntry(2, 1, 3.7)])
>>> mat = CoordinateMatrix(entries) >>> print(mat.numCols()) 2
>>> mat = CoordinateMatrix(entries, 7, 6) >>> print(mat.numCols()) 6
>>> mat = CoordinateMatrix(entries) >>> print(mat.numRows()) 3
>>> mat = CoordinateMatrix(entries, 7, 6) >>> print(mat.numRows()) 7
Number of rows that make up each block. The blocks forming the final rows are not required to have the given number of rows.
Number of columns that make up each block. The blocks forming the final columns are not required to have the given number of columns.
>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(6, 4, 2.1)]) >>> mat = CoordinateMatrix(entries).toBlockMatrix()
>>> # This CoordinateMatrix will have 7 effective rows, due to >>> # the highest row index being 6, and the ensuing >>> # BlockMatrix will have 7 rows as well. >>> print(mat.numRows()) 7
>>> # This CoordinateMatrix will have 5 columns, due to the >>> # highest column index being 4, and the ensuing >>> # BlockMatrix will have 5 columns as well. >>> print(mat.numCols()) 5
>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(6, 4, 2.1)]) >>> mat = CoordinateMatrix(entries).toIndexedRowMatrix()
>>> # This CoordinateMatrix will have 7 effective rows, due to >>> # the highest row index being 6, and the ensuing >>> # IndexedRowMatrix will have 7 rows as well. >>> print(mat.numRows()) 7
>>> # This CoordinateMatrix will have 5 columns, due to the >>> # highest column index being 4, and the ensuing >>> # IndexedRowMatrix will have 5 columns as well. >>> print(mat.numCols()) 5
>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(6, 4, 2.1)]) >>> mat = CoordinateMatrix(entries).toRowMatrix()
>>> # This CoordinateMatrix will have 7 effective rows, due to >>> # the highest row index being 6, but the ensuing RowMatrix >>> # will only have 2 rows since there are only entries on 2 >>> # unique rows. >>> print(mat.numRows()) 2
>>> # This CoordinateMatrix will have 5 columns, due to the >>> # highest column index being 4, and the ensuing RowMatrix >>> # will have 5 columns as well. >>> print(mat.numCols()) 5
New in version 2.0.0.
>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(1, 0, 2), ... MatrixEntry(2, 1, 3.7)]) >>> mat = CoordinateMatrix(entries) >>> mat_transposed = mat.transpose()
>>> print(mat_transposed.numRows()) 2
>>> print(mat_transposed.numCols()) 3
Attributes Documentation
>>> mat = CoordinateMatrix(sc.parallelize([MatrixEntry(0, 0, 1.2), ... MatrixEntry(6, 4, 2.1)])) >>> entries = mat.entries >>> entries.first() MatrixEntry(0, 0, 1.2)