Multiply Two Numpy Matrices In Pyspark
Let's say I have these two Numpy arrays: A = np.arange(1024 ** 2, dtype=np.float64).reshape(1024, 1024) B = np.arange(1024 ** 2, dtype=np.float64).reshape(1024, 1024) and I perfor
Solution 1:
Using the as_block_matrix
method from this post, you could do the following (but see the comment of @kennytm why this method can be slow for bigger matrices):
import numpy as np
from pyspark.mllib.linalg.distributed import RowMatrix
A = np.arange(1024 ** 2, dtype=np.float64).reshape(1024, 1024)
B = np.arange(1024 ** 2, dtype=np.float64).reshape(1024, 1024)
from pyspark.mllib.linalg.distributed import *
def as_block_matrix(rdd, rowsPerBlock=1024, colsPerBlock=1024):
return IndexedRowMatrix(
rdd.zipWithIndex().map(lambda xi: IndexedRow(xi[1], xi[0]))
).toBlockMatrix(rowsPerBlock, colsPerBlock)
matrixA = as_block_matrix(sc.parallelize(A))
matrixB = as_block_matrix(sc.parallelize(B))
product = matrixA.multiply(matrixB)
Post a Comment for "Multiply Two Numpy Matrices In Pyspark"