Skip to content Skip to sidebar Skip to footer

Cartesian Product Of Two Rdd In Spark

I am completely new to Apache Spark and I trying to Cartesian product two RDD. As an example I have A and B like : A = {(a1,v1),(a2,v2),...} B = {(b1,s1),(b2,s2),...} I need a new

Solution 1:

That's not the dot product, that's the cartesian product. Use the cartesian method:

defcartesian[U](other: spark.api.java.JavaRDDLike[U, _]): JavaPairRDD[T, U]

Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of elements (a, b) where a is in this and b is in other.

Source

Solution 2:

Just in case if you are curious on how to do with multiple lists, here's an example in pyspark

>>>a = [1,2,3]>>>b = [5,6,7,8]>>>c = [11,22,33,44,55]>>>import itertools>>>abcCartesianRDD = sc.parallelize(itertools.product(a,b,c))>>>abcCartesianRDD.count() #Test
    60

Post a Comment for "Cartesian Product Of Two Rdd In Spark"