Skip to content Skip to sidebar Skip to footer

Joining Dataframes With Same Coumn Name In Pyspark

I have two dataframe which has been readed from two csv files. +---+----------+-----------------+ | ID| NUMBER | RECHARGE_AMOUNT| +---+----------+-----------------+ | 1|9090909

Solution 1:

You can select the columns from each dataframe and alias it.
Like this.

dfFinal = dfFinal.join(df2, on=['NUMBER'], how='inner') \
                 .select('NUMBER',
                         dfFinal.ID.alias('ID_1'),
                         dfFinal.RECHARGE_AMOUNT.alias('RECHARGE_AMOUNT_1'),
                         df2.ID.alias('ID_2'),
                         df2.RECHARGE_AMOUNT.alias('RECHARGE_AMOUNT_2'))

Post a Comment for "Joining Dataframes With Same Coumn Name In Pyspark"