Pyspark Hive Context -- Read Table With Utf-8 Encoding
I have a table in hive, And I am reading that table in pyspark df_sprk_df from pyspark import SparkContext from pysaprk.sql import HiveContext sc = SparkContext() hive_context = Hi
Solution 1:
So this workaround helped to solve this, By changing the default encoding for the session
import sys
reload(sys)
sys.setdefaultencoding('UTF-8')
and then
df_pandas_df = df_pandas_df.astype(str)
converts whole dataframe as string df.
Solution 2:
Instead of directly casting it to string try to infer types of pandas DataFrame using following statement:
df_pandas_df .apply(lambda x: pd.lib.infer_dtype(x.values))
UPD:
try to perform mapping without .str
invocation.
Maybe something like below:
for cols in df_pandas_df.columns:
df_pandas_df[cols] = df_pandas_df[cols].apply(lambda x: unicode(x, errors='ignore'))
Post a Comment for "Pyspark Hive Context -- Read Table With Utf-8 Encoding"