Skip to content Skip to sidebar Skip to footer

Pyspark Creating Timestamp Column

I am using spark 2.1.0. I am not able to create timestamp column in pyspark I am using below code snippet. Please help df=df.withColumn('Age',lit(datetime.now())) I am getting a

Solution 1:

I am not sure for 2.1.0, on 2.2.1 at least you can just:

from pyspark.sqlimport functions as F
df.withColumn('Age', F.current_timestamp())

Hope it helps!

Solution 2:

Assuming you have dataframe from your code snippet and you want same timestamp for all your rows.

Let me create some dummy dataframe.

>>>dict = [{'name': 'Alice', 'age': 1},{'name': 'Again', 'age': 2}]>>>df = spark.createDataFrame(dict)>>>import time>>>import datetime>>>timestamp = datetime.datetime.fromtimestamp(time.time()).strftime('%Y-%m-%d %H:%M:%S')>>>type(timestamp)
<class 'str'>

>>>from pyspark.sql.functions import lit,unix_timestamp>>>timestamp
'2017-08-02 16:16:14'
>>>new_df = df.withColumn('time',unix_timestamp(lit(timestamp),'yyyy-MM-dd HH:mm:ss').cast("timestamp"))>>>new_df.show(truncate = False)
+---+-----+---------------------+
|age|name |time                 |
+---+-----+---------------------+
|1  |Alice|2017-08-02 16:16:14.0|
|2  |Again|2017-08-02 16:16:14.0|
+---+-----+---------------------+

>>>new_df.printSchema()
root
 |-- age: long (nullable = true)
 |-- name: string (nullable = true)
 |-- time: timestamp (nullable = true)

Solution 3:

Adding on to balalaika, if someone, like me just want to add the date, but not the time with it, then he can follow the below code

from pyspark.sqlimport functions as F
df.withColumn('Age', F.current_date())

Hope this helps

Post a Comment for "Pyspark Creating Timestamp Column"