Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Assign date values for null in a column in a pyspark dataframe

I have a pyspark dataframe:

Location        Month       New_Date    Sales
USA             1/1/2020    1/1/2020    34.56%
COL             1/1/2020    1/1/2020    66.4%
AUS             1/1/2020    1/1/2020    32.98%
NZ              null        null        44.59%
CHN             null        null        21.13%

Im creating New_Date column from Month column (MM/dd/yyyy format).
I need to populate New_date values for the rows having Month as null.

And this is what I tried:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df1=df.filter(col('Month').isNull()) \
.withColumn("current_date",current_date()) \
.withColumn("New_date", trunc(col("current_date"), "month"))

But Im getting first date of current month.
I need the first date of Month column
Pls suggest any other approach.

Location        Month       New_Date    Sales
USA             1/1/2020    1/1/2020    34.56%
COL             1/1/2020    1/1/2020    66.4%
AUS             1/1/2020    1/1/2020    32.98%
NZ              null        1/1/2020    44.59%
CHN             null        1/1/2020    21.13%

>Solution :

You can use first function over window:

from pyspark.sql import functions as F, Window

w = (Window.orderBy("Month")
     .rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing)
     )

df1 = df.withColumn(
    "New_date",
    F.coalesce(F.col("Month"), F.first("Month", ignorenulls=True).over(w))
)

df1.show()
#+--------+--------+--------+------+
#|Location|   Month|New_date| Sales|
#+--------+--------+--------+------+
#|      NZ|    null|1/1/2020|44.59%|
#|     CHN|    null|1/1/2020|21.13%|
#|     USA|1/1/2020|1/1/2020|34.56%|
#|     COL|1/1/2020|1/1/2020| 66.4%|
#|     AUS|1/1/2020|1/1/2020|32.98%|
#+--------+--------+--------+------+
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading