Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Column with array with all months in x amount of years from starting date – Pyspark

Imagine you have a dataframe df as follows:

ID  Years     Date
A    5    2021-02-01
B    3    2021-02-01
C    6    2021-02-01

I want to be able to create an additional date array column with all the dates starting from the initial date + 1 month all the way to the x amount of years in the years column. It would look like the following:

ID  Years     Date        Dates
A    5    2021-02-01     [2021-03-01,2021-04-01,...,2026-02-01]
B    3    2021-03-01     [2021-04-01,2021-04-01,...,2024-03-01]
C    6    2021-02-01     [2021-03-01,2021-04-01,...,2027-02-01]

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

For spark >= 2.4, you can use the sequence and add_months functions to generate the desired sequence of dates.

df = df.withColumn('Dates',
                   F.expr('sequence(add_months(to_date(Date), 1), add_months(to_date(Date), int(Years) * 12), interval 1 month)')
                   )
df.show(truncate=False)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading