Imagine you have a dataframe df as follows:
ID Years Date
A 5 2021-02-01
B 3 2021-02-01
C 6 2021-02-01
I want to be able to create an additional date array column with all the dates starting from the initial date + 1 month all the way to the x amount of years in the years column. It would look like the following:
ID Years Date Dates
A 5 2021-02-01 [2021-03-01,2021-04-01,...,2026-02-01]
B 3 2021-03-01 [2021-04-01,2021-04-01,...,2024-03-01]
C 6 2021-02-01 [2021-03-01,2021-04-01,...,2027-02-01]
>Solution :
For spark >= 2.4, you can use the sequence and add_months functions to generate the desired sequence of dates.
df = df.withColumn('Dates',
F.expr('sequence(add_months(to_date(Date), 1), add_months(to_date(Date), int(Years) * 12), interval 1 month)')
)
df.show(truncate=False)