My input is this dataframe :
df = pd.DataFrame(
{
'ID': ['ID001', 'ID002', 'ID003'],
'DATE': ['24/12/2023', '01/02/2024', '12/02/2024'],
}
)
df['DATE'] = pd.to_datetime(df['DATE'], dayfirst=True)
print(df)
ID DATE
0 ID001 2023-12-24
1 ID002 2024-02-01
2 ID003 2024-02-12
I’m trying to duplicate the rows for each id N times with N being the number of weeks between the column DATE and the current date. At the end, each last row for a given id will have the week of the column DATE.
For that I made the code below but it gives me a wrong output :
number_of_weeks = (pd.Timestamp('now') - df['DATE']).dt.days // 7
final = df.copy()
final['YEAR'] = final['DATE'].dt.isocalendar().year
final['WEEK'] = final['DATE'].dt.isocalendar().week
final['WEEKS'] = (pd.Timestamp('now') - df['DATE']).dt.days // 7
for index, row in final.iterrows():
for i in range(1, row['WEEKS'] + 1):
final.loc[i, 'WEEK'] = i
final = final.ffill().drop(columns='WEEKS')
print(final)
ID DATE YEAR WEEK
0 ID001 2023-12-24 2023 51
1 ID002 2024-02-01 2024 1
2 ID003 2024-02-12 2024 2
3 ID003 2024-02-12 2024 3
4 ID003 2024-02-12 2024 4
5 ID003 2024-02-12 2024 5
6 ID003 2024-02-12 2024 6
7 ID003 2024-02-12 2024 7
Have you guys encountered a similar problem ? I’m open to any suggestion.
My expected output is this :
ID DATE YEAR WEEK
0 ID001 24/12/2023 2023 51
1 ID001 24/12/2023 2023 52
2 ID001 24/12/2023 2024 1
3 ID001 24/12/2023 2024 2
4 ID001 24/12/2023 2024 3
5 ID001 24/12/2023 2024 4
6 ID001 24/12/2023 2024 5
7 ID001 24/12/2023 2024 6
8 ID001 24/12/2023 2024 7
##################################
9 ID002 01/02/2024 2024 5
10 ID002 01/02/2024 2024 6
11 ID002 01/02/2024 2024 7
##################################
12 ID003 12/02/2024 2024 7
>Solution :
I would use periods for that and Index.repeat:
# compute exact difference in weeks
n_weeks = (df['DATE'].dt.to_period('W')
.rsub(pd.Timestamp('now').to_period('W'))
.apply(lambda x: x.n)
)
# repeat rows, add incrementing days, convert to week number
out = (df.loc[df.index.repeat(n_weeks+1)]
.assign(WEEK=lambda d: df['DATE']
.add(pd.to_timedelta(d.groupby(level=0).cumcount()*7, unit='D'))
.dt.isocalendar().week
)
)
Output:
ID DATE WEEK
0 ID001 2023-12-24 51
0 ID001 2023-12-24 52
0 ID001 2023-12-24 1
0 ID001 2023-12-24 2
0 ID001 2023-12-24 3
0 ID001 2023-12-24 4
0 ID001 2023-12-24 5
0 ID001 2023-12-24 6
0 ID001 2023-12-24 7
1 ID002 2024-02-01 5
1 ID002 2024-02-01 6
1 ID002 2024-02-01 7
2 ID003 2024-02-12 7