Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to duplicate rows based on the number of weeks between two dates?

My input is this dataframe :

df = pd.DataFrame(
    {
        'ID': ['ID001', 'ID002', 'ID003'],
        'DATE': ['24/12/2023', '01/02/2024', '12/02/2024'],
    }
)

df['DATE'] = pd.to_datetime(df['DATE'], dayfirst=True)

print(df)

      ID       DATE
0  ID001 2023-12-24
1  ID002 2024-02-01
2  ID003 2024-02-12

I’m trying to duplicate the rows for each id N times with N being the number of weeks between the column DATE and the current date. At the end, each last row for a given id will have the week of the column DATE.

For that I made the code below but it gives me a wrong output :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

number_of_weeks = (pd.Timestamp('now') - df['DATE']).dt.days // 7

final = df.copy()
final['YEAR'] = final['DATE'].dt.isocalendar().year
final['WEEK'] = final['DATE'].dt.isocalendar().week
final['WEEKS'] = (pd.Timestamp('now') - df['DATE']).dt.days // 7

for index, row in final.iterrows():
    for i in range(1, row['WEEKS'] + 1):
        final.loc[i, 'WEEK'] = i

final = final.ffill().drop(columns='WEEKS')

print(final)

      ID       DATE  YEAR  WEEK
0  ID001 2023-12-24  2023    51
1  ID002 2024-02-01  2024     1
2  ID003 2024-02-12  2024     2
3  ID003 2024-02-12  2024     3
4  ID003 2024-02-12  2024     4
5  ID003 2024-02-12  2024     5
6  ID003 2024-02-12  2024     6
7  ID003 2024-02-12  2024     7

Have you guys encountered a similar problem ? I’m open to any suggestion.

My expected output is this :

       ID        DATE  YEAR   WEEK
0   ID001  24/12/2023  2023     51
1   ID001  24/12/2023  2023     52
2   ID001  24/12/2023  2024      1
3   ID001  24/12/2023  2024      2
4   ID001  24/12/2023  2024      3
5   ID001  24/12/2023  2024      4
6   ID001  24/12/2023  2024      5
7   ID001  24/12/2023  2024      6
8   ID001  24/12/2023  2024      7
##################################
9   ID002  01/02/2024  2024      5
10  ID002  01/02/2024  2024      6
11  ID002  01/02/2024  2024      7
##################################
12  ID003  12/02/2024  2024      7

>Solution :

I would use periods for that and Index.repeat:

# compute exact difference in weeks
n_weeks = (df['DATE'].dt.to_period('W')
           .rsub(pd.Timestamp('now').to_period('W'))
           .apply(lambda x: x.n)
          )

# repeat rows, add incrementing days, convert to week number
out = (df.loc[df.index.repeat(n_weeks+1)]
         .assign(WEEK=lambda d: df['DATE']
                 .add(pd.to_timedelta(d.groupby(level=0).cumcount()*7, unit='D'))
                 .dt.isocalendar().week
                )
      )

Output:

      ID       DATE  WEEK
0  ID001 2023-12-24    51
0  ID001 2023-12-24    52
0  ID001 2023-12-24     1
0  ID001 2023-12-24     2
0  ID001 2023-12-24     3
0  ID001 2023-12-24     4
0  ID001 2023-12-24     5
0  ID001 2023-12-24     6
0  ID001 2023-12-24     7
1  ID002 2024-02-01     5
1  ID002 2024-02-01     6
1  ID002 2024-02-01     7
2  ID003 2024-02-12     7
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading