Home how to get a unique week number for start and end dates in multi years – Pandas

Questions

how to get a unique week number for start and end dates in multi years – Pandas

June 19, 2023

I have a dataframe where two of the columns represent the start and end date of the data record. There are multiple years. My goal is to assign a new column that represents the time step of the data record in each row. Since I have a location columns as well, some of these weeks will be repeating.

import pandas as pd


dates = pd.date_range(start='2021-11-11', periods=20, freq='W')


df = pd.DataFrame({
    'start_date': np.repeat(dates, 5),
    'end_date': np.repeat(dates + pd.DateOffset(days=6), 5),
    'country': ['USA', 'Canada', 'UK', 'Australia', 'Russia'] * 20
})

df = df.sort_values("start_date")


    start_date  end_date    country 
0   2021-11-14  2021-11-20  USA 
1   2021-11-14  2021-11-20  Canada
2   2021-11-14  2021-11-20  UK
3   2021-11-14  2021-11-20  Australia   
4   2021-11-14  2021-11-20  Russia

I can get the week number using isocalendar().week, but it is giving the week number of the corresponding year. For instance, if 2021-11-14 and 2021-11-20 is the first week in the data frame, it should get 1. It may skip the next week, and have another record starting from 2021-11-27. Such time step should be the second week for me in the data frame.

>Solution :

IIUC, you can use groupby_ngroup:

df['week'] = df.groupby(df['start_date']).ngroup().add(1)
print(df)

# Output
   start_date   end_date    country  week
0  2021-11-14 2021-11-20        USA     1
1  2021-11-14 2021-11-20     Canada     1
2  2021-11-14 2021-11-20         UK     1
3  2021-11-14 2021-11-20  Australia     1
4  2021-11-14 2021-11-20     Russia     1
..        ...        ...        ...   ...
98 2022-03-27 2022-04-02  Australia    20
95 2022-03-27 2022-04-02        USA    20
96 2022-03-27 2022-04-02     Canada    20
97 2022-03-27 2022-04-02         UK    20
99 2022-03-27 2022-04-02     Russia    20

[100 rows x 4 columns]

Alternative with pd.factorize (IF the dataframe is already sorted by start_date value:

df['week'] = pd.factorize(df['start_date'])[0] + 1

pandas

byMR

Published June 19, 2023

Add a comment

How to determine unknown socket options?

byMR

June 19, 2023

Questions

What is the diffrence between prefix/postfix and increment operator

byMR

June 19, 2023

Questions

How to compare values in two columns and if values are equal keep as is, but if values are different, sum by row

byMR

June 19, 2023

Questions

EXCEL Pivot Table not showing all the months

byMR

June 19, 2023

Questions

Value not being displayed after render

byMR

June 19, 2023

Questions

Is there a way to convert string of certain format to dataframe in Python

byMR

June 19, 2023

how to get a unique week number for start and end dates in multi years – Pandas

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

How to determine unknown socket options?

What is the diffrence between prefix/postfix and increment operator

How to compare values in two columns and if values are equal keep as is, but if values are different, sum by row

EXCEL Pivot Table not showing all the months

Value not being displayed after render

Is there a way to convert string of certain format to dataframe in Python

Keep Up to Date with the Most Important News

how to get a unique week number for start and end dates in multi years – Pandas

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

How to determine unknown socket options?

What is the diffrence between prefix/postfix and increment operator

How to compare values in two columns and if values are equal keep as is, but if values are different, sum by row

EXCEL Pivot Table not showing all the months

Value not being displayed after render

Is there a way to convert string of certain format to dataframe in Python

Discover more from Dev solutions