We have this df:
df = pd.DataFrame({
'date': [pd.Timestamp('2020-12-26'), # week 52 of year 2020
pd.Timestamp('2020-12-27'), # last day of week 52 of year 2020
pd.Timestamp('2021-03-10'), # week 10 of year 2021
pd.Timestamp('2022-01-03'), # first day of week 1 of year 2022
pd.Timestamp('2022-01-09')], # last day of week 1 of year 2022
'value' : [15, 15.5, 26, 36, 36.15]
})
We want a new df that looks so:
date value
0 202052 15.50
1 202201 36.15
In other words we need to:
- convert ‘date’ to format year/week number (and store result as
an object) - select only rows which date correspond to the last
day of the week
Note both (1) and (2) need to be done following ISO-8601 definition of weeks. Actual dataset has thousands of rows.
How do we do this?
>Solution :
You can work directly on the series by using the dt call on the column to transform the format of the date. To find if it is the last day of the week, Sunday corresponds to 7 so we can do an equality check.
iso = df.date.dt.isocalendar()
mask = iso.day == 7
df.loc[mask].assign(date=iso.year.astype(str) + iso.week.astype(str).str.rjust(2, "0"))
date value
1 202052 15.50
4 202201 36.15