Given a dataframe like the one below, how do I filter for the earlest and latest date in each month? Note the actual data runs to tens of thousands of rows.
Input:
| Date | Deg |
|---|---|
| 02/01/1990 | 1210.92 |
| 13/01/1990 | 1226.83 |
| 14/01/1990 | 1224.52 |
| 15/01/1990 | 1220.77 |
| 08/02/1990 | 1164.32 |
| 09/02/1990 | 1156.72 |
| 12/02/1990 | 1145.18 |
| 13/02/1990 | 1146.88 |
| 24/02/1990 | 1149.07 |
Desired output:
| Date | Deg |
|---|---|
| 02/01/1990 | 1210.92 |
| 15/01/1990 | 1220.77 |
| 08/02/1990 | 1164.32 |
| 24/02/1990 | 1149.07 |
>Solution :
Your data looks sorted. Try this:
df["year"] = df["Date"].dt.year
df["month"] = df["Date"].dt.month
return pd.concat(
[
df.groupby(["month", "year"]).last(),
df.groupby(["month", "year"]).first()
]
).reset_index(drop=True).sort_values(by="Date")