Hi I am trying to do a for loop where the dataframe is filtered with max date return from date column and if the len of rows ==1, then drop and keep on doing that until max date != 1.
Max date here refers to the max date on the date column in dataframe.
The purpose why I am doing a descending here is that because data is appended daily to the dataframe, I am checking the last date if it’s returned 1 row only, if so delete so that the append feature from a different function can continue appending from the max date.
The table is like so
| date | names | age |
|---|---|---|
| 2022-04-01 | john | 15 |
| 2022-04-01 | jane | 15 |
| 2022-04-01 | Swansa | 15 |
| 2022-04-02 | Timmy | 15 |
| 2022-04-02 | (null) | 15 |
| 2022-04-02 | (null) | 15 |
| 2022-04-03 | (null) | 15 |
| 2022-04-04 | (null) | (null) |
| 2022-04-05 | (null) | (null) |
the expected end results is
| date | names | age |
|---|---|---|
| 2022-04-01 | john | 15 |
| 2022-04-01 | jane | 15 |
| 2022-04-01 | Swansa | 15 |
| 2022-04-02 | Timmy | 15 |
| 2022-04-02 | (null) | 15 |
| 2022-04-02 | (null) | 15 |
I have tried doing this
latestDate = df['date'].max()
dfRow = len(df[df['date']==latestDate])
if dfRow == 1:
df = df[df['date']!= latestDate]
How do I loop the code above until the len of rows for max date is not 1.
How do you a descending for loop date range?
>Solution :
You can try using groupby and transform
df['date'] = pd.to_datetime(df['date'])
df_filtered = df[(df.groupby('date')['names'].transform('count')!=1) | (df['date'].dt.year != 2022)]
| date | names | age | |
|---|---|---|---|
| 0 | 2022-04-01 | john | 15 |
| 1 | 2022-04-01 | jane | 15 |
| 2 | 2022-04-01 | Swansa | 15 |
| 3 | 2022-04-02 | Timmy | 15 |
| 4 | 2022-04-02 | (null) | 15 |
| 5 | 2022-04-02 | (null) | 15 |