Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas – Join two dataframes by groupby with conditions

I would like to left join table_B using table_A on column ID and groupby the number of records table A has when the date in table_A is smaller than the one in talbe_B. For example,

table_A = 
| ID | Date |
| ----| -------------| 
| 0   | 2022-03-01   |
| 1   | 2022-03-02   |
| 0   | 2022-05-02   |
| 1   | 2022-04-02   |

table_B =

| ID | Date         |
| -- | ------------ | 
| 0   | 2022-05-04  |
| 1   | 2022-03-20  |
| 1   | 2022-05-02  |
| 2   | 2022-02-22  |

The resulting table should be

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

| ID | Date       | number_of_records |
| ---| -----------| ----------------  |
| 0  | 2022-05-04 | 2                 |
| 1  | 2022-03-20 | 1                 |
| 1  | 2022-05-02 | 2                 |
| 2  | 2022-02-22 | 0                 |

import datetime
table_A = pd.DataFrame({'ID': [0,1,0,1], 
        'Date':[datetime.date(2022,3,1), datetime.date(2022,3,2), datetime.date(2022,5,2), datetime.date(2022,4,2)]})
table_B = pd.DataFrame({'ID': [0,1,1,2], 
        'Date':[datetime.date(2022,5,4), datetime.date(2022,3,20), datetime.date(2022,5,2), datetime.date(2022,2,22)]})
# The desired output
table_C = pd.DataFrame({'ID': [0,1,1,2], 
        'Date':[datetime.date(2022,5,4), datetime.date(2022,3,20), datetime.date(2022,5,2), datetime.date(2022,2,22)], 
        'number_of_records': [2,1,2,0]})

I tried to find the unique ID in table B and use groupby to count in table_A but I am not sure how to apply the condition "smaller than the one in talbe_B". Thanks a lot!

>Solution :

table_A.Date = pd.to_datetime(table_A.Date)
table_B.Date = pd.to_datetime(table_B.Date)
table_A.merge(table_B, on='ID', how='right', suffixes=['_count', None])[lambda x: x.Date_count.lt(x.Date) | x.Date_count.isna()].groupby(['ID', 'Date'], as_index=False).count()

Output:

   ID       Date  Date_count
0   0 2022-05-04           2
1   1 2022-03-20           1
2   1 2022-05-02           2
3   2 2022-02-22           0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading