Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Filter dataframe for value that exists in all dates

Say I have a dataframe like this:

df = pd.DataFrame({
    'PortDt': ['2022-01-31', '2022-02-28', '2022-02-28', '2022-03-31', '2022-03-31'],
    'loannum': ['111', '111', '222', '111', '333']
})

I want to filter the dataset so that I am left with only records who appear in every distinct value for PortDt.

For this example, the result would be:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

PortDt     |   loannum
-----------+-------------
2022-01-31 |  111
2022-02-28 |  111
2022-03-31 |  111

>Solution :

Using groupby.transform with ‘nunique’ and comparing to the overall number of unique values:

out = df[df.groupby('loannum')['PortDt']
           .transform('nunique').eq(df['PortDt'].nunique())]

Or same logic with better efficiency:

s = df.groupby('loannum')['PortDt'].nunique().eq(df['PortDt'].nunique())

df[df['loannum'].isin(s[s].index)]

Or with crosstab+all instead of groupby:

s = pd.crosstab(df['PortDt'], df['loannum']).all()
out = df[df['loannum'].isin(s[s].index)]

Output:

       PortDt loannum
0  2022-01-31     111
1  2022-02-28     111
3  2022-03-31     111
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading