Say I have a dataframe like this:
df = pd.DataFrame({
'PortDt': ['2022-01-31', '2022-02-28', '2022-02-28', '2022-03-31', '2022-03-31'],
'loannum': ['111', '111', '222', '111', '333']
})
I want to filter the dataset so that I am left with only records who appear in every distinct value for PortDt.
For this example, the result would be:
PortDt | loannum
-----------+-------------
2022-01-31 | 111
2022-02-28 | 111
2022-03-31 | 111
>Solution :
Using groupby.transform with ‘nunique’ and comparing to the overall number of unique values:
out = df[df.groupby('loannum')['PortDt']
.transform('nunique').eq(df['PortDt'].nunique())]
Or same logic with better efficiency:
s = df.groupby('loannum')['PortDt'].nunique().eq(df['PortDt'].nunique())
df[df['loannum'].isin(s[s].index)]
Or with crosstab+all instead of groupby:
s = pd.crosstab(df['PortDt'], df['loannum']).all()
out = df[df['loannum'].isin(s[s].index)]
Output:
PortDt loannum
0 2022-01-31 111
1 2022-02-28 111
3 2022-03-31 111