Home Keep only data in pandas dataframe where column value occurs more than once for a groupby arrangement

Questions

Keep only data in pandas dataframe where column value occurs more than once for a groupby arrangement

November 7, 2023

I have a dataframe like so:

ID 1          ID 2
1             5
1             5
1             6
2             7
2             5
2             7
3             8
3             9
3             10

What I am trying to do is only keep the rows that, when the df is grouped by ID 1, ID 2 exists more than once. I have used .groupby and .value_counts in this method:

df_temp = df.groupby('ID 1')
df_output = df_temp['ID 2'].value_counts()[df_temp1['ID 2'].value_counts() > 1]

This returns something like so:

ID 1          ID 2
1             5         2
2             7         2

Is there a way I can use this in order to only keep the rows in the initial df with the ID 2 in this groupedby object? To get a result like this:

ID 1          ID 2
1             5
1             5
2             7
2             7

>Solution :

In [422]: df = pd.DataFrame(dict(id1=[1,1,1,2,2,2,3,3,3], id2=[5,5,6,7,5,7,8,9,10]))

In [423]: df
Out[423]:
   id1  id2
0    1    5
1    1    5
2    1    6
3    2    7
4    2    5
5    2    7
6    3    8
7    3    9
8    3   10

In [424]: counts = pd.DataFrame(df.value_counts(subset=['id1', 'id2']))

In [425]: counts
Out[425]:
         0
id1 id2
2   7    2
1   5    2
3   10   1
    9    1
    8    1
2   5    1
1   6    1

In [426]: counts[counts[0]>1]
Out[426]:
         0
id1 id2
2   7    2
1   5    2