I have a dataframe like so:
ID 1 ID 2
1 5
1 5
1 6
2 7
2 5
2 7
3 8
3 9
3 10
What I am trying to do is only keep the rows that, when the df is grouped by ID 1, ID 2 exists more than once. I have used .groupby and .value_counts in this method:
df_temp = df.groupby('ID 1')
df_output = df_temp['ID 2'].value_counts()[df_temp1['ID 2'].value_counts() > 1]
This returns something like so:
ID 1 ID 2
1 5 2
2 7 2
Is there a way I can use this in order to only keep the rows in the initial df with the ID 2 in this groupedby object? To get a result like this:
ID 1 ID 2
1 5
1 5
2 7
2 7
>Solution :
In [422]: df = pd.DataFrame(dict(id1=[1,1,1,2,2,2,3,3,3], id2=[5,5,6,7,5,7,8,9,10]))
In [423]: df
Out[423]:
id1 id2
0 1 5
1 1 5
2 1 6
3 2 7
4 2 5
5 2 7
6 3 8
7 3 9
8 3 10
In [424]: counts = pd.DataFrame(df.value_counts(subset=['id1', 'id2']))
In [425]: counts
Out[425]:
0
id1 id2
2 7 2
1 5 2
3 10 1
9 1
8 1
2 5 1
1 6 1
In [426]: counts[counts[0]>1]
Out[426]:
0
id1 id2
2 7 2
1 5 2