I have the following dataframe:
import pandas as pd
df_test = pd.DataFrame(data=[['AP1', 'House1'],
['AP1', 'House1'],
['AP2', 'House1'],
['AP3', 'House2'],
['AP4','House2'],
['AP5', 'House2']],
columns=['AP', 'House'],
index=[0, 1, 2, 0, 1, 1])
I need to check at each subset of values of a column and see if there are duplicated indices. For example, in column House, we have three entries of House1 and no duplicated indices. But for entry House2 we have one duplicated index 1.
I have tried this:
print(f'{df_test.index.duplicated().sum()} repeated entries')
But this gives 3 duplicated entries, since it does not consider each value of the column separately.
>Solution :
A possible solution:
print(df_test.reset_index().duplicated(['index', 'AP']).sum())
print(df_test.reset_index().duplicated(['index', 'House']).sum())
Output:
0
1