Home pandas – drop duplicates in a column, count the number of duplicates and aggregate one column

Questions

pandas – drop duplicates in a column, count the number of duplicates and aggregate one column

December 17, 2022

I’m trying to remove duplicates values in ID column, count the duplicates in the ID column and create a new column called Count, and concatenate the Axis column

THIS IS MY CURRENT DATAFRAME:

ID    Axis    
1   1 2 3 4 
1   0 1 2 3 
1   4 5 2 4 
2   7 8 9 10 
2   1 2 3 4 
3   6 7 8 9 
4   1 2 3 4 
4   0 1 2 3

Desired output


 ID  count  Axis    
 1    3    [1 2 3 4 , 0 1 2 3 ,  4 5 2 4]
 2    2    [ 7 8 9 10 ,  1 2 3 4] 
 3    1    [6 7 8 9 ]
 4    2    [1 2 3 4 , 0 1 2 3]

I know I’m supposed to use aggregate function, but I’m not getting it. If someone can guide me, I would really appreciate it

>Solution :

Use:

df2 = df.groupby('ID').agg(lambda x: list(x))
df2['count'] = df2['Axis'].apply(lambda x: len(x))
print(df2)

which gives:

                                          Axis  count
ID                                                   
1   [[1, 2, 3, 4], [0, 1, 2, 3], [4, 5, 2, 4]]      3
2                [[7, 8, 9, 10], [1, 2, 3, 4]]      2
3                               [[6, 7, 8, 9]]      1
4                 [[1, 2, 3, 4], [0, 1, 2, 3]]      2

for the DataFrame

  ID           Axis
0   1   [1, 2, 3, 4]
1   1   [0, 1, 2, 3]
2   1   [4, 5, 2, 4]
3   2  [7, 8, 9, 10]
4   2   [1, 2, 3, 4]
5   3   [6, 7, 8, 9]
6   4   [1, 2, 3, 4]
7   4   [0, 1, 2, 3]