Home Add column with number of duplicated values within groups and number of unique values within groups in pandas

Questions

Add column with number of duplicated values within groups and number of unique values within groups in pandas

April 14, 2022

I have a dataframe such as

Groups Names
G1     A
G1     A
G1     B
G1     B
G1     C
G1     C
G1     C
G1     D
G2     A
G2     B
G2     C
G3     A
G3     A
G4     F
G4     F
G4     E

And I would like to count for each Groups the number of duplicated (at least 2 times) of values within the column Names and add this information on a new column called Nb_duplicated. And I would like also to add another column called Number_unique_names which will be the number of unique Names values within each Groups.

I should then get:

Groups Names Nb_duplicated Number_unique_names
G1     A     3             4
G1     A     3             4
G1     B     3             4
G1     B     3             4
G1     C     3             4
G1     C     3             4
G1     C     3             4
G1     D     3             4
G2     A     0             3
G2     B     0             3
G2     C     0             3
G3     A     1             1
G3     A     1             1
G4     F     1             2
G4     F     1             2
G4     E     1             2

>Solution :

You can use compute the number of unique and the number of non-duplicated names (with GroupBy.transform), then subtract the two to get the number of duplicated:

# set up group
g = df.groupby('Groups')
# get unique values
df['unique'] = g['Names'].transform('nunique')
# get non-duplicates
non_dup = g['Names'].transform(lambda x: (~x.duplicated(False)).sum())
# duplicates = unique - non-duplicates
df['duplicated'] = df['unique'] - non_dup

NB. I used an intermediate variable "non_dup" here for clarity but you can use a one-liner

output (with intermediate non_dup for clarity):

   Groups Names  unique  duplicated           non_dup
0      G1     A       4           3                 1
1      G1     A       4           3                 1
2      G1     B       4           3                 1
3      G1     B       4           3                 1
4      G1     C       4           3                 1
5      G1     C       4           3                 1
6      G1     C       4           3                 1
7      G1     D       4           3                 1
8      G2     A       3           0                 3
9      G2     B       3           0                 3
10     G2     C       3           0                 3
11     G3     A       1           1                 0
12     G3     A       1           1                 0
13     G4     F       2           1                 1
14     G4     F       2           1                 1
15     G4     E       2           1                 1