Home Panda rename rows after grouping by columns

Questions

Panda rename rows after grouping by columns

January 6, 2023

I’ve recently started to play around with Pandas in order to manipulate some data and I am now trying to anonymize a few columns after a groupBy to find unique occurrences for persons.

For example, suppose the following DF:

   First Name Last Name         DOB
0  Bob        One               28/05/1973
1  Bob        One               28/05/1973
2  Ana        Two               28/07/1991
3  Ana        Two               28/07/1991
4  Ana        Two               28/07/1991
5  Jim        Three             07/01/1994

I can easily find unique person by First Name, Last Name and DOB by using df.groupby(['First Name', 'Last Name', 'DOB']).

However, I’d like to apply a function to every unique combination that would transform those names to a known anonymized (incremental) version.

   First Name Last Name         DOB
0  F1         L1                28/05/1973
1  F1         L1                28/05/1973
2  F2         L2                28/07/1991
3  F2         L2                28/07/1991
4  F2         L2                28/07/1991
5  F3         L3                07/01/1994

I’ve tried a few things with transform and apply functions of DF groupBy but with no lucky so far. How could I achieve this?

>Solution :

ids = (df.groupby(["FirstName", "LastName", "DOB"], sort=False)
         .ngroup().add(1)
         .astype(str))

df["FirstName"] = "F" + ids
df["LastName"]  = "L" + ids

identify the IDs of firstname, lastname and DOB triples over their group number (ngroup)
- sort=False helps it retain the seen order
- ngroup is 0-based, so we add(1)
then add prefixes "F" and "L" to the IDs and assign to appropriate columns

to get

>>> df
  FirstName LastName         DOB
0        F1       L1  28/05/1973
1        F1       L1  28/05/1973
2        F2       L2  28/07/1991
3        F2       L2  28/07/1991
4        F2       L2  28/07/1991
5        F3       L3  07/01/1994

where the ids was

>>> ids
0    1
1    1
2    2
3    2
4    2
5    3
dtype: object

group-by

byMR

Published January 06, 2023

Add a comment

Create new data.frame consisting of average value of every two rows

byMR

January 6, 2023

Questions

SQL Join with 3 Tables and WHERE

byMR

January 6, 2023

Questions

how can I create a new data frame using exact rows from the old data frame in R Studio?

byMR

January 6, 2023

Questions

Can a user exit a contenteditable with a key?

byMR

January 6, 2023

Questions

How to call the slideToggle() function on the nearest element

byMR

January 6, 2023

Questions

Change the values in the nested list according to the specified index list

byMR

January 7, 2023

Panda rename rows after grouping by columns

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Create new data.frame consisting of average value of every two rows

SQL Join with 3 Tables and WHERE

how can I create a new data frame using exact rows from the old data frame in R Studio?

Can a user exit a contenteditable with a key?

How to call the slideToggle() function on the nearest element

Change the values in the nested list according to the specified index list

Keep Up to Date with the Most Important News

Panda rename rows after grouping by columns

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Create new data.frame consisting of average value of every two rows

SQL Join with 3 Tables and WHERE

how can I create a new data frame using exact rows from the old data frame in R Studio?

Can a user exit a contenteditable with a key?

How to call the slideToggle() function on the nearest element

Change the values in the nested list according to the specified index list

Discover more from Dev solutions