Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Getting different Values when using groupby(column)["id"].nunique and trying to add a column using transform

I’m trying to count the individual values per group in a dataset and add them as a new column to a table. The first one works, the second one produces wrong values.
When I use the following code
unique_id_per_column = source_table.groupby("disease").some_id.nunique()
I’ll get

|    | disease                 | some_id |
|---:|:------------------------|--------:|
|  0 | disease1                |   121   |
|  1 | disease2                |     1   |
|  2 | disease3                |     5   |
|  3 | disease4                |     9   |
|  4 | disease5                |    77   |

These numbers seem to check out, but I want to add them to another table where I have already a column with all values per group.
So I used the following code
table["unique_ids"] = source_table.groupby("disease").uniqe_id.transform("nunique")
and I get the following table, with wrong numbers for every row except the first.

|    | disease                 |some_id |   unique_ids      |
|---:|:------------------------|-------:|------------------:|
|  0 | disease1                |   151  |               121 |
|  1 | disease2                |     1  |               121 |
|  2 | disease3                |     5  |               121 |
|  3 | disease4                |     9  |               121 |
|  4 | disease5                |    91  |               121 |

I’ve expected that I will get the same results as in the first table. Anyone knows why I get the number for the first row repeated instead of correct numbers?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Solution with Series.map if need create column in another DataFrame:

s = source_table.groupby("disease").some_id.nunique()

table["unique_ids"] = table["disease"].map(s) 
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading