Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to add new columns in Pandas for unique values of certain key (problem agregate)

How to add a new column of aggregated data

I want to create 03 new columns in a dataframe

Column 01: unique_list

Create a new column in the dataframe of unique values of cfop_code for each key

Column 02: unique_count

A column that check the number of unique values that shows in unique_list

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Column 03: not_unique_count

A column that check the number of not unique values that shows in unique_list

example_df

key product cfop_code
0 12345678901234567890 product a 2551
1 12345678901234567890 product b 2551
2 12345678901234567890 product c 3551
3 12345678901234567895 product a 2551
4 12345678901234567897 product b 2551
5 12345678901234567897 product c 2407

Expected Result

key product cfop_code unique_list unique_count not_unique_count
0 12345678901234567890 product a 2551 2251, 3551 2 3
1 12345678901234567890 product b 2551 2251, 3551 2 3
2 12345678901234567890 product c 3551 2251, 3551 2 3
3 12345678901234567895 product a 2551 2251 1 1
4 12345678901234567897 product b 2551 2407, 2551 2 2
5 12345678901234567897 product c 2407 2407, 2551 2 2

What i had tried

Create a list of unique values

df.groupby('key')["cfop"].unique()

key
12345678901234567890    [2551, 3551]
12345678901234567895          [2551]
12345678901234567897    [2551, 2407]
Name: cfop, dtype: object

Getting the count not unique values

df.groupby("key").agg(**{"unique_values": pd.NamedAgg(column='cfop', aggfunc="size")}).reset_index()

key unique_values
0   12345678901234567890    3
1   12345678901234567895    1
2   12345678901234567897    2

Getting the count unique values into data frame

df.groupby("key").agg(**{"unique_values": pd.NamedAgg(column='cfop', aggfunc="nunique")}).reset_index()

key unique_values
0   12345678901234567890    2
1   12345678901234567895    1
2   12345678901234567897    2

But FAIL adding a new column

df['unique_list'] = df.groupby('key')["cfop"].unique()
df['unique_count'] = df.groupby("key").agg(**{"unique_values": pd.NamedAgg(column='cfop', aggfunc="nunique")}).reset_index()
df['not_unique_count'] =df.groupby("key").agg(**{"unique_values": pd.NamedAgg(column='cfop', aggfunc="size")}).reset_index()

>Solution :

Try:

tmp = (
    df.groupby("key")["cfop_code"]
    .agg(
        unique_list = lambda s: sorted(s.unique()), 
        unique_count = "nunique", 
        not_unique_count = "size")
    .reset_index()
)
res = df.merge(tmp, on="key")

print(res)
                    key    product  cfop_code   unique_list  unique_count  not_unique_count
0  12345678901234567890  product a       2551  [2551, 3551]             2                 3
1  12345678901234567890  product b       2551  [2551, 3551]             2                 3
2  12345678901234567890  product c       3551  [2551, 3551]             2                 3
3  12345678901234567895  product a       2551        [2551]             1                 1
4  12345678901234567897  product b       2551  [2407, 2551]             2                 2
5  12345678901234567897  product c       2407  [2407, 2551]             2                 2

The problem with your attempt is that:

df.groupby("key").agg(**{"unique_values": pd.NamedAgg(column='cfop_code', aggfunc="nunique")}).reset_index()

returns a DataFrame.You try to assign this whole DataFrame to a new column which fails.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading