Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Is there pandas aggregate function that combines features of 'any' and 'unique'?

I have a large dataset with similar data:

>>> df = pd.DataFrame(
...     {'A': ['one', 'two', 'two', 'one', 'one', 'three'],
...      'B': ['a', 'b', 'c', 'a', 'a', np.nan]})          
>>> df
       A    B
0    one    a
1    two    b
2    two    c
3    one    a
4    one    a
5  three  NaN

There are two aggregation functions ‘any’ and ‘unique’:

>>> df.groupby('A')['B'].any()
A
one       True
three    False
two       True
Name: B, dtype: bool

>>> df.groupby('A')['B'].unique()
A
one         [a]
three     [nan]
two      [b, c]
Name: B, dtype: object

but I want to get the folowing result (or something close to it):

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

A
one           a
three     False
two        True

I can do it with some complex code, but it is better for me to find appropriate function in python packages or the easiest way to solve problem. I’d be grateful if you could help me with that.

>Solution :

You can aggregate Series.nunique for first column and unique values with remove possible missing values for another columns:

df1 = df.groupby('A').agg(count=('B','nunique'), 
                          uniq_without_NaNs = ('B', lambda x: x.dropna().unique()))
print (df1)
       count uniq_without_NaNs
A                             
one        1               [a]
three      0                []
two        2            [b, c]

Then create mask if greater column count by 1 and replace values by uniq_without_NaNs if equal count with 1:

out = df1['count'].gt(1).mask(df1['count'].eq(1), df1['uniq_without_NaNs'].str[0])
print (out)
A
one          a
three    False
two       True
Name: count, dtype: object
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading