Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to extract group list element using pandas criteria

I have a pandas dataframe like as shown below

ID,color
1, Yellow
1, Red
1, Green
2, Red
2, np.nan
3, Green
3, Red
3, Green
4, Yellow
4, Red
5, Green
5, np.nan
6, Red
7, Red

fd = pd.read_clipboard(sep=',')

As you can see in the input dataframe, some ID’s have multiple colors associated to them.

So, whenever there is multiple color associated to them, I would like to select only one color based on the below criteria

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

['Green','Red','Yellow'] = Choose 'Green'
['Red', 'Yellow'] = Choose 'Yellow'
['Green', 'Yellow'] = Choose 'Green'

Basically, Green is given 1st preference. 2nd preference is for Yellow and last preference is for Red.

So, if an ID whenever has Green, choose Green (don’t care about other colors).

If an ID whenever has Yellow and Red, choose Yellow

If an ID for its all rows has only NA, leave it as NA

I tried the below but this only gets me the list of color

fd.groupby('ID',as_index=False)['color'].aggregate(lambda x: list(x))
fd[final_color] = [if i[0] =='Green' for i in fd[col]]

I expect my output to be like as shown below

enter image description here

>Solution :

Sort the values of dataframe on color with the help of the preference dictionary, then drop the duplicates on ID

d = {'Green': 1, 'Yellow': 2, 'Red': 3}
df.sort_values('color', key=lambda c: c.map(d)).drop_duplicates('ID')

Alternative approach by first converting the color column to ordered categorical type, then groupby and aggregate to select the min value

df['color'] = pd.Categorical(df['color'], ['Green', 'Yellow', 'Red'], True)
df.groupby('ID', as_index=False)['color'].agg('min')

   ID   color
0   1   Green
1   2     Red
2   3   Green
3   4  Yellow
4   5   Green
5   6     Red
6   7     Red
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading