Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Boolean column in pandas with a windowed function that checking a complex score condition

I have a frame that looks like this:

x        y         score    
Cat      Dog       0.1    
Dog      Monkey    0.5     
Fish     Cat       0.2     
Cat      Fish      0.2   
Monkey   Dog       0.4    
Dog      Cat       0.7    

Basically I want a new boolean column that looks at all pairs and their reverse, for example (Cat, Dog) and (Dog,Cat) and assign a True for when the score of a pair is greater than the score of the reversed pair, otherwise False. If there is equality, both pairs are assigned True. That is:

x        y         score    bool
Cat      Dog       0.1      False   # False because Dog,Cat > Cat,Dog
Dog      Monkey    0.5      True    # True because Dog,Monkey > Monkey, Dog
Fish     Cat       0.2      True    # True because both have the same score
Cat      Fish      0.2      True    # ...
Monkey   Dog       0.4      False
Dog      Cat       0.7      True

I am certain I can get a function done by applying on the rows a filter for the reverse and then return a checks on the score; however, the list can be quite long (n>100k) and time is a factor. It turns out this is trickier than I thought. I was wondering if this can be done in a more pythonic way with a magic Pandas function that haven’t encoutered yet or a rolling window.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Note:

  1. Every pair (x,y) is unique.
  2. For every pair (x,y) there is exactly
    one reverse (y,x).
  3. There can be multiple occurences of the same category
    within the same column, e.g. Cat appears twice in this frame for x.
  4. Scores range from 0-1.

Frame:

df = pd.DataFrame.from_records(zip(["Cat","Dog","Fish","Cat","Monkey","Dog"], ["Dog","Monkey","Cat","Fish","Dog","Cat"], [0.1,0.5,0.2,0.2,0.4,0.7]),columns=["x","y","score"])

>Solution :

You can swap columns x & y, merge on x/y pair, and compare score vs reversed score:

(df.merge(
     df.assign(x = df.y, y = df.x)
       .rename(columns={'score': 'rev_score'})
 ).assign(bool = lambda x: x.score >= x.rev_score)
 .drop('rev_score', axis=1))

        x       y  score   bool
0     Cat     Dog    0.1  False
1     Dog  Monkey    0.5   True
2    Fish     Cat    0.2   True
3     Cat    Fish    0.2   True
4  Monkey     Dog    0.4  False
5     Dog     Cat    0.7   True
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading