I am stuck in writing the python code for below problem

June 4, 2022

I have below data frame.

df = pd.DataFrame({'vin':['aaa','bbb','bbb','bbb','ccc','ccc','ddd','eee','eee','fff'],'module':['NORMAL','1ST_PRIORITY','2ND_PRIORITY','HELLO','3RD_PRIORITY','2ND_PRIORITY','2ND_PRIORITY','3RD_PRIORITY','HELLO','ABS']})

I wanted to find if the vin column contains a unique value then in the Result column it should return ‘YES’ and if the vin column is not unique then it will check the ‘module’ column and return ‘YES’ where the module column has more priority value.

I want output like the below data frame.

df = pd.DataFrame({'vin':['aaa','bbb','bbb','bbb','ccc','ccc','ddd','eee','eee','fff'],'module':['NORMAL','1ST_PRIORITY','2ND_PRIORITY','HELLO','3RD_PRIORITY','2ND_PRIORITY','2ND_PRIORITY','3RD_PRIORITY','HELLO','ABS'],
               'Result':['YES','YES','NO','NO','NO','YES','YES','YES','NO','YES']})

>Solution :

IIUC, you can use duplicated after sort_values:

df['Result'] = ((~df.sort_values('module').duplicated('vin'))
                    .replace({True: 'YES', False: 'NO'}))
print(df)

# Output
   vin        module Result
0  aaa        NORMAL    YES
1  bbb  1ST_PRIORITY    YES
2  bbb  2ND_PRIORITY     NO
3  bbb         HELLO     NO
4  ccc  3RD_PRIORITY     NO
5  ccc  2ND_PRIORITY    YES
6  ddd  2ND_PRIORITY    YES
7  eee  3RD_PRIORITY    YES
8  eee         HELLO     NO
9  fff           ABS    YES