Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

I am writing python code and stuck in one condition

Below is my data frame:

df = pd.DataFrame({
    'P_ID': [348374, 348374, 348374, 1348374, 1348374, 1348374, 1348374],
    'I_ISSUE_TYPE_CODE': ['CLAIM', 'RSA', 'RSA', 'EPQR', 'CLAIM', 'EPQR', 'EPQR'],
    'A_b': [234, 5671, 8900, 3451, 660, 770, 660]
})

I want to check in column I_ISSUE_TYPE_CODE if it is RSA with respect to P_ID column then put 1 in new_column else put zero. if there is duplicate RSA in same P_ID then the second one will be zero.
if there is no RSA in I_ISSUE_TYPE_CODE with respect to P_ID then put 1 at its 1st occurrence.

Below is my expected output

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df = pd.DataFrame({
    'P_ID': [348374, 348374, 348374, 1348374, 1348374, 1348374, 1348374],
    'I_ISSUE_TYPE_CODE': ['CLAIM', 'RSA', 'RSA', 'EPQR', 'CLAIM', 'EPQR', 'EPQR'],
    'A_b': [234, 5671, 8900, 3451, 660, 770, 660],
    'New_column': [0, 1, 0, 1, 0, 0, 0]
})

Below code I have tried

df['New_column'] = df.groupby('P_ID')['I_ISSUE_TYPE_CODE'].apply(lambda x: (x == 'RSA').cumsum().eq(1).astype(int))

Still not getting the correct output

>Solution :

Use boolean arithmetic with help of duplicated and groupby.transform('any'), then convert the output to integer with astype:

# is the value the first RSA per group?
m1 = (df['I_ISSUE_TYPE_CODE'].eq('RSA')
     & ~df[['P_ID', 'I_ISSUE_TYPE_CODE']].duplicated()
     )

# if there is no RSA in the group, identify the first index
m2 = ~df['P_ID'].duplicated() & ~m1.groupby(df['P_ID']).transform('any')

df['New_column'] = (m1|m2).astype(int)

Output:

      P_ID I_ISSUE_TYPE_CODE   A_b  New_column
0   348374             CLAIM   234           0
1   348374               RSA  5671           1
2   348374               RSA  8900           0
3  1348374              EPQR  3451           1
4  1348374             CLAIM   660           0
5  1348374              EPQR   770           0
6  1348374              EPQR   660           0

Intermediates:

      P_ID I_ISSUE_TYPE_CODE   A_b  New_column     m1     m2
0   348374             CLAIM   234           0  False  False
1   348374               RSA  5671           1   True  False
2   348374               RSA  8900           0  False  False
3  1348374              EPQR  3451           1  False   True
4  1348374             CLAIM   660           0  False  False
5  1348374              EPQR   770           0  False  False
6  1348374              EPQR   660           0  False  False

Alternative:

Same logic with a custom function for groupby.apply:

def flag(g):
    s = g['I_ISSUE_TYPE_CODE'].eq('RSA') & ~g['I_ISSUE_TYPE_CODE'].duplicated()
    if s.any():
        return s
    return ~g['P_ID'].duplicated()

df['New_column'] = df.groupby('P_ID', group_keys=False).apply(flag).astype(int)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading