I am writing python code and stuck in one condition

June 6, 2023

Below is my data frame:

df = pd.DataFrame({
    'P_ID': [348374, 348374, 348374, 1348374, 1348374, 1348374, 1348374],
    'I_ISSUE_TYPE_CODE': ['CLAIM', 'RSA', 'RSA', 'EPQR', 'CLAIM', 'EPQR', 'EPQR'],
    'A_b': [234, 5671, 8900, 3451, 660, 770, 660]
})

I want to check in column I_ISSUE_TYPE_CODE if it is RSA with respect to P_ID column then put 1 in new_column else put zero. if there is duplicate RSA in same P_ID then the second one will be zero.
if there is no RSA in I_ISSUE_TYPE_CODE with respect to P_ID then put 1 at its 1st occurrence.

Below is my expected output

df = pd.DataFrame({
    'P_ID': [348374, 348374, 348374, 1348374, 1348374, 1348374, 1348374],
    'I_ISSUE_TYPE_CODE': ['CLAIM', 'RSA', 'RSA', 'EPQR', 'CLAIM', 'EPQR', 'EPQR'],
    'A_b': [234, 5671, 8900, 3451, 660, 770, 660],
    'New_column': [0, 1, 0, 1, 0, 0, 0]
})

Below code I have tried

df['New_column'] = df.groupby('P_ID')['I_ISSUE_TYPE_CODE'].apply(lambda x: (x == 'RSA').cumsum().eq(1).astype(int))

Still not getting the correct output

>Solution :

Use boolean arithmetic with help of duplicated and groupby.transform('any'), then convert the output to integer with astype:

# is the value the first RSA per group?
m1 = (df['I_ISSUE_TYPE_CODE'].eq('RSA')
     & ~df[['P_ID', 'I_ISSUE_TYPE_CODE']].duplicated()
     )

# if there is no RSA in the group, identify the first index
m2 = ~df['P_ID'].duplicated() & ~m1.groupby(df['P_ID']).transform('any')

df['New_column'] = (m1|m2).astype(int)

Output:

      P_ID I_ISSUE_TYPE_CODE   A_b  New_column
0   348374             CLAIM   234           0
1   348374               RSA  5671           1
2   348374               RSA  8900           0
3  1348374              EPQR  3451           1
4  1348374             CLAIM   660           0
5  1348374              EPQR   770           0
6  1348374              EPQR   660           0

Intermediates:

      P_ID I_ISSUE_TYPE_CODE   A_b  New_column     m1     m2
0   348374             CLAIM   234           0  False  False
1   348374               RSA  5671           1   True  False
2   348374               RSA  8900           0  False  False
3  1348374              EPQR  3451           1  False   True
4  1348374             CLAIM   660           0  False  False
5  1348374              EPQR   770           0  False  False
6  1348374              EPQR   660           0  False  False

Alternative:

Same logic with a custom function for groupby.apply:

def flag(g):
    s = g['I_ISSUE_TYPE_CODE'].eq('RSA') & ~g['I_ISSUE_TYPE_CODE'].duplicated()
    if s.any():
        return s
    return ~g['P_ID'].duplicated()

df['New_column'] = df.groupby('P_ID', group_keys=False).apply(flag).astype(int)