Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Matching Columns in a dataframe and factoring in a List

I have the following dataframe where I am trying to match Account Codes. Hypothetically columns Account_Spread_v2 and Account_Codes_v2 were merged into the dataframe. And the idea is to match column Account_Codes_v2 against Account_Codes. See function below to apply this.

df = pd.DataFrame([[31,1234567890,'USD',3.5,'D12',3.5,'D3'],
                    [10,7854567890,'USD',2.7,'TT',2.7,'TT'],
                    [10,7854567899,'AUS',8,'D1',8,'D1'],
                    [6,7854567893,'USD',2.7,'D55',2.7,'H1'],
                    [10,7854567893,'EUR',2.7,'JG',2.7,'JG'],      
                    [31,9632587415,'USD',1.4,'D55',1.4,'D2']],
columns = ['branch','Account','Cur','Account_Spread','Account_Codes','Account_Spread_v2','Account_Codes_v2'])

Output:

branch  Account     Cur Account_Spread  Account_Codes   Account_Spread_v2   Account_Codes_v2
0   31  1234567890  USD 3.5             D12             3.5                 D3
1   10  7854567890  USD 2.7             TT              2.7                 TT
2   10  7854567899  AUS 8.0             D1              8.0                 D1
3   6   7854567893  USD 2.7             D55             2.7                 H1
4   10  7854567893  EUR 2.7             JG              2.7                 JG
5   31  9632587415  USD 1.4             D55             1.4                 D2

Fucntion:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

def compute_match_codes(row):
    codes = ['D1','D2','D4','D3']
    m = 'NA'
    if row['Account_Codes'] == row['Account_Codes_v2']:
        m = 'MatchOnCodes'
    else:
        m = 'MismatchOnCodes'
    return(m)
df = (pd.concat([df,(df.apply(compute_match_codes, axis=1, result_type='expand')),], axis=1))

branch  Account     Cur Account_Spread  Account_Codes   Account_Spread_v2   Account_Codes_v2 0
0   31  1234567890  USD 3.5             D12             3.5                 D3         MismatchOnCodes
1   10  7854567890  USD 2.7             TT              2.7                 TT           MatchOnCodes
2   10  7854567899  AUS 8.0             D1              8.0                 D1         MatchOnCodes
3   6   7854567893  USD 2.7             D55             2.7                 H1         MismatchOnCodes
4   10  7854567893  EUR 2.7             JG              2.7                 JG        MatchOnCodes
5   31  9632587415  USD 1.4             D55             1.4                 D2     MismatchOnCodes

The challenge I have is that if an account is USD, on branch 31 and its Account Code is either D12 and D55 in column "Account_Codes" it can substitute for any of the codes in the list called "codes".

By applying this row 0 and 5 will actual match then. I tried using the isin() method but isn’t working out. Any idea on how to up to edit the function to accommodate this?

>Solution :

I would use a nested np.where() to eliminate all exact matches first and then ddress the more complex logic you need. I believe this would also be a faster solution as its vectorized than using apply with concat and a custom function. The code would look like this:

codes = ['D1','D2','D3','D4']
df['Match'] = np.where(df['Account_Codes'] == df['Account_Codes_v2'],'MatchOnCodes',
                       np.where((df['Cur'] == 'USD') & (df['branch'] == 31) & (df['Account_Codes'].isin(['D12','D55'])) & (df['Account_Codes_v2'].isin(codes)),'MatchOnCodes','NoMatchOnCodes'))

This outputs:

   branch     Account  Cur  ...  Account_Spread_v2 Account_Codes_v2           Match
0      31  1234567890  USD  ...                3.5               D3    MatchOnCodes
1      10  7854567890  USD  ...                2.7               TT    MatchOnCodes
2      10  7854567899  AUS  ...                8.0               D1    MatchOnCodes
3       6  7854567893  USD  ...                2.7               H1  NoMatchOnCodes
4      10  7854567893  EUR  ...                2.7               JG    MatchOnCodes
5      31  9632587415  USD  ...                1.4               D2    MatchOnCodes

Per OPs comment:

codes = ['D1','D2','D3','D4']
def matching_func(row):
  if row['Account_Codes'] == row['Account_Codes_v2']:
    return 'MatchOnCodes'
  elif (row['Cur'] == 'USD') & (row['branch'] == 31) & (row['Account_Codes'] in ['D12','D55']) & (row['Account_Codes_v2'] in codes):
    return 'MatchOnCodes'
  else:
    return 'NoMatchOnCodes'
df['Match'] = df.apply(lambda x: matching_func(x),axis=1)

Output:

   branch     Account  Cur  ...  Account_Spread_v2 Account_Codes_v2           Match
0      31  1234567890  USD  ...                3.5               D3    MatchOnCodes
1      10  7854567890  USD  ...                2.7               TT    MatchOnCodes
2      10  7854567899  AUS  ...                8.0               D1    MatchOnCodes
3       6  7854567893  USD  ...                2.7               H1  NoMatchOnCodes
4      10  7854567893  EUR  ...                2.7               JG    MatchOnCodes
5      31  9632587415  USD  ...                1.4               D2    MatchOnCodes
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading