Home Sum of Two Columns should be Resulted in one of two columns based on the result in Pandas

Questions

Sum of Two Columns should be Resulted in one of two columns based on the result in Pandas

January 24, 2024

I have this data set

import pandas as pd

data = pd.DataFrame({
    'ID': [1, 2, 3, 4, 5, 2, 3, 1],
    'Debit': [0, 5000, 0, 5000, 3000, 0, 2000, 1000],
    'Credit': [-100, 0, -700, 0, 0, -8000, 0, 0]
  })

names_index = pd.DataFrame({
    'ID': [1, 2, 3, 4, 5, 6, 7, 8],
    'names': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'],
    'state1': [1, 2, 3, 4, 5, 6, 7, 8],
    'state2': [9, 10, 11, 12, 13, 14, 15, 16],
 })

I want to sum according to the ID from both of the columns altogether and want to show the result value into the relevant column based on the result value

what i am trying is…

balance = data.groupby('ID')[['Debit', 'Credit']].sum()
result = balance.merge(names_index, on='ID', how='inner')
result = result[['ID', 'names', 'Debit', 'Credit', 'state1', 'state2']]
result['state'] = result.pop('state1').where(result['Debit'].ne(0), result.pop('state2'))
print(result)

result:

  ID names  Debit  Credit  state
0  1     A   1000    -100      1
1  2     B   5000   -8000      2
2  3     C   2000    -700      3
3  4     D   5000       0      4
4  5     E   3000       0      5

The Result I want is:

  ID names  Debit  Credit state
0  1     A   900        0     1
1  2     B   0      -3000    10
2  3     C   1300       0     3
3  4     D   5000       0     4
4  5     E   3000       0     5

tried the pandas melt function but it resulted me in one column like so

balance = data.melt('ID').groupby('ID').value.sum().reset_index()

resulted:

   ID  value
0   1    900
1   2  -3000
2   3   1300
3   4   5000
4   5   3000

Please Help to sort it out..

>Solution :

If I understand correctly, aggregate as sum after your groupby, craft a DataFrame and merge, then perform indexing lookup:

s = data.groupby('ID')[['Debit', 'Credit']].sum().sum(axis=1)
m = s>0

tmp = pd.DataFrame({'Debit': s.where(m, 0),
                    'Credit': s.mask(m, 0),
                    'state': np.where(m, 'state1', 'state2'),
                   }).reset_index()

# merge with names
out = names_index.merge(tmp, on='ID')

# replace the state1/state2 by the actual value after merge
idx, cols = pd.factorize(out['state'])
out['state'] = out.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]

# drop the useless columns
out.drop(columns=['state1', 'state2'], inplace=True)
# or, if all columns are used at least once in "state"
# out.drop(columns=cols, inplace=True)

Output:

   ID names  Debit  Credit  state
0   1     A    900       0      1
1   2     B      0   -3000     10
2   3     C   1300       0      3
3   4     D   5000       0      4
4   5     E   3000       0      5