Sum of Two Columns should be Resulted in one of two columns based on the result in Pandas

Advertisements

I have this data set

import pandas as pd

data = pd.DataFrame({
    'ID': [1, 2, 3, 4, 5, 2, 3, 1],
    'Debit': [0, 5000, 0, 5000, 3000, 0, 2000, 1000],
    'Credit': [-100, 0, -700, 0, 0, -8000, 0, 0]
  })

names_index = pd.DataFrame({
    'ID': [1, 2, 3, 4, 5, 6, 7, 8],
    'names': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'],
    'state1': [1, 2, 3, 4, 5, 6, 7, 8],
    'state2': [9, 10, 11, 12, 13, 14, 15, 16],
 })

I want to sum according to the ID from both of the columns altogether and want to show the result value into the relevant column based on the result value

what i am trying is…

balance = data.groupby('ID')[['Debit', 'Credit']].sum()
result = balance.merge(names_index, on='ID', how='inner')
result = result[['ID', 'names', 'Debit', 'Credit', 'state1', 'state2']]
result['state'] = result.pop('state1').where(result['Debit'].ne(0), result.pop('state2'))
print(result)

result:

  ID names  Debit  Credit  state
0  1     A   1000    -100      1
1  2     B   5000   -8000      2
2  3     C   2000    -700      3
3  4     D   5000       0      4
4  5     E   3000       0      5

The Result I want is:

  ID names  Debit  Credit state
0  1     A   900        0     1
1  2     B   0      -3000    10
2  3     C   1300       0     3
3  4     D   5000       0     4
4  5     E   3000       0     5

tried the pandas melt function but it resulted me in one column like so

balance = data.melt('ID').groupby('ID').value.sum().reset_index()

resulted:

   ID  value
0   1    900
1   2  -3000
2   3   1300
3   4   5000
4   5   3000

Please Help to sort it out..

>Solution :

If I understand correctly, aggregate as sum after your groupby, craft a DataFrame and merge, then perform indexing lookup:

s = data.groupby('ID')[['Debit', 'Credit']].sum().sum(axis=1)
m = s>0

tmp = pd.DataFrame({'Debit': s.where(m, 0),
                    'Credit': s.mask(m, 0),
                    'state': np.where(m, 'state1', 'state2'),
                   }).reset_index()

# merge with names
out = names_index.merge(tmp, on='ID')

# replace the state1/state2 by the actual value after merge
idx, cols = pd.factorize(out['state'])
out['state'] = out.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]

# drop the useless columns
out.drop(columns=['state1', 'state2'], inplace=True)
# or, if all columns are used at least once in "state"
# out.drop(columns=cols, inplace=True)

Output:

   ID names  Debit  Credit  state
0   1     A    900       0      1
1   2     B      0   -3000     10
2   3     C   1300       0      3
3   4     D   5000       0      4
4   5     E   3000       0      5

Leave a ReplyCancel reply