Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Sum of Two Columns should be Resulted in one of two columns based on the result in Pandas

I have this data set

import pandas as pd

data = pd.DataFrame({
    'ID': [1, 2, 3, 4, 5, 2, 3, 1],
    'Debit': [0, 5000, 0, 5000, 3000, 0, 2000, 1000],
    'Credit': [-100, 0, -700, 0, 0, -8000, 0, 0]
  })

names_index = pd.DataFrame({
    'ID': [1, 2, 3, 4, 5, 6, 7, 8],
    'names': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'],
    'state1': [1, 2, 3, 4, 5, 6, 7, 8],
    'state2': [9, 10, 11, 12, 13, 14, 15, 16],
 })

I want to sum according to the ID from both of the columns altogether and want to show the result value into the relevant column based on the result value

what i am trying is…

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

balance = data.groupby('ID')[['Debit', 'Credit']].sum()
result = balance.merge(names_index, on='ID', how='inner')
result = result[['ID', 'names', 'Debit', 'Credit', 'state1', 'state2']]
result['state'] = result.pop('state1').where(result['Debit'].ne(0), result.pop('state2'))
print(result)

result:

  ID names  Debit  Credit  state
0  1     A   1000    -100      1
1  2     B   5000   -8000      2
2  3     C   2000    -700      3
3  4     D   5000       0      4
4  5     E   3000       0      5

The Result I want is:

  ID names  Debit  Credit state
0  1     A   900        0     1
1  2     B   0      -3000    10
2  3     C   1300       0     3
3  4     D   5000       0     4
4  5     E   3000       0     5

tried the pandas melt function but it resulted me in one column like so

balance = data.melt('ID').groupby('ID').value.sum().reset_index()

resulted:

   ID  value
0   1    900
1   2  -3000
2   3   1300
3   4   5000
4   5   3000

Please Help to sort it out..

>Solution :

If I understand correctly, aggregate as sum after your groupby, craft a DataFrame and merge, then perform indexing lookup:

s = data.groupby('ID')[['Debit', 'Credit']].sum().sum(axis=1)
m = s>0

tmp = pd.DataFrame({'Debit': s.where(m, 0),
                    'Credit': s.mask(m, 0),
                    'state': np.where(m, 'state1', 'state2'),
                   }).reset_index()

# merge with names
out = names_index.merge(tmp, on='ID')

# replace the state1/state2 by the actual value after merge
idx, cols = pd.factorize(out['state'])
out['state'] = out.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]

# drop the useless columns
out.drop(columns=['state1', 'state2'], inplace=True)
# or, if all columns are used at least once in "state"
# out.drop(columns=cols, inplace=True)

Output:

   ID names  Debit  Credit  state
0   1     A    900       0      1
1   2     B      0   -3000     10
2   3     C   1300       0      3
3   4     D   5000       0      4
4   5     E   3000       0      5
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading