Advertisements
I have this data set
import pandas as pd
data = pd.DataFrame({
'ID': [1, 2, 3, 4, 5, 2, 3, 1],
'Debit': [0, 5000, 0, 5000, 3000, 0, 2000, 1000],
'Credit': [-100, 0, -700, 0, 0, -8000, 0, 0]
})
names_index = pd.DataFrame({
'ID': [1, 2, 3, 4, 5, 6, 7, 8],
'names': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'],
'state1': [1, 2, 3, 4, 5, 6, 7, 8],
'state2': [9, 10, 11, 12, 13, 14, 15, 16],
})
I want to sum according to the ID from both of the columns altogether and want to show the result value into the relevant column based on the result value
what i am trying is…
balance = data.groupby('ID')[['Debit', 'Credit']].sum()
result = balance.merge(names_index, on='ID', how='inner')
result = result[['ID', 'names', 'Debit', 'Credit', 'state1', 'state2']]
result['state'] = result.pop('state1').where(result['Debit'].ne(0), result.pop('state2'))
print(result)
result:
ID names Debit Credit state
0 1 A 1000 -100 1
1 2 B 5000 -8000 2
2 3 C 2000 -700 3
3 4 D 5000 0 4
4 5 E 3000 0 5
The Result I want is:
ID names Debit Credit state
0 1 A 900 0 1
1 2 B 0 -3000 10
2 3 C 1300 0 3
3 4 D 5000 0 4
4 5 E 3000 0 5
tried the pandas melt function but it resulted me in one column like so
balance = data.melt('ID').groupby('ID').value.sum().reset_index()
resulted:
ID value
0 1 900
1 2 -3000
2 3 1300
3 4 5000
4 5 3000
Please Help to sort it out..
>Solution :
If I understand correctly, aggregate as sum
after your groupby, craft a DataFrame and merge
, then perform indexing lookup:
s = data.groupby('ID')[['Debit', 'Credit']].sum().sum(axis=1)
m = s>0
tmp = pd.DataFrame({'Debit': s.where(m, 0),
'Credit': s.mask(m, 0),
'state': np.where(m, 'state1', 'state2'),
}).reset_index()
# merge with names
out = names_index.merge(tmp, on='ID')
# replace the state1/state2 by the actual value after merge
idx, cols = pd.factorize(out['state'])
out['state'] = out.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]
# drop the useless columns
out.drop(columns=['state1', 'state2'], inplace=True)
# or, if all columns are used at least once in "state"
# out.drop(columns=cols, inplace=True)
Output:
ID names Debit Credit state
0 1 A 900 0 1
1 2 B 0 -3000 10
2 3 C 1300 0 3
3 4 D 5000 0 4
4 5 E 3000 0 5