I wanted to create a simple script, which counts values in one column, that are higher in another column:
d = {'a': [1, 3], 'b': [0, 2]}
df = pd.DataFrame(data=d, index=[1, 2])
print(df)
a b
1 1 0
2 3 2
My function:
def diff(dataframe):
a_counter=0
b_counter=0
for i in dataframe["a"]:
for ii in dataframe["b"]:
if i>ii:
a_counter+=1
elif ii>i:
b_counter+=1
return a_counter, b_counter
However
diff(df)
returns (3, 1), instead of (2,0). I know the problem is that every single value of one column gets compared to every value of the other column (e.g. 1 gets compared to 0 and 2 of column b). There probably is a special function for my problem, but can you help me fix my script?
>Solution :
I would suggest adding some helper columns in an intuitive way to help compute the sum of each condition a > b and b > a
A working example based on your code :
import numpy as np
import pandas as pd
d = {'a': [1, 3], 'b': [0, 2]}
df = pd.DataFrame(data=d, index=[1, 2])
def diff(dataframe):
dataframe['a>b'] = np.where(dataframe['a']>dataframe['b'], 1, 0)
dataframe['b>a'] = np.where(dataframe['b']>dataframe['a'], 1, 0)
return dataframe['a>b'].sum(), dataframe['b>a'].sum()
print(diff(df))
>>> (2, 0)
Basically what np.where() does, the way I used it, is that it produces 1 if the condition is met and 0 otherwise. You can then add those columns up using a simple sum() function applied on the desired columns.