Scripting a simple counter

June 2, 2022

I wanted to create a simple script, which counts values in one column, that are higher in another column:

d = {'a': [1, 3], 'b': [0, 2]}
df = pd.DataFrame(data=d, index=[1, 2])

print(df)

   a  b
1  1  0
2  3  2

My function:


def diff(dataframe):
    a_counter=0
    b_counter=0
    
    for i in dataframe["a"]:
        for ii in dataframe["b"]:
            if i>ii:
                a_counter+=1
            elif ii>i:
                b_counter+=1

    return a_counter, b_counter

However

diff(df)

returns (3, 1), instead of (2,0). I know the problem is that every single value of one column gets compared to every value of the other column (e.g. 1 gets compared to 0 and 2 of column b). There probably is a special function for my problem, but can you help me fix my script?

>Solution :

I would suggest adding some helper columns in an intuitive way to help compute the sum of each condition a > b and b > a

A working example based on your code :

import numpy as np
import pandas as pd

d = {'a': [1, 3], 'b': [0, 2]}
df = pd.DataFrame(data=d, index=[1, 2])

def diff(dataframe):
    dataframe['a>b'] = np.where(dataframe['a']>dataframe['b'], 1, 0)
    dataframe['b>a'] = np.where(dataframe['b']>dataframe['a'], 1, 0)
    return dataframe['a>b'].sum(), dataframe['b>a'].sum()

print(diff(df))

>>> (2, 0)

Basically what np.where() does, the way I used it, is that it produces 1 if the condition is met and 0 otherwise. You can then add those columns up using a simple sum() function applied on the desired columns.