add a column of bins by sum of a number

This is my dataframe:

df = pd.DataFrame({'a': range(100, 111)})

I want to add a column to this dataframe. My desired output looks like this:

    a  b
0  100  NaN
1  101  NaN
2  102  NaN
3  103  1
4  104  1
5  105  1
6  106  2
7  107  2
8  108  2
9  109  3
10 110  3

I have a value which in this case is 3. I want 1 in column b if the value in a is between 103 and 106. And I want 2 in b if value is between 106 and 109. I want the inclusiveness like the example.
I have tried a couple of solutions. One of them was pd.cut but I couldn’t figure out how to do it. This was one of my tries:

df['b'] = pd.cut(df.a, [100, 103, 106, 109], include_lowest=True)

But since I don’t know how many bins I have in my other samples I can’t use this solution.

>Solution :

One option without using cut, but simple arithmetic (floor division):

N = 3
start = df['a'].min()+N

s = df['a'].sub(start).floordiv(N).add(1)
df['b'] = s.where(s.gt(0))

# or in one line
df['b'] = df['a'].sub(start).floordiv(N).add(1).where(df['a'].ge(start))

With cut:

N = 3

start = df['a'].min()+N
end = df['a'].max()

df['b'] = pd.cut(df['a'], np.arange(start, end+N, N),
                 labels=range(1, (end-start)//N+2), right=False)

Output:

      a    b
0   100  NaN
1   101  NaN
2   102  NaN
3   103  1.0
4   104  1.0
5   105  1.0
6   106  2.0
7   107  2.0
8   108  2.0
9   109  3.0
10  110  3.0

Leave a Reply