This is my dataframe:
df = pd.DataFrame({'a': range(100, 111)})
I want to add a column to this dataframe. My desired output looks like this:
a b
0 100 NaN
1 101 NaN
2 102 NaN
3 103 1
4 104 1
5 105 1
6 106 2
7 107 2
8 108 2
9 109 3
10 110 3
I have a value which in this case is 3. I want 1 in column b
if the value in a
is between 103 and 106. And I want 2 in b
if value is between 106 and 109. I want the inclusiveness like the example.
I have tried a couple of solutions. One of them was pd.cut
but I couldn’t figure out how to do it. This was one of my tries:
df['b'] = pd.cut(df.a, [100, 103, 106, 109], include_lowest=True)
But since I don’t know how many bins I have in my other samples I can’t use this solution.
>Solution :
One option without using cut
, but simple arithmetic (floor division):
N = 3
start = df['a'].min()+N
s = df['a'].sub(start).floordiv(N).add(1)
df['b'] = s.where(s.gt(0))
# or in one line
df['b'] = df['a'].sub(start).floordiv(N).add(1).where(df['a'].ge(start))
With cut
:
N = 3
start = df['a'].min()+N
end = df['a'].max()
df['b'] = pd.cut(df['a'], np.arange(start, end+N, N),
labels=range(1, (end-start)//N+2), right=False)
Output:
a b
0 100 NaN
1 101 NaN
2 102 NaN
3 103 1.0
4 104 1.0
5 105 1.0
6 106 2.0
7 107 2.0
8 108 2.0
9 109 3.0
10 110 3.0