I’m using python 3.9, and I’m trying to calculate an output value in another dataframe column based on a range of values in another column.
For instance, in df['a'], I have integers between 0 and 50, in no particular order.
I am trying to create another column named df[‘output_column’] in that same dataframe based on an if statement.
import pandas as pd
import numpy as np
p = 'a'
if df[p] in range(0, 7):
df['output_column'] = 95
elif df[p] in range(8, 14):
df['output_column'] = 90
elif df[p] in range(15, 21):
df['output_column'] = 85
elif df[p] in range(22, 28):
df['output_column'] = 80
else:
df['output_column'] = 75
However, I get the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [18], in <module>
1 p = 'a'
----> 3 if df[p] in range(0, 7):
4 df['output_column'] = 95
5 elif df[p] in range(8, 14):
File ~\path_to_pandas\pandas\core\generic.py:1535, in NDFrame.__nonzero__(self)
1533 @final
1534 def __nonzero__(self):
-> 1535 raise ValueError(
1536 f"The truth value of a {type(self).__name__} is ambiguous. "
1537 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
1538 )
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
How can I correct this?
>Solution :
You can set your ranges with .bewteen() and then populate your new output_column with np.select().
import pandas as pd
import numpy as np
ranges = [df['a'].between(0, 6),
df['a'].between(7, 13), df['a'].between(14, 20),
df['a'].between(21, 27), df['a'].between(28, 999)]
values = [95,90, 85, 80, 75]
df['output_column'] = np.select(ranges, values)