Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas: Calculating a value in a separate data frame column frame based on range of values in another data frame column (python)

I’m using python 3.9, and I’m trying to calculate an output value in another dataframe column based on a range of values in another column.

For instance, in df['a'], I have integers between 0 and 50, in no particular order.

I am trying to create another column named df[‘output_column’] in that same dataframe based on an if statement.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import pandas as pd
import numpy as np

p = 'a'

if df[p] in range(0, 7):
    df['output_column'] = 95
elif df[p] in range(8, 14):
    df['output_column'] = 90
elif df[p] in range(15, 21):
    df['output_column'] = 85
elif df[p] in range(22, 28):
    df['output_column'] = 80
else:
    df['output_column'] = 75

However, I get the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [18], in <module>
      1 p = 'a'
----> 3 if df[p] in range(0, 7):
      4     df['output_column'] = 95
      5 elif df[p] in range(8, 14):

File ~\path_to_pandas\pandas\core\generic.py:1535, in NDFrame.__nonzero__(self)
   1533 @final
   1534 def __nonzero__(self):
-> 1535     raise ValueError(
   1536         f"The truth value of a {type(self).__name__} is ambiguous. "
   1537         "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
   1538     )

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

How can I correct this?

>Solution :

You can set your ranges with .bewteen() and then populate your new output_column with np.select().

import pandas as pd
import numpy as np

ranges = [df['a'].between(0, 6),
          df['a'].between(7, 13), df['a'].between(14, 20),
          df['a'].between(21, 27), df['a'].between(28, 999)]

values = [95,90, 85, 80, 75]

df['output_column'] = np.select(ranges, values)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading