Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Prevent numpy select from casting values in the "choicelist" and "default" argument

I have a list of boolean pandas Series (all of the same length), and a list of values of the same length as the list of Series. I’m trying to apply numpy’s select function using np.NaN as a default value to create a new Series from the result using np.NaN for rows that didn’t fulfill any of the conditions. Here’s what the line by itself looks like:

result = pd.Series(np.select(list_of_conditions, list_of_values, default=np.NaN))

The issue I’m having is that it looks at some point during the process all the values including np.NaN get casted to string if a single one of them is a string.
Here’s a minimal reproducible example:

import numpy as np
import pandas as pd

series1 = pd.Series([True, False, False])
series2 = pd.Series([True, True, False])
list_of_series = [series1, series2]
list_of_values = ['1', '2']

result = pd.Series(np.select(list_of_series, list_of_values, default=np.NaN))

print(result.unique())
# Printed result --> ['1' '2' 'nan']
# Desired result --> ['1' '2' nan]

I tried (to no avail) to replace list_of_values with numpy array with object dtype:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

# [...]
list_of_values = np.array(['1', 2], dtype=object)
# here I'm putting 2 instead of '2' to show that
# even items from the array get casted to string, not just the `default` argument

print(list_of_values)
# Printed result --> ['1' 2], as expected

result = pd.Series(np.select(list_of_series, list_of_values, default=np.NaN))

print(result.unique())
# Printed result --> ['1' '2' 'nan'] nope

select has no dtype argument, so I don’t know what to do, aside from botching a solution by replacing all 'nan' strings with actual nans, which irks me. Do I have to reimplement my own version of numpy’s select or did I miss something somewhere ?

Edit: I’m using numpy version 1.21.3, and pandas version 1.3.4.

>Solution :

You need to cast the default to 'object':

result = pd.Series(np.select(list_of_series, list_of_values, default=np.array(np.NaN, dtype='object')))

Result of print(result.unique()):

['1' '2' nan]

This is because the resulting dtype is created using result_type from the types of the choice list and the default: np.result_type('U1', float) yields '<U32' whereas np.result_type('U1', np.dtype('object')) yields 'O'.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading