Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

What is the 'fillna()' euiqvalent for dtype 'Int32'?

Short question: How can I set all values that are <1 or <NA> to 1?

Long question: Say I have a pure-int (int32!) pandas column, I used can do this to cap the minimum:

>>> shots = pd.DataFrame([2, 0, 1], index=['foo', 'bar', 'baz'], columns={'shots'}, dtype='int32')
shots
     shots
foo      2
bar      0
baz      1

>>> max(shots.loc['foo', 'shots'], 1)
2

>>> max(shots.loc['bar', 'shots'], 1)
1

So far, so good. Now, say the dtype of column shots changes from ‘int32’ to Int32, allowing <NA>. This gets me in trouble when accessing <NA> records. I get this error:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>>> shots = pd.DataFrame([2, np.nan, 1], index=['foo', 'bar', 'baz'], columns={'shots'}, dtype='Int32')
     shots
foo      2
bar   <NA>
baz      1

>>> max(shots.loc['bar', 'shots'], 1)    
`TypeError: boolean value of NA is ambiguous`

What should I do?

My first intuition was to say "Ok, let’s fill values, then apply max().". But that also fails:

>>> shots.loc[idx, 'shots'].fillna(1)

AttributeError: 'NAType' object has no attribute 'fillna'

–> What is the most pandiastic/pydantic way to apply a condition to <NA> values, i.e., setting all <NA> to 1, or applying some other form of basic match, such as max(<NA>, 1)?

Versions

  • Python 3.8.6
  • Pandas 1.2.3
  • Numpy 1.19.2

>Solution :

idx should be a collection else if it’s a scalar you get a scalar value:

# idx = 'bar'

>>> shots.loc[idx, 'shots']
<NA>

>>> shots.loc[idx, 'shots'].fillna(1)
...
AttributeError: 'NAType' object has no attribute 'fillna'

>>> shots.loc[[idx], 'shots'].fillna(1)
bar    1
Name: shots, dtype: Int32

The question is how idx is defined?


Old answer

Your problem is not reproducible for me.

shots = pd.DataFrame({'shots': [2, 1, pd.NA]}, dtype=pd.Int32Dtype())
idx = [2]

>>> shots
   shots
0      2
1      1
2   <NA>

>>> shots.dtypes
shots    Int32
dtype: object

>>> shots.loc[idx, 'shots'].fillna(1)
2    1
Name: shots, dtype: Int32

Versions:

  • Python 3.9.7
  • Pandas 1.4.1
  • Numpy 1.21.5
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading