Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to set second and following occurrences of 0 to NaN in python

I am trying to figure out how to set all second occurrences of 0 in my dataframe to NaN in python. However, this is a tranposed dataframe so 0 would occur across the columns. To explain, I have the following dataframe:

     A   B   C   D   E   F   G
-------------------------------
0   55  34  30  29  15    0  0
1   63  59  49  32  21   16  0   
2   42  24  12   5   0    0  0
3   25  6    0   0   0    0  0
4   13  0    0   0   0    0  0

And so rather than calculated occurrences of 0 within a column, I want to count occurrences of 0 across each row (apologies if I did not word this corresctly, and if I should not have called this "transposed").

And so, for each row in this dataframe, starting from column "A" and going to column "G" in order, I want to see each occurrence of 0 after the first occurence to NaN. And so I want to produce the following output dataframe:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

     A     B     C      D     E     F     G
--------------------------------------------
0   55    34     30    29    15     0   NaN
1   63    59     49    32    21    16     0   
2   42    24     12     5     0   NaN   NaN
3   25     6      0   NaN   NaN   NaN   NaN
4   13     0    NaN   NaN   NaN   NaN   NaN

How can I set all of those 0s after the first occurrence of 0 across each row to NaN? I know it is more straightforward to simply set all 0 to NaN, but I cannot figure out a simple way to tell python I want to separate all occurrences of 0 after the first to be set to NaN.

To explain the reasoning for why I would want to do this, I am trying to plot each row in my dataframe across the letters (as the x-axis). And so having all those 0s in a row will lead to unnecessary points in the plot, as I only am interested in the point where the curve hits 0. A trail of 0s will lead to a differently shaped curve than only containing a single instance of 0. And so I want to only plot each row until the initial point at which 0 is hit (a decay curve).

>Solution :

Compute the cumulative counts of zeros over the rows using df.eq(0).cumsum(axis=1) and then fill the values with zero count greater than 1 using DataFrame.mask

res = df.mask(df.eq(0).cumsum(axis=1).gt(1))

Output:

>>> res

    A   B     C     D     E     F    G
0  55  34  30.0  29.0  15.0   0.0  NaN
1  63  59  49.0  32.0  21.0  16.0  0.0
2  42  24  12.0   5.0   0.0   NaN  NaN
3  25   6   0.0   NaN   NaN   NaN  NaN
4  13   0   NaN   NaN   NaN   NaN  NaN

Intermediate Outputs:

>>> df.eq(0).cumsum(axis=1) 

   A  B  C  D  E  F  G
0  0  0  0  0  0  1  2
1  0  0  0  0  0  0  1
2  0  0  0  0  1  2  3
3  0  0  1  2  3  4  5
4  0  1  2  3  4  5  6

>>> df.eq(0).cumsum(axis=1).gt(1)

       A      B      C      D      E      F      G
0  False  False  False  False  False  False   True
1  False  False  False  False  False  False  False
2  False  False  False  False  False   True   True
3  False  False  False   True   True   True   True
4  False  False   True   True   True   True   True
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading