Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python Dataframe avoid non-NaN values dropping during <> operations

My code:

xdf = pd.DataFrame(data={'A':[-10,np.nan,-2.2],'B':[np.nan,2,1.5],'C':[3,1,-0.3]},index=['2023-05-13 08:40:00','2023-05-13 08:41:00','2023-05-13 08:42:00'])
xdf = 
                          A      B      C
2023-05-13 08:40:00     -10.0   NaN     3.0
2023-05-13 08:41:00     NaN     2.0     1.0
2023-05-13 08:42:00     -2.2    1.5     -0.3

Consider only values below 4.0 and above -4.0 in each row of the dataframe

print(xdf[((xdf<4.0).all(axis=1))&((xdf>-4.0).all(axis=1))])

Present output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

                          A      B      C
2023-05-13 08:42:00     -2.2    1.5     -0.3

Expected output: My above code drops a row if there is a NaN in one column, despite other columns satisfying the filter condition. So, I want to omit NaN columns and consider non-NaN columns in <> operation.

                          A      B      C
2023-05-13 08:41:00     NaN     2.0     1.0
2023-05-13 08:42:00     -2.2    1.5     -0.3

Edit:

One working solution:

print(xdf[((xdf.fillna(True)<4.0).all(axis=1))&((xdf.fillna(True)>-4.0).all(axis=1))])

>Solution :

I suggest add new mask DataFrame.isna chained with | (bitwise OR) for test missing values:

print(xdf[((xdf<4.0) & (xdf>-4.0) | xdf.isna()).all(axis=1)])
                       A    B    C
2023-05-13 08:41:00  NaN  2.0  1.0
2023-05-13 08:42:00 -2.2  1.5 -0.3

How it working:

print ((xdf<4.0) & (xdf>-4.0))
                         A      B     C
2023-05-13 08:40:00  False  False  True
2023-05-13 08:41:00  False   True  True
2023-05-13 08:42:00   True   True  True

print (((xdf<4.0) & (xdf>-4.0) | xdf.isna()))
                         A     B     C
2023-05-13 08:40:00  False  True  True
2023-05-13 08:41:00   True  True  True
2023-05-13 08:42:00   True  True  True

print(((xdf<4.0) & (xdf>-4.0) | xdf.isna()).all(axis=1))
2023-05-13 08:40:00    False
2023-05-13 08:41:00     True
2023-05-13 08:42:00     True
dtype: bool

Another idea is use DataFrame.lt and
DataFrame.gt :

print(xdf[(xdf.lt(4.0) & xdf.gt(-4.0) | xdf.isna()).all(axis=1)])
                       A    B    C
2023-05-13 08:41:00  NaN  2.0  1.0
2023-05-13 08:42:00 -2.2  1.5 -0.3
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading