Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Selecting column in dataframe within a range

I am working on dataframes in Python. I have original dataframe for 10 days. I have divided that dataframe for each day and trying to plot. I have some strange values in some columns(here y and z) ,so I am trying to use ‘between method’ to specify my range (0,100). The code is working, but I am getting warning. Can anyone help me please ?

for df  in ((listofDF)):
    if len(df) != 0:
        f_df = df[df[' y'].between(0,100)]
        f_df = f_df[df[' z'].between(0,100)]
        maxTemp = f_df[' y']
        minTemp = f_df[' z']
        Time = f_df['x']
        plt.plot(x,y)
        plt.plot(x,z)
        

The warning I am getting is, UserWarning: Boolean Series key will be reindexed to match DataFrame index.
f_df = f_df[df[‘ y’].between(0,100)]

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

TL;DR Solution

Change f_df = f_df[df[' z'].between(0, 100)] to f_df = f_df[f_df[' z'].between(0, 100)]


The warning you are getting is because of this line:

f_df = f_df[df[' z'].between(0,100)]

There’s an issue with this line, can you spot it?


You’re using df to index f_df. What you’re essentially doing here is getting the rows where in df, column z is between 0 and 100, so let’s say in df that’s rows 2 and 4.

However, in f_df, the rows could be completely different. Meaning that in f_df (which is a different dataframe), the rows where z is between 0 and 100 are rows 3 and 10. Since you’re using df to index f_df in this sense (as in you’re getting the indices that satisfy the condition in df and using these indices to select rows from f_df), pandas is telling you that f_df‘s index is used to decide which rows to keep, which may not be what you want.

So when you do the filter on df and it returns rows 1 and 10, it will choose rows 1 and 10 from f_df. Or to be more accurate – it will choose the indices 1 and 10.

In your case, it is what you want because the indices are retained when you create the f_df dataframe, as seen by the indices on the left when you print it out.

>>> df = pd.DataFrame([('a', 1, 51), ('b', 51, 31)], columns=['letter', 'x', 'y'])
>>> f_df = df[df.x.between(0, 50)]
>>> f_df
  letter  x   y
0      a  1  51
>>> f_df = f_df[df.y.between(0, 50)]
<stdin>:1: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
>>> f_df
Empty DataFrame
Columns: [letter, x, y]
Index: []
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading