Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Is a pandas.DataFrame still sorted after using the method `query`?

I am working on a dataframe df in python. I need to query and sort the results multiple times, but on different columns:

for x in X:
   # query the dataframe and sort the result
   query_result = df.query(f"column_name == '{x}'").sort_values(by="other_column")
   # ... use query_result ...

I am wondering if I can factorize the sorting operation, to make the code run faster, like this:

# First sort the dataframe
df.sort_values(by="other_column", inplace=True)

for x in X:
   # then query it
   query_result = df.query(f"column_name == '{x}'")
   # ... use query_result, assuming it is sorted by other_column ...

In the second code, do I have any guarantee that query_result is sorted?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Thank you for your help

>Solution :

query doesn’t change the order of the rows, thus if your input is sorted, the output will be sorted.

Note however that for what you’re trying to do, a better approach would be to use groupby:

# ensure we only keep the values that are in X, and sort
tmp = df.loc[df['column_name'].isin(X)].sort_values(by="other_column")

for x, query_result in tmp.groupby('column_name', sort=False):
     # do something
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading