I am working on a dataframe df in python. I need to query and sort the results multiple times, but on different columns:
for x in X:
# query the dataframe and sort the result
query_result = df.query(f"column_name == '{x}'").sort_values(by="other_column")
# ... use query_result ...
I am wondering if I can factorize the sorting operation, to make the code run faster, like this:
# First sort the dataframe
df.sort_values(by="other_column", inplace=True)
for x in X:
# then query it
query_result = df.query(f"column_name == '{x}'")
# ... use query_result, assuming it is sorted by other_column ...
In the second code, do I have any guarantee that query_result is sorted?
Thank you for your help
>Solution :
query doesn’t change the order of the rows, thus if your input is sorted, the output will be sorted.
Note however that for what you’re trying to do, a better approach would be to use groupby:
# ensure we only keep the values that are in X, and sort
tmp = df.loc[df['column_name'].isin(X)].sort_values(by="other_column")
for x, query_result in tmp.groupby('column_name', sort=False):
# do something