I know pandas provide various ways to index data, I wanted to know is there a difference between the following two methods from the perspective of performance i.e. which one is faster or both the same?
# method 1
df = table.loc[table.some_col==True, :]
# method 2
df = table[table.some_col==True]
>Solution :
Second is a bit faster, for me it has sense, because first solution is combination DataFrame.loc and boolean indexing, second only boolean indexing:
np.random.seed(2021)
table = pd.DataFrame(np.random.rand(10**7, 5), columns=list('abcde'))
table['some_col'] = table.a > 0.6
In [130]: %timeit table.loc[table.some_col==True, :]
258 ms ± 2.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [131]: %timeit df = table[table.some_col==True]
241 ms ± 1.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)