performance using loc vs simply using inside square brackets

December 17, 2021

I know pandas provide various ways to index data, I wanted to know is there a difference between the following two methods from the perspective of performance i.e. which one is faster or both the same?

# method 1

df = table.loc[table.some_col==True, :]

# method 2

df = table[table.some_col==True]

>Solution :

Second is a bit faster, for me it has sense, because first solution is combination DataFrame.loc and boolean indexing, second only boolean indexing:

np.random.seed(2021)
table = pd.DataFrame(np.random.rand(10**7, 5), columns=list('abcde'))
table['some_col'] = table.a > 0.6

In [130]: %timeit table.loc[table.some_col==True, :]
258 ms ± 2.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [131]: %timeit df = table[table.some_col==True]
241 ms ± 1.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)