I’m wanting to change a dataframe column so the values are lower case and also have their whitespace stripped:
df.loc[:, column] = df.loc[:, column].str.lower().str.strip()
The above snippet works, but it looks quite messy as I have to use .str. twice – is there a better solution?
>Solution :
You can use a list comprehension:
df['col2'] = [x.lower().strip() for x in df['col']]
Doing this can be faster than chaining multiple str:
%%timeit
df['col2'] = df['col'].str.strip().str.lower()
# 344 ms ± 12.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
df['col2'] = [x.lower().strip() for x in s]
# 182 ms ± 3.13 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
used input (1M rows):
df = pd.DataFrame({'col': [' aBc DeF ']*1000000})
NB. I used strip before lower in the comparison as this is faster than lower, then strip.