Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

'TypeError: expected string or bytes-like object' while trying to replace consecutive white spaces with a single space in all entries of a DataFrame

I have a DataFrame where every entry is a string value and a given entry may contain consecutive white spaces. For example:

import re
import pandas as pd
df = pd.DataFrame({'col1':['a--b','c  d'], 'col2':['e   f','g---h']})
print(df)

Output of print(df) (this is the initial df):

   col1   col2
0  a--b  e   f
1  c  d  g---h

I want to replace any consecutive white spaces with a single space in all the entries of df. So in this example, 'c d' (with two consecutive white spaces) should be replaced with 'c d', and 'e f' (with three consecutive white spaces) should be replaced with 'e f'.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Approach 1:
I get the correct result using df.replace, like so

# Approach 1 - works fine
df = df.replace('\s+', ' ', regex = True)
print(df)

Output of print(df) (this is the correct result expected):

   col1   col2
0  a--b    e f
1   c d  g---h

Approach 2:
However, I get TypeError: expected string or bytes-like object while using df.transform, like so

# Approach 2 - gives TypeError
df = df.transform(lambda s: re.sub('\s+', ' ', s))
print(df)

Output:

...
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/re.py", line 210, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object

Approach 3:
I get ValueError: Transform function failed if I do

# Approach 3 - gives ValueError
df = df.transform(lambda s: ' '.join(s.split()))
print(df)

Output:

...
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/apply.py", line 227, in transform
    raise ValueError("Transform function failed") from err
ValueError: Transform function failed

So where am I going wrong with the Approach 2 and 3?
Asking because the df.transform seems more powerful for transforming each cell in a DataFrame and will need that in my project for more complex transformations. Thank you!

>Solution :

You need DataFrame.applymap for element wise processing, because both function working with scalars:

df = df.applymap(lambda s: re.sub('\s+', ' ', s))
print(df)
   col1   col2
0  a--b    e f
1   c d  g---h

df = df.applymap(lambda s: ' '.join(s.split()))
print(df)
   col1   col2
0  a--b    e f
1   c d  g---h

Method DataFrame.transform processing columns like Series, so it failed.

You can rewrite second solution with Series.str.split and Series.str.join for processing columns (Series):

def f(x):
    #test - processing column
    #print (x)
    return x.str.split().str.join(' ')

df = df.transform(f)
print (df)

   col1   col2
0  a--b    e f
1   c d  g---h
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading