I have this dataframe where I’m trying to delete all one word responses, with/without punctuation and could have spaces in front too. Most of the values are full, long sentences but please find below the kind I am trying to remove.
| column |
|---|
| thanks |
| hello! |
| really…. |
My try
textonly = re.sub('^.\w+\w+.$' , " " , df.column)
error (even though dtype is string) : expected string or bytes-like object
Another try which seems to go through but doesnt change anything :/
textonly = re.sub('^.\w+\w+.$' , " " , str(df.column))
Please help identify what I’m missing
>Solution :
You can use
df['column'] = df['column'].str.replace(r'^\W*\w+\W*$', '', regex=True)
If you mean natural language words by "words", i.e. only consisting of letters, you may use
df['column'] = df['column'].str.replace(r'^[\W\d_]*[^\W\d_]+[\W\d_]*$', '', regex=True)
The regex matches
^– start of string\W*– zero or more non-word chars[\W\d_]*– zero or more non-word chars, digits and_\w+– one or more word chars[^\W\d_]+– one or more chars other than non-word chars, digits and_\W*– zero or more non-word chars$– end of string.