Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python regex for one word

I have this dataframe where I’m trying to delete all one word responses, with/without punctuation and could have spaces in front too. Most of the values are full, long sentences but please find below the kind I am trying to remove.

column
thanks
hello!
really….

My try
textonly = re.sub('^.\w+\w+.$' , " " , df.column)

error (even though dtype is string) : expected string or bytes-like object

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Another try which seems to go through but doesnt change anything :/

textonly = re.sub('^.\w+\w+.$' , " " , str(df.column))

Please help identify what I’m missing

>Solution :

You can use

df['column'] = df['column'].str.replace(r'^\W*\w+\W*$', '', regex=True)

If you mean natural language words by "words", i.e. only consisting of letters, you may use

df['column'] = df['column'].str.replace(r'^[\W\d_]*[^\W\d_]+[\W\d_]*$', '', regex=True)

The regex matches

  • ^ – start of string
  • \W* – zero or more non-word chars
  • [\W\d_]* – zero or more non-word chars, digits and _
  • \w+ – one or more word chars
  • [^\W\d_]+ – one or more chars other than non-word chars, digits and _
  • \W* – zero or more non-word chars
  • $ – end of string.
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading