extract uppercase words from string

Advertisements

I want to extract all the words that are complete in uppercase (so not only the first letter, but all the letters in the word) from strings in columnY in dataset X

I have the following script:

X['uppercase'] = X['columnY'].str.extract('([A-Z][A-Z]+)')

But that only extract the first uppercased word in the string.

Then I tried extractall:

X['uppercase'] = X['columnY'].str.extractall('([A-Z][A-Z]+)')

But I got the following error:

TypeError: incompatible index of inserted column with frame index

What am I doing wrong?

>Solution :

We can use regular expressions and list comprehensions as below

import re

def extract_uppercase_words(text):
    return re.findall(r'\b[A-Z]+\b', text)

X['columnY'].apply(extract_uppercase_words)

Leave a Reply Cancel reply