Advertisements
I want to extract all the words that are complete in uppercase (so not only the first letter, but all the letters in the word) from strings in columnY in dataset X
I have the following script:
X['uppercase'] = X['columnY'].str.extract('([A-Z][A-Z]+)')
But that only extract the first uppercased word in the string.
Then I tried extractall:
X['uppercase'] = X['columnY'].str.extractall('([A-Z][A-Z]+)')
But I got the following error:
TypeError: incompatible index of inserted column with frame index
What am I doing wrong?
>Solution :
We can use regular expressions and list comprehensions as below
import re
def extract_uppercase_words(text):
return re.findall(r'\b[A-Z]+\b', text)
X['columnY'].apply(extract_uppercase_words)