I have below dataframe in python,
Text = provide written informed consent healthy male or female age between 31 to 59 years fluent in german language
it needs looking fore age and add 5 vocab before and after that word.
target value = age
my desired output:
result = healthy male or female age between 31 to 59 years
my code:
Text = "provide written informed consent healthy male or female age between 31 to 59 years fluent in german language"
r1 = re.search(r"(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,3} age (?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,3}", text)
r1.group()
my result is
age 16 years old
my data has some words like manage or agent that should be ignore.
thanks
>Solution :
One way to do so, without using regex, might be to split the text into words and retrieve the position of age in the word list.
Text = "provide written informed consent healthy male or female age between 31 to 59 years fluent in german language"
Text = Text.split()
result = Text[Text.index("age") - 4:Text.index("age") + 5]
print(result) # ['healthy', 'male', 'or', 'female', 'age', 'between', '31', 'to', '59']