Extract sentence and value from string in R

April 25, 2023

I have a dataset that looks like this:

> dput(df)
structure(list(Person.ID = c(123L, 234L), Date = c("10/10/09", 
"11/11/03"), Text = c("Here are some random words that I do not want. The person was allowed to cool to a core body temperature of 16.5 degrees centigrade. Here are some other random words I do not want.", 
"Here are some random words that I do not want. A cooling mechanism was applied to cool the patient to a core body temperature of 19.1 degrees centigrade. Here are some other random words I do not want."
)), class = "data.frame", row.names = c(NA, -2L))

For each person, I would like to extract the sentence that mentions their body temperature (and get rid of the unwanted words). I would also like a separate column that ONLY mentions the temperature. The desired output should look like:

> dput(df2)
structure(list(Person.ID = c(123L, 234L), Date = c("10/10/09", 
"11/11/03"), Text = c("The person was allowed to cool to a core body temperature of 16.5 degrees centigrade.", 
"A cooling mechanism was applied to cool the patient to a core body temperature of 19.1 degrees centigrade. "
), Value = c(16.5, 19.1)), class = "data.frame", row.names = c(NA, 
-2L))

>Solution :

Here is one option

library(stringr)
library(tidyr)
library(dplyr)
df %>% 
 separate_longer_delim(Text, delim = regex("(?<=\\.)\\s+")) %>% 
 filter(str_detect(Text, "temperature")) %>% 
 mutate(Value = as.numeric(str_extract(Text, "\\d+\\.?\\d+?")))

-output

 Person.ID     Date
1       123 10/10/09
2       234 11/11/03
                                                                                                        Text Value
1                      The person was allowed to cool to a core body temperature of 16.5 degrees centigrade.  16.5
2 A cooling mechanism was applied to cool the patient to a core body temperature of 19.1 degrees centigrade.  19.1