I have this first dataset, and I want to create the desired dataset by splitting the text in the first dataset, I’m wondering how could I do this:
Basically the new variables will be split after "XYZ-1" or "AAA-2".
I appreciate all the help there is!Thanks!
1st dataset:
Name <- c("A B XYZ-1 Where","C AAA-2 When","ABC R SS XYZ-1 Where")
x <- data.frame(Name)
desired dataset:
Name <- c("A B XYZ-1 Where","C AAA-2 When","ABC R SS XYZ-1 Where")
Study <- c("A B XYZ-1","C AAA-2","ABC R SS XYZ-1")
Question <- c("Where","When","Where")
x <- data.frame(Name,Study,Question)
Name Study Question
A B XYZ-1 Where A B XYZ-1 Where
C AAA-2 When C AAA-2 When
ABC R SS XYZ-1 Where ABC R SS XYZ-1 Where
>Solution :
Use separate – pass a regex lookaround in sep to match one or more spaces (\\s+) that follows three upper case letters and a - and a digit ([A-Z]{3}-\\d) and that precedes an uppercase letter ([A-Z])
library(tidyr)
separate(x, Name, into = c("Study", "Question"),
sep = "(?<=[A-Z]{3}-\\d)\\s+(?=[A-Z])", remove = FALSE)
-output
Name Study Question
1 A B XYZ-1 Where A B XYZ-1 Where
2 C AAA-2 When C AAA-2 When
3 ABC R SS XYZ-1 Where ABC R SS XYZ-1 Where