splitting text to create new variable

January 9, 2023

I have this first dataset, and I want to create the desired dataset by splitting the text in the first dataset, I’m wondering how could I do this:

Basically the new variables will be split after "XYZ-1" or "AAA-2".
I appreciate all the help there is!Thanks!

1st dataset:

Name <- c("A B XYZ-1 Where","C AAA-2 When","ABC R SS XYZ-1 Where")
x <- data.frame(Name)

desired dataset:

Name <- c("A B XYZ-1 Where","C AAA-2 When","ABC R SS XYZ-1 Where")
Study <- c("A B XYZ-1","C AAA-2","ABC R SS XYZ-1")
Question <- c("Where","When","Where")
x <- data.frame(Name,Study,Question)

Name                      Study             Question

A B XYZ-1 Where           A B XYZ-1         Where       
C AAA-2 When              C AAA-2           When        
ABC R SS XYZ-1 Where      ABC R SS XYZ-1    Where

>Solution :

Use separate – pass a regex lookaround in sep to match one or more spaces (\\s+) that follows three upper case letters and a - and a digit ([A-Z]{3}-\\d) and that precedes an uppercase letter ([A-Z])

library(tidyr)
separate(x, Name, into = c("Study", "Question"), 
     sep = "(?<=[A-Z]{3}-\\d)\\s+(?=[A-Z])", remove = FALSE)

-output

                  Name          Study Question
1      A B XYZ-1 Where      A B XYZ-1    Where
2         C AAA-2 When        C AAA-2     When
3 ABC R SS XYZ-1 Where ABC R SS XYZ-1    Where