Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Converting rows into a categorical column using R

I have a transcribed interview and the data is organized as follows:

[1,]  "Interviewer"
[2,]  "What is your favorite food?"
[3,]  "Interviewee"
[4,]  "I love to eat pizza"
[5,]  "Interviewer"
[6,]  "Cool. But have you ever tried eating salad?"
[7,]  "Interviewee "
[8,]  "Yeah..."
[9,]  "Interviewer"
[10,] "I love salad, pizza is bad."
[11,] "Interviewee "
[12,] "I don't totally agree" 

I would like to remove the author of the speech from the rows and turn it into a categorical column, as in the example:

      [,1]                [,2]  
[1,]  "Interviewer"       "What is your favorite food?"
[2,]  "Interviewee"       "I love to eat pizza"
[3,]  "Interviewer"       "Cool. But have you ever tried eating a salad?"
[4,]  "Interviewee"       "Yeah..."
[5,]  "Interviewer"       "I love salad, pizza is bad."
[6,]  "Interviewee"       "I don't totally agree"

The interview considers the conversation between two people.
Does anyone know how to do this?
Thanks in advance!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

We can create a grouping variable with grepl on the ‘Interview’ keyword, split and rbind

do.call(rbind, split(v1, cumsum(grepl("^Interview", v1))))

-output

 [,1]           [,2]                                         
1 "Interviewer"  "What is your favorite food?"                
2 "Interviewee"  "I love to eat pizza"                        
3 "Interviewer"  "Cool. But have you ever tried eating salad?"
4 "Interviewee " "Yeah..."                                    
5 "Interviewer"  "I love salad, pizza is bad."                
6 "Interviewee " "I don't totally agree"        

If these are alternate elements, then either use a recycling index to create two columns

cbind(v1[c(TRUE, FALSE)], v1[c(FALSE, TRUE)])
     [,1]           [,2]                                         
[1,] "Interviewer"  "What is your favorite food?"                
[2,] "Interviewee"  "I love to eat pizza"                        
[3,] "Interviewer"  "Cool. But have you ever tried eating salad?"
[4,] "Interviewee " "Yeah..."                                    
[5,] "Interviewer"  "I love salad, pizza is bad."                
[6,] "Interviewee " "I don't totally agree"   

Or use matrix

matrix(v1, ncol = 2, byrow = TRUE)
     [,1]           [,2]                                         
[1,] "Interviewer"  "What is your favorite food?"                
[2,] "Interviewee"  "I love to eat pizza"                        
[3,] "Interviewer"  "Cool. But have you ever tried eating salad?"
[4,] "Interviewee " "Yeah..."                                    
[5,] "Interviewer"  "I love salad, pizza is bad."                
[6,] "Interviewee " "I don't totally agree"                

data

v1 <- c("Interviewer", "What is your favorite food?", "Interviewee", 
"I love to eat pizza", "Interviewer", 
"Cool. But have you ever tried eating salad?", 
"Interviewee ", "Yeah...", "Interviewer", "I love salad, pizza is bad.", 
"Interviewee ", "I don't totally agree")
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading