Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Reshape strings within a dataframe in R

I have a dataframe that looks like this

data.frame(service=c("1,2,3,4","2,4,5","1,3,4"),
                      score=c("1,2,1,3","1,1,3","1,1,1"))
        a       b
1 1,2,3,4 1,2,1,3
2   2,4,5   1,1,3
3   1,3,4   1,1,1

The variable service refers to a categorical variable from 1-5, while the score, is also a categorical variable from 1-3 for each of the previous service. Note that each respondent have taken different services, so the length is not the same for each variable is not the same

I need to reshape this dataframe to associate the score to each service. The final result would look like this

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

data_frame(ind=c(1,1,1,1,2,2,2,3,3,3),
                 serv=c(1,2,3,4,2,4,5,1,3,4),
                 score=c(1,2,1,3,1,1,3,1,1,1))

    ind  serv score
   <dbl> <dbl> <dbl>
 1     1     1     1
 2     1     2     2
 3     1     3     1
 4     1     4     3
 5     2     2     1
 6     2     4     1
 7     2     5     3
 8     3     1     1
 9     3     3     1
10     3     4     1

I first split the variable service to create all the categories in this way

library("qdapTools","tidyverse")

lst1 <- lapply(strsplit(df$serv, ","), function(x) 
  replace(x, (! x %in% c("1", "2", "3","4","5")) & !is.na(x), "other"))

lst1_tab<-mtabulate(lst1)%>% setNames(paste0('serv_', names(.)))
df<-cbind(df,lst1_tab)

     serv   score serv_1 serv_2 serv_3 serv_4 serv_5
1 1,2,3,4 1,2,1,3      1      1      1      1      0
2   2,4,5   1,1,3      0      1      0      1      1
3   1,3,4   1,1,1      1      0      1      1      0

I did that to later reshape the df in long form. However, not all individuals took all services. For instance, individual 1 did not take service 5. Therefore, I didn’t know how to split also the variable score to associate it to each service

>Solution :

We could use

library(dplyr)
library(tidyr)
df1 %>% 
 mutate(ind = row_number(), .before = 1) %>% 
 separate_longer_delim(c(service, score), delim = ",")

-output

   ind service score
1    1       1     1
2    1       2     2
3    1       3     1
4    1       4     3
5    2       2     1
6    2       4     1
7    2       5     3
8    3       1     1
9    3       3     1
10   3       4     1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading