I have a dataframe that looks like this
data.frame(service=c("1,2,3,4","2,4,5","1,3,4"),
score=c("1,2,1,3","1,1,3","1,1,1"))
a b
1 1,2,3,4 1,2,1,3
2 2,4,5 1,1,3
3 1,3,4 1,1,1
The variable service refers to a categorical variable from 1-5, while the score, is also a categorical variable from 1-3 for each of the previous service. Note that each respondent have taken different services, so the length is not the same for each variable is not the same
I need to reshape this dataframe to associate the score to each service. The final result would look like this
data_frame(ind=c(1,1,1,1,2,2,2,3,3,3),
serv=c(1,2,3,4,2,4,5,1,3,4),
score=c(1,2,1,3,1,1,3,1,1,1))
ind serv score
<dbl> <dbl> <dbl>
1 1 1 1
2 1 2 2
3 1 3 1
4 1 4 3
5 2 2 1
6 2 4 1
7 2 5 3
8 3 1 1
9 3 3 1
10 3 4 1
I first split the variable service to create all the categories in this way
library("qdapTools","tidyverse")
lst1 <- lapply(strsplit(df$serv, ","), function(x)
replace(x, (! x %in% c("1", "2", "3","4","5")) & !is.na(x), "other"))
lst1_tab<-mtabulate(lst1)%>% setNames(paste0('serv_', names(.)))
df<-cbind(df,lst1_tab)
serv score serv_1 serv_2 serv_3 serv_4 serv_5
1 1,2,3,4 1,2,1,3 1 1 1 1 0
2 2,4,5 1,1,3 0 1 0 1 1
3 1,3,4 1,1,1 1 0 1 1 0
I did that to later reshape the df in long form. However, not all individuals took all services. For instance, individual 1 did not take service 5. Therefore, I didn’t know how to split also the variable score to associate it to each service
>Solution :
We could use
library(dplyr)
library(tidyr)
df1 %>%
mutate(ind = row_number(), .before = 1) %>%
separate_longer_delim(c(service, score), delim = ",")
-output
ind service score
1 1 1 1
2 1 2 2
3 1 3 1
4 1 4 3
5 2 2 1
6 2 4 1
7 2 5 3
8 3 1 1
9 3 3 1
10 3 4 1