Home R how to use pivot wider taking column names from multiple columns, and adding 1 or zero based on presence/absence per sample

Questions

R how to use pivot wider taking column names from multiple columns, and adding 1 or zero based on presence/absence per sample

byMR

August 19, 2022

I am searching a way for obtaining a large table from a table with a list of species.

Here I give you an example:

I have something like this

data.frame(survey_id = c("ID_1", "ID_2", "ID_3", "ID_4", "ID_5"), 
           list_1 = c("A", "A", "A", "B", "A"), 
           list_2 = c("B", "D", "E", "E", "F"), 
           list_3 = c("C", "", "", "F", ""))

and I want to obtain this

data.frame(survey_id = c("ID_1", "ID_2", "ID_3", "ID_4", "ID_5"), 
           A = c(1,1,1,0,1), 
           B = c(1,0,0,1,0),
           C = c(1,0,0,0,0), 
           D = c(0,1,0,0,0), 
           E = c(0,0,1,1,0), 
           F = c(0,0,0,1,1))

Any suggestion using pivot_wider function?

Thanks for your help.

>Solution :

A slightly round-a-bout way: pivot_longer first, so as you can count by (survey_id, value), and then pivot_wider.

library(dplyr)
library(tidyr)

df1 %>% 
  pivot_longer(-Survey_ID) %>% 
  count(Survey_ID, value) %>% 
  filter(value != "") %>% 
  pivot_wider(names_from = "value", 
              values_from = "n") %>% 
  replace(is.na(.), 0)

Result:

# A tibble: 5 × 7
  Survey_ID     A     B     C     D     E     F
  <chr>     <int> <int> <int> <int> <int> <int>
1 ID_1          1     1     1     0     0     0
2 ID_2          1     0     0     1     0     0
3 ID_3          1     0     0     0     1     0
4 ID_4          0     1     0     0     1     1
5 ID_5          1     0     0     0     0     1

Data df1:

df1 <- data.frame(survey_id = c("ID_1", "ID_2", "ID_3", "ID_4", "ID_5"), 
                  list_1 = c("A", "A", "A", "B", "A"), 
                  list_2 = c("B", "D", "E", "E", "F"), 
                  list_3 = c("C", "", "", "F", ""))