I have a data frame in R which is organized with sampled trees in rows and different lichen species in columns. Count data for each lichen species per tree is listed in each column.
| Tree_ID | Lichen_A | Lichen_B | Lichen_C |
|---|---|---|---|
| Tree1 | 1 | 2 | 3 |
| Tree2 | 2 | 5 | 7 |
I would like to categorize the lichens and count the number seen per tree according to category. Ideally, the categories could be imported from another data frame which could look like this. Certain lichens will share the same category.
| Lichen | Sensitivity |
|---|---|
| Lichen_A | Low |
| Lichen_B | Low |
| Lichen_C | Medium |
Combining these to get the sum of the categories would look like this:
| Tree_ID | Low | Medium |
|---|---|---|
| Tree1 | 3 | 3 |
| Tree2 | 7 | 7 |
In reality I have about 40 different lichen species, two sensitivity categories and multiple datasets of trees, so I’d like to find a solution that means I don’t need to do everything manually.
I am not sure how to set up the code for this, though I have a feeling it should be possible, perhaps with a loop and/or dplyr mutate?
>Solution :
With dplyr and tidyr:
library(dplyr)
library(tidyr)
df1 %>%
pivot_longer(unique(df2$Lichen)) %>%
inner_join(df2, by = c("name" = "Lichen")) %>%
group_by(Tree_ID, Sensitivity) %>%
summarise(value = sum(value)) %>%
pivot_wider(names_from = "Sensitivity")
Tree_ID Low Medium
1 Tree1 3 3
2 Tree2 7 7
data
df1 <- read.table(header = T,text = "Tree_ID Lichen_A Lichen_B Lichen_C
Tree1 1 2 3
Tree2 2 5 7")
df2 <- read.table(header = T,text= "
Lichen Sensitivity
Lichen_A Low
Lichen_B Low
Lichen_C Medium")