How do I add a column of abundance data by summing observations per site?

March 8, 2022

I have a dataframe containing observations of scallop presence/absence across multiple sites. I would like to count the number of scallops per site, using the UID (unique identifier) and the presence/absence column (binary: 0 is absent, 1 is present).

My dataframe looks like this:

UID	Present.Absent	Size.cm	binary
A-10-2021	Present	4.60	1
A-10-2021	Present	6.0	1
A-11-2021	Present	4.70	1
A-11-2021	Present	4.8	1
A-4-2021	Absent	NA	0
A-5-2021	Present	5.90	1
A-5-2021	Present	6.00	1
A-5-2021	Present	6.00	1
A-5-2021	Present	3.90	1
A-5-2021	Present	5.00	1
A-6-2021	Absent	NA	0

and it goes on for about ~6000 observations, with about 1500 different UIDs

I am new to R, and wasn’t sure how to go about this. Is there a way to have it so there’s one row per UID, with a column of abundance data? Any help is much appreciated, and if any additional information would help, I am happy to provide. Thank you!

Edit: added sample of data ; first 10 rows

structure(list(UID = c("A-10-2021", "A-10-2021", "A-11-2021", 
"A-11-2021", "A-1-2021", "A-1-2021", "A-1-2021", "A-12-2021", 
"A-12-2021", "A-12-2021"), Present.Absent = c("Present", "Present", 
"Present", "Present", "Present", "Present", "Present", "Present", 
"Present", "Present"), Alive.Dead = c("Alive", "Alive", "Alive", 
"Alive", "Alive", "Alive", "Alive", "Alive", "Alive", "Alive"
), Size.cm = c(4.6, 5.25, 4.7, 5.1, 3.5, 3.9, 4.7, 4.7, 4.9, 
4.9), binary = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), row.names = c(3L, 
4L, 9L, 10L, 14L, 15L, 17L, 36L, 37L, 38L), class = "data.frame")

>Solution :

You can use group_by() to achieve that:

# Your data
temp1 <- structure(list(UID = c("A-10-2021", "A-10-2021", "A-11-2021", 
"A-11-2021", "A-1-2021", "A-1-2021", "A-1-2021", "A-12-2021", 
"A-12-2021", "A-12-2021"), Present.Absent = c("Present", "Present", 
"Present", "Present", "Present", "Present", "Present", "Present", 
"Present", "Present"), Alive.Dead = c("Alive", "Alive", "Alive", 
"Alive", "Alive", "Alive", "Alive", "Alive", "Alive", "Alive"
), Size.cm = c(4.6, 5.25, 4.7, 5.1, 3.5, 3.9, 4.7, 4.7, 4.9, 
4.9), id = c(3L, 4L, 9L, 10L, 14L, 15L, 17L, 36L, 37L, 38L)), row.names = c(3L, 
4L, 9L, 10L, 14L, 15L, 17L, 36L, 37L, 38L), class = "data.frame")

Note that you can first create your binary column (isPresent) by using mutate() and ifelse().

library(tidyverse)

# Option 1: Create a new column with abundance, by UID, but keep the number of rows
temp1 %>% mutate(isPresent = ifelse(Present.Absent == "Present", 1, 0)) %>% group_by(UID) %>% mutate(abundance = sum(isPresent))

# Option 2: Get a summary, with one row per UID
temp1 %>% mutate(isPresent = ifelse(Present.Absent == "Present", 1, 0)) %>% group_by(UID) %>% summarise(abundance = sum(isPresent))