Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R Code – Counting the number of columns that have data by group

I have a dataframe with number of times a bird species was detected at a site (n=110) during two separate visits:

   Site     Yr Total.ACFL Total.AMCR Total.AMGO Total.AMKE Total.AMRE
   <chr> <int>      <int>      <int>      <int>      <int>      <int>
 1 BRY1   2021          0          0          0          0          0
 2 BRY2   2021          0          0          3          0          0
 3 BRY2   2021          1          0          2          0          0
 4 BRY3   2021          0          0          2          0          0
 5 BRY3   2021          0          0          7          0          0
 6 BRY4   2021          0          0          1          0          0
 7 BRY4   2021          0          0          1          0          0
 8 BRY5   2021          0          0          4          0          0
 9 BRY5   2021          0          0          4          0          0
10 BRY6   2021          0          0          8          0          0

There are a total of 56 species (columns that start with Total.). What I want to do is determine the total species richness (number of unique species detected) at each site. I want to produce a script that does this by determining the number of columns (e.g. Total.ACFL) that contain at least one detection per site. For example, the site BRY2 would have a species richness of 2 since AMGO and ACFL were detected during the two visits.

Any help would be greatly appreciated.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I have not tried anything as of yet (at a bit of a loss, frankly), but I am hoping to be steered in the proper direction. Assuming I might use the vegan package to my advantage.

>Solution :

pivot_longer and summarize

You can first reshape your dataframe into a "long" format for easier manipulation, then count the number of species having the value > 0.

library(dplyr)

df |> 
  pivot_longer(starts_with("Total")) |> 
  summarize(value = sum(value), .by = c(Site, Yr, name)) |> 
  summarize(count = sum(value > 0), .by = c(Site, Yr))

summarize and reframe

Another method using reframe(). This does not require reshaping the dataframe.

df |>
  summarize(across(starts_with("Total"), sum), .by = c(Site, Yr)) |>
  reframe(count = rowSums(pick(starts_with("Total")) > 0), .by = c(Site, Yr))

They have the same output:

# A tibble: 6 × 3
  Site     Yr count
  <chr> <int> <dbl>
1 BRY1   2021     0
2 BRY2   2021     2
3 BRY3   2021     1
4 BRY4   2021     1
5 BRY5   2021     1
6 BRY6   2021     1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading