Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to consider `NA` when counting using `group_by()` and two factor variables in R

I have a dataframe df1 in which I have different sites (df1$Site) belonging to different regions (df1$Regions) in which I have data about evidence of herbivory and its type (df1$Herbivory_type). When there is no herbivory then, df1$Herbivory_type is NA. Below I show an example of my dataframe:

df1 <- data.frame(Region=c("ALI1","ALI1","ALI1","ALI1","ALI2","ALI2","ALI2","ALI3","ALI3","ALI3","ALI3","ALI5","ALI5"),
                  Site=c("ALI1_A","ALI1_B","ALI1_C","ALI1_D","ALI2_A","ALI2_B","ALI2_C","ALI3_A","ALI3_B","ALI3_C","ALI3_D","ALI5_A","ALI5_B"),
                  Herbivory_type=c(NA,"S",NA,NA,NA,NA,NA,NA,"S","S",NA,NA,"S"))

df1$Herbivory_type <- as.factor(df1$Herbivory_type)

df1

   Region   Site Herbivory_type
1    ALI1 ALI1_A           <NA>
2    ALI1 ALI1_B              S
3    ALI1 ALI1_C           <NA>
4    ALI1 ALI1_D           <NA>
5    ALI2 ALI2_A           <NA>
6    ALI2 ALI2_B           <NA>
7    ALI2 ALI2_C           <NA>
8    ALI3 ALI3_A           <NA>
9    ALI3 ALI3_B              S
10   ALI3 ALI3_C              S
11   ALI3 ALI3_D           <NA>
12   ALI5 ALI5_A           <NA>
13   ALI5 ALI5_B              S

I need to know the number of episodes of herbivorism by Region considering NA in the counting in df1$Site. I would expect this result:

df2

   Region N_Hervivory_S
1   ALI1             1
2   ALI2             0   # All sites have `NA`, thus, herbivorims is 0 in this region.
3   ALI3             2
4   ALI5             1

I tried this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

as.data.frame(df1 %>% group_by(Region,Herbivory_type) %>% summarise(N = n()))

But the output is not what I desire

  Region Herbivory_type N
1   ALI1              S 1
2   ALI1           <NA> 3
3   ALI2           <NA> 3
4   ALI3              S 2
5   ALI3           <NA> 2
6   ALI5              S 1
7   ALI5           <NA> 1

Does anyone know how to do it?

Thanks in advance

>Solution :

You can use count() to sum up !is.na(Herbivory_type) by group, and get the number of non-missing values for each region.

library(dplyr)

df1 %>%
  count(Region, wt = !is.na(Herbivory_type))

# # A tibble: 4 × 2
#   Region   res
#   <chr>  <int>
# 1 ALI1       1
# 2 ALI2       0
# 3 ALI3       2
# 4 ALI5       1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading