Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R: Calculate percentage of observations in a column that are below a certain value for panel data

I have panel data and I would like to get the percentage of observations in a column (Size) that are below 1 million.

My data is the following:

structure(list(Product = c("A", "A", "A", "A", "A", "A", "B", 
"B", "B", "B", "B", "B", "C", "C", "C", "C", "C", "C"), Date = c("02.05.2018", 
"04.05.2018", "05.05.2018", "06.05.2018", "07.05.2018", "08.05.2018", 
"02.05.2018", "04.05.2018", "05.05.2018", "06.05.2018", "07.05.2018", 
"08.05.2018", "02.05.2018", "04.05.2018", "05.05.2018", "06.05.2018", 
"07.05.2018", "08.05.2018"), Size = c(100023423, 1920, 2434324342, 
2342353566, 345345345, 432, 1.35135e+11, 312332, 23434, 4622436246, 
3252243, 234525, 57457457, 56848648, 36363546, 36535636, 2345, 
2.52646e+11)), class = "data.frame", row.names = c(NA, -18L))

So for instance, for Product A it would be 33.33% since two out of 6 observations are below one million.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I have tried the following in R

df <- df %>%
  group_by(Product) %>%
  dplyr:: summarise(CountDate = n(), SmallSize = count(Size<1000000))

However, I get an error saying that "no applicable method for ‘count’ applied to an object of class "logical"" eventhough the column Size has the format double.

After the code above I would then calculate SmallSize/CountDate to get the percentage.

What do I need to adjust to not get the error message?

>Solution :

Instead of count, which requires a data.frame/tibble, use sum on a logical vector to get the count – TRUE values will be counted as 1 and FALSE as 0

library(dplyr)
df %>%
  group_by(Product) %>%
  dplyr:: summarise(CountDate = n(),
     SmallSize = sum(Size<1000000, na.rm = TRUE), .groups = "drop") %>%
  dplyr::mutate(Percent = SmallSize/CountDate)
# A tibble: 3 × 4
  Product CountDate SmallSize Percent
  <chr>       <int>     <int>   <dbl>
1 A               6         2   0.333
2 B               6         3   0.5  
3 C               6         1   0.167

Also, we don’t need to create both the columns. It can be directly calculated with mean

df %>%
    group_by(Product) %>%
    dplyr::summarise(Percent = mean(Size < 1000000, na.rm = TRUE))
# A tibble: 3 × 2
  Product Percent
  <chr>     <dbl>
1 A         0.333
2 B         0.5  
3 C         0.167
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading