# R: Calculate percentage of observations in a column that are below a certain value for panel data

I have panel data and I would like to get the percentage of observations in a column (Size) that are below 1 million.

My data is the following:

``````structure(list(Product = c("A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B", "C", "C", "C", "C", "C", "C"), Date = c("02.05.2018",
"04.05.2018", "05.05.2018", "06.05.2018", "07.05.2018", "08.05.2018",
"02.05.2018", "04.05.2018", "05.05.2018", "06.05.2018", "07.05.2018",
"08.05.2018", "02.05.2018", "04.05.2018", "05.05.2018", "06.05.2018",
"07.05.2018", "08.05.2018"), Size = c(100023423, 1920, 2434324342,
2342353566, 345345345, 432, 1.35135e+11, 312332, 23434, 4622436246,
3252243, 234525, 57457457, 56848648, 36363546, 36535636, 2345,
2.52646e+11)), class = "data.frame", row.names = c(NA, -18L))
``````

So for instance, for Product A it would be 33.33% since two out of 6 observations are below one million.

I have tried the following in R

``````df <- df %>%
group_by(Product) %>%
dplyr:: summarise(CountDate = n(), SmallSize = count(Size<1000000))

``````

However, I get an error saying that "no applicable method for ‘count’ applied to an object of class "logical"" eventhough the column Size has the format double.

After the code above I would then calculate SmallSize/CountDate to get the percentage.

What do I need to adjust to not get the error message?

### >Solution :

Instead of `count`, which requires a data.frame/tibble, use `sum` on a logical vector to get the count – `TRUE` values will be counted as 1 and `FALSE` as 0

``````library(dplyr)
df %>%
group_by(Product) %>%
dplyr:: summarise(CountDate = n(),
SmallSize = sum(Size<1000000, na.rm = TRUE), .groups = "drop") %>%
dplyr::mutate(Percent = SmallSize/CountDate)
# A tibble: 3 × 4
Product CountDate SmallSize Percent
<chr>       <int>     <int>   <dbl>
1 A               6         2   0.333
2 B               6         3   0.5
3 C               6         1   0.167
``````

Also, we don’t need to create both the columns. It can be directly calculated with `mean`

``````df %>%
group_by(Product) %>%
dplyr::summarise(Percent = mean(Size < 1000000, na.rm = TRUE))
# A tibble: 3 × 2
Product Percent
<chr>     <dbl>
1 A         0.333
2 B         0.5
3 C         0.167
``````