Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I sum unique observations (var levels) for each level of another variable?

I have a dataset with 20000+ observations. I converted var ‘worker’ into a factor to find out how many workers there were (329 levels). Var org (the organisation workers belong to) is also a factor. The n column corresponds to the number of projects each worker completed with their org.

I need to know how many workers each org has (NOT projects/observations). How do I go about it?

    > data %>% group_by(data$org, data$worker) %>% count(data$org)
# A tibble: 329 x 3
# Groups:   data$org, data$worker[329]
   `data$org`         `data$worker`                  n
   <fct>            <fct>                          <int>
 1 ccdi-3           0ba0867be1b6c27645801da785cd00    19
 2 ccdi-3           0bee59c83cf1d6f269911722936d91   134
 3 ccdi-3           0d3280866efc2d9634f448cafcb520    79
 4 ccdi-3           12b7978ff64eab8022d858792c2235   135
 5 ccdi-3           19f3bf29cddde4018769536ff96f2b    85
 6 ccdi-3           1bd6a0f3b6c14a319923896c60ecc6   123
 7 ccdi-3           20761d73cccc97d45a143b1f776e90    32
 8 ccdi-3           2231d8b26e563b373c474003ba2f2c   124
 9 ccdi-3           2530d02cb21de9541bf498410a9926    45
10 ccdi-3           2ac3cb439ace387a397398484297bd     3
# ... with 319 more rows

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You could group by org and summarise with n_distinct:

library(tidyverse)

# Sample data
df <- tribble(
  ~org, ~worker, ~n,
  "ccdi-3", "0ba0867be1b6c27645801da785cd00", 19,
  "ccdi-3", "0bee59c83cf1d6f269911722936d91", 134,
  "ccdi-3", "0d3280866efc2d9634f448cafcb520", 79,
  "ccdi-3", "12b7978ff64eab8022d858792c2235", 135,
  "ccdi-3", "19f3bf29cddde4018769536ff96f2b", 85,
  "ccdi-3", "1bd6a0f3b6c14a319923896c60ecc6", 123
)

df |> 
  group_by(org) |> 
  summarise(n_workers = n_distinct(worker)) |> 
  ungroup()

#> # A tibble: 1 × 2
#>   org    n_workers
#>   <chr>      <int>
#> 1 ccdi-3         6

Created on 2022-07-18 by the reprex package (v2.0.1)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading