Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

dplyr summarise on group not flattening data

I have a dataset:

df <- structure(list(ID = c(101188, 101192, 101193, 101196, 101198, 
101202, 101203, 101206, 101211, 101212, 101216, 101219, 101220, 
101222, 101223, 101224, 101226, 101227, 101228, 101229), LA = c("Barking and Dagenham", 
"Barking and Dagenham", "Barking and Dagenham", "Barking and Dagenham", 
"Barking and Dagenham", "Barking and Dagenham", "Barking and Dagenham", 
"Barking and Dagenham", "Barking and Dagenham", "Barking and Dagenham", 
"Barking and Dagenham", "Barking and Dagenham", "Barking and Dagenham", 
"Barking and Dagenham", "Barking and Dagenham", "Barking and Dagenham", 
"Barking and Dagenham", "Barking and Dagenham", "Barking and Dagenham", 
"Barking and Dagenham"), EstablishmentGroup = c("Local authority maintained schools", 
"Local authority maintained schools", "Local authority maintained schools", 
"Local authority maintained schools", "Local authority maintained schools", 
"Local authority maintained schools", "Local authority maintained schools", 
"Local authority maintained schools", "Local authority maintained schools", 
"Local authority maintained schools", "Local authority maintained schools", 
"Local authority maintained schools", "Local authority maintained schools", 
"Local authority maintained schools", "Local authority maintained schools", 
"Local authority maintained schools", "Local authority maintained schools", 
"Local authority maintained schools", "Local authority maintained schools", 
"Local authority maintained schools")), row.names = c(NA, -20L
), class = c("tbl_df", "tbl", "data.frame"))

If I run the following code I expect the final summarise to flatten the data and tell me

df %>%
  group_by(LA) %>%
  mutate(All_schools = n()) %>%
  ungroup() %>%
  group_by(LA, EstablishmentGroup, All_schools) %>%
  summarise(total = n(),
            per = total/All_schools)

Barking and Dagenham Local authority maintained schools 20 20 1

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

But it gives me 20 rows instead. I could use a distinct, but not sure what I’ve done wrong.

>Solution :

You can summarise the count first, then mutate to calculate the percentage.

df %>%
  group_by(LA) %>%
  mutate(All_schools = n()) %>%
  ungroup() %>% 
  group_by(LA, EstablishmentGroup, All_schools) %>% 
  summarise(total = n()) %>% 
  mutate(per = total/All_schools)

Output:

# A tibble: 1 x 5
# Groups:   LA, EstablishmentGroup [1]
  LA                   EstablishmentGroup                 All_schools total   per
  <chr>                <chr>                                    <int> <int> <dbl>
1 Barking and Dagenham Local authority maintained schools          20    20     1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading