Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to calculate percentage across a subset of column

I originally asked this question here:-

https://datascience.stackexchange.com/questions/113526/need-help-with-calculating-percentage-across-a-subset-of-column

I need help with calculating % on a subset of data that I have. I am sharing the sample code below for the data:-

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

dff4 <- data.frame(stringsAsFactors = FALSE, check.names = FALSE, 
    Region = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B", 
        "C", "C", "C", "C", "C"), 
Brand = c("B1", "B2", "B3", 
        "B4", "B5", "B1", "B2", "B3", "B4", "B5", "B1", "B2", 
        "B3", "B4", "B5"), 
`2018` = c(2923, 2458, 2812, 2286, 
        1683, 1085, 2805, 3214, 1059, 1866, 3280, 2481, 2016, 
        1230, 1763), 
`2019` = c(2497, 2306, 2264, 3602, 3381, 
        1778, 2470, 2249, 2297, 3264, 1071, 2345, 3815, 3685, 
        1381), 
`2020` = c(3458, 1448, 2033, 1021, 2275, 1527, 
        1316, 2229, 3029, 1054, 3590, 2978, 2633, 3531, 2608), 
`2021` = c(1496, 2196, 1448, 2344, 3853, 3499, 1681, 3282, 
        1693, 2102, 2235, 2007, 3796, 3394, 2421), 
`2022` = c(3759, 
        3371, 2908, 3222, 1720, 2862, 3767, 2544, 3299, 3961, 
        1030, 1268, 2652, 3656, 3053))

Created on 2022-08-15 by the reprex package (v2.0.1)

I want to calculate the % share of Brand B1 by different regions across all columns. So the steps would be like taking a % on parent total like we do in Excel pivot.

I tried using the below code, however it calculates the % on the sum of the total column not the subset of the column REGION.

dff4 %>% mutate(across(where(is.double), ~./sum(.), .names = "perc_{.col}"))
Created on 2022-08-15 by the reprex package (v2.0.1)

I also tried running the code below which gives me the exact answer that I want however I can not replicate it across the columns without writing the code for each column from 2018 to 2022 separately.

transform(dff4, percent = ave(dff4$`2018`, dff4$`Internal region`, 
    FUN = prop.table))
Created on 2022-08-15 by the reprex package (v2.0.1)

Any help would be appreciated.

>Solution :

I think what you’re looking for is to group_by region first:

dff4 <- dff4 %>% 
  group_by(Region) %>% 
  mutate(across(`2018`:`2022`, ~ ./sum(.), .names = "perc_{.col}"))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading