Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Sum Columns in a dataframe where the names match a vector list

I have a dataframe made up largely of integers and community names.
I have made a list of the community names grouped by their regions like so;

RegionA <- c(a,c,d)
RegionB <- c(b,e,f)
RegionC <- c(g,h,i)

    Year     a     b     c     d     e     f     g     h     i   `5`
   <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int> <dbl>
 1  2021    61    44     1    78    37    46    33    16    57     5
 2  2020    60    54    60     2    72    59    60    34    60     5
 3  2019    53    77    39    66    85    82    65    95    50     5
 4  2018    78    20    63    26    41    29    19    82    46     5
 5  2017    62    38    22    23     6    11    20    51    65     5
 6  2021    39    15    38    74    90    83    73    12    71     5
 7  2020    28    23    76    57   100    89    62    14    56     5
 8  2019    82    48    40    45    93    72    40    45    29     5
 9  2018    13    69   100    13     5    52    99    52    47     5
10  2017    92    13    13    96    98    17    46    49    74     5

I am trying to select the names from the Regions vector and sum them in a new columns

I have tried using

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df <- df %>%
   mutate(Region_A = rowSums(select(., colnames %in% RegionA)))

and

df <- df %>%
   rowwise %>%
   mutate(Region_A = sum(c_across(where(colnames %in% RegionA))))

with no success, getting this error

Caused by error in `match()`:
! 'match' requires vector arguments

What could be the proper solution?

>Solution :

A possible solution:

library(dplyr)

RegionA <- c("a","c","d")
RegionB <- c("b","e","f")
RegionC <- c("g","h","i")

df %>% 
  rowwise %>% 
  mutate(RegionA = sum(c_across(all_of(RegionA))),
         RegionB = sum(c_across(all_of(RegionB))),
         RegionC = sum(c_across(all_of(RegionC)))) %>% 
  ungroup

#> # A tibble: 10 × 13
#>     Year     a     b     c     d     e     f     g     h     i RegionA RegionB
#>    <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>   <int>   <int>
#>  1  2021    61    44     1    78    37    46    33    16    57     140     127
#>  2  2020    60    54    60     2    72    59    60    34    60     122     185
#>  3  2019    53    77    39    66    85    82    65    95    50     158     244
#>  4  2018    78    20    63    26    41    29    19    82    46     167      90
#>  5  2017    62    38    22    23     6    11    20    51    65     107      55
#>  6  2021    39    15    38    74    90    83    73    12    71     151     188
#>  7  2020    28    23    76    57   100    89    62    14    56     161     212
#>  8  2019    82    48    40    45    93    72    40    45    29     167     213
#>  9  2018    13    69   100    13     5    52    99    52    47     126     126
#> 10  2017    92    13    13    96    98    17    46    49    74     201     128
#> # … with 1 more variable: RegionC <int>
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading