Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How best to do row operations in R

Below is the sample data

   year <- c (2016,2017,2018,2019,2020,2021,2016,2017,2018,2019,2020,2021,2016,2017,2018,2019,2020,2021,2016,2017,2018,2019,2020,2021)
  indcode <- c(71,71,71,71,71,71,72,72,72,72,72,72,44,44,44,44,44,44,45,45,45,45,45,45)
  avgemp <- c(44,46,48,50,55,56,10,11,12,13,14,15,21,22,22,23,25,25,61,62,62,63,69,77)
  ownership <-c(50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50)

  test3 <- data.frame (year,indcode,avgemp,ownership)

The desired result is to have where it sums the avgemp for two specific combinations (71+72 and 44+45) and produces one additional row per year. The items in the parentheses (below) are just there to illustrate which numbers get added. The primary source of my confusion is how to have it select and therefore add certain indcode combinations. My initial thought is that you would pivot wider, add the columns, and the pivot_longer but hoping for something a bit less convoluted.

  year     indcode        avgemp          ownership
  2016       71+72         54 (44+10)         50
  2016         71          44                 50
  2016         72          10
  2017       71+72         57 (46+11)         50
  2018       71+72         60 (48+12)         50
  2019       71+72         63 (50+13)         50
  2020       71+72         69 (55+14)         50
  2021       71+72         71 (56+15)         50

I know that it would start something like this

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

 test3 <- test3 %>% group_by (indcode) %>% mutate("71+72" = (something that filters out 71 and  72) 
  

>Solution :

Using data.table – convert the data.frame to ‘data.table’ with setDT, grouped by ‘year’, ‘ownership’, and the ‘indcode’ created by an ifelse/fcase method), get the sum of ‘avgemp’ as a summarised output

library(data.table)
setDT(test3)[, .(avgemp = sum(avgemp)), .(year, ownership, 
  indcode = fcase(indcode %in% c(71, 72), '71+72', default = '44+45'))]

-output

   year ownership indcode avgemp
    <num>     <num>  <char>  <num>
 1:  2016        50   71+72     54
 2:  2017        50   71+72     57
 3:  2018        50   71+72     60
 4:  2019        50   71+72     63
 5:  2020        50   71+72     69
 6:  2021        50   71+72     71
 7:  2016        50   44+45     82
 8:  2017        50   44+45     84
 9:  2018        50   44+45     84
10:  2019        50   44+45     86
11:  2020        50   44+45     94
12:  2021        50   44+45    102
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading