Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I replace the value of an observation based on another variable's values, within a group?

I’m trying to find a way to replace values based on the value of another observation within a group. The replacement value is linked to the value observed through another variable.
In particular, I’m looking for a way to transform the dataframe below, to ensure that the variable "Flow" does not include the values 0, but the negative of the inversed variable "Border").
Let’s assume the following dataframe:

df <- data.frame("Hour" = c(rep(1, 6), rep(2, 6)),
                 "Border" = rep(c("BE - NL", "NL - BE", "BE - FR", "FR - BE", "BE - DE", "DE - BE"), 2),
                 "Flow" = c(0, 100, 200, 0, 150, 0, 250, 0, 50, 0, 300, 0))

What I want is that the value in row 1 (observation for Hour = 1, Border = "BE – NL") equals the negative value in row 2 (observation for Hour = 1, Border = "NL – BE"). The observation in row 4 (Hour = 1, Border = "FR – BE") should be the same as in row 3 (Hour = 1, Border = "BE – FR"). Same thing for the observations in other hours: whenever a value ("Flow") is equal to zero, it should take on the negative of the Flow value of the observation in the same hour, but where the Border is inversed.

Any suggestions on how to do this? If this would work within dplyr, that would be great, but other suggestions are more then welcome!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

My thinking was to use the group_by %>% mutate logic, combined with a case_when(), to modify only the Flow = 0 values, but can’t figure out how to reference / subset the Border variable:

df %>% 
  group_by(Hour) %>% 
  mutate(Flow = case_when(Flow != 0 ~ Flow,
                          Flow == 0 ~ -Flow[#not sure how to reference the subset])

>Solution :

Here is one option where we split the column ‘Border’ into two, rearrange the column values and use that in grouping to replace the 0 values with the corresponding non-zero negative Flow value

library(dplyr)
library(tidyr)
df %>% 
  separate(Border, into = c("b1", 'b2'), sep = "\\s+-\\s+", 
    remove = FALSE) %>% 
  group_by(Hour, b1new = pmin(b1, b2), b2new = pmax(b1, b2)) %>% 
  mutate(Flow = replace(Flow, Flow == 0, -Flow[Flow != 0])) %>% 
  ungroup %>% 
  select(names(df))

-output

# A tibble: 12 × 3
    Hour Border   Flow
   <dbl> <chr>   <dbl>
 1     1 BE - NL  -100
 2     1 NL - BE   100
 3     1 BE - FR   200
 4     1 FR - BE  -200
 5     1 BE - DE   150
 6     1 DE - BE  -150
 7     2 BE - NL   250
 8     2 NL - BE  -250
 9     2 BE - FR    50
10     2 FR - BE   -50
11     2 BE - DE   300
12     2 DE - BE  -300
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading