Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Efficiently assigning multiple variables created from a subset of grouped data in R

I am trying to improve the efficiency of code that is already working. For instance, consider the toy dataset below:

df = data.frame(id = c(1,1,2,2,2,3,3,3),
                date = c(12,13,1,4,5,9,10,12),
                visit = c("out","in","in","out","out","out","in","in"))

df
 id date visit
  1   12   out
  1   13    in
  2    1    in
  2    4   out
  2    5   out
  3    9   out
  3   10    in
  3   12    in

What I am trying to do is create separate variables that contain the minimum date (here date is just day of the month) for both in and out visits grouped by ID.

If I try grouping on both variables, the output is correct but only one variable:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df %>% group_by(id,visit) %>%
  mutate(m_date = min(date))
# A tibble: 8 × 4
# Groups:   id, visit [6]
     id  date visit m_date
  <dbl> <dbl> <chr>  <dbl>
1     1    12 out       12
2     1    13 in        13
3     2     1 in         1
4     2     4 out        4
5     2     5 out        4
6     3     9 out        9
7     3    10 in        10
8     3    12 in        10

I’ve also tried mutate(case_when()) which seems to have slow performance and results in the same issue (all results under on variable) as would mutate(if_else()) if I’m not mistaken. The way I’m currently achieving this is a two-step process to filter and then rejoin the results:

df = full_join(df,
    df %>% group_by(id) %>%
    filter(visit == "out") %>%
    mutate(m_date_out = min(date)) %>%
    select(id, m_date_out) %>% distinct(id, .keep_all=TRUE),
    by = "id")

df = full_join(df,
     df %>% group_by(id) %>%
     filter(visit == "in") %>%
     mutate(m_date_in = min(date)) %>%
     select(id, m_date_in) %>% distinct(id, .keep_all=TRUE),
     by = "id")
df
  id date visit m_date_out m_date_in
1  1   12   out         12        13
2  1   13    in         12        13
3  2    1    in          4         1
4  2    4   out          4         1
5  2    5   out          4         1
6  3    9   out          9        10
7  3   10    in          9        10
8  3   12    in          9        10

However this seems overly complicated for what I"m trying to do. Can the more experienced R coders show me a simpler way to achieve the same output? Speed is also relevant as this will be used on a large dataset.

>Solution :

It looks like you’re kind of making a long and wide table at the same, which seems weird to me. But you can just filter directly inside the mutate call like so:

df %>% 
  group_by(id) %>%
  mutate(
    m_date_out = min(date[visit == "out"]),
    m_date_in = min(date[visit == "in"])
  )
# A tibble: 8 × 5
# Groups:   id [3]
     id  date visit m_date_out m_date_in
  <dbl> <dbl> <chr>      <dbl>     <dbl>
1     1    12 out           12        13
2     1    13 in            12        13
3     2     1 in             4         1
4     2     4 out            4         1
5     2     5 out            4         1
6     3     9 out            9        10
7     3    10 in             9        10
8     3    12 in             9        10
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading