Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

In R, how to get subset by id with column condition

I have the following data frame.

id <- c(1,1,1,2,2,2,3,3,3,4,4,4)
time <- c(1,2,3,1,2,3,1,2,3,1,2,3)
event1 <- c(0,0,1,0,0,1,0,0,0,1,0,0)
event2 <- c(0,1,0,0,0,0,0,1,0,1,0,0)   
event3 <- c(0,0,0,0,0,0,0,1,0,1,0,0)

temp<-data.frame(id,time,event1,event2,event3)


    id time event1 event2 event3
1   1    1      0      0      0
2   1    2      0      1      0
3   1    3      1      0      0
4   2    1      0      0      0
5   2    2      0      0      0
6   2    3      1      0      0
7   3    1      0      0      0
8   3    2      0      1      1
9   3    3      0      0      0
10  4    1      1      1      1
11  4    2      0      0      0
12  4    3      0      0      0

I wish to get a subset that each unique id and the event1, event2, event3 columns if that event happened (event = 1)

So that the output would be like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

    id event1 event2 event3
1   1    1      1      0
2   2    1      0      0
3   3    0      1      1
4   4    1      1      1

I tried to remove duplicates and keep the unique id but that gave me the wrong output since event1 through event3 might not happened in time = 1. The time doesn’t matter. As long as the event happens in one of the three time it counts.

I was trying to replace the 0s with 1 by id if 1 exists in that id for that event but this didn’t work.

temp %>% group_by(id) %>% mutate(id, event1=ifelse(event1==1,1,event1),
                                 event2 = ifelse(event2==1,1,event2),
                                 event3 =ifelse(event3 ==1,1,event3))

Any help would be appreciated.

library(tidyverse)
id <- c(1,1,1,2,2,2,3,3,3,4,4,4)
time <- c(1,2,3,1,2,3,1,2,3,1,2,3)
event1 <- as.factor(c(0,0,1,0,0,1,0,0,0,1,0,0))

event2 <-as.factor(c(0,1,0,0,0,0,0,1,0,1,0,0))

event3 <- as.factor(c(0,0,0,0,0,0,0,1,0,1,0,0))

temp<-data.frame(id,time,event1,event2,event3)

temp[,3:5][temp[,3:5]==0] <-NA
temp2 <- as.data.frame(temp%>% group_by(id)%>% 
                         fill(event1,event2,event3,.direction ="downup")
                       )
temp3 <- temp2[!duplicated(temp2[,'id']),]

temp3[is.na(temp3)] <-0
temp3

This gave my desired result but I feel like it’s overcomplicated.

>Solution :

You can use summarise() + across() in dplyr:

library(dplyr)

temp %>%
  group_by(id) %>%
  summarise(across(contains("event"), sum))

# A tibble: 4 x 4
     id event1 event2 event3
  <dbl>  <dbl>  <dbl>  <dbl>
1     1      1      1      0
2     2      1      0      0
3     3      0      1      1
4     4      1      1      1

The method above calculate counts of each event. If you just want to know whether an event happened or not, try

temp %>%
  group_by(id) %>%
  summarise(across(contains("event"), max))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading