Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Filtering columns based on conditions related to IDs

I have the following data.frame

df = data.frame(plot = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2),
                tree = c("1", "1", "1", "1", "2", "2", 
                            "3", "4", "7", "7"),
                trunk = c("1", "2", "3", "4", "1", "2", 
                             "1", "1", "1", "2"),
                name = c("A", "A", "A", "A", "A", "A",
                         "B", "C", "A", "A"),
                time_1 = c("alive", "alive", "dead", "dead",
                           "alive", "alive",
                           "alive",
                           "alive",
                           "dead", "dead"),
                time_2 = c("dead", "alive", "dead", "dead",
                             "dead", "dead",
                             "dead",
                             "dead",
                             "dead", "dead"))

To rapidly explain the context, I have for each plot a number of tree and each tree can have a single trunk or multiples trunk. What I’m trying to do is keep only tree that have time_1 == "alive" and time_2 == "dead". A tree can have multiple "dead" trunk, but if a single trunk is alive, then I consider the tree to be "alive".

So, the first thing I did was add some identifiers for each tree and trunk:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

#Adding an ID for each trunk in each plot
df$trunk_id <- paste(df$plot, "_",
                   df$tree, "_",
                   df$trunk,
                   sep = "")

#Adding an ID for each tree in each plot  
df$tree_id <- paste(df$plot, "_",
                    df$tree,
                    sep = "")                

Then, I was filtering only cases where the time_1 == "alive" and time_2 == "dead".

df2 <- df %>% filter(time_1 == "alive" & time_2 == "dead")

However, I noticed that this would not return exactly what I wanted. For example, looking at df when compared to df2, I know for a fact that I don’t want plot == 1 and tree_id == "1_1" because at least one of the trunk is "alive" (see bold above). And filtering like that would not remove these cases.

What type of condition should I add to consider the entirety of the time_1 when related to each tree with multiple trunk?

My ideal output would be these IDs, so I’d be able to filter out what is irrelevant

output <- c("1_2", "2_3", "2_4")

>Solution :

You can try adding all() in your condition, i.e.

library(dplyr)

df %>% 
 group_by(plot, tree) %>% 
 filter(all(time_1 == 'alive') & all(time_2 == 'dead'))

# A tibble: 4 × 6
# Groups:   plot, tree [3]
   plot tree  trunk name  time_1 time_2
  <dbl> <chr> <chr> <chr> <chr>  <chr> 
1     1 2     1     A     alive  dead  
2     1 2     2     A     alive  dead  
3     2 3     1     B     alive  dead  
4     2 4     1     C     alive  dead  
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading