Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R dplyr filter data based on values in other rows

I am trying to filter a data frame using dplyr and I can’t really think of a way to achieve what I want. I have a data frame of the following form:

A    B    C
-----------
1    2    5
1    4    6
2    2    7
2    4    6

Each value in column A appears exactly 2 times. Column B has exactly 2 different values, each appearing exactly once for each value of A. Column C can have any positive values. I want to keep all rows where for one value of A, the row with the bigger B value has a smaller C value than the row with the smaller B value. In the example above, this would result in:

A    B    C
-----------
2    2    7
2    4    6

Is there a way to achieve this using dplyr?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Sort by A and B to ensure that the larger B is always the second within A and then grouping by A use a filter based on diff(C) < 0.

library(dplyr)

DF %>%
  arrange(A, B) %>%
  group_by(A) %>%
  filter((diff(C) < 0)) %>%
  ungroup

giving:

# A tibble: 2 × 3
      A     B     C
  <int> <int> <int>
1     2     2     7
2     2     4     6

Note

Lines <- "A    B    C
1    2    5
1    4    6
2    2    7
2    4    6"

DF <- read.table(text = Lines, header = TRUE)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading