Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can I use a value extracted from a data table to specify columns to subset in R?

I have a dataframe that I want to subset inside a function so that only rows where both columns are either 1 or NA remain. For df:

df <- data.frame(a = c(1,1,0,NA,0,1), 
                 b = c(0,1,0,1,0, NA),
                 c = c(0,0,0,0,0,0))

I want:

   a  b  c
2  1  1  0
4 NA  1  0
6  1 NA  0

The problem I’m having is I have many columns with names that change. So this works well:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

subset(df, (is.na(a) | a == 1) & (is.na(b) | b == 1))

but when column names ‘a’ and ‘b’ become ‘d’ and ‘f’ during the operation of the function it breaks. Specifying by column index works more robustly:

subset(df, (is.na(df[,1]) | df[,1] == 1) & (is.na(df[,2]) | df[,2] == 1))

But is cumbersome, and if a previous processing step messes up and column ‘c’ ends up before ‘a’ or ‘b’ I end up subsetting by the wrong columns.

I also have another dataframe that specifies what the column names to subset by will be:

cro_df <- data.frame(pop = c('c1', 'c2'),
                     p1 = c('a', 'd'),
                     p2 = c('b', 'f'))
  pop p1 p2
1  c1  a  d
2  c2  b  f

I would like to be able to extract the column names from that dataframe to use in my subset function, e.g.:

col1 <- cro_df[cro_df[,'pop']=='c1', 'p1']
subset(df, is.na(col1) | col1 == 1)

This returns an empty dataframe. I have tried turning col1 into a symbol and a factor with no success:

subset(df, as.symbol(col1) == 1)
subset(df, sym(col1) == 1)
subset(df, as.factor(col1) == 1)

And they all return:

[1] a b c
<0 rows> (or 0-length row.names)

Is there a way I can specify my columns to subset using the second dataframe cro_df?

>Solution :

Perhaps this is a good start?

with(cro_df[cro_df$pop == "c1",],
  df[ (is.na(df[[p1]]) | df[[p1]] == 1) & (is.na(df[[p2]]) | df[[p2]] == 1), ]
)
#    a  b c
# 2  1  1 0
# 4 NA  1 0
# 6  1 NA 0

FYI, subset is intended for interactive use, its help page says

Warning:

     This is a convenience function intended for use interactively.
     For programming it is better to use the standard subsetting
     functions like [, and in particular the non-standard evaluation
     of argument ‘subset’ can have unanticipated consequences.
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading