Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R: reset values in data frame to zero based on vector with column indices

I have a data frame with integers, like so:

# generate data frame
df = cbind(c(0,102,0,40,0,0), c(22,0,0,0,12,4), c(23,101,55,0,0,0),
           c(0,0,0,414,0,0), c(0,0,61,0,0,112), c(0,0,0,0,20,0))
colnames(df) = c('A', 'T', 'C', 'G', 'N', 'Del')
rownames(df) = c('Pos1', 'Pos2', 'Pos3', 'Pos4', 'Pos5', 'Pos6')
df
           A  T   C   G   N Del
    Pos1   0 22  23   0   0   0
    Pos2 102  0 101   0   0   0
    Pos3   0  0  55   0  61   0
    Pos4  40  0   0 414   0   0
    Pos5   0 12   0   0   0  20
    Pos6   0  4   0   0 112   0

I also have a vector with integers (which correspond to column indices of df):

# generate vector
cols = c(2,3,5,4,6,5)

Now, I want to reset all integers in df to zero that are present in columns with column indices that are listed in the vector, row-by-row. For example, for the first row I want to reset column 2 to zero, for the second row I want to reset column 3 to zero, etc.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I solved this with the following piece of code:

for (i in c(1:nrow(df))) {
    ncol = cols[[i]]
    df[[i, ncol]] = 0
    df
}
df
   
    A  T  C G N Del
    Pos1   0  0 23 0 0   0
    Pos2 102  0  0 0 0   0
    Pos3   0  0 55 0 0   0
    Pos4  40  0  0 0 0   0
    Pos5   0 12  0 0 0   0
    Pos6   0  4  0 0 0   0

As you can see, my code behaves as intended. However, it turns out to be very inefficient on large datasets. I therefore wondered whether there is an alternative that will be considerably faster than using a for-loop.

Note that it looks like I am resetting the maximum value in each row, but this is not the case as in some instances, it is the smaller of the two values that I am resetting to zero. So I cannot simply reset the min or max in each row to zero.

>Solution :

You can use cbind to create a matrix of row and column positions and replace those with 0 as follows.

rows <- seq_len(nrow(df))
df[cbind(rows, cols)] <- 0

Result

df
#       A  T  C G N Del
#Pos1   0  0 23 0 0   0
#Pos2 102  0  0 0 0   0
#Pos3   0  0 55 0 0   0
#Pos4  40  0  0 0 0   0
#Pos5   0 12  0 0 0   0
#Pos6   0  4  0 0 0   0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading