Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

lapply returns the same value for every data frame element

I created a function that splits a string by ":" and takes the first element, which is the information I need from a vcf:

remove_semicolon = function(x){
    newstr = strsplit(x,":")[[1]][1]
    return(newstr)
}

I wish to apply it to every element of a data frame such as the following:

>rubbish
              NS05                   NS113                   NS137
1              0/0:1                  0/0:15                  0/0:25
2              0/0:1                  0/0:15                  0/0:25
3              0/0:1                  0/0:16                  0/0:25
4 1/1:0,1:1:3:39,3,0 1/1:0,16:16:48:621,48,0 1/1:0,26:26:78:969,78,0
5              0/0:1                  0/0:16                  0/0:29

So that for rubbish[1,1] the desired output is "0/0", for rubbish[4,1] "1/1" etc, with the matrix/data frame structure left intact. However,

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

 rubbish[]=lapply(rubbish,remove_semicolon)

returns:

> rubbish
NS05 NS113 NS137
1  0/0   0/0   0/0
2  0/0   0/0   0/0
3  0/0   0/0   0/0
4  0/0   0/0   0/0
5  0/0   0/0   0/0

even though, in contrast,

sapply(rubbish[,1],remove_semicolon)

returns what I want, i.e. a vector 0/0, 0/0, 0/0, 1/1, 0/0 rather than all 0/0:

         0/0:1              0/0:1              0/0:1 1/1:0,1:1:3:39,3,0 
         "0/0"              "0/0"              "0/0"              "1/1" 
         0/0:1 
         "0/0" 

What am I doing incorrectly when implement lapply? Shouldn’t it just apply the remove_semicolon function to every element of rubbish in the same way that sapply does it for every element of a column vector?

>Solution :

Using apply(., MARGIN = 1:2, .) seems to work:

rubbish[] <- apply(rubbish, 1:2, remove_semicolon)
 NS05 NS113 NS137
1  0/0   0/0   0/0
2  0/0   0/0   0/0
3  0/0   0/0   0/0
4  1/1   1/1   1/1
5  0/0   0/0   0/0

If you look at the output of lapply(rubbish, remove_semicolon) (before assigning it back to the data frame) you’ll see that each element of the output is a length-1 vector (which then gets replicated to fill the column). This happens because remove_semicolon isn’t vectorized.

This would work with lapply():

rs2 <- function(x) {
   newstr <- strsplit(x, ":")
   return(sapply(newstr, head, 1))
}
lapply(rubbish, rs2)

Another alternative would be to use gsub() (or stringr::str_extract) with a regular expression, e.g.

rs3 <- function(x) gsub("^([^:]+):.*$", "\\1", x)

(it does admittedly look a little like magic)

example

rubbish <- read.table(header = TRUE, text = "
             NS05                   NS113                   NS137
              0/0:1                  0/0:15                  0/0:25
              0/0:1                  0/0:15                  0/0:25
              0/0:1                  0/0:16                  0/0:25
 1/1:0,1:1:3:39,3,0 1/1:0,16:16:48:621,48,0 1/1:0,26:26:78:969,78,0
              0/0:1                  0/0:16                  0/0:29
")
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading