Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Apply a particular function in all files of a folder using R

I have developed a particular R function named DNAdupstability for some Biological analysis which requires input using as fasta file (.fasta/.txt) which returns a dataframe in this format:

 Sequence Position8 Position9 Position10 Position11 Position12 Position13
1        1 -1.473571 -1.473571  -1.462143  -1.412143  -1.412143  -1.371429
  Position14 Position15 Position16 Position17 Position18 Position19 Position20
1  -1.372143       -1.4  -1.428571  -1.439286  -1.430714  -1.420714  -1.397143

This is a random dataframe and it continues to n positions on the basis of the input sequence. I have a folder named Random_fasta which has 1333 equal length but different fasta sequences. The developed function DNAdupstability gives the desired outcome for a single fasta sequence (the above mentioned dataframe) from the folder Random_fasta, but now I want to carry out analysis of all the other 1332 sequences using the same DNAdupstability function and a form a combined dataframe similar to this format for all the sequences

  Sequence Position8 Position9 Position10 Position11 Position12 Position13
1        1 -1.434286 -1.434286  -1.446429  -1.435714  -1.445714  -1.509286
2        2 -1.522143 -1.492143  -1.463571  -1.435714  -1.492857  -1.544286
3        3 -1.232857 -1.265000  -1.333571  -1.328571  -1.330000  -1.329286
4        4 -1.799286 -1.799286  -1.799286  -1.799286  -1.730714  -1.735714
5        5 -1.547143 -1.507143  -1.535714  -1.530714  -1.478571  -1.450714
  Position14 Position15 Position16 Position17 Position18 Position19 Position20
1  -1.452143  -1.402143  -1.390000  -1.457143  -1.509286  -1.498571  -1.458571
2  -1.544286  -1.544286  -1.544286  -1.544286  -1.601429  -1.715000  -1.755000
3  -1.340000  -1.328571  -1.333571  -1.344286  -1.384286  -1.446429  -1.486429
4  -1.667143  -1.605000  -1.536429  -1.486429  -1.536429  -1.605000  -1.600000
5  -1.450714  -1.450714  -1.412143  -1.372143  -1.434286  -1.531429  -1.615000

So that I could calculate the position-wise mean which will then be further used for some visualization using ggplot2. Is there any way that I could apply the same functions in all the files of the folder particularly using R and get the desired combined dataframe? Any help will be greatly appreciated!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

One option is to recursively return all the files from the main folder with list.files, then apply the custom fuction by looping over the files, and convert to a single data.frame with do.call(rbind

files <- list.files('path/to/your/folder', recursive = TRUE, 
  pattern = "\\.txt$", full.names = TRUE)
lst1 <- lapply(files, DNAdupstability)
out <- do.call(rbind, lst1)

Or we can use map from purrr with _dfr to combine all the output from the list to a single data.frame

library(purrr)
out <- map_dfr(files, DNAdupstability)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading