Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

A way to mutate a new column in every dataframe in a list of dataframes?

I have a list of files that I want to import as tibbles, then mutate a new column onto each tibble based on an ifelse statement for a threshold. I didn’t want to use a loop, so I have been trying to make this work by using map() wrapped around read_csv %>% mutate() with no success. If possible, I would like to keep the pipes and/or do both the read csv and mutate steps within the same command.

Here are the two solutions I have tried:

This loop works, but I want to make it better.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

filenames <- list.files(path = 'data-cleaned', full.names = TRUE)

for (x in 1:length(filenames)) {
  file <- read_csv(filenames[[x]]) %>%
    mutate(output_col = if_else(target_col >= threshold, "Positive", "Negative"))
  filename <- filenames[[x]]
  write_csv(file, filename)
}

This is my attempted improvement, but I have been getting errors every time I try to change it.

filenames <- list.files(path = 'data-cleaned', full.names = TRUE)

files <- filenames %>%
  map(read_csv %>%
  ~ .x %>% mutate(output_col = if_else(x$target_col >= threshold, "Positive", "Negative"))
   )

Error in `as_mapper()`:
! Can't convert `.f`, a two-sided formula, to a function.

In both of these cases, I am starting from the list of file names, but here is what I would expect the tibbles (simplified) to look like after being read in:

example1 <- tibble(id = 1:50, 
                  target_col = sample(1:2000, 50), 
                  other_col = sample(1:200, 50))

      id target_col other_col
   <int>      <int>     <int>
 1     1        860       118
 2     2        582       120
 3     3       1514       190
 4     4       1266        14
 5     5       1921       137
 6     6        804         1
 7     7        222       139
 8     8       1748        26
 9     9        472       107
10    10       1013       189

And this is what I would expect each tibble to look like after the mutation (except the Positive or Negative would be assigned based on if the target_col met the numeric threshold criteria):

example2 <- tibble(id = 1:50, 
                  target_col = sample(1:2000, 50), 
                  other_col = sample(1:200, 50),
                  output_col = sample(rep(c("Positive", "Negative"), each = 25))

     id  target_col other_col output_col
   <int>      <int>     <int> <chr>     
 1     1       1886       174 Negative  
 2     2       1794        55 Negative  
 3     3        472       178 Positive  
 4     4       1583       195 Negative  
 5     5       1037       110 Positive  
 6     6       1652       158 Negative  
 7     7        152        69 Negative  
 8     8       1396       113 Positive  
 9     9       1016       150 Positive  
10    10       1922       157 Negative  

I also want to mention I have tried looking at these existing questions and didn’t find them helpful:

How to use map from purrr with dplyr::mutate to create multiple new columns based on column pairs

How to mutate multiple columns with dynamic variable using purrr:map function?

R using dplyr::mutate() within purrr::map without duplicating rows

Iterating over listed data frames within a piped purrr anonymous function call

Using purrr to iterate over two lists and then pipe into dplyr::filter across a list of data frames

>Solution :

Your use of the formula syntax ~ .x is syntactically incorrect – the formula should start with ~. Another x shows up later in the pipeline which isn’t defined anywhere.

The following is correct:

files <- filenames %>%
  map(~ read_csv(.x) %>%
      mutate(output_col = if_else(target_col >= threshold, "Positive", "Negative")))
    )

purrr only recommends the formula syntax if you require backward compatibility, I would personally use an anonymous function here:

files <- filenames %>% map(\(x) read_csv(x) %>% ...)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading