Home A way to mutate a new column in every dataframe in a list of dataframes?

Questions

A way to mutate a new column in every dataframe in a list of dataframes?

October 13, 2023

I have a list of files that I want to import as tibbles, then mutate a new column onto each tibble based on an ifelse statement for a threshold. I didn’t want to use a loop, so I have been trying to make this work by using map() wrapped around read_csv %>% mutate() with no success. If possible, I would like to keep the pipes and/or do both the read csv and mutate steps within the same command.

Here are the two solutions I have tried:

This loop works, but I want to make it better.

filenames <- list.files(path = 'data-cleaned', full.names = TRUE)

for (x in 1:length(filenames)) {
  file <- read_csv(filenames[[x]]) %>%
    mutate(output_col = if_else(target_col >= threshold, "Positive", "Negative"))
  filename <- filenames[[x]]
  write_csv(file, filename)
}

This is my attempted improvement, but I have been getting errors every time I try to change it.

filenames <- list.files(path = 'data-cleaned', full.names = TRUE)

files <- filenames %>%
  map(read_csv %>%
  ~ .x %>% mutate(output_col = if_else(x$target_col >= threshold, "Positive", "Negative"))
   )

Error in `as_mapper()`:
! Can't convert `.f`, a two-sided formula, to a function.

In both of these cases, I am starting from the list of file names, but here is what I would expect the tibbles (simplified) to look like after being read in:

example1 <- tibble(id = 1:50, 
                  target_col = sample(1:2000, 50), 
                  other_col = sample(1:200, 50))

      id target_col other_col
   <int>      <int>     <int>
 1     1        860       118
 2     2        582       120
 3     3       1514       190
 4     4       1266        14
 5     5       1921       137
 6     6        804         1
 7     7        222       139
 8     8       1748        26
 9     9        472       107
10    10       1013       189

And this is what I would expect each tibble to look like after the mutation (except the Positive or Negative would be assigned based on if the target_col met the numeric threshold criteria):

example2 <- tibble(id = 1:50, 
                  target_col = sample(1:2000, 50), 
                  other_col = sample(1:200, 50),
                  output_col = sample(rep(c("Positive", "Negative"), each = 25))

     id  target_col other_col output_col
   <int>      <int>     <int> <chr>     
 1     1       1886       174 Negative  
 2     2       1794        55 Negative  
 3     3        472       178 Positive  
 4     4       1583       195 Negative  
 5     5       1037       110 Positive  
 6     6       1652       158 Negative  
 7     7        152        69 Negative  
 8     8       1396       113 Positive  
 9     9       1016       150 Positive  
10    10       1922       157 Negative

I also want to mention I have tried looking at these existing questions and didn’t find them helpful:

How to use map from purrr with dplyr::mutate to create multiple new columns based on column pairs

How to mutate multiple columns with dynamic variable using purrr:map function?

R using dplyr::mutate() within purrr::map without duplicating rows

Iterating over listed data frames within a piped purrr anonymous function call

Using purrr to iterate over two lists and then pipe into dplyr::filter across a list of data frames

>Solution :

Your use of the formula syntax ~ .x is syntactically incorrect – the formula should start with ~. Another x shows up later in the pipeline which isn’t defined anywhere.

The following is correct:

files <- filenames %>%
  map(~ read_csv(.x) %>%
      mutate(output_col = if_else(target_col >= threshold, "Positive", "Negative")))
    )

purrr only recommends the formula syntax if you require backward compatibility, I would personally use an anonymous function here:

files <- filenames %>% map(\(x) read_csv(x) %>% ...)

mutate

byMR

Published October 13, 2023

Add a comment

NextUI <Navbar> component is not sticking to the top

byMR

October 13, 2023

Questions

restructuring large dataframe

byMR

October 13, 2023

Questions

let `lazy` recalculate when value changed

byMR

October 13, 2023

Questions

How to replace the text between () and the () themselves in Looker Studio?

byMR

October 13, 2023

Questions

Sed copy first string and append it to line

byMR

October 13, 2023

Questions

Python in-memory GZIP on existing file

byMR

October 13, 2023

A way to mutate a new column in every dataframe in a list of dataframes?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

NextUI <Navbar> component is not sticking to the top

restructuring large dataframe

let `lazy` recalculate when value changed

How to replace the text between () and the () themselves in Looker Studio?

Sed copy first string and append it to line

Python in-memory GZIP on existing file

Keep Up to Date with the Most Important News

A way to mutate a new column in every dataframe in a list of dataframes?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

NextUI <Navbar> component is not sticking to the top

restructuring large dataframe

let `lazy` recalculate when value changed

How to replace the text between () and the () themselves in Looker Studio?

Sed copy first string and append it to line

Python in-memory GZIP on existing file

Discover more from Dev solutions