Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

additional arguments to purrr:map don't work as expected

I’m using the purrr::map function to iterate over several columns and tidy the result. for a short example, I provide the following code:

library(tidymodels)
library(broom)

> penguins %>% 
+   select(where(is.numeric)) %>% 
+   map(\(x) lm(x ~ penguins$species, .)) %>% 
+   map_df(broom::tidy, .id = "var")
# A tibble: 12 × 6
   var               term                       estimate std.error statistic   p.value
   <chr>             <chr>                         <dbl>     <dbl>     <dbl>     <dbl>
 1 bill_length_mm    (Intercept)                 38.8       0.241    161.    2.47e-322
 2 bill_length_mm    penguins$speciesChinstrap   10.0       0.432     23.2   4.23e- 72
 3 bill_length_mm    penguins$speciesGentoo       8.71      0.360     24.2   5.33e- 76
 4 bill_depth_mm     (Intercept)                 18.3       0.0912   201.    0        
 5 bill_depth_mm     penguins$speciesChinstrap    0.0742    0.164      0.453 6.50e-  1
 6 bill_depth_mm     penguins$speciesGentoo      -3.36      0.136    -24.7   7.93e- 78
 7 flipper_length_mm (Intercept)                190.        0.540    351.    0        
 8 flipper_length_mm penguins$speciesChinstrap    5.87      0.970      6.05  3.79e-  9
 9 flipper_length_mm penguins$speciesGentoo      27.2       0.807     33.8   1.84e-110
10 body_mass_g       (Intercept)               3701.       37.6       98.4   2.49e-251
11 body_mass_g       penguins$speciesChinstrap   32.4      67.5        0.480 6.31e-  1
12 body_mass_g       penguins$speciesGentoo    1375.       56.1       24.5   5.42e- 77

This works as expected.

However, usually when I map functions with additional arguments, I use an anonymous function as suggested in the doc.
When I try it in this example, only changing the last line of the code from previous code, I get the tidy table with all regerssions results, but without the "var" column which tells me the variable included in the regression

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

> penguins %>% 
+   select(where(is.numeric)) %>% 
+   map(\(x) lm(x ~ penguins$species, .)) %>% 
+   map_df(\(x) broom::tidy(x, .id = "var"))
# A tibble: 12 × 5
   term                       estimate std.error statistic   p.value
   <chr>                         <dbl>     <dbl>     <dbl>     <dbl>
 1 (Intercept)                 38.8       0.241    161.    2.47e-322
 2 penguins$speciesChinstrap   10.0       0.432     23.2   4.23e- 72
 3 penguins$speciesGentoo       8.71      0.360     24.2   5.33e- 76
 4 (Intercept)                 18.3       0.0912   201.    0        
 5 penguins$speciesChinstrap    0.0742    0.164      0.453 6.50e-  1
 6 penguins$speciesGentoo      -3.36      0.136    -24.7   7.93e- 78
 7 (Intercept)                190.        0.540    351.    0        
 8 penguins$speciesChinstrap    5.87      0.970      6.05  3.79e-  9
 9 penguins$speciesGentoo      27.2       0.807     33.8   1.84e-110
10 (Intercept)               3701.       37.6       98.4   2.49e-251
11 penguins$speciesChinstrap   32.4      67.5        0.480 6.31e-  1
12 penguins$speciesGentoo    1375.       56.1       24.5   5.42e- 77
> penguins %>% 
+   select(where(is.numeric)) %>% 
+   map(\(x) lm(x ~ penguins$species, .)) %>% 
+   map_df(~ broom::tidy(.x, .id = "var"))
# A tibble: 12 × 5
   term                       estimate std.error statistic   p.value
   <chr>                         <dbl>     <dbl>     <dbl>     <dbl>
 1 (Intercept)                 38.8       0.241    161.    2.47e-322
 2 penguins$speciesChinstrap   10.0       0.432     23.2   4.23e- 72
 3 penguins$speciesGentoo       8.71      0.360     24.2   5.33e- 76
 4 (Intercept)                 18.3       0.0912   201.    0        
 5 penguins$speciesChinstrap    0.0742    0.164      0.453 6.50e-  1
 6 penguins$speciesGentoo      -3.36      0.136    -24.7   7.93e- 78
 7 (Intercept)                190.        0.540    351.    0        
 8 penguins$speciesChinstrap    5.87      0.970      6.05  3.79e-  9
 9 penguins$speciesGentoo      27.2       0.807     33.8   1.84e-110
10 (Intercept)               3701.       37.6       98.4   2.49e-251
11 penguins$speciesChinstrap   32.4      67.5        0.480 6.31e-  1
12 penguins$speciesGentoo    1375.       56.1       24.5   5.42e- 77

What is the reason for this behavior?

>Solution :

The problem is that .id = "var" is not an argument for broom::tidy, but for purrr::map_df(). Under that hood purrr::map_df() is like purrr::map(), returning a list. But then it calls dplyr::bind_rows(), creating a data frame. The .id argument is passed to that function. When you provide .id to bind_rows(), it turns the names of the list into a column with the name provided in the .id argument. broom::tidy() discards the .id argument unless the tidying method has such an argument. This is why you are missing your column.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading