Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

map inside a mutate exploding number of rows in output tibble

Say I have data like this:

d <- tibble::tribble(
  ~sit_comfy_sofa_1, ~sit_comfy_sofa_2, ~sit_comfy_sofa_3, ~sit_comfy_sofa_4, ~sit_comfy_couch_1, ~sit_comfy_couch_2, ~sit_comfy_couch_3, ~sit_comfy_couch_4, ~sit_comfy_settee_1, ~sit_comfy_settee_2, ~sit_comfy_settee_3, ~sit_comfy_settee_4,
                 1L,                0L,                0L,                0L,                 0L,                 1L,                 0L,                 0L,                  0L,                  0L,                  1L,                  0L,
                 0L,                0L,                0L,                1L,                 0L,                 0L,                 0L,                 1L,                  0L,                  1L,                  0L,                  0L,
                 0L,                1L,                0L,                0L,                 1L,                 0L,                 0L,                 0L,                  1L,                  0L,                  0L,                  0L,
                 0L,                0L,                1L,                0L,                 0L,                 0L,                 1L,                 0L,                  0L,                  0L,                  0L,                  1L
  )

This tibble has three ‘categories’ of columns, one for _sofa_, one for _couch_, and one for _settee_. I’m trying to look across each category, and construct a new variable that has a conditional value based on whether each of the columns within a category == 1.

I wrote this function to attempt that:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

cleaning_fcn <- function(.df, .x){
  .df %>% 
    mutate(!!sym(paste0("explain_", .x)) := case_when(
      !!sym(paste0("sit_comfy_", .x ,"_1")) == 1 ~ "Just better",
      !!sym(paste0("sit_comfy_", .x, "_2")) == 1 ~ "Nice shape",
      !!sym(paste0("sit_comfy_", .x ,"_3")) == 1 ~ "Like the color",
      !!sym(paste0("sit_comfy_", .x ,"_4")) == 1 ~ "Nice material"),
      !!sym(paste0("explain_", .x)) := factor(!!sym(paste0("explain_", .x)), 
                                               levels = c("Just better", "Nice shape",
                                                          "Like the color", "Nice material")))
}

However, when I call it I end up with a tibble that has 3x as many rows the original tibble.

require(tidyverse)

purrr::map_dfr(
    .x = tidyselect::all_of(c("sofa", "couch", "settee")),
    .f = ~ cleaning_fcn(.df = d, .x))

Can anyone see where i’m going wrong?

Essentially, I want to achieve the same as the code below but ideally it’d be a function (and just generally with a lot less repetition):

d <- d %>% 
  mutate(explain_sofa = case_when(
    sit_comfy_sofa_1 == 1 ~ "Just better",
    sit_comfy_sofa_2 == 1 ~ "Nice shape",
    sit_comfy_sofa_3 == 1 ~ "Like the color",
    sit_comfy_sofa_4 == 1 ~ "Nice material"),
    explain_sofa = factor(explain_sofa, levels = c("Just better", "Nice shape",
                                                   "Like the color", "Nice material")))
d <- d %>% 
  mutate(explain_couch = case_when(
    sit_couch_sofa_1 == 1 ~ "Just better",
    sit_couch_sofa_2 == 1 ~ "Nice shape",
    sit_couch_sofa_3 == 1 ~ "Like the color",
    sit_couch_sofa_4 == 1 ~ "Nice material"),
    explain_couch = factor(explain_couch, levels = c("Just better", "Nice shape",
                                                   "Like the color", "Nice material")))

d <- d %>% 
  mutate(explain_settee = case_when(
    sit_settee_sofa_1 == 1 ~ "Just better",
    sit_settee_sofa_2 == 1 ~ "Nice shape",
    sit_settee_sofa_3 == 1 ~ "Like the color",
    sit_settee_sofa_4 == 1 ~ "Nice material"),
    explain_settee = factor(explain_settee, levels = c("Just better", "Nice shape",
                                                    "Like the color", "Nice material")))

>Solution :

Using map_dfr you are creating a list of dataframes, one for each of your categories, which is then bind by rows. Hence you end up with a dataframe with 3 times the number of rows. One option would be to use purrr::reduce instead:

library(tidyverse)

purrr::reduce(.x = c("sofa", "couch", "settee"), .f = cleaning_fcn, .init = d)
#> # A tibble: 4 × 15
#>   sit_comfy_sofa_1 sit_comfy_sofa_2 sit_comfy_sofa_3 sit_comfy_sofa_4
#>              <int>            <int>            <int>            <int>
#> 1                1                0                0                0
#> 2                0                0                0                1
#> 3                0                1                0                0
#> 4                0                0                1                0
#> # ℹ 11 more variables: sit_comfy_couch_1 <int>, sit_comfy_couch_2 <int>,
#> #   sit_comfy_couch_3 <int>, sit_comfy_couch_4 <int>, sit_comfy_settee_1 <int>,
#> #   sit_comfy_settee_2 <int>, sit_comfy_settee_3 <int>,
#> #   sit_comfy_settee_4 <int>, explain_sofa <fct>, explain_couch <fct>,
#> #   explain_settee <fct>
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading