Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

What gets passed to the mutate and modify?

I’m fairly new to R but not new to programming itself. I am using a simplified example of my code here. I have a dataframe that has three columns ( doc_id, tag_list, single_tag) all of which are characters.

df <- data.frame('doc_id' = c('A', 'B', 'C', 'D'),
                 'tag_list' = c("tagA1,tagA2,tagA3", "tagB1,tabB2", "tagC3, tagC4", "tagD1,tagD3,tagD4"),
                 'single_tag' = c("tagA2", NA, "tagC", NA)
                 )

Here is what I’ve been doing:
If the value of single_tag is NA, I try to replace it with the value in tag_list.

df %>% mutate(single_tag = ifelse(is.na(single_tag), tag_list, single_tag))

This works as expected with the following output

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  doc_id          tag_list        single_tag
1      A tagA1,tagA2,tagA3             tagA2
2      B       tagB1,tabB2       tagB1,tabB2
3      C      tagC3, tagC4              tagC
4      D tagD1,tagD3,tagD4 tagD1,tagD3,tagD4

Now I want to do the same thing again, but this time, I would like to replace the first value in tag_list if single_tag is NA (expected output below). Here’s the code I try.

df %>% mutate(single_tag = ifelse(is.na(single_tag), str_split(tag_list, ",")[[1]][1], single_tag))

Expected output (** added for emphasis) :

  doc_id          tag_list single_tag
1      A tagA1,tagA2,tagA3      tagA2
2      B       tagB1,tabB2      **tagB1**
3      C      tagC3, tagC4       tagC
4      D tagD1,tagD3,tagD4      **tagD1**

Actual output (** added for emphasis):

  doc_id          tag_list single_tag
1      A tagA1,tagA2,tagA3      tagA2
2      B       tagB1,tabB2      **tagA1**
3      C      tagC3, tagC4       tagC
4      D tagD1,tagD3,tagD4      **tagA1**

I also tried this with modify_if

df <- df %>% mutate(single_tag = modify_if(.,is.na(single_tag), ~ str_split(tag_list, ",")[[1]][1], .else=single_tag))

I get the following error:

Error in `mutate()`:
ℹ In argument: `single_tag = modify_if(...)`.
Caused by error in `where_if()`:
! length(.p) == length(.x) is not TRUE

I did some digging and found that the length of .x is 3 and of the predicate .p is 4. I have discovered that .p produces a vector of four logical values one for each row in df. .x I presume is only getting the values of the three columns in one row.

While I know some way to achieve what I need, I need to understand what is going on these two cases. I feel like I’m using a traditional way of thinking of how functions and arguments work but somehow it’s different in this case (because of vectorisation perhaps?). I tried reading up the documentation and the code but I am stumped.

I’m on R version 4.2.3 if that matters.

Any help would be appreciated!

>Solution :

Going through your examples in order:

library(tidyverse)

df %>% mutate(single_tag = ifelse(is.na(single_tag), str_split(tag_list, ",")[[1]][1], single_tag))

With this, it’s instructive to look at the output of str_split(tag_list, ","):

str_split(df$tag_list, ",")
[[1]]
[1] "tagA1" "tagA2" "tagA3"

[[2]]
[1] "tagB1" "tabB2"

[[3]]
[1] "tagC3"  " tagC4"

[[4]]
[1] "tagD1" "tagD3" "tagD4"

As you can see, getting the first element of the first list is akin to getting the first thing in the first row of the dataframe, hence your result.

df <- df %>% mutate(single_tag = modify_if(.,is.na(single_tag), tag_list, .else=single_tag))

The issue with this is that .x (the first input of the modify_if), is, per the documentation, meant to be a vector, but you’re passing a dataframe as the first input.

Solutions

  1. Use str_extract() to get everything before the first comma (^ is the start, . is any character, * means match it any number of times, ? makes sure it is not greedy (i.e. it doesn’t just match the whole string if it doesn’t have to), (?=,) is a look ahead for a comma)
df |> mutate(single_tag = ifelse(is.na(single_tag), tag_list, str_extract(tag_list, "^.*?(?=,)")))
  1. split the tag_list column into an actual list column, then take the first element of that (using map()):
df |> mutate(tag_list = str_split(tag_list, ","),
             single_tag = ifelse(is.na(single_tag), map_chr(tag_list, 1), single_tag))
  1. Use map2():
df |> mutate(single_tag = map2_chr(tag_list, single_tag, \(t, s) ifelse(is.na(s), str_split(t, ",")[[1]], s)))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading