Home What gets passed to the mutate and modify?

Questions

What gets passed to the mutate and modify?

September 29, 2023

I’m fairly new to R but not new to programming itself. I am using a simplified example of my code here. I have a dataframe that has three columns ( doc_id, tag_list, single_tag) all of which are characters.

df <- data.frame('doc_id' = c('A', 'B', 'C', 'D'),
                 'tag_list' = c("tagA1,tagA2,tagA3", "tagB1,tabB2", "tagC3, tagC4", "tagD1,tagD3,tagD4"),
                 'single_tag' = c("tagA2", NA, "tagC", NA)
                 )

Here is what I’ve been doing:
If the value of single_tag is NA, I try to replace it with the value in tag_list.

df %>% mutate(single_tag = ifelse(is.na(single_tag), tag_list, single_tag))

This works as expected with the following output

  doc_id          tag_list        single_tag
1      A tagA1,tagA2,tagA3             tagA2
2      B       tagB1,tabB2       tagB1,tabB2
3      C      tagC3, tagC4              tagC
4      D tagD1,tagD3,tagD4 tagD1,tagD3,tagD4

Now I want to do the same thing again, but this time, I would like to replace the first value in tag_list if single_tag is NA (expected output below). Here’s the code I try.

df %>% mutate(single_tag = ifelse(is.na(single_tag), str_split(tag_list, ",")[[1]][1], single_tag))

Expected output (** added for emphasis) :

  doc_id          tag_list single_tag
1      A tagA1,tagA2,tagA3      tagA2
2      B       tagB1,tabB2      **tagB1**
3      C      tagC3, tagC4       tagC
4      D tagD1,tagD3,tagD4      **tagD1**

Actual output (** added for emphasis):

  doc_id          tag_list single_tag
1      A tagA1,tagA2,tagA3      tagA2
2      B       tagB1,tabB2      **tagA1**
3      C      tagC3, tagC4       tagC
4      D tagD1,tagD3,tagD4      **tagA1**

I also tried this with modify_if

df <- df %>% mutate(single_tag = modify_if(.,is.na(single_tag), ~ str_split(tag_list, ",")[[1]][1], .else=single_tag))

I get the following error:

Error in `mutate()`:
ℹ In argument: `single_tag = modify_if(...)`.
Caused by error in `where_if()`:
! length(.p) == length(.x) is not TRUE

I did some digging and found that the length of .x is 3 and of the predicate .p is 4. I have discovered that .p produces a vector of four logical values one for each row in df. .x I presume is only getting the values of the three columns in one row.

While I know some way to achieve what I need, I need to understand what is going on these two cases. I feel like I’m using a traditional way of thinking of how functions and arguments work but somehow it’s different in this case (because of vectorisation perhaps?). I tried reading up the documentation and the code but I am stumped.

I’m on R version 4.2.3 if that matters.

Any help would be appreciated!

>Solution :

Going through your examples in order:

library(tidyverse)

df %>% mutate(single_tag = ifelse(is.na(single_tag), str_split(tag_list, ",")[[1]][1], single_tag))

With this, it’s instructive to look at the output of str_split(tag_list, ","):

str_split(df$tag_list, ",")
[[1]]
[1] "tagA1" "tagA2" "tagA3"

[[2]]
[1] "tagB1" "tabB2"

[[3]]
[1] "tagC3"  " tagC4"

[[4]]
[1] "tagD1" "tagD3" "tagD4"

As you can see, getting the first element of the first list is akin to getting the first thing in the first row of the dataframe, hence your result.

df <- df %>% mutate(single_tag = modify_if(.,is.na(single_tag), tag_list, .else=single_tag))

The issue with this is that .x (the first input of the modify_if), is, per the documentation, meant to be a vector, but you’re passing a dataframe as the first input.

Solutions

Use str_extract() to get everything before the first comma (^ is the start, . is any character, * means match it any number of times, ? makes sure it is not greedy (i.e. it doesn’t just match the whole string if it doesn’t have to), (?=,) is a look ahead for a comma)

df |> mutate(single_tag = ifelse(is.na(single_tag), tag_list, str_extract(tag_list, "^.*?(?=,)")))

split the tag_list column into an actual list column, then take the first element of that (using map()):

df |> mutate(tag_list = str_split(tag_list, ","),
             single_tag = ifelse(is.na(single_tag), map_chr(tag_list, 1), single_tag))

Use map2():

df |> mutate(single_tag = map2_chr(tag_list, single_tag, \(t, s) ifelse(is.na(s), str_split(t, ",")[[1]], s)))

mutate

byMR

Published September 29, 2023

Add a comment

python sqlite query that only updates values if the variables used are not NULL

byMR

September 29, 2023

Questions

Tricky calculation refresh based on columns using Pandas

byMR

September 29, 2023

Questions

rbind 2 data frames by the elements in 2 columns, avoiding nested loops

byMR

September 29, 2023

Questions

Haskell dot operator with multiple parameters

byMR

September 29, 2023

Questions

Redirection of python script output to file not working

byMR

September 29, 2023

Questions

Type '() -> N' cannot conform to 'View'

byMR

September 29, 2023

What gets passed to the mutate and modify?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Solutions

Like this:

Leave a ReplyCancel reply

Read more

python sqlite query that only updates values if the variables used are not NULL

Tricky calculation refresh based on columns using Pandas

rbind 2 data frames by the elements in 2 columns, avoiding nested loops

Haskell dot operator with multiple parameters

Redirection of python script output to file not working

Type '() -> N' cannot conform to 'View'

Keep Up to Date with the Most Important News

What gets passed to the mutate and modify?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Solutions

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

python sqlite query that only updates values if the variables used are not NULL

Tricky calculation refresh based on columns using Pandas

rbind 2 data frames by the elements in 2 columns, avoiding nested loops

Haskell dot operator with multiple parameters

Redirection of python script output to file not working

Type '() -> N' cannot conform to 'View'

Discover more from Dev solutions