Passing multiple column names to R function

Following on from my previous question, I’m trying to create a function using tidyr::complete that can fill in a grouped/summarised tibble with missing dates, with NA for relevant values, as an intermediate step before further calculations.

I’ve almost got the function working, but am having trouble with passing column names as arguments.

For reference, more info on what the function is trying to do is below. What I have so far is:

complete_dates <- function(data, datevar, grouping_vars) {
  calendar <- expand_grid("{{datevar}}" := seq(min(pull(data %>% select({{datevar}}))),  # Extract date vector from data 
                                               max(pull(data %>% select({{datevar}}))),by="1 day"))
  calendar %>% 
    left_join(data) %>% 
    ungroup() %>%
    complete({{datevar}}, {{grouping_vars}}) %>%
    filter(!if_any({{grouping_vars}}, is.na))
}

The problem arises in the line complete({{datevar}}, {{grouping_vars}}). As the name implies, I want to be able to pass multiple column names to include in the complete step. (It’s called grouping_vars because it corresponds to the columns used for the original group_by %>% summarise in the first place.)

But while the syntax above works with a single column name, it doesn’t work with a character vector of column names, e.g. c("GroupA", "GroupB").

I’ve read various SO articles about passing column names to R functions but I’m still an R noob and don’t fully grasp the dplyr syntax, even after reading the relevant blog post. Can anyone advise on the syntax I need please?


Info on function in question:

Basically, I’m starting with something like this:

grouped <- data %>% group_by(Date, Group) %>% summarise(mean = mean(Value))
head(grouped)
# A tibble: 6 × 3
# Groups:   Date [4]
  Date       Group  mean
  <date>     <fct> <dbl>
1 2021-02-18 A      37.4
2 2021-02-19 B      25.5
3 2021-02-19 A      26.1
4 2021-02-22 B      34.2
5 2021-02-22 A      26.4
6 2021-02-23 B      34.2

And want to get something like this:

   Date       Group  mean
   <date>     <fct> <dbl>
 1 2021-02-18 B      NA  
 2 2021-02-18 A      37.4
 3 2021-02-19 B      25.5
 4 2021-02-19 A      26.1
 5 2021-02-20 B      NA  
 6 2021-02-20 A      NA  
 7 2021-02-21 B      NA  
 8 2021-02-21 A      NA  
 9 2021-02-22 B      34.2
10 2021-02-22 A      26.4

where the missing dates are now there, with relevant grouping variables, but with values of NA.
Example data:

grouped <- structure(list(Date = structure(c(18676, 18677, 18677, 18680, 
18680, 18681, 18681), class = "Date"), Group = structure(c(2L, 
1L, 2L, 1L, 2L, 1L, 2L), levels = c("B", "A"), class = "factor"), 
    mean = c(37.43, 25.54, 26.13, 34.1966666666667, 26.4211111111111, 
    34.216, 22.8064285714286)), class = c("grouped_df", "tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -7L), groups = structure(list(
    Date = structure(c(18676, 18677, 18680, 18681), class = "Date"), 
    .rows = structure(list(1L, 2:3, 4:5, 6:7), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, -4L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE))

>Solution :

Try

library(dplyr)
library(tidyr)
grouped %>%
   ungroup %>%
   complete(Date = full_seq(Date, period = 1), Group) 

-output

# A tibble: 12 × 3
   Date       Group  mean
   <date>     <fct> <dbl>
 1 2021-02-18 B      NA  
 2 2021-02-18 A      37.4
 3 2021-02-19 B      25.5
 4 2021-02-19 A      26.1
 5 2021-02-20 B      NA  
 6 2021-02-20 A      NA  
 7 2021-02-21 B      NA  
 8 2021-02-21 A      NA  
 9 2021-02-22 B      34.2
10 2021-02-22 A      26.4
11 2021-02-23 B      34.2
12 2021-02-23 A      22.8

If we want to use a function

complete_dates <- function(data, datevar, grouping_vars) {
   data %>%
      ungroup %>%
      complete("{{datevar}}" :=  full_seq({{datevar}}, period = 1), 
 !!! rlang::syms(grouping_vars))
      }

and then call as

complete_dates(grouped, Date, c("GroupA", "GroupB"))

Leave a Reply