Consolidating rows using 'mutate_at' in R?

March 17, 2023

I have a dataset of U.S. cities that includes 400 variables (with both character and numeric values). I want each row to be a unique city, but the dataset contains duplicate rows for many cities. In these duplicate rows, the values that are missing in one row are non-missing in the other, and vice versa. The unique id for each city is geocode.

This is an example of what my data look like now:

geocode   x    y
8049     NA    7
8049      2   NA
2045     NA    9
2045      3   NA

And this is what I want it to look like:

geocode   x    y
8049      2    7
2045      3    9

I’ve seen similar questions asked before, but none of the existing solutions have worked for me. This is what I’ve tried:

vars <- ls(df)
vars <- vars[vars != "geocode" & vars != "State_name"]

max <- function(x) max(x, na.rm = TRUE)
df = df %>% group_by(geocode) %>%
   mutate_at(vars, max)

However, I get the following error:

Error in `mutate()`:
...
Caused by error in `max()`:
! unused argument (na.rm = T)

And if I remove the na.rm = T from the function, I get the following error:

Error: C stack usage is too close to the limit

Any idea what I can do?

I’ve also tried limiting the list var to only numeric variables and I still get the same error.

>Solution :

One way could be a quasi "rowwise coalesce":

library(dplyr)

coalesce_all_columns <- function(df) {
  return(coalesce(!!! as.list(df)))
}

df %>% 
  group_by(geocode) %>%
  summarise_all(coalesce_all_columns)

  geocode     x     y
    <int> <int> <int>
1    2045     3     9
2    8049     2     7

Another way could be using pivoting:

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(-geocode) %>% 
  na.omit() %>% 
  pivot_wider(names_from = name, values_from = value)

geocode     y     x
    <int> <int> <int>
1    8049     7     2
2    2045     9     3