Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Consolidating rows using 'mutate_at' in R?

I have a dataset of U.S. cities that includes 400 variables (with both character and numeric values). I want each row to be a unique city, but the dataset contains duplicate rows for many cities. In these duplicate rows, the values that are missing in one row are non-missing in the other, and vice versa. The unique id for each city is geocode.

This is an example of what my data look like now:

geocode   x    y
8049     NA    7
8049      2   NA
2045     NA    9
2045      3   NA 

And this is what I want it to look like:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

geocode   x    y
8049      2    7
2045      3    9

I’ve seen similar questions asked before, but none of the existing solutions have worked for me. This is what I’ve tried:

vars <- ls(df)
vars <- vars[vars != "geocode" & vars != "State_name"]

max <- function(x) max(x, na.rm = TRUE)
df = df %>% group_by(geocode) %>%
   mutate_at(vars, max)

However, I get the following error:

Error in `mutate()`:
...
Caused by error in `max()`:
! unused argument (na.rm = T)

And if I remove the na.rm = T from the function, I get the following error:

Error: C stack usage is too close to the limit

Any idea what I can do?

I’ve also tried limiting the list var to only numeric variables and I still get the same error.

>Solution :

One way could be a quasi "rowwise coalesce":

library(dplyr)

coalesce_all_columns <- function(df) {
  return(coalesce(!!! as.list(df)))
}

df %>% 
  group_by(geocode) %>%
  summarise_all(coalesce_all_columns)

  geocode     x     y
    <int> <int> <int>
1    2045     3     9
2    8049     2     7

Another way could be using pivoting:

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(-geocode) %>% 
  na.omit() %>% 
  pivot_wider(names_from = name, values_from = value)

geocode     y     x
    <int> <int> <int>
1    8049     7     2
2    2045     9     3
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading