I have a list of data frames, and the data frames in the list look something like this where the columns x_, y_, and t_ all have the same values in each role and the only thing differnt are the var1, var2, and var3 values:
| x_ | y | t_ | var1 | var2 | var3 |
|---|---|---|---|---|---|
| 1 | 1 | 1 | 5 | NA | NA |
| 1 | 1 | 1 | NA | 9 | NA |
| 1 | 1 | 1 | NA | NA | 20 |
Here is the code for an example of the data frame above:
df <- data.frame(x_ = c(1,1,1),
y_ = c(1,1,1),
t_ = c(1,1,1),
var1 = c(5, NA, NA),
var2 = c(NA, 9, NA),
var3 = c(NA, NA, 20))
I would like to get the data frames to look something like this, where I can condense the data into a single row:
| x_ | y | t_ | var1 | var2 | var3 |
|---|---|---|---|---|---|
| 1 | 1 | 1 | 5 | 9 | 20 |
Is there a good way to do this?
>Solution :
One potential option is to fill the NAs then remove duplicate lines, e.g.
library(tidyverse)
library(vctrs)
df <- data.frame(x_ = c(1,1,1),
y_ = c(1,1,1),
t_ = c(1,1,1),
var1 = c(5, NA, NA),
var2 = c(NA, 9, NA),
var3 = c(NA, NA, 20))
df2 <- df %>%
mutate(across(everything(),
~vec_fill_missing(.x, direction = "downup")))
df2
#> x_ y_ t_ var1 var2 var3
#> 1 1 1 1 5 9 20
#> 2 1 1 1 5 9 20
#> 3 1 1 1 5 9 20
df2 %>%
distinct()
#> x_ y_ t_ var1 var2 var3
#> 1 1 1 1 5 9 20
If you have NAs for every line, this will have NA in the final distinct row:
df3 <- data.frame(x_ = c(1,1,1),
y_ = c(1,1,1),
t_ = c(1,1,1),
var1 = c(5, NA, NA),
var2 = c(NA, 9, NA),
var3 = c(NA, NA, NA))
df4 <- df3 %>%
mutate(across(everything(),
~vec_fill_missing(.x, direction = "downup")))
df4
#> x_ y_ t_ var1 var2 var3
#> 1 1 1 1 5 9 NA
#> 2 1 1 1 5 9 NA
#> 3 1 1 1 5 9 NA
df4 %>%
distinct()
#> x_ y_ t_ var1 var2 var3
#> 1 1 1 1 5 9 NA
Created on 2023-03-17 with reprex v2.0.2