I need to create a blank version of a dataset, to clear all the values while preserving the columm names and, importantly, the classes of the variables.
Here’s some toy data, three different variables with three different attributes
df <- data.frame(x = rnorm(5),
y = factor(letters[5:1]),
z = c(1:2, NA, 4:5))
glimpse(df)
Rows: 5
Columns: 3
$ x <dbl> -0.24530142, -0.05332072, 0.12387791, -0.26148671, -0.53779766
$ y <fct> e, d, c, b, a
$ z <int> 1, 2, NA, 4, 5
Now when I try to clear the values using mutate
and across
in dplyr
…
df %>%
mutate(across(everything(),
~ NA)) -> blankDF
blankDF
x y z
1 NA NA NA
2 NA NA NA
3 NA NA NA
4 NA NA NA
5 NA NA NA
Looks good, but
glimpse(blankDF)
# Rows: 5
# Columns: 3
# $ x <lgl> NA, NA, NA, NA, NA
# $ y <lgl> NA, NA, NA, NA, NA
# $ z <lgl> NA, NA, NA, NA, NA
It has stripped the attributes of all the variables, turning them logical.
Can someone give me advice on how to get the blank dataset while retaining the attributes?
A tidyverse solution would be nice, but any solutions appreciated.
>Solution :
You could replace all values across
the columns by replacing the columns .x
with NA using na_if
like this:
library(dplyr)
glimpse(df)
#> Rows: 5
#> Columns: 3
#> $ x <dbl> -0.2006935, 1.3461746, -0.1433400, -0.8983886, -0.3190282
#> $ y <fct> e, d, c, b, a
#> $ z <int> 1, 2, NA, 4, 5
df_output = df %>%
mutate(across(everything(), ~ na_if(.x, .x)))
glimpse(df_output)
#> Rows: 5
#> Columns: 3
#> $ x <dbl> NA, NA, NA, NA, NA
#> $ y <fct> NA, NA, NA, NA, NA
#> $ z <int> NA, NA, NA, NA, NA
Created on 2023-07-07 with reprex v2.0.2