Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Overwrite dataframe values with an exact number of random NAs per column

I’m using this code to generate a random number of NAs within a dataframe. Here’s an example

set.seed(1)
df <- mtcars[1:10,]
df <- as.data.frame(lapply(df, function(cc) cc[ sample(c(TRUE, NA), prob = c(0.7, 0.3), size = length(cc), replace = TRUE) ]))

> df
    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
1  21.0   6    NA 110   NA 2.620    NA  0  1    4    4
2  21.0   6 160.0 110 3.90    NA 17.02 NA NA    4    4
3  22.8   4 108.0  93   NA 2.320 18.61  1  1    4    1
4    NA   6 258.0 110 3.08 3.215 19.44  1  0   NA   NA
5  18.7  NA 360.0  NA 3.15 3.440 17.02  0 NA   NA    2
6    NA   6 225.0 105   NA 3.460 20.22 NA  0   NA    1
7    NA  NA 360.0  NA 3.21 3.570 15.84 NA NA    3    4
8  24.4  NA 146.7  62 3.69 3.190    NA  1  0    4    2
9  22.8   4    NA  NA   NA 3.150 22.90 NA  0   NA   NA
10 19.2  NA 167.6 123 3.92 3.440    NA NA  0    4    4

It’s useful but NAs are inconsistent per column across the dataframe. I would like to have an exact number of NAs per column. Is there a way to create exactly 3 random NAs per column? Many thanks

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

We may sample the row_number() to replace the column with exact number of NAs

library(dplyr)
df1 <- df %>%
   mutate(across(everything(),
     ~ replace(.x, sample(row_number(), 3), NA)))

-output

df1
                   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0  NA 160.0  NA 3.90    NA    NA  0  1   NA    4
Mazda RX4 Wag     21.0  NA    NA 110 3.90 2.875 17.02  0 NA    4    4
Datsun 710        22.8   4    NA  NA 3.85 2.320 18.61  1  1   NA    1
Hornet 4 Drive      NA   6 258.0 110 3.08 3.215 19.44  1 NA   NA    1
Hornet Sportabout 18.7  NA 360.0  NA   NA 3.440    NA NA  0    3    2
Valiant           18.1   6 225.0 105   NA 3.460 20.22  1  0    3   NA
Duster 360          NA   8    NA 245 3.21    NA 15.84  0  0    3   NA
Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            NA   4 140.8  95   NA 3.150 22.90 NA NA    4    2
Merc 280          19.2   6 167.6 123 3.92    NA    NA NA  0    4   NA

In base R, we do the same step by looping over the columns with lapply

df[] <- lapply(df, \(x) replace(x, sample(seq_along(x), 3), NA))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading