Home Overwrite dataframe values with an exact number of random NAs per column

Questions

Overwrite dataframe values with an exact number of random NAs per column

February 27, 2022

I’m using this code to generate a random number of NAs within a dataframe. Here’s an example

set.seed(1)
df <- mtcars[1:10,]
df <- as.data.frame(lapply(df, function(cc) cc[ sample(c(TRUE, NA), prob = c(0.7, 0.3), size = length(cc), replace = TRUE) ]))

> df
    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
1  21.0   6    NA 110   NA 2.620    NA  0  1    4    4
2  21.0   6 160.0 110 3.90    NA 17.02 NA NA    4    4
3  22.8   4 108.0  93   NA 2.320 18.61  1  1    4    1
4    NA   6 258.0 110 3.08 3.215 19.44  1  0   NA   NA
5  18.7  NA 360.0  NA 3.15 3.440 17.02  0 NA   NA    2
6    NA   6 225.0 105   NA 3.460 20.22 NA  0   NA    1
7    NA  NA 360.0  NA 3.21 3.570 15.84 NA NA    3    4
8  24.4  NA 146.7  62 3.69 3.190    NA  1  0    4    2
9  22.8   4    NA  NA   NA 3.150 22.90 NA  0   NA   NA
10 19.2  NA 167.6 123 3.92 3.440    NA NA  0    4    4

It’s useful but NAs are inconsistent per column across the dataframe. I would like to have an exact number of NAs per column. Is there a way to create exactly 3 random NAs per column? Many thanks

>Solution :

We may sample the row_number() to replace the column with exact number of NAs

library(dplyr)
df1 <- df %>%
   mutate(across(everything(),
     ~ replace(.x, sample(row_number(), 3), NA)))

-output

df1
                   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0  NA 160.0  NA 3.90    NA    NA  0  1   NA    4
Mazda RX4 Wag     21.0  NA    NA 110 3.90 2.875 17.02  0 NA    4    4
Datsun 710        22.8   4    NA  NA 3.85 2.320 18.61  1  1   NA    1
Hornet 4 Drive      NA   6 258.0 110 3.08 3.215 19.44  1 NA   NA    1
Hornet Sportabout 18.7  NA 360.0  NA   NA 3.440    NA NA  0    3    2
Valiant           18.1   6 225.0 105   NA 3.460 20.22  1  0    3   NA
Duster 360          NA   8    NA 245 3.21    NA 15.84  0  0    3   NA
Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            NA   4 140.8  95   NA 3.150 22.90 NA NA    4    2
Merc 280          19.2   6 167.6 123 3.92    NA    NA NA  0    4   NA

In base R, we do the same step by looping over the columns with lapply

df[] <- lapply(df, \(x) replace(x, sample(seq_along(x), 3), NA))

byMR

Published February 27, 2022

Add a comment

Problem with spreading a data frame using R

byMR

February 27, 2022

Questions

JavaScript: Sort array of objects by computed property or by an existing property if equal

byMR

February 27, 2022

Questions

Not sure how to blit font with a variable onto a surface, pygame

byMR

February 27, 2022

Questions

Mathematical calculation using for loop in a dataframe

byMR

February 27, 2022

Questions

learning about arrays and was attempting to code a program to find the greatest Integer in an array. Program isn't giving me the solution

byMR

February 27, 2022

Overwrite dataframe values with an exact number of random NAs per column

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Problem with spreading a data frame using R

JavaScript: Sort array of objects by computed property or by an existing property if equal

Not sure how to blit font with a variable onto a surface, pygame

Mathematical calculation using for loop in a dataframe

learning about arrays and was attempting to code a program to find the greatest Integer in an array. Program isn't giving me the solution

Keep Up to Date with the Most Important News

Overwrite dataframe values with an exact number of random NAs per column

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Problem with spreading a data frame using R

JavaScript: Sort array of objects by computed property or by an existing property if equal

Select dropdown value by a-tag button

Not sure how to blit font with a variable onto a surface, pygame

Mathematical calculation using for loop in a dataframe

learning about arrays and was attempting to code a program to find the greatest Integer in an array. Program isn't giving me the solution

Discover more from Dev solutions