Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R: Recording the Index Iterations of a Function

I am working with the R programming language. In a previous question (R: Creating a Function to Randomly Replace Data from a Data Frame), I learned how to "randomly replace rows in a dataset with 0 according to different conditions":

  • Step 1: The dataset has 10 variables – in Step 1, randomly select "n" of these variables ("n" has to be less than 10).

  • Step 2: For the above "n" variables, if they are "factor", randomly select a subset (of size "m") of the levels for each of these factor variables. For each of the non-factor variables, split them randomly at a point between their minimum and their maximum (call this point "p").

    MEDevel.com: Open-source for Healthcare and Education

    Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

    Visit Medevel

  • Step 3: Generate a random number between 0 and 1 (call this "r").

  • Step 4: Select all rows identified in Step 2. For these rows, consider the columns that were not used in the logic condition. For these columns, there is a "r" percent probability that any element in these rows can be replaced with 0.

  • Step 5: Repeat Step 1 – Step 4 for 10 times.

    set.seed(123)
    
      num_var_1 <- rnorm(1000, 10, 1)
      num_var_2 <- rnorm(1000, 10, 5)
      num_var_3 <- rnorm(1000, 10, 10)
      num_var_4 <- rnorm(1000, 10, 10)
      num_var_5 <- rnorm(1000, 10, 10)
    
      factor_1 <- c("A","B", "C")
      factor_2 <- c("AA","BB", "CC")
      factor_3 <- c("AAA","BBB", "CCC", "DDD")
      factor_4 <- c("AAAA","BBBB", "CCCC", "DDDD", "EEEE")
      factor_5 <- c("AAAAA","BBBBB", "CCCCC", "DDDDD", "EEEEE", "FFFFFF")
    
      factor_var_1 <- as.factor(sample(factor_1, 1000, replace=TRUE, prob=c(0.3, 0.5, 0.2)))
      factor_var_2 <-  as.factor(sample(factor_2, 1000, replace=TRUE, prob=c(0.5, 0.3, 0.2)))
      factor_var_3 <-  as.factor(sample(factor_3, 1000, replace=TRUE, prob=c(0.5, 0.2, 0.2, 0.1)))
      factor_var_4 <-  as.factor(sample(factor_4, 1000, replace=TRUE, prob=c(0.5, 0.2, 0.1, 0.1, 0.1)))
      factor_var_5 <-  as.factor(sample(factor_4, 1000, replace=TRUE, prob=c(0.3, 0.2, 0.1, 0.1, 0.1)))
    
      my_data = data.frame(num_var_1, num_var_2, num_var_3, num_var_4, num_var_5, factor_var_1, factor_var_2, factor_var_3, factor_var_4, factor_var_5)
    
      random_drop <- function(x) {
        # Randomly select variables
        which_vars <- names(x[, sort(sample(ncol(x), sample(ncol(x), 1)))])
        # Randomly select factor levels subset or generate continuous cutoff value
        cutoff_vals <- lapply(
          which_vars,
          function(i) {
            if (is.factor(x[[i]])) {
              return(sample(levels(x[[i]]), sample(nlevels(x[[i]]), 1)))
            }
            runif(1, min(x[[i]], na.rm = TRUE), max(x[[i]], na.rm = TRUE))
          }
        )
        names(cutoff_vals) <- which_vars
        # Create random prob value
        r <- runif(1,0,1)
        # Generate idx for which rows to select
        row_idx <- Reduce(
          `&`,
          lapply(
            which_vars,
            function(i) {
              if (is.factor(x[[i]])) {
                return(x[[i]] %in% cutoff_vals[[i]])
              }
              x[[i]] > cutoff_vals[[i]]
            }
          )
        )
        x_sub <- x[row_idx, !colnames(x) %in% which_vars, drop = FALSE]
        # With prob. 'r' fill row values in with '0'
        r_mat <- matrix(
          sample(
            c(TRUE, FALSE), 
            ncol(x_sub)*nrow(x_sub), 
            replace = TRUE, 
            prob = c(r, 1 - r)
          ),
          nrow = nrow(x_sub),
          ncol = ncol(x_sub)
        )
        x_sub[r_mat] <- 0
        x[row_idx, !colnames(x) %in% which_vars] <- x_sub
        return(x)
      }
    
      random_drop_recurse <- function(x, n = 10) {
        if (n == 1) return(random_drop(x))
        random_drop_recurse(random_drop(x), n = n - 1)
      }
    
      suppressWarnings(
        head(
          random_drop_recurse(my_data[, c(1:3, 6:8)], 10),
          20
        )
      )
      #>    num_var_1 num_var_2 num_var_3 factor_var_1 factor_var_2 factor_var_3
      #> 1   9.439524  5.021006  4.883963            B           AA          AAA
      #> 2   9.769823  4.800225 12.369379            B           AA          AAA
      #> 3  11.558708  9.910099  0.000000            C           AA          BBB
      #> 4  10.070508  9.339124 22.192276            B           CC          DDD
      #> 5  10.129288 -2.746714 11.741359            B           AA          AAA
      #> 6  11.715065 15.202867  3.847317         <NA>           AA          CCC
      #> 7  10.460916 11.248629 -8.068930            C           CC         <NA>
      #> 8   8.734939 22.081037  0.000000            C           AA          BBB
      #> 9   9.313147 13.425991 30.460189            C           AA          BBB
      #> 10  9.554338  7.765203  4.392376            B           AA          AAA
      #> 11 11.224082 23.986956  1.640007            A         <NA>          AAA
      #> 12 10.359814 24.161130 16.529475            A           AA          AAA
      #> 13  0.000000  3.906441  0.000000            A           CC         <NA>
      #> 14 10.110683 12.345160 17.516291            B           CC          AAA
      #> 15  9.444159  8.943765  7.220249            A           AA          DDD
      #> 16 11.786913 10.935256 21.226542            B           CC          DDD
      #> 17 10.497850 11.137714 -1.726089            B           AA          AAA
      #> 18  8.033383  3.690498  9.511232            B           CC          CCC
      #> 19 10.701356 11.427948  2.958597            B           BB          AAA
      #> 20  9.527209 18.746237 16.807586            C           AA          BBB
    

Question: Now, I am trying to learn how to record the results of each iteration – that is, each time a combination of variables is selected to be replaced with 0, I would like to record that combination.

Can someone please show me how to do this?

Thanks!

>Solution :

Hey @stats555 thanks for splitting this out into a new question! I’ve made a very small tweak to the code in the former question; namely the random_drop_recurse function now looks like the following:

random_drop_recurse <- function(x, n = 10) {
  if (n == 1) {
    dropped <- random_drop(x)
    return(list(x = dropped, x_dropped = is.na(dropped) | dropped == 0))
  }
  random_drop_recurse(random_drop(x), n = n - 1)
}

Instead of just returning the matrix with the dropped entries, it now returns a boolean matrix showing all the indices that have been dropped. The following code chunk demonstrates on your supplied data:

set.seed(123)

num_var_1 <- rnorm(1000, 10, 1)
num_var_2 <- rnorm(1000, 10, 5)
num_var_3 <- rnorm(1000, 10, 10)
num_var_4 <- rnorm(1000, 10, 10)
num_var_5 <- rnorm(1000, 10, 10)

factor_1 <- c("A","B", "C")
factor_2 <- c("AA","BB", "CC")
factor_3 <- c("AAA","BBB", "CCC", "DDD")
factor_4 <- c("AAAA","BBBB", "CCCC", "DDDD", "EEEE")
factor_5 <- c("AAAAA","BBBBB", "CCCCC", "DDDDD", "EEEEE", "FFFFFF")

factor_var_1 <- as.factor(sample(factor_1, 1000, replace=TRUE, prob=c(0.3, 0.5, 0.2)))
factor_var_2 <-  as.factor(sample(factor_2, 1000, replace=TRUE, prob=c(0.5, 0.3, 0.2)))
factor_var_3 <-  as.factor(sample(factor_3, 1000, replace=TRUE, prob=c(0.5, 0.2, 0.2, 0.1)))
factor_var_4 <-  as.factor(sample(factor_4, 1000, replace=TRUE, prob=c(0.5, 0.2, 0.1, 0.1, 0.1)))
factor_var_5 <-  as.factor(sample(factor_4, 1000, replace=TRUE, prob=c(0.3, 0.2, 0.1, 0.1, 0.1)))

my_data = data.frame(num_var_1, num_var_2, num_var_3, num_var_4, num_var_5, factor_var_1, factor_var_2, factor_var_3, factor_var_4, factor_var_5)

random_drop <- function(x) {
  # Randomly select variables
  which_vars <- names(x[, sort(sample(ncol(x), sample(ncol(x), 1)))])
  # Randomly select factor levels subset or generate continuous cutoff value
  cutoff_vals <- lapply(
    which_vars,
    function(i) {
      if (is.factor(x[[i]])) {
        return(sample(levels(x[[i]]), sample(nlevels(x[[i]]), 1)))
      }
      runif(1, min(x[[i]], na.rm = TRUE), max(x[[i]], na.rm = TRUE))
    }
  )
  names(cutoff_vals) <- which_vars
  # Create random prob value
  r <- runif(1,0,1)
  # Generate idx for which rows to select
  row_idx <- Reduce(
    `&`,
    lapply(
      which_vars,
      function(i) {
        if (is.factor(x[[i]])) {
          return(x[[i]] %in% cutoff_vals[[i]])
        }
        x[[i]] > cutoff_vals[[i]]
      }
    )
  )
  x_sub <- x[row_idx, !colnames(x) %in% which_vars, drop = FALSE]
  # With prob. 'r' fill row values in with '0'
  r_mat <- matrix(
    sample(
      c(TRUE, FALSE), 
      ncol(x_sub)*nrow(x_sub), 
      replace = TRUE, 
      prob = c(r, 1 - r)
    ),
    nrow = nrow(x_sub),
    ncol = ncol(x_sub)
  )
  x_sub[r_mat] <- 0
  x[row_idx, !colnames(x) %in% which_vars] <- x_sub
  return(x)
}

random_drop_recurse <- function(x, n = 10) {
  if (n == 1) {
    dropped <- random_drop(x)
    return(list(x = dropped, x_dropped = is.na(dropped) | dropped == 0))
  }
  random_drop_recurse(random_drop(x), n = n - 1)
}

test <- suppressWarnings(
  random_drop_recurse(my_data[, c(1:3, 6:8)], 10)
)

# View the first 20 entries of the matrix with the dropped entries
test$x[1:20, ]
#>    num_var_1 num_var_2 num_var_3 factor_var_1 factor_var_2 factor_var_3
#> 1   9.439524  5.021006  4.883963            B           AA          AAA
#> 2   9.769823  4.800225 12.369379            B           AA          AAA
#> 3  11.558708  9.910099  0.000000            C           AA          BBB
#> 4  10.070508  9.339124 22.192276            B           CC          DDD
#> 5  10.129288 -2.746714 11.741359            B           AA          AAA
#> 6  11.715065 15.202867  3.847317         <NA>           AA          CCC
#> 7  10.460916 11.248629 -8.068930            C           CC         <NA>
#> 8   8.734939 22.081037  0.000000            C           AA          BBB
#> 9   9.313147 13.425991 30.460189            C           AA          BBB
#> 10  9.554338  7.765203  4.392376            B           AA          AAA
#> 11 11.224082 23.986956  1.640007            A         <NA>          AAA
#> 12 10.359814 24.161130 16.529475            A           AA          AAA
#> 13  0.000000  3.906441  0.000000            A           CC         <NA>
#> 14 10.110683 12.345160 17.516291            B           CC          AAA
#> 15  9.444159  8.943765  7.220249            A           AA          DDD
#> 16 11.786913 10.935256 21.226542            B           CC          DDD
#> 17 10.497850 11.137714 -1.726089            B           AA          AAA
#> 18  8.033383  3.690498  9.511232            B           CC          CCC
#> 19 10.701356 11.427948  2.958597            B           BB          AAA
#> 20  9.527209 18.746237 16.807586            C           AA          BBB

# View the corresponding boolean matrix showing dropped indices
test$x_dropped[1:20, ]
#>       num_var_1 num_var_2 num_var_3 factor_var_1 factor_var_2 factor_var_3
#>  [1,]     FALSE     FALSE     FALSE        FALSE        FALSE        FALSE
#>  [2,]     FALSE     FALSE     FALSE        FALSE        FALSE        FALSE
#>  [3,]     FALSE     FALSE      TRUE        FALSE        FALSE        FALSE
#>  [4,]     FALSE     FALSE     FALSE        FALSE        FALSE        FALSE
#>  [5,]     FALSE     FALSE     FALSE        FALSE        FALSE        FALSE
#>  [6,]     FALSE     FALSE     FALSE         TRUE        FALSE        FALSE
#>  [7,]     FALSE     FALSE     FALSE        FALSE        FALSE         TRUE
#>  [8,]     FALSE     FALSE      TRUE        FALSE        FALSE        FALSE
#>  [9,]     FALSE     FALSE     FALSE        FALSE        FALSE        FALSE
#> [10,]     FALSE     FALSE     FALSE        FALSE        FALSE        FALSE
#> [11,]     FALSE     FALSE     FALSE        FALSE         TRUE        FALSE
#> [12,]     FALSE     FALSE     FALSE        FALSE        FALSE        FALSE
#> [13,]      TRUE     FALSE      TRUE        FALSE        FALSE         TRUE
#> [14,]     FALSE     FALSE     FALSE        FALSE        FALSE        FALSE
#> [15,]     FALSE     FALSE     FALSE        FALSE        FALSE        FALSE
#> [16,]     FALSE     FALSE     FALSE        FALSE        FALSE        FALSE
#> [17,]     FALSE     FALSE     FALSE        FALSE        FALSE        FALSE
#> [18,]     FALSE     FALSE     FALSE        FALSE        FALSE        FALSE
#> [19,]     FALSE     FALSE     FALSE        FALSE        FALSE        FALSE
#> [20,]     FALSE     FALSE     FALSE        FALSE        FALSE        FALSE

# If you want the actual indices
which(test$x_dropped[1:20, ], arr.ind = TRUE)
#>      row col
#> [1,]  13   1
#> [2,]   3   3
#> [3,]   8   3
#> [4,]  13   3
#> [5,]   6   4
#> [6,]  11   5
#> [7,]   7   6
#> [8,]  13   6
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading