Create a large number of new columns

For a jittering exercise, I want to create a large number of new columns automatically based on a row-wise operation. I have a data frame with unit ids, an indicator measurement per year, and first compute the year-to-year standard deviation:

library(tidyverse)
df <- data.frame(id = c("A", "A", "A", "B", "B", "B"),
                 year = c(2008, 2009, 2010),
                 indicator = c(12,13,8, 23,21,17))


df <- df %>%
  group_by(id) %>%
  mutate(indicator_sd = sd(indicator)) %>%
  ungroup()

Now, I want to create additional columns which should compute dummy indices for statistical testing. I use rnorm for that:

test <- df %>%
  group_by(id) %>%
  mutate(test1 = rnorm(n(), mean = indicator, sd = indicator_sd),
         test2 = rnorm(n(), mean = indicator, sd = indicator_sd),
         test3 = rnorm(n(), mean = indicator, sd = indicator_sd),
         test4 = rnorm(n(), mean = indicator, sd = indicator_sd)) %>%
  ungroup()

This all works fine, except I want to repeat this test several hundred times. I have played around with across, but not found a workable solution, even if this seems trivial to do.

Can anyone give me good advice how to automate the mutate? Thank you!

>Solution :

Well you could use replicate function from base R

# Sample data
df <- data.frame(id = c("A", "A", "A", "B", "B", "B"),
                 year = c(2008, 2009, 2010, 2008, 2009, 2010),
                 indicator = c(12, 13, 8, 23, 21, 17))

df <- df %>%
  group_by(id) %>%
  mutate(indicator_sd = sd(indicator)) %>%
  ungroup()

# First select the number of iterations (if want to repeat 100 times, replace 4 with 100)
n <- 4

# Generate n test columns using replicate
testCols <- as.data.frame(replicate(n, 
                                     rnorm(nrow(df),
                                           mean = df$indicator,
                                           sd = df$indicator_sd)))

# Rename the test columns to "test1", "test2", ...
names(testCols) <- paste0("test", 1:n)

# Bind the test columns to the original df
result <- bind_cols(df, testCols)

And the output is

# A tibble: 6 x 8
  id     year indicator indicator_sd test1 test2 test3 test4
  <chr> <dbl>     <dbl>        <dbl> <dbl> <dbl> <dbl> <dbl>
1 A      2008        12         2.65 11.7   9.99 11.7  12.8 
2 A      2009        13         2.65 15.0  13.7  14.9  16.5 
3 A      2010         8         2.65  6.12 11.2   9.94  6.43
4 B      2008        23         3.06 26.2  25.2  25.9  23.6 
5 B      2009        21         3.06 16.9  22.5  21.7  23.1 
6 B      2010        17         3.06 21.6  16.7  19.9  19.9 

Leave a Reply