Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create a large number of new columns

For a jittering exercise, I want to create a large number of new columns automatically based on a row-wise operation. I have a data frame with unit ids, an indicator measurement per year, and first compute the year-to-year standard deviation:

library(tidyverse)
df <- data.frame(id = c("A", "A", "A", "B", "B", "B"),
                 year = c(2008, 2009, 2010),
                 indicator = c(12,13,8, 23,21,17))


df <- df %>%
  group_by(id) %>%
  mutate(indicator_sd = sd(indicator)) %>%
  ungroup()

Now, I want to create additional columns which should compute dummy indices for statistical testing. I use rnorm for that:

test <- df %>%
  group_by(id) %>%
  mutate(test1 = rnorm(n(), mean = indicator, sd = indicator_sd),
         test2 = rnorm(n(), mean = indicator, sd = indicator_sd),
         test3 = rnorm(n(), mean = indicator, sd = indicator_sd),
         test4 = rnorm(n(), mean = indicator, sd = indicator_sd)) %>%
  ungroup()

This all works fine, except I want to repeat this test several hundred times. I have played around with across, but not found a workable solution, even if this seems trivial to do.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Can anyone give me good advice how to automate the mutate? Thank you!

>Solution :

Well you could use replicate function from base R

# Sample data
df <- data.frame(id = c("A", "A", "A", "B", "B", "B"),
                 year = c(2008, 2009, 2010, 2008, 2009, 2010),
                 indicator = c(12, 13, 8, 23, 21, 17))

df <- df %>%
  group_by(id) %>%
  mutate(indicator_sd = sd(indicator)) %>%
  ungroup()

# First select the number of iterations (if want to repeat 100 times, replace 4 with 100)
n <- 4

# Generate n test columns using replicate
testCols <- as.data.frame(replicate(n, 
                                     rnorm(nrow(df),
                                           mean = df$indicator,
                                           sd = df$indicator_sd)))

# Rename the test columns to "test1", "test2", ...
names(testCols) <- paste0("test", 1:n)

# Bind the test columns to the original df
result <- bind_cols(df, testCols)

And the output is

# A tibble: 6 x 8
  id     year indicator indicator_sd test1 test2 test3 test4
  <chr> <dbl>     <dbl>        <dbl> <dbl> <dbl> <dbl> <dbl>
1 A      2008        12         2.65 11.7   9.99 11.7  12.8 
2 A      2009        13         2.65 15.0  13.7  14.9  16.5 
3 A      2010         8         2.65  6.12 11.2   9.94  6.43
4 B      2008        23         3.06 26.2  25.2  25.9  23.6 
5 B      2009        21         3.06 16.9  22.5  21.7  23.1 
6 B      2010        17         3.06 21.6  16.7  19.9  19.9 
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading