I have a simple problem that I wanted to solve using data.table. I was surprised about the following behaviour, as I thought, asignements in base R are always copying:
library(data.table)
df <- data.frame(
t = 1:10,
x = "x",
y = "y"
)
df$z <- df$y # I was assuming this is a full blown copy (since done in base R)
setDT(df)
df[t>=5, `:=`(z=x)]
df
So in the end I want to have for the column z copy of x in case t>=5 and a copy of y otherwise. However, y is also changed, which I find surprising. What is the reason for that?
t x y z
1: 1 x y y
2: 2 x y y
3: 3 x y y
4: 4 x y y
5: 5 x x x
6: 6 x x x
7: 7 x x x
8: 8 x x x
9: 9 x x x
10: 10 x x x
>Solution :
data.table has its own way of doing things; which you discovered one example of, trying to save on memory because at the point where the setdf happens the columns are pointing to a single source.
To get around this issue, you might choose to create z in the data.table part ;
heres an example with chaining
library(data.table)
df <- data.frame(
t = 1:10,
x = "x",
y = "y"
)
setDT(df)[, z := copy(y)][t>=5, z := x]
df