Thanks in advance, and sorry if something is unclear, it’s my first time posting here. I am working on something that should be fairly simple, but I cannot seem to find a way of making it work.
The task that I want to complete is the following:
I have a dataset with hundreds of variables. I need to recode all of them following the same logic. The logic is the following: if the GIVEN VARIABLE == 0 & a SPECIFIC VARIABLE == 1, the GIVEN VARIABLE must = -1. The SPECIFIC VARIABLE is the same for all of them.
What I have done is the following:
set.seed(123)
data=data.table(a = 0:10, b= 0:10, c = 0:10, d = 1:0)
Here "d" is the SPECIFIC VARIABLE and a:c are the GIVEN VARIABLEs
list_variables <- names(data)
list_variables_v2 <- list_variables[-c(4)]
I extracted the names of the variables from the dataset (minus d) and put them on a list, so they can be fed into the loop
data_v1 = copy(d)
for(i in (list_variables_v2)) {
data_v1[(i) == 0 & d == 1, (i) := -1]
}
Problematically, when I run the loop nothing happens. Those variables that comply with the condition (e.g. a == 0 & d == 1) are not recoded as -1. Various problems could be happening, but I think I have reduced them to one. Potential problems:
a) The code, even outside the loop, does not work. But this is not true. The following code produces the expected result:
data_v1[a == 0 & d == 1, a := -1]
b) The loop is not working, hence, the variable names are not really sorted and recognized. Nonetheless, if I exclude the (i) == 0 condition, the code does work, implying that the loop works for the right side:
for(i in (list_variables_v2)) {
data_v1[d == 1, (i) := -1]
}
I think that the root of the problem is that R, in the row filtering side, is not recognizing (i) == 0 as e.g. a == 0. This is quite weird given that R, when dealing with the right side (columns), does recognize that (i) := -1 as e.g. a := -1. Any idea of what might be causing this and, hopefully, how to solve it?
Again, many many thanks, and please let me know if something is unclear or repeated.
>Solution :
A simple correction would be to wrap with get
for(i in (list_variables_v2)) {
data_v1[get(i) == 0 & d == 1, (i) := -1]
}
-output
> data_v1
a b c d
<int> <int> <int> <int>
1: -1 -1 -1 1
2: 1 1 1 0
3: 2 2 2 1
4: 3 3 3 0
5: 4 4 4 1
6: 5 5 5 0
7: 6 6 6 1
8: 7 7 7 0
9: 8 8 8 1
10: 9 9 9 0
11: 10 10 10 1
> data
a b c d
<int> <int> <int> <int>
1: 0 0 0 1
2: 1 1 1 0
3: 2 2 2 1
4: 3 3 3 0
5: 4 4 4 1
6: 5 5 5 0
7: 6 6 6 1
8: 7 7 7 0
9: 8 8 8 1
10: 9 9 9 0
11: 10 10 10 1