Data.table; Update join on a filtered table/values

March 2, 2023

I am wanting to update a subset of table_a using an update join with table_b however after attempting the update join the non-filtered table_a does not change at all;

dat_a <- 
  data.table(postcode = LETTERS[1:10],
             values = c(1:8,NaN,10))
dat_b <- 
  data.table(postcode = LETTERS[1:10],
             values = c(990:999))

Table A and B both have the same postcodes but different values for each. There is a ‘NaN’ value in table_a which I want to update with the corresponding value in table_b however I don’t want to update all the other values in table_a.

dat_a[is.nan(values)
      ][dat_b,
        values := i.values,
        on = .(postcode)]

Does not change the table at all

dat_a
    postcode values
 1:        A      1
 2:        B      2
 3:        C      3
 4:        D      4
 5:        E      5
 6:        F      6
 7:        G      7
 8:        H      8
 9:        I    NaN
10:        J     10

The operation seems to have worked though so I’m not sure why the specific value did not get updated by reference. What do I need to do differently here?

dat_a[is.nan(values)
      ][dat_b,
        values := i.values,
        on = .(postcode)][]

   postcode values
1:        I    998

Note;
An alternative is to filter the values in table_b however this is not a good solution for my real dataset as instead of just filtering table_a to a single column of "NaN" values I’ll have to filter multiple columns in table_b and all the different values in that table that leads to a ‘NaN’ result.

E.g. Imagine if instead of just needing to filter by postcode I’d have to filter by (postcode == "I" & age == 23 & qualification == "bachelors" then redo this for all the other combinations of those variables that lead to a ‘NaN’ result)

dat_a[dat_b[postcode == "I"],
      values := i.values,
      on = .(postcode)]

dat_a
    postcode values
 1:        A      1
 2:        B      2
 3:        C      3
 4:        D      4
 5:        E      5
 6:        F      6
 7:        G      7
 8:        H      8
 9:        I    998
10:        J     10

>Solution :

By filtering you create another deep copied table that is separate from the original table.
Instead of filtering, you can use fcoalesce here. It returns the first non-missing value

dat_a[dat_b,
      # as numeric to ensure same variable type
      values := fcoalesce(values, as.numeric(i.values)), 
      on = .(postcode)]

dat_a
#>     postcode values
#>  1:        A      1
#>  2:        B      2
#>  3:        C      3
#>  4:        D      4
#>  5:        E      5
#>  6:        F      6
#>  7:        G      7
#>  8:        H      8
#>  9:        I    998
#> 10:        J     10