Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Data.table; Update join on a filtered table/values

I am wanting to update a subset of table_a using an update join with table_b however after attempting the update join the non-filtered table_a does not change at all;

dat_a <- 
  data.table(postcode = LETTERS[1:10],
             values = c(1:8,NaN,10))
dat_b <- 
  data.table(postcode = LETTERS[1:10],
             values = c(990:999))

Table A and B both have the same postcodes but different values for each. There is a ‘NaN’ value in table_a which I want to update with the corresponding value in table_b however I don’t want to update all the other values in table_a.

dat_a[is.nan(values)
      ][dat_b,
        values := i.values,
        on = .(postcode)]

Does not change the table at all

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

dat_a
    postcode values
 1:        A      1
 2:        B      2
 3:        C      3
 4:        D      4
 5:        E      5
 6:        F      6
 7:        G      7
 8:        H      8
 9:        I    NaN
10:        J     10

The operation seems to have worked though so I’m not sure why the specific value did not get updated by reference. What do I need to do differently here?

dat_a[is.nan(values)
      ][dat_b,
        values := i.values,
        on = .(postcode)][]

   postcode values
1:        I    998

Note;
An alternative is to filter the values in table_b however this is not a good solution for my real dataset as instead of just filtering table_a to a single column of "NaN" values I’ll have to filter multiple columns in table_b and all the different values in that table that leads to a ‘NaN’ result.

E.g. Imagine if instead of just needing to filter by postcode I’d have to filter by (postcode == "I" & age == 23 & qualification == "bachelors" then redo this for all the other combinations of those variables that lead to a ‘NaN’ result)

dat_a[dat_b[postcode == "I"],
      values := i.values,
      on = .(postcode)]

dat_a
    postcode values
 1:        A      1
 2:        B      2
 3:        C      3
 4:        D      4
 5:        E      5
 6:        F      6
 7:        G      7
 8:        H      8
 9:        I    998
10:        J     10

>Solution :

By filtering you create another deep copied table that is separate from the original table.
Instead of filtering, you can use fcoalesce here. It returns the first non-missing value

dat_a[dat_b,
      # as numeric to ensure same variable type
      values := fcoalesce(values, as.numeric(i.values)), 
      on = .(postcode)]

dat_a
#>     postcode values
#>  1:        A      1
#>  2:        B      2
#>  3:        C      3
#>  4:        D      4
#>  5:        E      5
#>  6:        F      6
#>  7:        G      7
#>  8:        H      8
#>  9:        I    998
#> 10:        J     10
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading