Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Merging a column existing in multiple datasets to the same column in target data

I have data as follows:

library(data.table)
all_questions <- fread("Variable_codes_2022 Variables_2022
              Cat1_1              This_question
              Cat1_2              Other_question
              Cat2_1              One_question
              Cat2_2              Another_question
              Cat3_1              Some_question
              Cat3_2              Extra_question
              Cat3_3              This_question
              Cat4_1              One_question
              Cat4_2              Wrong_question")

# I have other datasets that have the same questions with a key:

dat1 <- fread("Variable_codes Variables        Key
              Cat1              This_question   A1
              Cat1              Other_question  B3")

dat2 <- fread("Variable_codes Variables        Key
              Cat2              One_question    A7
              Cat2              Another_question C8")

I would like add the matching keys from dat1 and dat2 to all_questions, which I tried to do as follows:

all_questions <- merge(all_questions, dat1, by.x="Variables_2022", by.y="Variables", all.x=TRUE)

     Variables_2022 Variable_codes_2022 Variable_codes  Key
1: Another_question              Cat2_2           <NA> <NA>
2:   Extra_question              Cat3_2           <NA> <NA>
3:     One_question              Cat2_1           <NA> <NA>
4:     One_question              Cat4_1           <NA> <NA>
5:   Other_question              Cat1_2           Cat1   B3
6:    Some_question              Cat3_1           <NA> <NA>
7:    This_question              Cat1_1           Cat1   A1
8:    This_question              Cat3_3           Cat1   A1
9:   Wrong_question              Cat4_2           <NA> <NA>

The problem is that when I do it in this manner, when I try to merge the second data frame, the keys get spread over multiple columns with suffixes:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

all_questions <- merge(all_questions, dat2, by.x="Variables_2022", by.y="Variables", all.x=TRUE)

     Variables_2022 Variable_codes_2022 Variable_codes.x Key.x Variable_codes.y Key.y
1: Another_question              Cat2_2             <NA>  <NA>             Cat2    C8
2:   Extra_question              Cat3_2             <NA>  <NA>             <NA>  <NA>
3:     One_question              Cat2_1             <NA>  <NA>             Cat2    A7
4:     One_question              Cat4_1             <NA>  <NA>             Cat2    A7
5:   Other_question              Cat1_2             Cat1    B3             <NA>  <NA>
6:    Some_question              Cat3_1             <NA>  <NA>             <NA>  <NA>
7:    This_question              Cat1_1             Cat1    A1             <NA>  <NA>
8:    This_question              Cat3_3             Cat1    A1             <NA>  <NA>
9:   Wrong_question              Cat4_2             <NA>  <NA>             <NA>  <NA>

How can I add the Keys of dat1 and dat2 to all_questions, in a way that this does not happen?

>Solution :

We may do a join

library(data.table)
all_questions[dat1, Key1 := i.Key, on = .(Variables_2022 = Variables)]
all_questions[dat2, Key2 := i.Key, on = .(Variables_2022 = Variables)]

-output

> all_questions
   Variable_codes_2022   Variables_2022   Key1   Key2
                <char>           <char> <char> <char>
1:              Cat1_1    This_question     A1   <NA>
2:              Cat1_2   Other_question     B3   <NA>
3:              Cat2_1     One_question   <NA>     A7
4:              Cat2_2 Another_question   <NA>     C8
5:              Cat3_1    Some_question   <NA>   <NA>
6:              Cat3_2   Extra_question   <NA>   <NA>
7:              Cat3_3    This_question     A1   <NA>
8:              Cat4_1     One_question   <NA>     A7
9:              Cat4_2   Wrong_question   <NA>   <NA>

If there are multiple datasets, do it in a loop and if it needs only a single ‘Key’ column

lst1 <- list(dat1, dat2)
all_questions[, Key := NA_character_]

for(i in seq_along(lst1)) all_questions[lst1[[i]], 
  Key := fcoalesce(i.Key, Key), on = .(Variables_2022 = Variables)]

-output

> all_questions
   Variable_codes_2022   Variables_2022    Key
                <char>           <char> <char>
1:              Cat1_1    This_question     A1
2:              Cat1_2   Other_question     B3
3:              Cat2_1     One_question     A7
4:              Cat2_2 Another_question     C8
5:              Cat3_1    Some_question   <NA>
6:              Cat3_2   Extra_question   <NA>
7:              Cat3_3    This_question     A1
8:              Cat4_1     One_question     A7
9:              Cat4_2   Wrong_question   <NA>
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading