Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Selecting only one column from the right dataframe when joining

I have two huge dataframes that even contain columns with the same name that have no connection whatsoever. I have 2 join keys, though, and I want to add to data_left just one column from data_right. I tried:

output_df = data_left.join(data_right, on=["join_key_1", "join_key_2"], how="left").select("data_left.*", "data_right.extraColumn")

But it does not recognize the * even after importing it.

Sample:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

data_left = 

col_1    col_2   join_key_1    join_key_2
   12        a          a_b             1
   14        c          r_t             2
   12        d          v_b             1
   24        r          a_s             2


data_right = 

col_3    col_4   join_key_1    join_key_2     extraColumn
   12        a          a_b             1             456
   14        g          r_t             2             654
   15        e          v_c             5             464
   24        r          a_s             2             546
   12        d          v_b             1             549

output_df =

       col_1    col_2   join_key_1       join_key_2     extraColumn
          12        a          a_b                1             456
          14        c          r_t                2             654
          12        d          v_b                1             546
          24        r          a_s                2             549

If there is no correspondent group of join keys in the data_right, we keep the extraColumn empty.

>Solution :

Would this work for your usecase? :

output_df = data_left.join(data_right.select("join_key_1", "join_key_2", "extraColumn"), on=["join_key_1", "join_key_2"], how="left")
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading