Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Union of unique list from two and three columns of a dataframe

How do I get the union of unique list from two columns and three columns of a dataframe?

This is the dataframe I am working with:

Col1 Extract              Col2 Extract           Col3 Extract      
------------              ------------           ------------
['unclassified']          ['sink', 'fridge']     ['unclassified']
['fridge', 'microwave']   ['fridge', 'stove']    ['sink']          
['unclassified']          ['unclassified']       ['unclassified']

What I would like is the union of unique list for (‘Col1 Extract’ + ‘Col2 Extract’) and (‘Col1 Extract’ + ‘Col2 Extract’ + ‘Col3 Extract’) using pandas way. This is what I’m looking for:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Col1+Col2                             Col1+Col2+Col3
------------                          ---------------             
['unclassified', 'sink', 'fridge']    ['unclassified', 'sink', 'fridge']      
['fridge', 'microwave', 'stove']      ['fridge', 'microwave', 'stove', 'sink']          
['unclassified']                      ['unclassified']  

>Solution :

Join columns and remove duplicates by sets:

df['Col1+Col2'] = (df['Col1 Extract'] + df['Col2 Extract']).apply(lambda x: list(set(x)))
df['Col1+Col2+Col3'] = (df['Col1 Extract'] + df['Col2 Extract'] + df['Col3 Extract']).apply(lambda x: list(set(x)))
print (df)
          Col1 Extract     Col2 Extract    Col3 Extract  \
0       [unclassified]   [sink, fridge]  [unclassified]   
1  [fridge, microwave]  [fridge, stove]          [sink]   
2       [unclassified]   [unclassified]  [unclassified]   

                      Col1+Col2                    Col1+Col2+Col3  
0  [fridge, unclassified, sink]      [fridge, unclassified, sink]  
1    [stove, fridge, microwave]  [stove, fridge, microwave, sink]  
2                [unclassified]                    [unclassified] 

If ordering is important use dict.fromkeys trick:

df['Col1+Col2'] = (df['Col1 Extract'] + df['Col2 Extract']).apply(lambda x: list(dict.fromkeys(x)))
df['Col1+Col2+Col3'] = (df['Col1 Extract'] + df['Col2 Extract'] + df['Col3 Extract']).apply(lambda x: list(dict.fromkeys(x)))
print (df)
          Col1 Extract     Col2 Extract    Col3 Extract  \
0       [unclassified]   [sink, fridge]  [unclassified]   
1  [fridge, microwave]  [fridge, stove]          [sink]   
2       [unclassified]   [unclassified]  [unclassified]   

                      Col1+Col2                    Col1+Col2+Col3  
0  [unclassified, sink, fridge]      [unclassified, sink, fridge]  
1    [fridge, microwave, stove]  [fridge, microwave, stove, sink]  
2                [unclassified]                    [unclassified]  
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading