Home Join 2 columns with list values in pandas(Avoiding duplicates and NaN)

Questions

Join 2 columns with list values in pandas(Avoiding duplicates and NaN)

January 7, 2022

I have pandas dataframe as follows,

loc_1                               loc_2                             

[mumbai, gujarat, sri lanka]        [chennai, UP]
[Goa, telangana]                    [Kashmir, Goa, Rajkot]
NaN                                 [Bihar, Orissa]

I want to create a new column that is a combination of both the above columns,
I did search other similar questions but the issue I am facing is that,

When I do,

data['locations'] = data['loc_1'] + data['loc_2']

Output
--------
loc_1                               loc_2                       locations

[mumbai, gujarat, sri lanka]        [chennai, UP]                [mumbai, gujarat, sri lanka,chennai, UP]   
[Goa, telangana]                    [Kashmir, Goa, Rajkot]       [Goa, telangana,Kashmir, Goa, Rajkot]
NaN                                 [Bihar, Orissa]              NaN

Issue

As you can see above, there are duplicate values as well as NaN values formed.
How to avoid them?

Remember

The original dataset contains values in list, str and NaN format.

Dataset:


loc = pd.DataFrame({
'loc_1': [['mumbai', 'gujarat', 'sri lanka'],['Goa', 'telangana'],np.nan],
'loc_2':[['chennai','UP'],['kashmir','goa','rajkot'],['bihar','orissa']],
'loc_3':['Chennai','Bangalore','Vizag']

})

>Solution :

First join values with replace NaNs (floats) to empty lists:

data['locations'] = data['loc_1'].apply(lambda x: [] if isinstance(x, float) else x) + data['loc_2']

And then remove duplicates with same order like original by converting to dictionaries by dict.fromkeys:

data['locations'] = data['locations'].apply(lambda x: list(dict.fromkeys(x)))

If the order is not important you can use a set:

data['locations'] = data['locations'].apply(lambda x: list(set(x)))

dataframe

byMR

Published January 07, 2022

Add a comment

Darking image lines using OpenCV

byMR

January 7, 2022

Questions

C# DateTime parses two date strings milliseconds apart incorrectly and gives negative unix time

byMR

January 7, 2022

Questions

Grok filter is not working when id has dashes

byMR

January 7, 2022

Questions

What exactly empty input means for cin.get()?

byMR

January 7, 2022

Questions

How do I evaluate and capture a function for later use in a loop?

byMR

January 7, 2022

Questions

How to get data value with in a loop

byMR

January 7, 2022

Join 2 columns with list values in pandas(Avoiding duplicates and NaN)

MEDevel.com: Open-source for Healthcare and Education

Issue

Remember

Dataset:

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Darking image lines using OpenCV

C# DateTime parses two date strings milliseconds apart incorrectly and gives negative unix time

Grok filter is not working when id has dashes

What exactly empty input means for cin.get()?

How do I evaluate and capture a function for later use in a loop?

How to get data value with in a loop

Keep Up to Date with the Most Important News

Join 2 columns with list values in pandas(Avoiding duplicates and NaN)

MEDevel.com: Open-source for Healthcare and Education

Issue

Remember

Dataset:

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Darking image lines using OpenCV

C# DateTime parses two date strings milliseconds apart incorrectly and gives negative unix time

Grok filter is not working when id has dashes

What exactly empty input means for cin.get()?

How do I evaluate and capture a function for later use in a loop?

How to get data value with in a loop

Discover more from Dev solutions