Advertisements
I’ve got a dataframe column which represents the order in which fruit was bought at a supermarket. The dataframe looks something like this:
mydict ={
'customer': ['Jack', 'Danny', 'Alex'],
'fruit_bought': ['apple#orange#apple', 'orange#apple', 'apple#banana#banana'],
}
df = pd.DataFrame(mydict)
customer | fruit_bought
-----------------------------
Jack | apple#orange#apple
Danny | orange#apple
Alex | apple#banana#banana
What I’d like to do is reduce the strings into the combination of unique fruit that the customer bought, which would look like this:
customer | fruit_bought
---------------------
Jack | apple#orange
Danny | apple#orange
Alex | apple#banana
I’m sure I can put together a long-winded apply
function to help with this, but I’m looking at 200,000 rows of data so I’d rather avoid using apply
here in favour of a vectorized approach. Can anyone please help me with this?
>Solution :
You can use map
>>> df = pd.DataFrame(mydict)
>>> df
customer fruit_bought
0 Jack apple#orange#apple
1 Danny orange#apple
2 Alex apple#banana#banana
>>> df['Unique'] = df.fruit_bought.str.split('#').map(set).str.join('#')
>>> df
customer fruit_bought Unique
0 Jack apple#orange#apple apple#orange
1 Danny orange#apple apple#orange
2 Alex apple#banana#banana apple#banana