Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Concatenate pandas column values based on common index

Input dataframe

A      B
n1     "joe,jack"
n2     "kelly,john"
n3     "adam,sam"
n1     "jack,frank"
n3     "rita"
n4     "steve, buck"
n2     "john, kelly, peter"

Based on index column A, I want to concat text, seperated with comma(,). So the expected output would look like(any instance of repetition is taken only once)

A       B
n1      joe,jack,frank
n2      kelly,john,peter
n3      adam,sam,rita
n4      steve, buck

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Use GroupBy.agg with custom function with split, set comprehension and join if order is not important:

f = lambda x: ','.join(set([z for y in x for z in y.replace(', ',',').split(',')]))
df = df.groupby('A')['B'].agg(f).reset_index()
print (df)
    A                 B
0  n1    jack,joe,frank
1  n2  john,kelly,peter
2  n3     adam,rita,sam
3  n4        steve,buck

If order is important for remove duplicated use dict.fromkeys trick:

f = lambda x:','.join(dict.fromkeys([z for y in x for z in y.replace(', ',',').split(',')]))
df = df.groupby('A')['B'].agg(f).reset_index()
print (df)
    A                 B
0  n1    joe,jack,frank
1  n2  kelly,john,peter
2  n3     adam,sam,rita
3  n4        steve,buck
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading