Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Aggregate and concatenate multiple columns

I want to groupby my dataframe and concatenate the values/strings from the other columns together.

   Year Letter  Number  Note   Text
0  2022      a       1     8     hi
1  2022      b       1     7  hello
2  2022      a       1     6    bye
3  2022      b       3     5    joe

To this:

             Column
Year Letter              
2022 a            1|8|hi; 1|6|bye
     b            1|7|hello; 3|5|joe

I tried some things with groupby, apply() and agg() but I can’t get it work:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df.groupby(['Year', 'Letter']).agg(lambda x: '|'.join(x))

Output:

                  Text
Year Letter           
2022 a          hi|bye
     b       hello|joe

>Solution :

You can first join values per rows converted to strings by DataFrame.astype and DataFrame.agg and then aggregate join in GroupBy.agg:

df1 = (df.assign(Text= df[['Number','Note','Text']].astype(str).agg('|'.join, axis=1))
         .groupby(['Year', 'Letter'])['Text']
         .agg('; '.join)
         .to_frame())
print (df1)
                           Text
Year Letter                    
2022 a          1|8|hi; 1|6|bye
     b       1|7|hello; 3|5|joe

Or create custom lambda function in GroupBy.apply:

f = lambda x:  '; '.join('|'.join(y) for y in x.astype(str).to_numpy())
df1 = (df.groupby(['Year', 'Letter'])[['Number','Note','Text']].apply(f)
         .to_frame(name='Text')
         )
print (df1)
                           Text
Year Letter                    
2022 a          1|8|hi; 1|6|bye
     b       1|7|hello; 3|5|joe

If need join all columns without grouping columns:

grouped = ['Year','Letter']

df1 = (df.assign(Text= df[df.columns.difference(grouped, sort=False)]
                .astype(str).agg('|'.join, axis=1))
         .groupby(['Year', 'Letter'])['Text']
         .agg('; '.join)
         .to_frame())

grouped = ['Year','Letter']

f = lambda x:  '; '.join('|'.join(y) for y in x.astype(str).to_numpy())
df1 = (df.groupby(grouped)[df.columns.difference(grouped, sort=False)].apply(f)
        .to_frame(name='Text')
         )
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading