Follow

Follow

Contact

Home How to merge multiple columns of a dataframe using regex？

Questions

How to merge multiple columns of a dataframe using regex？

byMR

February 3, 2023

I have a df which as following

import pandas as pd

df = pd.DataFrame(
    {'number_C1_E1': ['1', '2', None, None, '5', '6', '7', '8'],
     'fruit_C11_E1': ['apple', 'banana', None, None, 'watermelon', 'peach', 'orange', 'lemon'],
     'name_C111_E1': ['tom', 'jerry', None, None, 'paul', 'edward', 'reggie', 'nicholas'],
     'number_C2_E2': [None, None, '3', None, None, None, None, None],
     'fruit_C22_E2': [None, None, 'blueberry', None, None, None, None, None],
     'name_C222_E2': [None, None, 'anthony', None, None, None, None, None],
     'number_C3_E1': [None, None, '3', '4', None, None, None, None],
     'fruit_C33_E1': [None, None, 'blueberry', 'strawberry', None, None, None, None],
     'name_C333_E1': [None, None, 'anthony', 'terry', None, None, None, None],
     }
)

Here what I want to do is combine those columns and we have two rules:

If a column removes _C{0~9} or _C{0~9}{0~9} or _C{0~9}{0~9}{0~9} is equal to another column, these two columns can be combined.

Let’s take number_C1_E1 number_C2_E2 number_C3_E1 as an example, here number_C1_E1 and number_C3_E1 can be combined because they are both number_E1 after removing _C{0~9}.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.
Visit Medevel

The two combined columns should get rid of the None values.

The desired result is

  number_C1_1_E1 fruit_C11_1_E1 name_C111_1_E1 number_C2_1_E2 fruit_C22_1_E2 name_C222_1_E2
0              1          apple            tom           None           None           None
1              2         banana          jerry           None           None           None
2              3      blueberry        anthony              3      blueberry        anthony
3              4     strawberry          terry           None           None           None
4              5     watermelon           paul           None           None           None
5              6          peach         edward           None           None           None
6              7         orange         reggie           None           None           None
7              8          lemon       nicholas           None           None           None

Anyone has a good solution?

>Solution :

Use the same as your previous question, but also compute a renamer for your columns:

group = df.columns.str.replace(r'_C\d+', '', regex=True)

names = df.columns.to_series().groupby(group).first()

out = (df.groupby(group, axis=1, sort=False).first()
         .rename(columns=names)
       )

Output:

  number_C1_E1 fruit_C11_E1 name_C111_E1 number_C2_E2 fruit_C22_E2 name_C222_E2
0            1        apple          tom         None         None         None
1            2       banana        jerry         None         None         None
2            3    blueberry      anthony            3    blueberry      anthony
3            4   strawberry        terry         None         None         None
4            5   watermelon         paul         None         None         None
5            6        peach       edward         None         None         None
6            7       orange       reggie         None         None         None
7            8        lemon     nicholas         None         None         None

pandas

byMR

Published February 03, 2023

Add a comment

Leave a ReplyCancel reply

Read more

Questions

How to show all 3 urls are null in following table in postgresql/sql

byMR

February 3, 2023

Questions

Adding NOT NULL constraint to one SELECT Column

byMR

February 3, 2023

Questions

how to play gif whenever button is clicked in javascript modal

byMR

February 3, 2023

Questions

react-router-dom doesn't redirect to the linked page, instead it shows the content of the linked page below the menu | ReactJS

byMR

February 3, 2023

Questions

How to subtract date-time from current date-time in SQL?

byMR

February 3, 2023

Questions

Unusual names from across() when using scale() as the .fns

byMR

February 3, 2023