Create dictionary with pairs from column from pandas dataframe using regex

March 23, 2023

I have the following dataframe

import pandas as pd
df = pd.DataFrame({'Original': [92,93,94,95,100,101,102],
             'Sub_90': [99,98,99,100,102,101,np.nan],
             'Sub_80': [99,98,99,100,102,np.nan,np.nan],
             'Gen_90': [99,98,99,100,102,101,101],
             'Gen_80': [99,98,99,100,102,101,100]})

I would like to create the following dictionary

{
    'Gen_90': 'Original',
    'Sub_90': 'Gen_90',
    'Gen_80': 'Original',
    'Sub_80': 'Gen_80',
 }

using regex (because at my original data I also have Gen_70, Gen_60, ... , Gen_10 and Sub_70, Sub_60, ... , Sub_10)

So I would like to create pairs of Sub and Gen for the same _number and also pairs or the Original with the Gens

How could I do that ?

>Solution :

You can do:

gen_cols = df.filter(like='Gen_').columns
sub_cols = df.filter(like='Sub_').columns
d = dict(zip(sorted(sub_cols), sorted(gen_cols)))
d.update({g : 'Original' for g in gen_cols})
print(d)

{'Sub_80': 'Gen_80',
 'Sub_90': 'Gen_90',
 'Gen_90': 'Original',
 'Gen_80': 'Original'}