Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Merge two dataframes based on new column(s) values

I have two Dataframes

df1 = pd.DataFrame(
    {
        "A": ["1", "3", "22", "43"],
        "B": ["6", "19", "4", "31"],
        "C": ["47", "15", "8", "19"],
    },
    index=[0, 1, 2, 3],
)


df2 = pd.DataFrame(
    {
        "A": ["65", "47", "6", "13"],
        "B": ["29", "5", "2", "21"],
        "C": ["69", "9", "11", "80"],
    },
    index=[4, 5, 6, 7],
)

By using pandas, the final result should be:

    A   B   C   Ti  ID
0   1   6   47  am  01
1   3   19  15  am  01
2   22  4   8   am  01
3   43  31  19  am  01
4   65  29  69  pm  01
5   47  5   9   pm  01
6   6   2   11  pm  01
7   13  21  80  pm  01

I went through Pandas Documentation, and I am trying to merge these two Dataframes by using pd.concat. The code is:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

new_df = pd.concat([df1, df2], keys=['am', 'pm']).reset_index()

However, the new Dataframe came out with an extra column level_1 that I don’t want it to be there:

    level_0  level_1  A    B    C
0   am       0        1    6    47
1   am       1        3    19   15
2   am       2        22   4    8
3   am       3        43   31   19
4   pm       4        65   29   69
5   pm       5        47   5    9
6   pm       6        6    2    11
7   pm       7        13   21   80

I know reset_index() created that unwanted column. But why?

What else to do to get the same final Dataframe using pandas?

>Solution :

After you concatenate the DataFrames,

new_df = pd.concat([df1, df2], keys=['am', 'pm'])

new_df looks like

       A   B   C
am 0   1   6  47
   1   3  19  15
   2  22   4   8
   3  43  31  19
pm 4  65  29  69
   5  47   5   9
   6   6   2  11
   7  13  21  80
    

If we look at the index new_df.index, it’s a MultiIndex where the first level is the keys and second level is the old index:

MultiIndex([('am', 0),
            ('am', 1),
            ('am', 2),
            ('am', 3),
            ('pm', 4),
            ('pm', 5),
            ('pm', 6),
            ('pm', 7)],
           )

Then first we can rename the MultiIndex levels using rename_axis, and reset_index but only remove the first level from the index (which then becomes a column with its name). Note that by default, reset_index removes all levels from the index. That’s why you see level_0 and level_1 columns added after reset_index.

new_df = new_df.rename_axis(['Ti', None]).reset_index(level=0)

You can rearrange the columns by reassigning the DataFrame with a list of the columns with the desired order.

cols = new_df.columns.tolist()
new_df = new_df[cols[1:]+[cols[0]]]
new_df['ID'] = '01'

Output:

    A   B   C  Ti  ID
0   1   6  47  am  01
1   3  19  15  am  01
2  22   4   8  am  01
3  43  31  19  am  01
4  65  29  69  pm  01
5  47   5   9  pm  01
6   6   2  11  pm  01
7  13  21  80  pm  01
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading