Home Merge pandas datasets with different indices and columns

Questions

Merge pandas datasets with different indices and columns

July 26, 2023

I’m fairly sure I’m missing a version of this question elsewhere on the site, but cannot for the life of me find it. I can’t be the first person to have this issue.

In essence, I’m looking for a function that would merge pandas Dataframes in the equivalent way that a nested version of dict.merge would. If A and B are dictionaries of dictionaries, then we’d get C = {k: A.get(k, {}) | B.get(k, {}) for k in set().union(*[A, B])}

Concretely, I have two Pandas datasets A and B, with different sets of indices and columns (but not necessarily disjoint).

I would like a dataset C, whose indices are the union of A and B’s indices and whose columns are the union of A and B’s columns. The values are taken from B, if B has a non-NaN value for that particular index/column, otherwise it uses A.

I want a function SuperMerge: C = A.SuperMerge(B). Assume the dataframes have some method df.get(i, c, default=pd.NA) that returns df[i, c] if it exists otherwise it returns default.

SuperMerge works like this:

C[i,c] = B.get([i, c]) if not B.get([i, c]).isna() else A.get([i, c])

Based on the lack of questions that address this, and the lack of Pandas functionality, I assume this is extremely rare use-case, but it’s effectively how Python’s dict update works: the updated dictionary now has the union of keys from both dicts, taking values from the right dict if available else it takes the one from the left dict.

For example:

df1 = pd.DataFrame({'col1': ['a', 'b'], 'col2': [1, 2], 'col4': ['col4_x', 'col4_y']}, index=['x', 'y'])

  col1  col2    col4
x    a     1  col4_x
y    b     2  col4_y

df2 = pd.DataFrame({'col1': ['ALT', 'c'], 'col3': ['ONE', 'THREE'], 'col4': [pd.NA, 'col4_z']}, index=['x', 'z'])

  col1   col3    col4
x  ALT    ONE    <NA>
z    c  THREE  col4_z

df1.SuperMerge(df2)

  col1  col2   col3    col4
x  ALT     1    ONE  col4_x
y    b     2   <NA>  col4_y
z    c  <NA>  THREE  col4_z

>Solution :

Supermerge is essentially combine_first.

From the docs:

Combine two DataFrame objects by filling null values in one DataFrame
with non-null values from other DataFrame. The row and column indexes
of the resulting DataFrame will be the union of the two. The resulting
dataframe contains the ‘first’ dataframe values and overrides the
second one values where both first.loc[index, col] and
second.loc[index, col] are not missing values, upon calling
first.combine_first(second).

df2.combine_first(df1)

  col1  col2   col3    col4
x  ALT   1.0    ONE  col4_x
y    b   2.0    NaN  col4_y
z    c   NaN  THREE  col4_z

dataframe

byMR

Published July 26, 2023

Add a comment

How to remove specific character except for the first one using regex in JavaScript?

byMR

July 26, 2023

Questions

Linq extension methods are not using Equals of Object class

byMR

July 26, 2023

Questions

VBA finish loop if adjacing cells are empty

byMR

July 26, 2023

Questions

Excel VBA Copy sheet. Rename it and add it

byMR

July 26, 2023

Questions

Flatten a complex nested having a common column JSON using jolt transform

byMR

July 26, 2023

Questions

Visual Studio 2022 pre-release nuget not used

byMR

July 26, 2023

Merge pandas datasets with different indices and columns

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

How to remove specific character except for the first one using regex in JavaScript?

Linq extension methods are not using Equals of Object class

VBA finish loop if adjacing cells are empty

Excel VBA Copy sheet. Rename it and add it

Flatten a complex nested having a common column JSON using jolt transform

Visual Studio 2022 pre-release nuget not used

Keep Up to Date with the Most Important News

Merge pandas datasets with different indices and columns

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

How to remove specific character except for the first one using regex in JavaScript?

Linq extension methods are not using Equals of Object class

VBA finish loop if adjacing cells are empty

Excel VBA Copy sheet. Rename it and add it

Flatten a complex nested having a common column JSON using jolt transform

Visual Studio 2022 pre-release nuget not used

Discover more from Dev solutions