How to add a column with given values to a Pandas dataframe, grouped by another column?

Advertisements

I have two Pandas dataframes:

print(df_a)
   ID  irrelevant_value
0   1  1.2
1   1  2.3
2   1  0.9
3   1  1.1
4   2  2.7
5   2  3.1
6   3  1.3
7   3  0.2
8   3  2.3
...

and

    ID  add_these_values_to_the_same_ID
0   1   100
1   2   120
2   3   90
...

I would like to combine them, such that – desired result:

print(df_a)

    ID  irrelevant_value  add_these_values_to_the_same_ID
0   1   1.2               100
1   1   2.3               100
2   1   0.9               100
3   1   1.1               100
4   2   2.7               120
5   2   3.1               120
6   3   1.3               90
7   3   0.2               90
8   3   2.3               90
...

How can this be accomplished?

I have been struggling with df_a.groupby(["ID"]), but cannot find a way forward.

>Solution :

The groupby function is not needed here; instead, just use merge. As it says in those docs,

If both key columns contain rows where the key is a null value, those rows will be matched against each other. This is different from usual SQL join behaviour and can lead to unexpected results.

And both of your dataframes contain ID. Therefore, you can merge the dataframes with this code (this is just one way to do so with one method, but it does work):

import pandas as pd

# Example dataframes - these might be defined elsewhere in your situation
data_a = {'ID': [1, 1, 1, 1, 2, 2, 3, 3, 3],
          'irrelevant_value': [1.2, 2.3, 0.9, 1.1, 2.7, 3.1, 1.3, 0.2, 2.3]}
df_a = pd.DataFrame(data_a)

data_b = {'ID': [1, 2, 3],
          'add_these_values_to_the_same_ID': [100, 120, 90]}
df_b = pd.DataFrame(data_b)

# Merge dataframes on the 'ID' column
result = df_a.merge(df_b, on='ID')

print(result)

The really important line there is:

result = df_a.merge(df_b, on='ID')

And this will output:

   ID  irrelevant_value  add_these_values_to_the_same_ID
0   1               1.2                             100
1   1               2.3                             100
2   1               0.9                             100
3   1               1.1                             100
4   2               2.7                             120
5   2               3.1                             120
6   3               1.3                              90
7   3               0.2                              90
8   3               2.3                              90

This should work, but if your data does not work with this method, you can take a look at how:

how{‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}, default inner
Type of merge to be performed.

  • left: use only keys from left frame, similar to a SQL left outer join; preserve key order.
  • right: use only keys from right frame, similar to a SQL right outer join; preserve key order.
  • outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically.
  • inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys.
  • cross: creates the cartesian product from both frames, preserves the order of the left keys.

outer might work if the code above does not.

Leave a ReplyCancel reply